diff --git "a/Data/decision.csv" "b/Data/decision.csv" new file mode 100644--- /dev/null +++ "b/Data/decision.csv" @@ -0,0 +1,29024 @@ +id,tcdate,tmdate,number,forum,replyto,title,decision,comment,conf_name,confidence +uv_wrStPTR,1642700000000.0,1642700000000.0,1,FFGDKzLasUa,FFGDKzLasUa,Paper Decision,Reject,"This paper presents a meta-learning method where the standard ReLU activation units are replaced by the stochastic LWTA units to learn data-driven sparse representation. Most of reviewers like the idea of embedding the stochastic LWTA into a MAML framework. Initial assessment pointed out the lack of clarity in various places in the paper. Authors did a good job in the author’s rebuttal period, to clarify the paper. Experiments demonstrated the competitive performance over existing meta-learning methods. Two of reviewers raised their overall score to 5 (from 3). However, all reviewers have concerns in the incremental technical novelty and feel that presentation should be improved to make the paper more clear and friendly to readers. While the idea is interesting, which is backed up by experiments, the paper is not ready for the publication at the current stage. I encourage to resubmit the paper after addressing these concerns.",ICLR2022, +UXXsqzi3CV,1642700000000.0,1642700000000.0,1,MGIg_Q4QtW2,MGIg_Q4QtW2,Paper Decision,Reject,"This paper proposes a learning-based method for shape registration that conditions on regions of the shape rather than learning from the entire point cloud in one shot. The reviewers point out several questions about the method, thanks to expository issues as well as missing comparisons/ablation studies. As the authors have chosen not to submit a rebuttal, I will refer them to the original reviews for details here for additional points of improvement.",ICLR2022, +ii_qWy8f4x0,1642700000000.0,1642700000000.0,1,TH7crDRRND,TH7crDRRND,Paper Decision,Reject,"Thanks for your submission to ICLR. + +This paper considers binary hashing schemes, and makes two related contributions. First, it analyzes a simple extension to SQ-RFF; second, it introduces and analyzes a novel metric for ranking called ranking efficiency. Some experiments are also performed on standard data sets. + +This is very much a borderline paper, and could go either way. I took a close look at the paper to offer my opinions in addition to the reviewers. The paper itself is well written and seems to be correct. I do like the simplicity of the proposed SignRFF method as well as the ranking efficiency measure. However, the contributions are somewhat limited, and it's in an area that hasn't seen much work in the last several years (this paper mainly builds off of methods from 10+ years ago). Further, it doesn't seem that methods such as KLSH and SignRFF are used much in practice, so I don't know if this will have substantial impact. So while it's a reasonably interesting paper with some nice insights, I think it falls just below the acceptance threshold for me.",ICLR2022, +hQSmalOLmrM,1610040000000.0,1610470000000.0,1,UuchYL8wSZo,UuchYL8wSZo,Final Decision,Accept (Oral),"Motivated by the importance of gameplay in the development of critical skills for humans and other biological species, this work aims to explore representation learning via gameplay in a realistic, high fidelity environment. Inspired by childhood psychology, they propose a variant of hide-and-seek game called ""Cache"" built on top of AI2-THOR, where one agent must place an object in a room such that another agent cannot find it, and demonstrate that the adversarial nature of the game helps the agents learn useful representations of the environment. They examine the difference in representations learned via such a dynamic, interactive adversarial gameplay approach, vs other more passive approaches involving static images. + +The paper is well written and motivated, and easy to follow. All reviewers agree that the paper will be a great contribution to the ICLR community. I believe this is an important work, because not only does it challenge the traditional way of training many components of our systems passively (via static image recognition models), it synthesizes ideas from various disciplines (psychology, embodiment, ML) and provides an excellent framework for future research. For these reasons I'm recommending we accept this work as an Oral presentation.",ICLR2021, +B1gZR6YgxN,1544750000000.0,1545350000000.0,1,HkgmzhC5F7,HkgmzhC5F7,meta-review,Reject,"The paper revisits the traditional bias-variance trade-off for the case +of large capacity neural networks. Reviewers requested several clarifications +on the experimental setting and underlying results. Authors provided some, +but it was deemed not enough for the paper to be strong enough to be accepted. +Reviewers discussed among themselved but think that given the paper is mostly +experimental, it needs more experimental evidence to be acceptable. +Overall, I found the paper borderline but concur with the reviewers to reject +it in its current form.",ICLR2019,4: The area chair is confident but not absolutely certain +QScSo_rdNA,1576800000000.0,1576800000000.0,1,Hkl4EANFDH,Hkl4EANFDH,Paper Decision,Reject,"The submission proposes a 'co-natural' gradient update rule to precondition the optimization trajectory using a Fisher information estimate acquired from previous experience. This results in reduced sensitivity and forgetting when new tasks are learned. + +The reviews were mixed on this paper, and unfortunately not all reviewers had enough expertise in the field. After reading the paper carefully, I believe that the paper has significance and relevance to the field of continual learning, however it will benefit from more careful positioning with respect to other work as well as more empirical support. The application to the low-data-regime is interesting and could be expanded and refined in a future submission. + +The recommendation is for rejection.",ICLR2020, +7IcLkvgoJZf,1610040000000.0,1610470000000.0,1,fTeb_adw5y4,fTeb_adw5y4,Final Decision,Reject,"This paper improves calibration of neural networks by investing its connection to adversarial robustness. Two reviewers suggested acceptance, and two did rejection. As the authors and some reviewers highlighted, AC also agreed that the correlation between adversarial robustness and calibration is interesting to explore. However, as R1 pointed out, AC also thinks that the experimental results are not strong enough to meet the high standard of ICLR, e.g., Mixup often outperforms the proposed method (without further post-processing) and the proposed method does not outperform the deep ensemble (although deep ensemble is expensive and both method can be combined). Due to this, AC doubts whether adversarial robustness is indeed the best way to improve calibration (it can be useful though). Hence, AC recommends rejection.",ICLR2021, +BJrnhMUdx,1486400000000.0,1486400000000.0,1,ByW2Avqgg,ByW2Avqgg,ICLR committee final decision,Reject,"The reviewers pointed out several issues with the paper, and all recommended rejection. The revision seems to not have been enough to change their minds.",ICLR2017, +HyVwnGLOx,1486400000000.0,1486400000000.0,1,SJc1hL5ee,SJc1hL5ee,ICLR committee final decision,Reject,"The submission describes a method for compressing shallow and wide text classification models. The paper is well written, but the proposed method is not particularly novel, as it's comprised of existing model compression techniques. Overall, the contributions are incremental and the potential impact seems rather limited.",ICLR2017, +SJgNUmk-gV,1544770000000.0,1545350000000.0,1,BygGNnCqKQ,BygGNnCqKQ,"Interesting idea, but requires additional experimentation and analyses",Reject,"The authors propose a scheme to learn a mapping between the discrete space of network architectures into a continuous embedding, and from the continuous embedding back into the space of network architectures. During the training phase, the models regress the number of parameters, and expected accuracy given the continuous embedding. Once trained, the model can be used for compression by first embedding the network structure and then performing gradient descent to maximize accuracy by minimizing the number of parameters. The optimized representation can then be mapped back into the discrete architecture space. +Overall, the main idea of this work is very interesting, and the experiments show that the method has some promise. However, as was noted by the reviewers, the paper could be significantly strengthened by performing additional experiments and analyses. As such, the AC agrees with the reviewers that the paper in its present form is not suitable for acceptance, but the authors are encouraged to revise and resubmit this work to a future venue. +",ICLR2019,4: The area chair is confident but not absolutely certain +QKFhqS12ToV,1610040000000.0,1610470000000.0,1,HNytlGv1VjG,HNytlGv1VjG,Final Decision,Reject,"The paper introduces a simple and interesting method that adaptively smoothes the labels of augmented data based on a distance to the “clean” training data. The reviewers have raised concerns about limited novelty, minor improvement over baselines, and insufficient experiments. The author’s response was not sufficient to eliminate these concerns. The AC agrees with the reviewers that the paper does not pass the acceptance bar of ICLR.",ICLR2021, +aYB2hWSIubo,1642700000000.0,1642700000000.0,1,0xiJLKH-ufZ,0xiJLKH-ufZ,Paper Decision,Accept (Oral),"This paper presents an analytic approach for estimating the optimal reverse variance schedule given a pre-trained score-based model. The experimental results demonstrated the efficacy of the proposed method on several datasets across different sampling budgets. Given the recent interest in score-based generative models, I believe that the paper will find applications in various domains. I am pleased to recommend it for acceptance.",ICLR2022, +2OgxtsbfcvU,1610040000000.0,1610470000000.0,1,b7ZRqEFXdQ,b7ZRqEFXdQ,Final Decision,Reject,"The work introduces a method that uses the Feature Statistics Alignment paradigm to improve sequence generation with GANs. The contribution is interesting and novel (although marginally), clarity is also good. +However the reviewers raised several concerns calling for more comprehensive and thorough evaluation. Experiments show an improvement comparing to selected baselines and the revised paper addressed, at least partially, a serious evaluation concern of one reviewer. +Although the excellent revision work some important open questions still seem to remain, in particular the choose of alignment metrics and a thorough evaluation. +",ICLR2021, +xugQw8jZ-_wv,1642700000000.0,1642700000000.0,1,pgkwZxLW8b,pgkwZxLW8b,Paper Decision,Reject,"The paper revisits representation learning for extreme settings (large number of class categories) in a federated learning setup. The authors show how each client can sample a set of negative classes and optimize only the corresponding model parameters with respect to a sampled softmax objective that approximates the global full softmax objective. The authors investigate the interest of the approach for image classification and image retrieval. + +The reviewers appreciated the interest of the approach to reduce communication and the experimental evaluation on several datasets. The reviewers also expressed concerns about privacy, a central concern in federated learning. One reviewer noted for instance that ‘since every sampled set of each client has to include the classes that the client has, the central server can infer the classes the client has’. The reviewers would also have liked to see a more comprehensive evaluation, in the absence of the theoretical guarantees. Finally, the reviewers expressed regarding accuracy/efficiency trade-offs, one reviewer commenting that “the proposed method degrades the accuracy”. + +The authors submitted responses to the reviewers' comments. The authors discussed the challenges related to privacy. The authors also commented on other gradient sparsification communication-reducing competing approaches (FedAwS) and the choice of datasets. After reading the response, updating the reviews, and discussion, the reviewers found that ‘the current good results are only obtained on smaller-scale datasets with fewer classes [while] in machine learning, the phenomenon could be quite different at different scales’ and that ‘it is not clear if the proposed method can outperform TernGrad at the same amount of transferred data [and] TernGrad also has a better convergence proof compared to the proposed method’. + +We encourage the paper to pursue their approach further taking into account the reviewers' comments, encouragements, and suggestions. Recent progress in privacy protection theoretical frameworks in FL (secure multi party computation, etc.), see the recent survey by Kairouz et al. in FnT in ML, should help the authors develop guarantees for their approach. Moreover the reviewers suggested a clear path towards further improvements of the experimental evaluation. + +The revision of the paper will generate a stronger submission to a future venue. + +Reject.",ICLR2022, +lXbGBMe4d,1576800000000.0,1576800000000.0,1,HkxzNpNtDS,HkxzNpNtDS,Paper Decision,Reject,"The paper proposes a multitask navigation model that can be trained on both vision-language navigation (VLN) and navigation from dialog history (NDH) tasks. The authors provide experiments that demonstrate that their model can outperformance single-task baseline models. + +The paper received borderline scores with two weak accept and one weak reject. Overall, the reviewers found the paper to be well-written and easy to understand, with thorough experiments. + +The reviewers had minor concerns about the following: +1. The generalizability of the work. No results are reported on the test set, only on val. +2. The gains for val unseen are pretty small and there are other models (e.g. Ke et al, Tan et al) that have better results. Would the proposed environment-agnostic multitask learning be able to improve those models as well? Or is the gains limited to having a weak baseline? +3. It's unclear if the gains are due to the multitasking or just having more data available to train on. +4. There are some minor issues with the misspellings/typos. Some examples are given: +Page 1: ""Manolis Savva* et al"" --> ""Savva et al"" +Page 5: ""x_1, x2, ..., x_3"" --> Should the x_3 be something like x_k where k is the length of the utterance? + +The AC agrees with the reviewers that the paper is interesting and is mostly solid work. The AC also feels that there are some valid concerns about the generalizability of the work and that the paper would benefit from a more careful consideration of the issues raised by the reviewers. The authors are encouraged to refine the work and resubmit.",ICLR2020, +Sw63ZR4me1q,1610040000000.0,1610470000000.0,1,DQpwoZgqyZ,DQpwoZgqyZ,Final Decision,Reject,"This work presented a broad set of interesting applications of model information toward understanding task difficulty, domain similarity, and more. However, reviewers were concerned around the validity and rigor of the conclusions. Going into more depth in a subset of the areas presented would strengthen the paper, as would further discussions and experiments around the limitations of model information with regards to specific models and dataset sizes (as you have begun to discuss in Section 8). Additionally, reviewers found the updated paper with connections to Kolmogorov complexity interesting, but reviewers wanted a more formal treatment and analysis of the relationship. ",ICLR2021, +RXHopZ3Yp,1576800000000.0,1576800000000.0,1,Hye-p0VFPB,Hye-p0VFPB,Paper Decision,Reject,"This paper presents an energy-efficient architecture for quantized deep neural networks based on decomposable multiplication using MACs. Although the proposed approach is shown to be somehow effective, two reviewers pointed out that the very similar idea was already proposed in the previous work, BitBlade [1]. As the authors did not submit a rebuttal to defend this critical point, I’d like to recommend rejection. I recommend authors to discuss and clarify the difference from [1] in the future version of the paper. + +[1] Sungju Ryu, Hyungjun Kim, Wooseok Yi, Jae-Joon Kim. BitBlade: Area and Energy-Efficient Precision-Scalable Neural Network Accelerator with Bitwise Summation. DAC'2019 +",ICLR2020, +SJfj1HyBWqF,1642700000000.0,1642700000000.0,1,wQ7RCayXUSl,wQ7RCayXUSl,Paper Decision,Reject,"The paper attacks an interesting problem: accurately estimating uncertainties in action-value estimates in offline RL. It proposes a method based on ensembles of Q functions, where we alternately train an ensemble to estimate Q(s,a) for the current policy, and then adjust our policy based on the mean and uncertainty in this ensemble. By choosing mean + \beta * [standard deviation] as the basis for our policy updates, we can be either conservative (\beta < 0) or optimistic (beta > 0). The paper analyzes the ensemble training using the Gaussian process (NTK) view of deep nets. + +The largest weakness of the paper is a lack of rigor in its analysis. While its main topic is uncertainty in Q estimates, the paper does not specify a valid probabilistic model on which such uncertainty estimates could be based. The theorems analyze only a part of the algorithm (policy evaluation), and don't take into account the interplay between this evaluation and any policy updates. The theorems also do not show that the computed output distribution is relevant to the actual uncertainty of the algorithm; e.g., they do not describe a prior for which the ensemble approximates the correct posterior (nor any other similar notion). Despite these omissions, the theorems are nonetheless presented as providing a reason to trust the output of the algorithm. + +On the other hand, there definitely is valuable material in the paper; the experiments are interesting (and would be even more interesting if we could compare to some notion of a correct answer for at least the small ones), and the intuition and analysis could be enlightening if presented more clearly and formally, with a better description of the connection between theory and practice. Unfortunately, the paper as written doesn't enable the reader to accurately understand and assess the contributions.",ICLR2022, +rJgd1KdCJV,1544620000000.0,1545350000000.0,1,S1eK3i09YQ,S1eK3i09YQ,ICLR 2019 decision,Accept (Poster),"This paper proves that gradient descent with random initialization converges to global minima for a squared loss penalty over a two layer ReLU network and arbitrarily labeled data. The paper has several weakness such as, 1) assuming top layer is fixed, 2) large number of hidden units 'm', 3) analysis is for squared loss. Despite these weaknesses the paper makes a novel contribution to a relatively challenging problem, and is able to show convergence results without strong assumptions on the input data or the model. Reviewers find the results mostly interesting and have some concerns about the \lambda_0 requirement. I believe the authors have sufficiently addressed this issue in their response and I suggest acceptance. ",ICLR2019,4: The area chair is confident but not absolutely certain +SJlxA3vJx4,1544680000000.0,1545350000000.0,1,S1g2V3Cct7,S1g2V3Cct7,"Some insights into using ER to reduce catastrophic forgetting, but requires a bit better placement",Reject,"This paper and revisions have some interesting insights into using ER for catastrophic forgetting, and comparisons to other methods for reducing catastrophic forgetting. However, the paper is currently pitched as the first to notice that ER can be used for this purpose, whereas it was well explored in the cited paper ""Selective Experience Replay for Lifelong Learning"", 2018. For example, the abstract says ""While various methods to counteract catastrophic forgetting have recently been proposed, we explore a straightforward, general, and seemingly overlooked solution – that of using experience replay buffers for all past events"". It seems unnecessary to claim this as a main contribution in this work. Rather, the main contributions seem to be to include behavioural cloning, and do provide further empirical evidence that selective ER can be effective for catastrophic forgetting. + +Further, to make the paper even stronger, it would be interesting to better understand even smaller replay buffers. A buffer size of 5 million is still quite large. What is a realistic size for continual learning? Hypothesizing how ER can be part of a real continual learning solution, which will likely have more than 3 tasks, is important to understand how to properly restrict the buffer size. + +Finally, it is recommended to reconsider the strong stance on catastrophic interference and forgetting. Catastrophic interference has been considered for incremental training, where recent updates can interfere with estimates for older (or other values). This definition does not precisely match the provided definition in the paper. Further, it is true that forgetting has often been used explicitly for multiple tasks, trained in sequence; however, the issues are similar (new learning overriding older learning). These two definitions need not be so separate, and further it is not clear that the provided definitions are congruent with older literature on interference. + +Overall, there is most definitely useful ideas and experiments in this paper, but it is as yet a bit preliminary. Improvements on placement, motivation and experimental choices would make this work much stronger, and provide needed clarity on the use of ER for forgetting.",ICLR2019,4: The area chair is confident but not absolutely certain +HkxD_SCMg4,1544900000000.0,1545350000000.0,1,SygxYoC5FX,SygxYoC5FX,"Novelty, complexity and poor presentation are all of concern.",Reject,"AR2 is concerned about the marginal novelty, weak experiments and very high complexity of the algorithm. AR3 is concerned about lack of theoretical analysis and parameter setting. AR4 is concerned that the proposed method is useful in very +restricted settings and the paper is incremental. + +Unfortunately, with strong critique from reviewers regarding the novelty, complexity, poor presentation and restricted setting, this draft cannot be accepted by ICLR.",ICLR2019,5: The area chair is absolutely certain +S1gkRB8SeE,1545070000000.0,1545350000000.0,1,BkgzniCqY7,BkgzniCqY7,Accept.,Accept (Poster),This paper contributes a novel approach to evaluating the robustness of DNN based on structured sparsity to exploit the underlying structure of the image and introduces a method to solve it. The proposed approach is well evaluated and the authors answered the main concerns of the reviewers. ,ICLR2019,4: The area chair is confident but not absolutely certain +zw3fh1p4EI,1610040000000.0,1610470000000.0,1,igkmo23BgzB,igkmo23BgzB,Final Decision,Reject,"This paper introduced a log-barrier based regularization method to reduce the dynamic range of data types in neural networks. As pointed out by the reviewers, there are many technical issues. The authors agree with the reviewers in the rebuttal, though claimed that they are fixed in the revised version of the paper. + +Experimental results are not convincing. It is not clear how the proposed method is evaluated. Accuracy of MobileNet using the proposed method is quite significantly lower compared to previous works. The work needs additional results/comparisons with other highly relevant papers on fixed-point training. + +There are also many clarity issues that need to be fixed.",ICLR2021, +xNIys1wbIAqt,1642700000000.0,1642700000000.0,1,K0E_F0gFDgA,K0E_F0gFDgA,Paper Decision,Accept (Spotlight),"This paper is a resource and numerical investigation into the variability of BERT checkpoints. It also provides a bootstrap method for making investigations on the checkpoints. + +All reviewers appreciate this contribution that can be expected to be used by the NLP community.",ICLR2022, +op722O20zF,1576800000000.0,1576800000000.0,1,S1lDkaEFwS,S1lDkaEFwS,Paper Decision,Reject,"This paper proposes a defense technique against query-based attacks based on randomization applied to a DNN's output layer. It further shows that for certain types of randomization, they can bound the probability of introducing errors by carefully setting distributional parameters. It has some valuable contributions; however, the rebuttal does not fully address the concerns.",ICLR2020, +em4lgR-UVk5,1610040000000.0,1610470000000.0,1,J_pvI6ap5Mn,J_pvI6ap5Mn,Final Decision,Reject,"The paper presents a novel framework from transfer learning over GNNs. Experiments ought to better substantiate how structural differences/similarities are measured, as well as relying on prior art to measure transferability success. A plan for incorporating (structural) features would also strengthen the present work.",ICLR2021, +S1OjjGIue,1486400000000.0,1486400000000.0,1,HkzuKpLgg,HkzuKpLgg,ICLR committee final decision,Reject,"The authors propose improvements for the utilization of modern hardware when training using stochastic gradient. However, the reviewers bring up several issues with the paper, including major clarity issues as well as notational issues and some comments about the theory vs. practice.",ICLR2017, +Skl7iE201V,1544630000000.0,1545350000000.0,1,B1fysiAqK7,B1fysiAqK7,Limited novelty and non-impressive experimental results,Reject,"The paper proposes a probabilistic training method for binary Neural Network with stochastic versions of Batch Normalization and max pooling. + +The reviewers and AC note the following potential weaknesses: (1) limited novelty and (2) preliminary experimental results. + +AC thinks the proposed method has potential and is interesting, but decided that the authors need more works to publish.",ICLR2019,4: The area chair is confident but not absolutely certain +QMyqAHBEV8u,1610040000000.0,1610470000000.0,1,yfKOB5CO5dY,yfKOB5CO5dY,Final Decision,Reject,"The paper presents a PAC-Bayesian approach for meta-learning that utilizes information of the task distribution in the prior. The presented localized approach allows the authors to derive an algorithm directly from the bound - this is a worthwhile contribution. Nevertheless there are several concerns that were raised by the reviewers and in its current form the work is not ready to appear in ICLR. + +",ICLR2021, +BklwuGtBxN,1545080000000.0,1545350000000.0,1,Skz3Q2CcFX,Skz3Q2CcFX,All reviewers agree that paper is not strong enough,Reject,Several visualizations are shown in this paper but it is unclear if they are novel.,ICLR2019,4: The area chair is confident but not absolutely certain +gWO2AMm7VB3,1642700000000.0,1642700000000.0,1,U-GB_gONqbo,U-GB_gONqbo,Paper Decision,Reject,"This paper proposes SH-LDM, which approximates the LDM model with a hierarchy of clusters. The authors should discuss the details about clustering and how this algorithm can benefit from sparsity in a more rigorous language. + +The authors should review the rich literature on scaling up distance-based methods such as kNN and kernel methods, which this paper belongs to. The title is also misleading; the paper mainly discusses scalable link prediction rather than learning new embeddings. + +The reviewers have raised several questions about the experiments. For example, the main results should be a table for comparing the speed rather than the accuracy of the algorithms. Also, the original LDM should be included in the accuracy tables. The settings in the experiments, such as embedding dimensions, are not appropriate for large graphs.",ICLR2022, +nY3vKS0rfMT,1610040000000.0,1610470000000.0,1,I6QHpMdZD5k,I6QHpMdZD5k,Final Decision,Reject,"This paper proposes using a neural network to learn an approximate solution for desired boundary conditions to accelerate the semiconductor device simulation. The work shows that speed-up simulation is increased significantly. However, the major concern about this work is the limited contribution to the machine learning community as exposed by the reviewers. ",ICLR2021, +mADsHcnUaL,1576800000000.0,1576800000000.0,1,S1gnxaVFDB,S1gnxaVFDB,Paper Decision,Reject,"This paper introduces a method for building interpretable classifiers, along with a measure of ""concept accuracy"" to evaluate interpretability, and primarily applies this method to text models, but includes a proof of concept on images in the appendix. + +The main contributions are sensible enough, but the main problems the reviewers had were: +A) The performance of the proposed method +B) The lack of human evaluation of interpretability, and +C) Lack of background and connections to other work. + +The authors improved the paper considerably during the rebuttal period, and might have addressed point C) satisfactorily, but only after several back and forths, and at this point it's too late to re-evaluate the paper. I expect that a more polished version of this paper would be acceptable in a future conference. + +I mostly ignored R1's review as they didn't seem to put much thought into their review and didn't respond to requests for clarifications.",ICLR2020, +LgMdEK4iNK,1576800000000.0,1576800000000.0,1,rJeIcTNtvS,rJeIcTNtvS,Paper Decision,Accept (Poster),"The paper considers the problem of knowledge-grounded dialogue generation with low resources. The authors propose to disentangle the model into three components that can be trained on separate data, and achieve SOTA on three datasets. + +The reviewers agree that this is a well-written paper with a good idea, and strong empirical results, and I happily recommend acceptance.",ICLR2020, +YEklyZ7ike,1576800000000.0,1576800000000.0,1,B1liraVYwr,B1liraVYwr,Paper Decision,Reject,"This paper tackles neural response generation with Generative Adversarial Nets (GANs), and to address the training instability problem with GANs, it proposes a local distribution oriented objective. The new objective is combined with the original objective, and used as a hybrid loss for the adversarial training of response generation models, named as LocalGAN. Authors responded with concerns about reviewer 3's comments, and I agree with the authors explanation, so I am disregarding review 3, and am relying on my read through of the latest version of the paper. The other reviewers think the paper has good contributions, however they are not convinced about the clarity of the presentations and made many suggestions (even after the responses from the authors). I suggest a reject, as the paper should include a clear presentation of the approach and technical formulation (as also suggested by the reviewers).",ICLR2020, +ITNJKX0OLk,1576800000000.0,1576800000000.0,1,S1gTAp4FDB,S1gTAp4FDB,Paper Decision,Reject,"This paper proposes 1) using neural-guided Monte-Carlo Tree Search to search for expressions that match a dataset and 2) Augments the loss to match the asymptotics of the true function when these are given. + +The use of MCTS sounds more sensible than standard evolutionary search. The augmented loss could make sense but seems extremely niche, requiring specific side information about the problem being solved. + +Overall, the task is so niche that I don't think it'll be of wide interest. It's not clear that it's solving a real problem.",ICLR2020, +XT8ukYI8yqO,1610040000000.0,1610470000000.0,1,nRJ08rN_b17,nRJ08rN_b17,Final Decision,Reject,"This paper explores a network that has a parvo (fine, detailed, slow) +and magno (low-res, quick) stream. The ideas are interesting and the +results intriguing, and one reviewer is in favor of acceptance. +Several reviewers criticized the clarity of the paper. and the lack of +details for, explanations of, and critical evaluation of, the design +decisions. For example, how do the results depend on certain design +decisions? I think that with a bit more work, this paper has potential to +be a very impactful paper. I would encourage the authors to follow the +detailed suggestions and resubmit the work to a high-impact conference or +journal. + +",ICLR2021, +BkNvIJ6BG,1517250000000.0,1517260000000.0,801,ByYPLJA6W,ByYPLJA6W,ICLR 2018 Conference Acceptance Decision,Reject,"The paper proposes a method to map input probability distributions to output probability distributions with few parameters. They show the efficacy of their method on synthetic and real stock data. After revision they seemed to have added another dataset, however, it is not carefully analyzed like the stock data. More rigorous experimentation needs to be done to justify the method.",ICLR2018, +u8SmfzbxLT_,1642700000000.0,1642700000000.0,1,CBchIgBBrwj,CBchIgBBrwj,Paper Decision,Reject,"This paper evaluates interpretation methods of neural networks on time series data. The reviewers find some values in this work, but were also consistently concerned with the main theme and novelty of this work. The authors have actively responded to reviewer comments, but the reviewers were not convinced with the major contributions and novelty. Thus the work is not ready for ICLR.",ICLR2022, +86nIhKLUrJ-,1642700000000.0,1642700000000.0,1,lEB5Dnz_MmH,lEB5Dnz_MmH,Paper Decision,Reject,"This paper proposed a new approach to jointly model text and stock price information and fuse them for stock market forecasting. It encodes text and stock price information in parallel and then fuses them using a co-attention transformer. According to the reviewers, the design of the model is not very well justified and seems to be a little ad hoc. The authors spent quite a few pages introducing background knowledge and the novelty of the proposed model is not sufficiently described. Some details in the experiments are missing, and it is not clear whether the results could be easily reproduced. There are many writing issues too. As a result, we do not think the paper is ready for publication at ICLR in its current form. BTW, after the reviewers posted their comments, the authors did not submit their rebuttals.",ICLR2022, +5loUnTu3SZbn,1642700000000.0,1642700000000.0,1,Xx4MNjSmQQ9,Xx4MNjSmQQ9,Paper Decision,Reject,"This paper tackles an interesting problem: distribution shift generalization often requires parameter identification but this is not possible for over-parameterized neural networks. This paper shows for quadratic neural networks, it is possible to identify the function without identifying the parameter. + +This is an interesting result. However, reviewers raise concerns about the assumption and technical details. The meta-reviewer agrees with these concerns.",ICLR2022, +m9OgIvFJHk,1576800000000.0,1576800000000.0,1,H1ekF2EYDH,H1ekF2EYDH,Paper Decision,Reject,"This paper presents a large-scale automatically extracted knowledge base in Chinese which contains information about entities and their relations present in academic papers. The authors have collected several papers that come from around 38 different domains. As such this is a dataset creation paper where the authors have used existing methodologies to perform relation extraction in Chinese. + +After having read the reviews and followup replies by authors, the main criticisms of the paper still hold. In addition to the lack of technical contribution, I feel that the writing of the paper can be improved a lot, for example, I would like to see a table with some example entities and relations extracted. That said, with further improvements this paper could potentially be a good contribution to LREC which is focused on dataset creation. + +In its current form, I recommend the paper to be rejected.",ICLR2020, +0KB1BPApx,1576800000000.0,1576800000000.0,1,H1e31AEYwB,H1e31AEYwB,Paper Decision,Reject,"While there was some support for the ideas presented, the majority of reviewers felt that this submission is not ready for publication at ICLR in its present form. + +Concerns raised include lack of sufficient motivation for the approach, and problems with clarity of the exposition.",ICLR2020, +mu7yY9Dkxrc,1610040000000.0,1610470000000.0,1,6FsCHsZ66Fp,6FsCHsZ66Fp,Final Decision,Reject,"In this paper, the authors propose a theoretically principled neural network that inherently resists ℓ∞ perturbations without the help of adversarial training. Although the authors insist to focus on the novel design with comprehensive theoretical supports, the reviewers still concern the insufficient empirical evaluations despite the novel idea and theoretical analysis.",ICLR2021, +ARDbUw6h8ZR,1642700000000.0,1642700000000.0,1,EgkZwzEwciE,EgkZwzEwciE,Paper Decision,Reject,"This manuscript proposes an adversarial method to address non-IID heterogeneity on federated learning. Distinct from existing methods, the mitigation is implemented by training and local communicating learned representations, so the metric of success is indistinguishability of representations across devices. + +There are four reviewers, all of whom agree that the method addresses an interesting and timely issue. However, reviewers are mixed on the paper score, and many raised concerns about communication overhead, apparent privacy costs, and convergence concerns with the adversarial methods. There is also some limited concern of novelty compared to existing methods. The authors provide a good rebuttal addressing these issues -- either based on experimental evidence (adding differential privacy), comparing communication overhead tradeoffs as it varies with model and sample size, and some existing convergence analysis. However, after reviews and discussion, the reviewers are unconvinced that the method is sufficiently robust to the concerns outlined. Authors are encouraged to address the highlighted technical concerns in any future submission of this work.",ICLR2022, +_iCFFAJ_rkP,1642700000000.0,1642700000000.0,1,KL5jILuehZ,KL5jILuehZ,Paper Decision,Reject,"The paper provides a new way of weighting data to build weighted estimators of causal effects (which themselves can be used in other contexts, e.g. doubly-robust estimators). It's novel in the sense that it optimizes the choice of weighting based on information about the response function space. The approach is simple to implement, and opens up other possibilities for different classes of estimators. + +I liked it. I think the paper is nearly there in terms of a well-rounded contribution. But I have to say that I did share the concern about the choice of random response functions. It's not only a matter of function space (everybody wants the most flexible one), but also of the random measure that goes in it - so the more flexible the random space, the least understood (to me at least) is the influence of the random measure. Surely that are choices of function space distributions that can do worse than uniform weights for some classes of problems? It's not that it's a implausible starting point (Bayesians do it all the time in terms of prior distribution, on top of a full-blown likelihood function that is more often than not just a big nuisance parameter), but I think the paper covers this aspect just too lightly. I think it's of benefit to the authors to release a published version of this paper once they have some more formal guidance or a more complex experimental setup providing a more thorough insight of it. I do think the contribution is really promising, but it feels unfinished, and I'd be curious to see where it could go following this direction.",ICLR2022, +sa9S-YWOQF,1576800000000.0,1576800000000.0,1,BJlAzTEKwS,BJlAzTEKwS,Paper Decision,Reject,"The reviewers generally expressed considerable reservation about the novelty of the proposed method. After reading the reviews in detail and looking at the paper, I'm inclined to agree that the contribution is rather incremental. Using normalizing flows for representing policies in RL has been studied in a number of prior works, including with soft actor-critic, and I think the novelty in this work is limited in relation to prior work. Therefore, I cannot recommend acceptance at this time.",ICLR2020, +BZIvm0T2T,1576800000000.0,1576800000000.0,1,HJxf53EtDr,HJxf53EtDr,Paper Decision,Reject,"The paper makes an interesting attempt at connecting graph convolutional neural networks (GCN) with matrix factorization (MF) and then develops a MF solution that achieves similar prediction performance as GCN. + +While the work is a good attempt, the work suffers from two major issues: (1) the connection between GCN and other related models have been examined recently. The paper did not provide additional insights; (2) some parts of the derivations could be problematic. + +The paper could be a good publication in the future if the motivation of the work can be repositioned. ",ICLR2020, +E6zaInPrGJU,1610040000000.0,1610470000000.0,1,xOBMyvoMQw8,xOBMyvoMQw8,Final Decision,Reject,"This paper describes a non-uniformly weighted version of SGMCMC, combining aspects of SG methods and importance sampling. The idea is interesting and novel, but unfortunately the authors have not made a compelling case for the resulting algorithm being a practical addition to the literature. The experimental analysis is not particularly compelling, and there are key concerns raised about practical implementation, and about the validity of the approximations raised. I hope that the authors will continue along this interesting line of work and add additional explorations of the approximations and improved experimental analysis.",ICLR2021, +rKoJ3HJ8Uc,1576800000000.0,1576800000000.0,1,S1lukyrKPr,S1lukyrKPr,Paper Decision,Reject,"The paper is well-written and presents an extensive set of experiments. The architecture is a simple yet interesting attempt at learning explainable rumour detection models. Some reviewers worry about the novelty of the approach, and whether the explainability of the model is in fact properly evaluated. The authors responded to the reviews and provided detailed feedback. A major limitation of this work is that explanations are at the level of input words. This is common in interpretability (LIME, etc), but it is not clear that explanations/interpretations are best provided at this level and not, say, at the level of training instances or at a more abstract level. It is also not clear that this approach would scale to languages that are morphologically rich and/or harder to segment into words. Since modern approaches to this problem would likely include pretrained language models, it is an interesting problem to make such architectures interpretable. ",ICLR2020, +gjQlZz_2-G,1610040000000.0,1610470000000.0,1,TEtO5qiBYvE,TEtO5qiBYvE,Final Decision,Reject,"This paper proposes an approach to allow a neural network to memorize and reason over a long time horizon. Experiments on synthetic datasets, question answering, and sequence recommendation are presented to evaluate the proposed method. + +The paper addresses an important problem of processing long sequences. However, all reviewers agree that the writing of the paper can be improved (i.e., motivation, details of experiment design/setup, and others below). Importantly, I think the authors need to elaborate the differences of continual memory with existing episodic memory methods. The authors added a paragraph about continual learning during the rebuttal period, and mentioned that their continual memory focuses on remembering infinite information stream without forgetting. Episodic memory models can be applied/adapted for this purpose, so the authors should at least compare with one of them (ideally more).",ICLR2021, +8daT0dl44,1576800000000.0,1576800000000.0,1,BJexP6VKwH,BJexP6VKwH,Paper Decision,Reject,"This paper proposes a method to address the covariate shift and label shift problems simultaneously. + +The paper is an interesting attempt towards an important problem. However, Reviewers and AC commonly believe that the current version is not acceptable due to several major misconceptions and misleading presentations. In particular: +- The novelty of the paper is not very significant. +- The main concern of this work is that its shift assumption is not well justified. +- The proposed method may be problematic by using the minimax entropy and self-training with resampling. +- The presentation has many errors that require a full rewrite. + +Hence I recommend rejection.",ICLR2020, +cD4VgiJEGGZ,1610040000000.0,1610470000000.0,1,W1G1JZEIy5_,W1G1JZEIy5_,Final Decision,Accept (Poster),"This work presents a novel approach to improving text decoding. This is backed up by a solid analysis of cross-entropy growth with top-k vs top-p and an interesting demonstration of repetition correlating with probability. The paper is well written and well organized. The authors' rebuttal was effective in convincing the reviewers. The human evaluation (added during the rebuttal phase) is a good demonstration of the effectiveness of the approach and so this paper's proposed decoding algorithm is likely to be impactful. + +Pros: +- Well written. +- Solid theoretical analysis of cross-entropy and its relation to top-p and top-k decoding. Good demonstration of how repetition is related to probability. +- Interesting, novel and effective decoding algorithm. +- Human evaluation of the algorithm's output. + +Cons: +- The approach has not been tested with a variety of language models. +- Decoding quality still depends on a target perplexity which may need to be tuned. +- Unnecessary dependence on Zipf's law in the basic decoding algorithm.",ICLR2021, +BklA2K-elE,1544720000000.0,1545350000000.0,1,r1fE3sAcYQ,r1fE3sAcYQ,good work but more is needed to have impact,Reject," +pros: +- nicely written paper +- clear and precise with a derivation of the loss function + +cons: + +novelty/impact: +I think all the reviewers acknowledge that you are doing something different in the neural brainwashing (NB) problem than is done in the typical catastropic forgetting (CF) setting. You have one dataset and a set of models with shared weights; the CF setting has one model and trains on different datasets/tasks. But whereas solving the CF problem would solve a major problem of continual machine learning, the value of solving the NB problem is harder to assess from this paper... The main application seems to be improving neural architecture search. At the meta-level, the techniques used to derive the main loss are already well known and the result similar to EWC, so they don't add a lot from the analysis perspective. I think it would be very helpful to revise the paper to show a range of applications that could benefit from solving the NB problem and that the technique you propose applies more broadly.",ICLR2019,3: The area chair is somewhat confident +VIWTsDHD-6a,1610040000000.0,1610470000000.0,1,Wga_hrCa3P3,Wga_hrCa3P3,Final Decision,Accept (Poster),"This paper proposes a new method for conditional text generation that uses contrastive learning to mitigate the exposure bias problem in order to improve the performance. Specifically, negative examples are generated by adding small perturbations to the input sequence to minimize its conditional likelihood, while positive examples are generated by adding large perturbations while enforcing it to have a high conditional likelihood. + +This paper receives 2 reject and 2 accept recommendations, which is a borderline case. The reviewers have raised many useful questions during the review process, while the authors has also done a good job during the rebuttal to address the concerns. After checking the paper and all the discussions, the AC feels that all the major concerns have been solved, such as more clarification in the paper, more results on non-pretrained models, and small-scale human evaluation. + +On one hand, reviewers found that the proposed method is interesting and novel to a certain extent, the paper is also well written. On the other hand, even after adding all the additional results, the reviewers still feel it is not super-clear that results would extend to better models, as most of the experiments are conducted on T5-small, and the final reported numbers in the paper are far from SOTA. + +As shown in Table 1 & 2, the AC agrees that the final results are far from SOTA, and the authors should probably also study the incorporation of CLAPS into stronger backbones. On the other hand, the AC also thinks that T5 is already a relatively strong baseline to start with (though it is T5-small), and it may not be necessary to chase SOTA. Under a fair comparison, the AC thinks that the authors have done a good job at demonstrating its improvements over T5-MLE baselines. + +As a summary, the AC thinks that the authors have done a good job during the rebuttal. On balance, the AC is happy to recommend acceptance of the paper. The authors should add more careful discussions to reflect the reviewers' comments when preparing the camera ready. ",ICLR2021, +bnS--LV1x,1576800000000.0,1576800000000.0,1,SygW0TEFwH,SygW0TEFwH,Paper Decision,Accept (Poster),"This paper presents a novel black-box adversarial attack algorithm, which exploits a sign-based rather than magnitude-based, gradient estimator for black-box optimization. It also adaptively constructs queries to estimate the gradient. The proposed approach outperforms many state-of-the-art black-box attack methods in terms of query complexity. There is a unanimous agreement to accept this paper.",ICLR2020, +GdrpX14XKFj,1610040000000.0,1610470000000.0,1,#NAME?,#NAME?,Final Decision,Accept (Poster),"This paper shows that transformer models can be used to learn certain advanced mathematical concepts such as the local stability of differential equations. Reviewers found this surprising and useful for engineers, and the evaluation was adequate. They also felt that it opens the doors to similar studies on other aspects of mathematics.",ICLR2021, +dRHjdYvuBtxN,1642700000000.0,1642700000000.0,1,EHaUTlm2eHg,EHaUTlm2eHg,Paper Decision,Accept (Poster),"This paper presents a new reinforcement learning algorithm for POMDPs that specifically deals with the credit assignment problem. The proposed algorithm consists in using at each time-step t of a training trajectory the subsequent future trajectory that starts at time t+1 as additional inputs to the policy and value networks. Instead of using the trajectories directly, two RNNs are used to encode the trajectories into latent two variables that are then given as inputs to the policy and value networks. A key novel contribution of this work is the use of ""Z-forcing"" to help the RNNs learn the relevant information. Since future trajectories are not available during testing, a ""prior"" network is trained to predict the latent variable given a state. During testing, the latent variable is sampled from the network. Empirical experiments on simple simulated environments show that the proposed algorithm outperforms several baselines. + +Key issues raised by the reviewers include the complexity of the proposed algorithm, the fact that several interesting results are in the appendix rather than the main paper, and the weakness of certain baselines. The authors responses helped clarify these issues, and additional experiments (such as a comparison to a DQN with n-step value updates) were performed and added to the paper. The reviews are updated accordingly. + +In summary, the paper contains several novel ideas in the context of learning in partially observable environments. It is not entirely clear similar effects of the proposed algorithm can be obtained by using simpler tricks, but the evidence provided by the authors supports the claim that the algorithm outperforms several SOTA techniques in the context of POMDPS.",ICLR2022, +XpUHcpfpncS,1642700000000.0,1642700000000.0,1,F9McnN1dITx,F9McnN1dITx,Paper Decision,Reject,"The paper addresses an interesting problem, namely how to evolve effective weight and activation update rules for online learning of a recurrent network. The work focuses on two specific tasks: character sequence memorisation and prediction. Two approaches based on meta-gradients and evolutionary strategies are explored. Unfortunately the paper is missing some important related works. Moreover, presentation needs to be improved, as well as experimental assessment should be expanded both in terms of tasks and in terms of comparable models presented in the literature.",ICLR2022, +nIqlNoo4jZR,1642700000000.0,1642700000000.0,1,1oEvY1a67c1,1oEvY1a67c1,Paper Decision,Reject,"The main contribution of the paper is to perform a systematic and large study of self-training as a method to deal with distribution shifts. Reviewers have appreciated the clarity in the overall writing of the paper, and rigor in the empirical analysis. However the main concern from two of the reviewers is that the technical contributions of the paper are only marginal and incremental in nature. The premise that self-learning improves robustness is already somewhat well-established (Reviewer PUq6 has pointed out papers that focus on how self-training / self-learning improves distribution shift and how self-training and pre-training stack together), and the main contribution of the paper is a systematic application to different datasets. Given the existing work on the relevance of self-training in distribution shift, the paper falls below the acceptance bar for ICLR in my opinion.",ICLR2022, +r1rP81pSz,1517250000000.0,1517260000000.0,802,ry9tUX_6-,ry9tUX_6-,ICLR 2018 Conference Acceptance Decision,Reject,"The paper proposes a new analysis of the optimization method called entropy-sgd which seemingly leads to more robust neural network classifiers. This is a very important problem if successful. The reviewers are on the fence with this paper. On the one hand they appreciate the direction and theoretical contribution, while on the other they feel the assumptions are not clearly elucidated or justified. This is important for such a paper. The author responses have not helped in alleviating these concerns. As one of the reviewers points out, the writing needs a massive overhaul. I would suggest the authors clearly state their assumptions and corresponding justifications in future submissions of this work.",ICLR2018, +VMtCB6gNFHs,1642700000000.0,1642700000000.0,1,vqGi8Kp0wM,vqGi8Kp0wM,Paper Decision,Accept (Poster),"This paper proposes a novel method for the single-shot domain adaptation with the help of Generative Adversarial Nets. The proposed method is interesting, novel, and versatile. Moreover, the performance is impressive and better than the existing methods. However, the writing needs some improvement for better readability. More quantitive results should be provided in the revision for completeness.",ICLR2022, +CWlFj65ZqPo,1610040000000.0,1610470000000.0,1,cO1IH43yUF,cO1IH43yUF,Final Decision,Accept (Poster),"This paper addresses some of the well-documented instabilities that can arise from fine-tuning BERT on a dataset with few samples. Through a thorough investigation, they highlight various bizarre behaviors that have a negative impact on stability: First, that BERT inexplicably uses an unusual variant of Adam that, in fact, harms behavior; and second, that people tend to undertrain BERT on some downstream tasks. Separately, they find that reinitializing some of the final layers in BERT can be helpful. Since fine-tuning BERT has become such a common way to attack NLP problems, these practical recommendations will be quite welcome to the community. These findings address issues raised by recent work, so the paper is timely and relevant. The paper has through empirical analysis and is clear to read. There is a concurrent ICLR submission with similar findings, and this paper stands on its own. Reviewers all agreed that this paper should be published.",ICLR2021, +s8aCY1w7xDd,1642700000000.0,1642700000000.0,1,yV4_fWe4nM,yV4_fWe4nM,Paper Decision,Reject,"This paper received a majority voting of rejection. In the internal discussion, no reviewer would like to change the score according to the author response. I have read all the materials of this paper including manuscript, appendix, comments and response. Based on collected information from all reviewers and my personal judgement, I can make the recommendation on this paper, *rejection*. Here are the comments that I summarized, which include my opinion and evidence. + +**Motivation** + +The motivation of this paper is not strong. In this paper, the authors claimed that the fairness level of deep clustering methods is relatively poorly compared with the traditional fair clustering methods. The traditional fair clustering methods employ the hard constraints to achieve fairness by scarifying the cluster utility. Instead, deep fair clustering methods seek the trade-off balance between fairness level and cluster utility; therefore, the deep fair clustering can be regarded to use the soft constraints. There is no necessary to compare two different fairness constraints. Even the proposed method is a trade-off balance between fairness level and cluster utility. + +**Self-augmented Training** + +The relationship between self-augmented learning and fairness learning is unclear. I guess that the authors added this modular simply to enhance the cluster utility. However, such a loss or an operator can be also applied to other (fair) clustering algorithms. The experimental comparisons in Section 5 is unfair. No ablation study on this is provided. + +**Novelty** + +One reviewer pointed out there existed some work that plugs integer linear programming into a probabilistic discriminative clustering model proposed in 2017. + +**Experiments** + +(1) ScFC and DFCV release their codes; no results of these two methods were reported on HAR. (2) No standard deviation. (3) The Initial ILP Results (Ours) and Ours Result in Table 1 on HAR dataset is 0.653 and 0.468, both higher than the Ground Truth (Optimal) 0.458. + +**Presentation** + + A few statements are not well-supported, or require small changes to be made correct. + +No objection from reviewers was raised to again this recommendation.",ICLR2022, +r6KRbe-NFY,1610040000000.0,1610470000000.0,1,I4c4K9vBNny,I4c4K9vBNny,Final Decision,Accept (Poster),"This work proposes a novel network structure, spatial dependency networks that is introduced as an alternative to convolutional neural networks. This new architecture is used successfully to get state of the art performance for a number of common image generation benchmarks when compared with non-autoregressive approach (even much larger CNNs). There is a lot of useful feedback in the reviews themselves: a thing to consider in the final version is the fact that the authors had motivated SDNs as drop-in replacements for CNNs, but do experiments mostly in VAE-like settings. This is a point that was raised by multiple reviewers and is clearly something that should be dealt with as explicitly as possible. + +While there are legitimate reasons to be wary of the increased computation time, I tend to side with the authors that baselines that are being compared with SDNs are likely to have more optimized primitives. From the inference numbers presented in the rebuttal, it doesn't appear like the speed issues are insurmountable. + +Given the high quality of writing, the excellent performance on image density modeling, the various ablations and understanding of the disentangling effects, I think this is an interesting piece of work that the field would benefit from.",ICLR2021, +Ai-5rY9iPU,1576800000000.0,1576800000000.0,1,Skxn-JSYwr,Skxn-JSYwr,Paper Decision,Reject,"This papers proposed a solution to the problem of disease density estimation using satellite scene images. The method combines a classification and regression task. The reviewers were unanimous in their recommendation that the submission not be accepted to ICLR. The main concern was a lack of methodological novelty. The authors responded to reviewer comments, and indicated a list of improvements that still remain to be done indicating that the paper should at least go through another review cycle.",ICLR2020, +rkgVHG5-gN,1544820000000.0,1545350000000.0,1,SkelJnRqt7,SkelJnRqt7,"Good formulation, but needs improvement in presentation and experiments",Reject,"This paper presents a novel technique for separating signals in a given mixture, a common problem encountered in audio and vision tasks. The algorithm assumes that training samples from only one of the sources and the mixture distributions are available, which is a realistic assumption in a lot of cases. It then iteratively learns a model that can separate the mixture by using the available samples in a clever fashion. + +Strengths: +- The novelty lies in how the authors formulate the problem, and the iterative approach used to learn the unknown distribution and thereby improve source separation. +- The use of existing GLO masking techniques for initialization to improve performance is also novel and interesting. + +Weaknesses +- There are some concerns around guarantees of convergence. Empirically, the algorithm works well, but it is unclear when the algorithm will fail. Some analysis here would have greatly improved the quality of the paper. +- The reviewers also raised concerns around clarity of presentation and consistency of notation. While the presentation improved after revision, there are parts which remain unclear (e.g., those raised by R3) that may hinder readability and reproducibility. +- The mixing model assumed by the authors is additive, which may not always be the case, e.g. when noise is convolutive (room reverberation, for instance). +- (Minor) Experiments can also be improved. The vision tasks are not very realistic. For the speech separation task, relatively clean speech is easy to obtain. Therefore, it would be worth considering speech as observed, and noise as unobserved. The authors cite separating animal sounds from background, but the task chosen does not quite match that setup. + +Overall, the reviewers agree that the paper presents an interesting approach to separation. But given the issues with presentation and evaluations, the recommendation is to reject the paper. We strongly encourage the authors to address these concerns and resubmit in the future.",ICLR2019,4: The area chair is confident but not absolutely certain +HJxOG3CxlV,1544770000000.0,1545350000000.0,1,HkxStoC5F7,HkxStoC5F7,meta review,Accept (Poster),"The paper proposes a decision-theoretic framework for meta-learning. The ideas and analysis are interesting and well-motivated, and the experiments are thorough. The primary concerns of the reviewers have been addressed in new revisions of the paper. The reviewers all agree that the paper should be accepted. Hence, I recommend acceptance.",ICLR2019,5: The area chair is absolutely certain +r1eK3RXJxE,1544660000000.0,1545350000000.0,1,SyVU6s05K7,SyVU6s05K7,Metareview,Accept (Poster),"The paper was judged by the reviewers as providing interesting ideas, well-written and potentially having impact on future research on NN optimization. The authors are asked to make sure they addressed reviewers comments clearly in the paper.",ICLR2019,5: The area chair is absolutely certain +OFjHAoyL17,1610040000000.0,1610470000000.0,1,Cb54AMqHQFP,Cb54AMqHQFP,Final Decision,Accept (Poster),"This paper follows the observations of Renda et al. (2020) that the learning rate in the fine-tuning or retraining phase of neural network pruning is an under-considered component of the pruning process. Renda et al. (2020) argue for a technique that uses the learning rate schedule of the original training regime for fine-tuning. However, their work does not offer a hypothesis or an explanation for why this works. + +This work instead offers more insight into why reusing the original learning rate is productive. Specifically, it shows that using high learning rates is the key component. To demonstrate this, the paper includes a study of using the original step-wise learning from the original training regimen, except accelerated for a given number of fine-tuning epochs. The paper also demonstrates that Cyclic Learning Rate Restarting (CLR) also provides an effective, if not better, learning rate schedule for fine-tuning. + +As noted by the reviewers, the core observations and contributions of this work are modest, but are still a valuable addition to the literature in the pruning community. + +Having said that, there are some confounding issues with CLR. Specifically, that CLR itself may simply be a more effective learning rate schedule for training neural networks, independent of the particular application to fine-tuning (Reviewer 1). The revision includes an additional appendix that dispels some of this concern. However, indeed, the CLR does improve the base network performance for some configurations. + +Broadly, the value proposition here is a thorough demonstration of learning rate schedules for fine-tuning with an overall take that comparisons between techniques need be more sensitive to this choice as previous work perhaps has not thoroughly considered alternative learning rates. + +",ICLR2021, +MFt1eD_513,1610040000000.0,1610470000000.0,1,4D4Rjrwaw3q,4D4Rjrwaw3q,Final Decision,Reject,"This paper presents a benchmarking suite, primarily targeting the domain of evolutionary style optimization algorithms, and an effective heuristic algorithm selection procedure ABBO. The reviewers seemed quite split in their reviews with significant variance, particularly with one outlier review (9) lifting up the average. They all felt that there was significant value in the work presented and that the benchmark could be useful for designing and evaluating new methods. However, there were concerns regarding details about the contributions (e.g. a detailed description of ABBO and which contributions to the suite were novel vs obtained from other benchmarks), the relevance of this work to the ICLR community, and choice of algorithms presented (i.e. not SOTA). + +In general, this seems like a useful contribution for the evolutionary algorithm community but this paper seems off-topic from the conference. Certainly optimization is important and of interest to the community. However, there is no machine learning component to the technical contribution of this paper, and it ignores many of the contributions to black-box optimization within this community (see e.g. the citations from AnonReviewer1, and the literature on surrogate-based black-box optimization - i.e. Bayesian optimization). The RL optimization problems are somewhat relevant, but AnonReviewer1 raises concerns about the reporting of those results and the representation of the current literature. There is an algorithm proposed in this work, but it's largely heuristic and no comparison is given to state-of-the-art portfolio optimization algorithms from the machine learning community (e.g. P3BO from Angermueller et al., ICML 2019). A venue such as GECCO seems much more well suited to this work.",ICLR2021, +ScmRHSF_g,1576800000000.0,1576800000000.0,1,BygJKn4tPr,BygJKn4tPr,Paper Decision,Reject,"All reviewers recommend reject, and there is no rebuttal.",ICLR2020, +XFcL8TiJgX,1610040000000.0,1610470000000.0,1,tkAtoZkcUnm,tkAtoZkcUnm,Final Decision,Accept (Poster),"All reviewers tend towards accepting the paper, and I agree.",ICLR2021, +phRs_nG09Mq,1610040000000.0,1610470000000.0,1,O6LPudowNQm,O6LPudowNQm,Final Decision,Accept (Poster),"This work proposes an algorithm for generating training data to train automatic theorem proving models. In particular, it allows users to pose specific generalization challenges to theorem proving models and evaluate their performance. In doing so, it provides a degree of control over the task space that is greater than when working with 'organic' corpora of real theorems and proofs. + + The authors demonstrated the utility of their generated data by training well-known models such as transformers and GNNs, and were able to derive insights such as the value of MCTS-style planning for finding proofs in particular settings. + +After the rebuttal period, all authors agreed that the work was well executed and that the algorithm creates datasets that will be of value to the (learning-based) theorem proving community. As such, they all recommended acceptance to a greater or lesser degree. I am convinced by their arguments, because I think there is real value in using controlled synthetic data alongside real data when making scientific progress on hard problems like theorem proving. I am particularly convinced by the observation that the data generated by this method has already led to improved performance on a real corpus of proofs, as the authors state in their rebuttal. If they have not done so already, I encourage the authors to report this fact in the camera ready version of their paper citing the relevant work. ",ICLR2021, +uyGglX0qW,1576800000000.0,1576800000000.0,1,HJe-oRVtPB,HJe-oRVtPB,Paper Decision,Reject,"The article studies the stability of ResNets in relation to initialisation and depth. The reviewers found that this is an interesting article with important theoretical and experimental results. However, they also pointed out that the results, while good, are based on adaptations of previous work and hence might not be particularly impactful. The reviewers found that the revision made important improvements, but not quite meeting the bar for acceptance, pointing out that the presentation and details in the proofs could still be improved. ",ICLR2020, +zUY-HI-6ANy,1642700000000.0,1642700000000.0,1,PilZY3omXV2,PilZY3omXV2,Paper Decision,Accept (Poster),"The paper proposes to learn disentangled trends and seasonal representations of time series for forecasting tasks. It shows separating the representation learning and downstream forecasting task to be a more promising paradigm than the standard end-to-end supervised training approach for time-series forecasting. + +During the post-rebuttal phase, there were interactions from all the reviewers, and reviewer KrXv raised the score. The reviewers think the contrastive learning method is novel and the added experiments have strengthened the paper. The authors are encouraged to include more standard datasets (M5) in the final version. + +Based on the above reasons, I am recommending accepting this paper.",ICLR2022, +mxD7HMlBEc,1576800000000.0,1576800000000.0,1,HJel76NYPS,HJel76NYPS,Paper Decision,Reject,"The presented work has worse accuracy than existing (and not all the baselines are given correctly) and does not provide the running time comparison. All reviewers recommend rejection, and I am with them.",ICLR2020, +_Omdt8mZ9rQ,1642700000000.0,1642700000000.0,1,cGDAkQo1C0p,cGDAkQo1C0p,Paper Decision,Accept (Poster),"This paper introduces the ""reversible instance normalization"" (RevIN), a method for addressing temporal distribution shift in time-series forecasting. RevIN consists in normalizing (subtracting the mean and dividing by the standard deviation) each layer of of deep neural network in a given temporal window for a given instance, and de-normalizing by introducing learnable shifting and scaling parameters. + +The paper initially received one weak accept and two weak reject recommendations. The main limitations pointed out by reviewers relate to the limited novelty of the approach, the positioning with window normalization methods and hybrid methods in times series, and clarifications on experiments. The authors' rebuttal did a good job in answering the main concerns: rV5fo increased its grade from weak reject to clear accept, and RuPmn maintained its weak acceptance recommendation. + +The AC carefully read the submission. The AC considers that the idea is simple yet meaningful. The large set of experiments are well conducted and conclusive. The rebuttal successfully answers to relevant issues raised by reviewers, regarding ablation studies (for highlighting the importance of the learnable de-normalization), the impact of the temporal window, the comparison to hybrid approaches and the difference with respect to Adaptive normalization. The AC thus acknowledge that this submission draws important take-home messages for the community, and therefore recommends acceptance.",ICLR2022, +ZFL7JCcsQAF,1642700000000.0,1642700000000.0,1,fRb9LBWUo56,fRb9LBWUo56,Paper Decision,Reject,"While all reviewers acknowledge the relevance of such an evaluation work for the MRI reconstruction field, they all agree that the contribution has a limited fit with a ML conference like ICLR. The work is solid experimentally and will surely interest the audience of conferences like ISMRM or MICCAI. For this reason, the work can unfortunately not be endorsed for publication.",ICLR2022, +SpKGoqpcUJ,1576800000000.0,1576800000000.0,1,H1eF3kStPS,H1eF3kStPS,Paper Decision,Reject,"This paper proposes a new graph Hierarchy representation (HAG) which eliminates the redundancy during the aggregation stage and improves computation efficiency. It achieves good speedup and also provide theoretical analysis. There has been several concerns from the reviewers; authors' response addressed them partially. Despite this, due to the large number of strong papers, we cannot accept the paper at this time. We encourage the authors to further improve the work for a future version. + + +",ICLR2020, +B1O78yTHG,1517250000000.0,1517260000000.0,748,HJsk5-Z0W,HJsk5-Z0W,ICLR 2018 Conference Acceptance Decision,Reject,This paper has been withdrawn by the authors.,ICLR2018, +rJeoEo3A1E,1544630000000.0,1545350000000.0,1,r1fWmnR5tm,r1fWmnR5tm,Limited contribution,Reject,"The paper proposes to apply Neural Architecture Search for pruning DenseNet. + +The reviewers and AC note the potential weaknesses of the paper in various aspects, and decided that the authors need more works to publish. ",ICLR2019,5: The area chair is absolutely certain +eGcPQTXG-5,1576800000000.0,1576800000000.0,1,B1lCn64tvS,B1lCn64tvS,Paper Decision,Reject,"SAT is NP-complete (Karp, 1972) due its intractable exhaustive search. As such, heuristics are commonly used to reduce the search space. While usually these heuristics rely on some in-domain expert knowledge, the authors propose a generic method that uses RL to learn a branching heuristic. The policy is parametrized by a GNN, and at each step selects a variable to expand and the process repeats until either a satisfying assignment has been found or the problem has been proved unsatisfiable. The main result of this is that the proposed heuristic results in fewer steps than VSIDS, a commonly used heuristic. + +All reviewers agreed that this is an interesting and well-presented submission. However, both R1 and R2 (rightly according to my judgment) point that at the moment the paper seems to be conducting an evaluation that is not entirely fair. Specifically, VSIDS has been implemented within a framework optimized for running time rather than number of iterations, whereas the proposed heuristic is doing the opposite. Moreover, the proposed heuristic is not stressed-test against larger datasets. So, the authors take a heuristic/framework that has been optimized to operate specifically well on large datasets (where running time is what ultimately makes the difference) scale it down to a smaller dataset and evaluate it on a metric that the proposed algorithm is optimized for. At the same time, they do not consider evaluation in larger datasets and defer all concerns about scalability to the one of industrial use vs answering ML questions related to whether or not it is possible to “stretch existing RL techniques to learn a branching heuristic”. This is a valid point and not all techniques need to be super scalable from iteration day 0, but this being ML, we need to make sure that our evaluation criteria are fair and that we are comparing apples to apples in testing hypotheses. As such, I do not feel comfortable suggesting acceptance of this submission, but I do sincerely hope the authors will take the reviewers' feedback and improve the evaluation protocols of their manuscript, resulting in a stronger future submission.",ICLR2020, +-1lBPmGErV,1610040000000.0,1610470000000.0,1,LjFGgI-_tT0,LjFGgI-_tT0,Final Decision,Reject,"This paper aims at improving the adoption of Bayesian NNs by providing a practical and user friendly variational inference method. The main ideas consist of two parts: +1. Warm-start the variational inference from a pre-trained deterministic NN. It takes advantage of existing deep learning library features for easy implementation including weight decay, batch matrix multiplication, etc. +2. Calibrating uncertainty estimation for out-of-domain detection using adversarial examples. + +Pros: +1. A practical way of implementing DNN variational inference with reduced variance, without sacrificing classification accuracy of the pretrained NN model. +2. Significantly better OOD detection accuracy compared to other BNN approaches without taking OOD into account explicitly. + +Cons: +1. During discussion, it becomes clear that most of the techniques have been proposed similarly in the literature. Krishnan, 2020 applied BNN starting from MAP of NN, Flipout (Wen et. al., 2018) applies instance-wise sampling, Hendrycks et. al., 2018 and Hafner et. al., 2018 improves detection accuracy by training on OOD examples. The novelty of the proposed method is therefore limited. +2. There's not much benefit on the classification performance compared to the initial MAP and is inferior to MCMC-based SOTA BNNs. One of the reviewers considers the SGLD-type approach may be more appealing to ML practitioners with the overhead of VI in training additional variance parameters. +3. The authors argue MCMC-based BNN methods cannot achieve good performance without temperature scaling. But the main performance improvement of the paper is in the OOD detection with uncertainty regularization that modifies the posterior as well. The method of training with OOD samples is orthogonal to applying Bayesian inference to NNs, and the detection performance is limited to the distribution close to examples during training. + +This paper falls on the borderline for acceptance. With the goal of improving adoption of BNN in practice, it is not convincing yet making mean field VI easier to implement could realize it without achieving competitive performance. +",ICLR2021, +RYHraUEpYv9H,1642700000000.0,1642700000000.0,1,yhCp5RcZD7,yhCp5RcZD7,Paper Decision,Accept (Poster),"This paper provides a normal map-inspired implicit surface representation involving a smooth surface whose high frequency detail comes from normal displacements. Reviewers were impressed with the results and theoretical discussion in the paper. The AC agrees. + +The authors were responsive to reviewer feedback and addressed some questions about parameter choice during the rebuttal phase, including new experiments/discussion in the supplementary document. Note the response to reviewer WHEF notes that the authors will be releasing data/code; the AC strongly hopes the authors are true to their word in that regard. + +The AC chose to disregard some comments from reviewer G54X regarding tests with noise, as this method appears to be tuned to computer graphics applications; the level of empirical work here aligns with past work in the area. Of course the authors are encouraged to include some tests responding to the reviewer comments in the camera ready. The AC also found the score from reviewer WHEF to be somewhat uncalibrated with the tone of their review, but of course their assessment is quite positive nonetheless. + + +One small comment: The abstract appears a bit strangely on the OpenReview site because of line breaks; if possible, please remove the line breaks. + +Another small comment: The ""spectral shape representation"" phrase used a bit in the discussion below might not be advisable, as this phrase typically refers to the intrinsic spectrum of a shape (e.g. Laplace-Beltrami analysis)",ICLR2022, +BydeLJpSM,1517250000000.0,1517260000000.0,711,HJg1NTGZRZ,HJg1NTGZRZ,ICLR 2018 Conference Acceptance Decision,Reject,"Pros: ++ The idea of end-to-end training that simultaneously learns the weights and appropriate precision for those weights is very appealing. + +Cons: +- Experimental results are far from the state-of-the-art, which makes the empirical evaluation unconvincing. +- More justification is needed for the update of the number of bits using the sign of the gradient. +",ICLR2018, +HuUJbZQLe7w,1642700000000.0,1642700000000.0,1,dtpgsBPJJW,dtpgsBPJJW,Paper Decision,Reject,"The paper seeks to improve straight-through estimators by combining them with the ideas for correcting the step direction to be closer to a natural gradient. + +While some (modest) improvements are demonstrated experimentally, the paper critically lacks technical correctness and has quite some gaps when trying to derive the algorithm from the natural gradient and Rao-Cramer bound. See public comments by reviewers and AC. The algorithm ends up to be a mirror descent with a mirror map, which is cheap to compute but not particularly well motivated. Moreover application of mirror descent to the activations (unlike the weights) is not well justified. The paper is rather unclear and hard to read also language-wise. Please proofread _before_ submitting.",ICLR2022, +rPClWhIXrf,1576800000000.0,1576800000000.0,1,BJxWx0NYPr,BJxWx0NYPr,Paper Decision,Accept (Poster),This paper is consistently supported by all three reviewers during initial review and discussions. Thus an accept is recommended.,ICLR2020, +M55Yc67LKOG,1642700000000.0,1642700000000.0,1,ZFIT_sGjPJ,ZFIT_sGjPJ,Paper Decision,Reject,"The idea to adapt the noise variance in the certification of a base classifier sounds natural and interesting, but unfortunately fundamentally flawed, as correctly pointed out by Reviewer viFi (also acknowledged in the authors' response): the author's main algorithm does not lead to any theoretical certification while the empirical fix (based on memory), however successful in one's experiment, does not rule out the possibility of failure when future test samples flood in. Incidentally, I believe this fallacy may have also answered Reviewer Xsdx's question (why this has not been done before). I agree with Reviewer viFi that the writing of this work is a bit deceptive and will require significant change. In particular, one cannot wave hands at claims on certification: you need to formally prove the memory-based empirical fix will provably certify a region for what classifier and under what assumption. Therefore, the current draft cannot be accepted. Please consider rethinking about the idea and rewriting the paper according to the reviewers' comments.",ICLR2022, +Zo0gLSCHFWj,1642700000000.0,1642700000000.0,1,Fh_NyEuejsZ,Fh_NyEuejsZ,Paper Decision,Reject,"This paper received scores of 6,6,6 after the reviewers succeeded in making two authors raise their scores from 5 to 6. However, even after this, none of the reviewers actively argued for the paper. The only positive point raised in the private discussion was that the results are strong. (However, there is still the question of how much of this was due to the different design space used.) +Negative points raised in the private discussion included that +- despite the authors clarification on the differences to Zen-NAS, the difference is perceived not to be large. +- there is no theoretical foundation behind the selection of a critical parameter, and this directly limits the applicability of ZenDet in searching for FPN connections. +- as a paper focused on detection NAS, a limitation to only search for the backbone may not be novel enough for publication at ICLR. + +Overall, I agree with this criticism and weakly recommend rejection.",ICLR2022, +#NAME?,1610040000000.0,1610470000000.0,1,6k7VdojAIK,6k7VdojAIK,Final Decision,Accept (Poster),"I think this is a very solid and good work in the topic of ""Practical Massive Parallel MCTS."" I think it will be good to open up perspectives among ICLR's audience going beyond just Deep Learning and Machine Learning. I also noted a lot of positive comments during the evaluation and discussion period. + +Still, it was a borderline case and not an easy decision (primarily because of the concerns raised by R3 towards the end of the discussion period). In the end the program committee decided that the paper does meet the bar. We think that the work is interesting and original, though not without weaknesses. +",ICLR2021, +tMq_Zy_e7wg,1610040000000.0,1610470000000.0,1,Rld-9OxQ6HU,Rld-9OxQ6HU,Final Decision,Reject,"The paper proposes a variant of recurrent neural networks based on Long Short-Term Memory. Unlike the standard LSTM, the proposed mass-conserving LSTM subtracts the output hidden state of the LSTM from its current cell state, thus preserving the ""mass"" stored in the cell states at each step. A left-stochastic recurrent weight matrix is also used to conserve the ""mass"" across the time steps. Empirical experiments demonstrated the effectiveness of the proposed MC-LSTM on a range of datasets such as addition & arithmetic tasks, traffic forecast, and rainfall modeling models. + +Several issues were clarified during the rebuttal period in a way that satisfied the reviewers. However, some concerns still remain unanswered: + +1) This is an empirical paper that proposes a modified LSTM that brings forward a few different ideas: L1 norm, stochastic transition matrices, and subtracting the output hidden states. An ablation study is a MUST in such an applied work. It has been pointed out by other reviewers that there are many prior references on LSTMs variants. It would greatly strengthen the paper by considering more diverse baselines. There is no experiment nor discussion on how much each modification helps wrt the final accuracy. Thus it remains unclear how the results can generalize to other problems. + +2) Although the results seem convincing across various datasets that mass conservation seems to help, the datasets are non-standard benchmarks in the machine learning conferences thus there is a lack of competitive prior baselines. As the proposed LSTM has a different number of parameters compared to the standard LSTM, is it fair to compare the different architectures under the same number of neurons? What happens if we compare the architectures with the same number of parameters? And how well does the model scale as we vary the hidden size? It would be helpful to keep the contributions into perspective by using standard RNN benchmark datasets such as Penn TreeBank or Wiki-8. + +Overall, the basic idea seems interesting, but the lack of ablation studies significantly hurt the contribution and the positioning of the paper. Given the current submission, the paper needs further development, and non-trivial modifications, to be broadly appreciated by the machine learning community. + +",ICLR2021, +BzL8yF1I5b,1642700000000.0,1642700000000.0,1,d2TT6gK9qZn,d2TT6gK9qZn,Paper Decision,Accept (Poster),The paper provides a unique contribution that uses Padde approximations to approximate non-linear operators for solving initial value problems in PDEs. The paper contains also a non-trivial experiment with a real-world dataset that showcase the impact of the proposed model. The authors have provided a strong rebuttal and therefore I recommend Accept.,ICLR2022, +ivyXrtBB200,1642700000000.0,1642700000000.0,1,eqaxDZg4MHw,eqaxDZg4MHw,Paper Decision,Reject,"This paper presents an empirical study of generalization in visual reinforcement learning. This study is carried out in the domain of video games and it addresses the benefits of techniques such as regularization, augmentation and training with auxiliary tasks. The reviewers for this submission were positive about the goal and setups in this paper. They agreed that understanding why present day methods that attempt to improve generalization continue to fall short, is an important problem. However, most reviewers were underwhelmed by the findings presented in the submission. As examples: Reviewer 185P mentions that ""the paper does not seem to provide a clear and definite answer to the question"" and "" I am not convinced the experiments described in this paper support the claims made by the authors."" and Reviewer SFef mentions that ""Most of the conclusions are already known"". Some reviewers also found a lack of clarity and several typos in the initial submission. The authors have provided detailed responses to the reviewers. In particular they have fixed most writing issues. They also detailed why certain algorithms and techniques were benchmarked in this submission and others were left out. I think this is reasonable. One cannot expect a paper to benchmark every algorithm out there, and choosing promising and representative ones is sufficient. My takeaway from detailed discussions about this paper are that: The paper is much improved from a writing point of view and the rebuttal addresses some concerns well. However, I do agree with the reviewers that the findings presented in the paper are for the most part expected. This reduces the value of the paper to readers. When this is the case, it may be beneficial to dig deeper into these findings and present a narrow but deep analysis. Please see Reviewer 185P's suggestions in this regard. Given the above, I am recommending rejection for this conference, but I encourage the authors to take into the reviewers suggestions and resubmit.",ICLR2022, +Utd6Q53q0wu,1610040000000.0,1610470000000.0,1,aGfU_xziEX8,aGfU_xziEX8,Final Decision,Accept (Poster),"This article proposes latent variable augmentation scheme for inference in nonlinear multivariate Hawkes processes. It combines existing approaches (Polya-gamma and sparsity-inducing variables) in a sensible way and is clearly written. Concerns were raised with respect to the comparison to alternative baselines, and answered by the authors. As a result, some reviewers have increased their score, and I recommend acceptance. +",ICLR2021, +sIDO7B-T7Hz,1610040000000.0,1610470000000.0,1,PsdsEbzxZWr,PsdsEbzxZWr,Final Decision,Reject,"This paper conducts a theoretical and empirical analysis of the Generative Adversarial Training method (GAT). Although many comments have been addressed in the rebuttal, the reviewers still have few (but important) concerns, including the memorization effects and the lack of comparisons. ",ICLR2021, +iGxapUXt0zc,1642700000000.0,1642700000000.0,1,e0TRvNWsVIH,e0TRvNWsVIH,Paper Decision,Reject,"In this paper, the problem of identifying a low-dimensional latent space for high-dimensional Bayesian optimization (BO) is considered. In particular, the authors focus on the problem of collision, where different points in the original space become identical in the latent space, and propose a regularization method to avoid this problem. Latent space identification for high-dimensional Bayesian optimization is an interesting and the authors' approach sounds reasonable. However, many reviewers pointed out that the discussion and results in the paper do not provide sufficient evidence for the authors' claims. Therefore, we have to conclude that the paper cannot be accepted at this time.",ICLR2022, +v5MIY_-8NlEZ,1642700000000.0,1642700000000.0,1,EYCm0AFjaSS,EYCm0AFjaSS,Paper Decision,Reject,"This paper suggests an architecture with a deterministic initialization which has only 0/1 values. +The reviewers were mostly (marginally) negative, mainly because of the low novelty and significance of this work. + +Specifically, the main novelty issues were: +1) Improving convergence speed and removing BatchNorm: was already done, in a quite similar manner, and it achieves better or similar results. (Fixup , ReZero: https://arxiv.org/abs/1901.09321, https://arxiv.org/abs/2003.04887, and few others as well) +2) Initializing a network with deterministic initialization: was also done (ConstNet, https://arxiv.org/abs/2007.01038). I think the main difference from the previous work is the additional Hadamard connections, which help break the symmetry. However, it is unclear what is the benefit of this modification, as the previous work could train without it (albeit on CIFAR). + +Specifically, the main significance issues were: +1) Reducing standard deviation: The authors' response confirmed there is no statistically significant benefit (p=~ 0.1) for variance reduction when comparing with Kaiming initialization for ImageNet. +2) General network performance: The results do not seem better than the baseline (Xavier init is not a proper baseline in a network with ReLUs). +3) Sparsity claims: The network appears to be losing accuracy even with 20% sparsity, which isn't even useful for efficiency. For comparison, the lottery ticket hypothesis showed you can get to 90% sparsity and get better results. So, this is a nice observation, but not a major contribution. + +Therefore, I recommend the authors to better distinguish themselves from previous works (What are the changes? Why are these important?), and improve their empirical results so they highlight the usefulness of the suggested method (e.g., improve the SOTA in some benchmark).",ICLR2022, +KkQlEXGQQU,1610040000000.0,1610470000000.0,1,#NAME?,#NAME?,Final Decision,Accept (Poster),"This paper proposes replacing the softmax of deep NNs with a kernel-based Gaussian mixture model, to allow for per-class multi-modality. Results show that the method is competitive with other output modifications such as the large-margin softmax. + +The two primary concerns of the reviewers were the lack of large-scale image classification results and theoretical guarantees. The authors have added CIFAR-100 results. Moreover, the authors agree that theoretical results would be nice to have, but such results are non-trivial and likely require a PAC-Bayes treatment. + +I find the method to be well-motivated and that the paper demonstrates sufficient experimental rigor. Given the popularity of the softmax throughout deep learning, this paper will likely be of interest---or at least, be of potential use---to a large part of the ICLR community. I encourage the authors to add the ImageNet results to the final version. +",ICLR2021, +g-9p59N2O2M,1610040000000.0,1610470000000.0,1,sbyjwhxxT8K,sbyjwhxxT8K,Final Decision,Reject,"This paper relates the problem of influence maximization and adversarial attacks on GCNs. +The paper, and its formulation and assumptions stirred up quite a discussion among the reviewers and the authors. I do appreciate the thorough rebuttal that the authors provided, and the reviewers did take it into account (and revised their scores). +However, all in all, I am afraid that there are just a few too many concerns with this paper. +If the authors take the reviews to heart, they should be able to improve the manuscript and submit a stronger and improved version to the next conference. ",ICLR2021, +#NAME?,1610040000000.0,1610470000000.0,1,XoF2fGAvXO6,XoF2fGAvXO6,Final Decision,Reject,"This paper proposes a weakly supervised model for numerical reasoning. After discussion with the reviewers it seems that it is already known that training NMNs directly on DROP is not successful and requires taking additional measures. Past work (NERD) has resorted to using data augmentation, and this work encodes it directly to the model. This paper needs to show the advantages of their approach and that it generalizes better to other scenarios. Other minor issues include (a) clarity fo writing (b) focus on a subset of questions (c) no evaluation on other numerical datasets (d) mild inaccuracies w.r.t prior work (GenBERT)",ICLR2021, +Byg7RfXlJN,1543680000000.0,1545350000000.0,1,BJeapjA5FX,BJeapjA5FX,More convincing experiments are needed,Reject,"All three reviewers feel that the paper needs to provide more convincing results to support their robustness claim, in addition to a number of other issues that need to be clarified/improved. The authors did not provide any response. ",ICLR2019,5: The area chair is absolutely certain +nT8HK4UQWIXY,1642700000000.0,1642700000000.0,1,avgclFZ221l,avgclFZ221l,Paper Decision,Accept (Oral),"This paper proposes asymmetry learning for learning counterfactual classifiers, i.e. classifiers which are invariant to certain symmetry transformations w.r.t. hidden variables that differ between the training and test sets. + +The reviewers universally agreed that the proposed setting, and theoretical contribution, were interesting and novel. They also praised the writing quality, but had some quibbles about the quality of the experiments, and discussion of prior work. Neither of these concerns were considered significant enough to be a barrier to acceptance, but the authors should try to improve them, if possible.",ICLR2022, +By8QQ1aBM,1517250000000.0,1517260000000.0,103,SJ-C6JbRW,SJ-C6JbRW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper provides a game-based interface to have Turkers compete to analyze data for a learning task over multiple rounds. Reviewers found the work interesting and clear written, saying ""the paper is easy to follow and the evaluation is meaningful."" They also note that there is clear empirical benefit ""the results seem to suggest that MTD provides an improvement over non-HITL methods."" They also like the task compared to synthetic grounding experiments. There was some concern about the methodology of the experiments but the authors provide reasonable explanations and clarification. + +One final concern that I hope the readers take into account. While the reviewers were convinced by the work and did not require it, I feel like the work does not engage enough with the literature of crowd-sourcing in other disciplines. While there are likely some unique aspects to ML use of crowdsourcing, there are many papers about encouraging crowd-workers to produce more useful data. ",ICLR2018, +GNfm11hKcGn,1610040000000.0,1610470000000.0,1,U6Xpa5R-E1,U6Xpa5R-E1,Final Decision,Reject,"This is a creative piece of work wherein learning of what is normally family-specific Potts models is turned into an amortized optimization problem across different families of proteins. The Potts models are learned with a pseudolikelihood approach, and the evaluation of the model against baselines is performed only on a contact prediction problem. This last point is problematic, because on the one hand, the authors use this ""as a proxy for the underlying accuracy of the Potts model learned"", and on the other hand, claim that ""we do not want to claim our method is state-of-the-art for contact prediction, it is certainly not"". Overall, the paper is promising, but is too preliminary on the empirics to warrant publication at this time.",ICLR2021, +AJdOM_ahPX,1610040000000.0,1610470000000.0,1,q-qxdClTs0d,q-qxdClTs0d,Final Decision,Reject,"Loosely, while IRM aims to find a feature mapping Phi s.t. response Y given Phi(X) is independent of the environment variables E, they suggest that when E is strongly correlated with Y, then it is possible for Phi obtained via IRM to involve environment variables. They motivate this by suggesting that if there exists a feature mapping Phi(X) = E, it would satisfy the IRM aim, but that this is undesirable. + +They suggest instead requiring Phi(X)|Y being invariant to the environment. + +The reviewers bring up a couple of concerns. The first is that it is not clear outside some simple examples when Y given Phi(X) being independent of E does not suffice. The second is that the authors also do not empirically validate their fix outside a single simple dataset. Moreover, what are the pitfalls of having Phi(X) given Y being independent of E? + +Overall, this is an interesting kernel of an idea; it just needs to be fleshed out a bit more. +",ICLR2021, +6mTBiKb8l_K,1610040000000.0,1610470000000.0,1,T58qDGccG56,T58qDGccG56,Final Decision,Reject,"This paper studies the training of multi-branch networks, i.e., networks formed by linearly combining multiple disjoint branches of the same architecture. The four reviewers seem to reach a consensus that the paper is not ready for publication for ICLR. ",ICLR2021, +KpX5uMudwp,1576800000000.0,1576800000000.0,1,S1eRbANtDB,S1eRbANtDB,Paper Decision,Accept (Poster),All reviewers come to agreement that this is a solid paper worth publishing at ICLR; the authors are encouraged to incorporate additional comments suggested by reviewers.,ICLR2020, +KhDO2mnMiSD,1610040000000.0,1610470000000.0,1,Rw_vo-wIAa,Rw_vo-wIAa,Final Decision,Reject,"All reviewers expressed interest in this promising approach, but raised questions that were not addressed by the authors during the discussion period. As concerns raised included insufficient repeats of empirical experiments to draw conclusions and the paper appearing to be in an early draft format, we cannot support acceptance for publication at this time. I strongly encourage the authors to act on the feedback given to improve the paper for a future submission. ",ICLR2021, +kRaJGtFOTZ,1642700000000.0,1642700000000.0,1,shbAgEsk3qM,shbAgEsk3qM,Paper Decision,Accept (Poster),"This paper presents a study of the over parametrization of linear representations in the context of recursive value estimation. + +The reviewers could not reach a consensus over the quality of the paper, with a fairly wide range of scores even after the rebuttal. + +After considering the paper, the rebuttal, and the discussion, I lean towards accepting the paper. Despite the concerns voiced by some of the reviewers, the topic and analysis of the manuscript are novel and interesting, and it is my expectation that this manuscript will prove a valuable source of inspiration for future work. + +I invite the authors to carefully consider the feedback received by all the reviewers (and in particular Reviewers xq3y and gT5o and) and to revisit the manuscript accordingly.",ICLR2022, +S6Kp2IVPQBP,1610040000000.0,1610470000000.0,1,Fo6S5-3Dx_,Fo6S5-3Dx_,Final Decision,Reject,"This work combines deep generative models (variational autoencoders, FragVAE) and multi-objective evolutionary computation for molecular design. They use a multilayer perceptron as a predictor for properties. Evolutionary operations are used to explore the latent space of the generative model to produce novel competitive molecules. Experiments are executed to show the effectiveness of the proposed method with respect to Bayesian optimization-based methods. + +Strengths: + +1 - Combines multi-objective evolutionary computation and deep generative modeling, which is a promising approach to tackle multi-objective optimization in structured spaces. + +Weaknesses: + +All the reviewers agree that the paper is not yet ready for publication. They point out the following areas to improve: + +1 - The lack of details and clarity in the method. + +2 - The experimental section needs to be improved. The evaluation metrics and baselines are weak. + +3 - Describe better and more clearly the novelty of the proposed approach with respect to previous work in the area.",ICLR2021, +EuFDwiiWK47,1610040000000.0,1610470000000.0,1,dmCL033_YwO,dmCL033_YwO,Final Decision,Reject,"One referee supports acceptance, whereas three referees lean towards rejection. All referees agree that the idea introduced in the paper is interesting but find that the motivation and evaluation of the proposed aggregation functions could be significantly strengthened. The rebuttal addresses R1's concerns about novelty and unfair comparisons, R2's concerns about computational efficiency of the methods, R3's concerns about motivation of the proposed approach and some missing baselines, and R4's concerns about motivation. However, the rebuttal does not address the reviewers' concerns related to improvements achieved by the proposed approach, statistical significance nor appropriate comparison with SOTA. I agree with the reviewers that the paper tries to address a relevant problem and proposes interesting ideas, which are worth exploring. However, after discussion, the referees agree that further work should be devoted to strengthen the contribution. I agree with their assessment and hence must reject. In particular, I would strongly recommend to follow their suggestions to either provide strong theoretical motivation to support the claims of the paper or work on a strengthened empirical evaluation, following OGB guidelines to report the std of the results and including a proper comparison with the state of the art. ",ICLR2021, +e-Bp40tRmPx,1642700000000.0,1642700000000.0,1,siCt4xZn5Ve,siCt4xZn5Ve,Paper Decision,Accept (Spotlight),All the reviewers agree that this paper made a solid contribution of understanding the algorithmic regularization of SGD noise (in particular the label noise for regression) after reaching zero loss. The framework is novel and has the potential to extend to other settings.,ICLR2022, +B1wXTGIdl,1486400000000.0,1486400000000.0,1,r1YNw6sxg,r1YNw6sxg,ICLR committee final decision,Accept (Poster),"All reviewers viewed the paper favourably, although there were some common criticisms. In particular, the demonstration would be more convincing on a more difficult task, and this seems like an intermediate step on the way to an end-to-end solution. There were also questions of being able to reproduce the results. I would strongly recommend that the authors take this suggestions into account.",ICLR2017, +lJTwbSyIQV7,1610040000000.0,1610470000000.0,1,vyY0jnWG-tK,vyY0jnWG-tK,Final Decision,Accept (Poster),"The consensus of the reviews is to accept the paper. I agree. + +Reviewers highlighted many strengths, including a compelling main idea: +* R5: ""The paper presents an interesting and motivating case for Bayesian inference in probabilistic generative models: a problem that has inherent uncertainty along with the ability to incorporate domain knowledge that can reduce the inference complexity."" +* R3: ""Overall, the idea is interesting and supported by correct mathematical derivations and experimental proofs of concept."" +* R4: ""the generative approach is novel. Adding domain knowledge is relevant and significant when dealing with real world applications"" + +As well as compelling experiments, substantially improved in the discussion period: +* R1: ""The authors have shown some promising results in modeling particle dynamics."" +* R5: ""The addition of Appendix H, in my opinion, considerably strengthens the paper's story and case for acceptance. [... T]he authors have addressed most of my major concerns."" + +And clear writing: +* R5: ""In general, the paper is well written (apart from some higher-level structural issues discussed below) and the notation is clear and unambiguous."" +* R4: ""The paper is very well written, clear"" + +The main weaknesses highlighted were in experiments (lacking good baselines, as well as ablations), and in discussing some choices in the model's construction. These were effectively addressed in the discussion (though R5 still points to some places that could be improved).",ICLR2021, +9vzZVL0Y5,1576800000000.0,1576800000000.0,1,HJlrS1rYwH,HJlrS1rYwH,Paper Decision,Reject,"The consensus amongst the reviewers is that the paper discusses an interesting idea and shows significant promise, but that the presentation of the initial submission was not of a publishable standard. While some of the issues were clarified during discussion, the reviewers agree that the paper lacks polish and is therefore not ready. While I think Reviewer #3 is overly strict in sticking to a 1, as it is the nature of ICLR to allow papers to be improved through the discussion, in the absence of any of the reviewers being ready to champion the paper, I cannot recommend acceptance. I however have no doubt that with further work on the presentation of what sounds like a potentially fascinating contribution to the field, the paper will stand a chance at acceptance at a future conference.",ICLR2020, +mPe8At1XUcA,1610040000000.0,1610470000000.0,1,mCtadqIxOJ,mCtadqIxOJ,Final Decision,Accept (Poster),"Four expert reviewers (after much discussion, in which the authors seemed to do a pretty good job addressing a lot of the initial complaints) unanimously voted to accept this paper. + +Everyone seemed to agree that the idea was interesting, and it is indeed interesting. +There were generally complaints about benchmarking; there always are for papers about program synthesis. + +One complaint I have, but that I didn't really see mentioned, is that the system as described is pretty baroque. +I have a hard time imaging how you'd scale something like this up to more complicated contexts, +and honestly I'm not sure even in some of the contexts where it was tested if it would really outperform a well-engineered +top-down synthesizer. +Maybe this is just an aesthetic preference that only I have, and maybe ideas need to start out overly complicated +before the most useful bits can be extracted from them and refined. + +At any rate, I do think that this paper gives a cool new research contribution and that people will want to read it, so I am recommending acceptance. ",ICLR2021, +-5p4ADkiR,1576800000000.0,1576800000000.0,1,BkljIlHtvS,BkljIlHtvS,Paper Decision,Reject,"This paper presents a number of experiments involving the Model-Agnostic Meta-Learning (MAML) framework, both for the purpose of understanding its behavior and motivating specific enhancements. With respect to the former, the paper argues that deeper networks allow earlier layers to learn generic modeling features that can be adapted via later layers in a task-specific way. The paper then suggests that this implicit decomposition can be explicitly formulated via the use of meta-optimizers for handling adaptations, allowing for simpler networks that may not require generic modeling-specific layers. + +At the end of the rebuttal and discussion phases, two reviewers chose rejection while one preferred acceptance. In this regard, as AC I did not find clear evidence that warranted overriding the reviewer majority, and consistent with some of the evaluations, I believe that there are several points whereby this paper could be improved. + +More specifically, my feeling is that some of the conclusions of this paper would either already be expected by members of the community, or else would require further empirical support to draw more firm conclusions. For example, the fact that earlier layers encode more generic features that are not adapted for each task is not at all surprising (such low-level features are natural to be shared). Moreover, when the linear model from Section 3.2 is replaced by a deep linear network, clearly the model capacity is not changed, but the effective number of parameters which determine the gradient update will be significantly expanded in a seemingly non-trivial way. This is then likely to be of some benefit. + +Consequently, one could naturally view the extra parameters as forming an implicit meta-optimizer, and it is not so remarkable that other trainable meta-optimizers might work well. Indeed cited references such as (Park & Oliva, 2019) have already applied explicit meta-optimizers to MAML and few-shot learning tasks. And based on Table 2, the proposed factorized meta-optimizer does not appear to show any clear advantage over the meta-curvature method from (Park & Oliva, 2019). Overall, either by using deeper networks or an explicit trainable meta-optimizer, there are going to be more adaptable parameters to exploit and so the expectation is that there will be room for improvement. Even so, I am not against the message of this paper. Rather it is just that for an empirically-based submission with close ties to existing work, the bar is generally a bit higher in terms of the quality and scope of the experiments. + +As a final (lesser) point, the paper argues that meta-optimizers allow for the decomposition of modeling and adaptation as mentioned above; however, I did not see exactly where this claim was precisely corroborated empirically. For example, one useful test could be to recreate Figure 2 but with the meta-optimizer in place and a shallower network architecture. The expectation then might be that general features are no longer necessary.",ICLR2020, +rktnN1pHz,1517250000000.0,1517260000000.0,441,H1OQukZ0-,H1OQukZ0-,ICLR 2018 Conference Acceptance Decision,Reject,"This paper presents an update to the method of Franceschi 2017 to optimize regularization hyperparameters, to improve stability. However, the theoretical story isn't so clear, and the results aren't much of an improvement. Overall, the presentation and development of the idea needs work.",ICLR2018, +rkxq4m9Me4,1544890000000.0,1545350000000.0,1,rJNwDjAqYX,rJNwDjAqYX,meta-review,Accept (Poster),"The authors have extended previous publications on curiosity driven, intrinsically motivated RL with this broad empirical study on the effectiveness of the curiosity algorithm on many game environments, the merits of different feature sets, and limitations of the approach. The paper is well-written and should be of interest to the community. The experiments are well conceived and seem to validate the general effectiveness of curiosity. However, the paper does not actually have any novel contribution compared against prior work, and there are no great insights or takeaways from the empirical study. Therefore, the reviewers were somewhat divided on how confident they were that the paper should be accepted. Overall, the AC agrees that it is a valuable paper that should be accepted even though it does not deliver any algorithmic novelty.",ICLR2019,5: The area chair is absolutely certain +ByYKUJ6Bf,1517250000000.0,1517260000000.0,831,HkbmWqxCZ,HkbmWqxCZ,ICLR 2018 Conference Acceptance Decision,Reject,"This is a well-written paper that aims to address an important problem. However, all the reviewers agreed that the experimental section is currently too weak for publication. They also made several good suggestions about improving the paper and the authors are encouraged to incorporate them before resubmitting.",ICLR2018, +SJIqoGI_x,1486400000000.0,1486400000000.0,1,S1J0E-71l,S1J0E-71l,ICLR committee final decision,Reject,"Based on the feedback, I'm going to be rejecting the paper on the following grounds: + 1. Results are not SOTA as reported. + 2. No real experiments other than cursory experiments on Hutter prize data. + 2. Writing is very poor. + + However, just for playing devil's advocate, to the reviewers, I would like to point out that I am in agreement with the author that dynamic evaluation is not equivalent to this method. The weights are not changed in this model, as far as I can see, for the test set. Surprisal is just an extra input to the model. I think the reviewers were puzzled by the fact that at test time, the actual sequence needs to be known. While this may be problematic for generative modeling, I do not see why this would be a problem for language modeling, where the goal of the model is only to provide a log prob to evaluate how good a sequence of text is. Long before language modeling started being used to generate text, this was the main reason to use it - in speech recognition, spelling correction etc..",ICLR2017, +KV7wpLNDP3z,1610040000000.0,1610470000000.0,1,Iz3zU3M316D,Iz3zU3M316D,Final Decision,Accept (Poster),"Clarity: The paper is well-written with illustrative figures. + +Originality: The originality of the paper is relatively restricted, mainly due to the resemblance with the work [1]. However, there are important differences, that the authors nicely pointed out, and we encourage them to include these in the final version of the paper. + +Significance: The paper points out a relevant issue in using normalization techniques such as batch normalization together with momentum-based optimization algorithms in training deep neural networks. While the paper could be considered ""another algorithms for training NNs"", the papers illustrates nicely the main arguments, and is backed up with more than sufficient experimental results. + +Main pros: +- In the main pros, AC and reviewers admit the phenomenal job in responding to reviewers' questions and requests +- The paper provides experimental results on various tasks and datasets to demonstrate the advantage of the proposed method. +- After the reviews, The authors also reinforced their empirical investigation by reporting standard deviation of the results, which allows to better appreciate the performances of SGDP and AdamP. Finally, they also added the experiments with higher weight decay, showing that indeed 1e-4 was the best value. + +Main cons: +- One reviewer requires more explanation why the proposed update in equation (12) yields smaller norms ||w_{t+1}|| than the momentum-based update in equation (8).",ICLR2021, +mXRTHjxmZ1o,1642700000000.0,1642700000000.0,1,OWZVD-l-ZrC,OWZVD-l-ZrC,Paper Decision,Accept (Poster),"This is a borderline paper. The scores were initially below the bar. The novelty of the work is limited and there are strong claims in the paper that should be revised. The authors can also do a better job in positioning their work with respect to the existing results. However, the authors managed to address several questions/concerns of the reviewers and convince them to raise their scores. I would strongly recommend the authors to address the rest of the reviewers' comments, especially those related to strong claims and connection to related work, and further improve their work in preparing its final draft.",ICLR2022, +teAI1SstlZ,1576800000000.0,1576800000000.0,1,SJefPkSFPr,SJefPkSFPr,Paper Decision,Reject,"The authors take inspiration from regulatory fit theory and propose a new parameter for policy gradient algorithms in RL that can manage the ""regulatory focus"" of an agent. They hypothesize that this can affect performance in a problem-specific way, especially when trading off between broad exploration and risk. The reviewers expressed concerns about the usefulness of the proposed algorithm in practice and a lack of thorough empirical comparisons or theoretical results. Unfortunately, the authors did not provide a rebuttal, so no further discussion of these issues was possible; thus, I recommend to reject.",ICLR2020, +MvB-OdYjmpl,1642700000000.0,1642700000000.0,1,K2JfSnLBD9,K2JfSnLBD9,Paper Decision,Accept (Poster),"All reviewers appreciate the suggested EM approach to goal-conditioned long-horizon reinforcement learning, and the technical contributions of the paper. While there is a mix in ratings, even the most critical reviewers feels that the paper has clear merits and is acceptable, and there are two solid acceptance recommendations. Overall, the papers significantly meets the standards of an ICLR paper acceptance.",ICLR2022, +B1ftQyarM,1517250000000.0,1517260000000.0,182,rkN2Il-RZ,rkN2Il-RZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper initially received borderline reviews. The main concern raised by all reviewers was a limited experimental evaluation (synthetic only). In rebuttal, the authors provided new results on the CelebA dataset, which turned the first reviewer positive. The AC agrees there is merit to this approach, and generally appreciates the idea of compositional concept learning.",ICLR2018, +FrllGZ72D6n,1610040000000.0,1610470000000.0,1,UfJn-cstSF,UfJn-cstSF,Final Decision,Reject,"The paper received mixed reviews, with one review voting for acceptance, one strongly opposed, and two borderline ones. The discussion essentially involved R1 and R2, who gave the most informative reviews. After discussion, they did not update their score, even though they appreciated the work and effort done by the authors during the rebuttal. + +In short, the paper has some merit, but several concerns were raised, which the area chair agrees with, leading to a rejection recommendation. The innovation was found to be limited and the discussion between practice and theory (meaning assumptions made in this work) are not discussed in a convincing manner, and these concerns remained after the rebuttal. The experiments were also subject to improvements. + +It is however likely that with a major revision, this work may become publishable to a another venue.",ICLR2021, +2xGCbkP3_E,1576800000000.0,1576800000000.0,1,H1gB4RVKvB,H1gB4RVKvB,Paper Decision,Accept (Poster),"All the reviewers recommend accept, and the found the paper interesting and novel. ",ICLR2020, +S1xxKLOQlE,1544940000000.0,1545350000000.0,1,rkgKBhA5Y7,rkgKBhA5Y7,Interesting analysis and insights into SWA for semisupervised learning,Accept (Poster),All reviewers appreciate the empirical analysis and insights provided in the paper. The paper also reports impressive results on SSL. It will be a good addition to the ICLR program. ,ICLR2019,5: The area chair is absolutely certain +Ye4T_ZHy9N8,1642700000000.0,1642700000000.0,1,EnwCZixjSh,EnwCZixjSh,Paper Decision,Accept (Poster),"The paper argues that existing evaluation metrics for GGMs are insufficient and perform an extensive empirical study questioning their ability to measure the diversity and fidelity of the generated graphs. To solve these limitations, they propose a new evaluation metric that computes the Maximum Mean Discrepancy (MMD) between graph representations of the sampled and real graphs, as extracted from an untrained GGM model. + +All the reviewers agreed that the research problem is interesting and the overall idea behind the proposed metric is sound and novel. While there were some concerns regarding some details/comparisons/conclusions of the experimental evaluation, the rebuttal managed to cleared up these concerns and all the reviewers eventually supported acceptance.",ICLR2022, +kh36lwVQiS4,1642700000000.0,1642700000000.0,1,k9bx1EfHI_-,k9bx1EfHI_-,Paper Decision,Accept (Poster),"This work tackles an important clinical application. It is experimentally solid and investigates +novel deep learning methodologies in a convincing way. + +For these reasons, this work is endorsed for publication at ICLR 2022.",ICLR2022, +ZCcoumT7i_,1576800000000.0,1576800000000.0,1,BygPq6VFvS,BygPq6VFvS,Paper Decision,Reject,"This paper proposes a phrase-based attention method to model word n-grams (as opposed to single words) as the basic attention units. Multi-headed phrasal attentions are designed within the Transformer architecture to perform token-to-token and token-to-phrase mappings. Some improvements are shown in English-German, English-Russian and English-French translation tasks on the standard WMT'14 test set, and on the one-billion-word language modeling benchmark. + +While the proposed approach is interesting and takes inspiration in the notion of phrases used in phrase-based machine translation, with some positive empirical results, the technical novelty of this paper is rather limited, and the experiments could be more solid. While it is understandable that lack of computational resources made it hard to experiment with larger models (e.g. Transformer-big), perhaps it would be interesting to try on language pairs with fewer resources (smaller datasets), where base models are more competitive.",ICLR2020, +7-hu9o_Y1I,1610040000000.0,1610470000000.0,1,OmtmcPkkhT,OmtmcPkkhT,Final Decision,Accept (Poster),"The paper proposes multiplicative filter networks (GaborNet and FourierNet) as functional approximations of deepnets. The proposed networks are a sequence of multiplications linear functions of sinusoidal or Gabor filters. The authors show that in some cases the performance of proposed networks outperforms the existing deepnets using ReLu activations. This representation is notably simpler as well. Moreover, compared to classical Fourier approach, the proposed method scales to higher dimensions in practice as well. +The downside of the paper is that it is not clear how to empirically use exponentially many Fourier functions. Moreover, proposed methods have more parameters, and the additional parameters are linear in size of the hidden layer. + +The paper is clearly written and the authors improved the quality of the paper and added additional experiments to support their claim through the review process and I appreciate that.",ICLR2021, +LmXmaIilPiv,1610040000000.0,1610470000000.0,1,s4D2nnwCcM,s4D2nnwCcM,Final Decision,Reject,"All reviewers agree that the current approach is very similar to traditional uncertainty-based active learning, and that the empirical results are inconclusive, so at this point the paper is not ready for publication.",ICLR2021, +k2q6zxzttUC,1610040000000.0,1610470000000.0,1,WiGQBFuVRv,WiGQBFuVRv,Final Decision,Accept (Spotlight),"This paper proposes an approach to probabilistic time series forecasting based on combining autoregressive deep learning models with normalizing flows. In terms of strengths, time series forecasting is a fundamental problem. The proposed approach is a reasonable combination of existing model components that provides a flexible, end-to-end trainable framework for multivariate probabilistic forecasting. The experiments are well-conducted and the results outperform recently published methods. While the reviewers raised a number of questions, all of the reviewers agree that their questions have be answered satisfactorily by the authors during the discussion and the paper should be accepted. The authors should be sure to incorporate the reviewer suggestions and author responses into the final paper. ",ICLR2021, +H1ebD4WQgV,1544910000000.0,1545350000000.0,1,ryxSrhC9KX,ryxSrhC9KX,Reviewer consensus is accept,Accept (Poster),"The reviewers viewed the work favorably, with only one reviewer providing a score slightly below acceptance. The authors thoroughly addressed the reviewer's original concerns, and they adjusted their score upwards afterwards. The low-rating reviewer remains skeptical of the significance of the work, but the other two reviewers make firm cases for the appeal of the work to the ICLR audience. In follow-up discussion after the author's responses were submitted and discussed, the low-rating reviewer did not make a clear case for rejecting the paper, and further, the higher-rating reviewers' arguments for the impact of the paper were convincing. Therefore, I recommend accepting this paper.",ICLR2019,4: The area chair is confident but not absolutely certain +Mpb1Rjjd7H,1576800000000.0,1576800000000.0,1,S1et1lrtwr,S1et1lrtwr,Paper Decision,Reject,"The paper discusses the relevant topic of unsupervised meta-learning in an RL setting. The topic is an interesting one, but the writing and motivation could be much clearer. I advise the authors to make a few more iterations on the paper taking into account the reviewers' comments and then resubmit to a different venue.",ICLR2020, +S1xVl8vPlE,1545200000000.0,1545350000000.0,1,r1eEG20qKQ,r1eEG20qKQ,"A useful approach to hyperparameter tuning, promising results",Accept (Poster),"The paper proposes an approach to hyperparameter tuning based on bilevel optimization, and demonstrates promising empirical results. Reviewer's concerns seem to be addressed well in rebuttals and extended version of the paper.",ICLR2019,4: The area chair is confident but not absolutely certain +6uCHmWKy_o,1576800000000.0,1576800000000.0,1,HJgkj0NFwr,HJgkj0NFwr,Paper Decision,Reject,The paper is on the borderline. A rejection is proposed due to the percentage limitation of ICLR. ,ICLR2020, +4BcjwBb4dp,1610040000000.0,1610470000000.0,1,SUyxNGzUsH,SUyxNGzUsH,Final Decision,Reject,"The authors propose a neural module based approach for reasoning about video grounding. The goal is to provide performance and interpretability. Unfortunately, the reviewers found the paper opaque, the results confusing, and expressed repeated concerns about the novelty, fairness of comparisons and concerns that the surprising results were not sufficiently well justified by the paper (or the author's response).",ICLR2021, +7ccTtZVyR-c,1610040000000.0,1610470000000.0,1,3jJKpFbLkU2,3jJKpFbLkU2,Final Decision,Reject,"The paper had four borderline reviews with none enthusiastic about championing the merits of the paper. While it was felt that the extension of an existing technique to deep learning via amortization is a useful procedure, it is also not very novel and the experiments didn't demonstrate a significant leap in performance.",ICLR2021, +z2W23eUZK7h,1642700000000.0,1642700000000.0,1,z8j0bPU4DIw,z8j0bPU4DIw,Paper Decision,Reject,"This paper presents the use of scalable evolution strategies (S-ES) in hierarchical reinforcement learning. After reviewing the paper and reading the comments from the reviewers, here are my comments: + +- The proposal is quite novel. It requires major improvements to clearly state how this proposal contributes in the field. +- The main concern is about the experimental results. There are some flaws in the comparative results. Also, they do not support the proposal.",ICLR2022, +rJx1naFxeN,1544750000000.0,1545350000000.0,1,ByxBFsRqYm,ByxBFsRqYm,meta-review,Accept (Poster),"The paper presents a new deep learning approach for combinatorial optimization +problems based on the Transformer architecture. The paper is well written +and several experiments are provided. A reviewer asked for more intuition to +the proposed approach and authors have responded accordingly. Reviewers are +also concerned with scalability and theoretical basis. +Overall, all reviewers were positives in their scores, and I recommend accepting the paper.",ICLR2019,4: The area chair is confident but not absolutely certain +a8Ktx4Avo7C,1642700000000.0,1642700000000.0,1,xw04RdwI2kS,xw04RdwI2kS,Paper Decision,Reject,"Summary: This paper studies an inverse (linear) contextual bandits (ICB) problem, where, given a $T$-round realization of a bandit policy’s actions and observed rewards, the goal is to design an algorithm to estimate the underlying environment parameter, along with the “belief trajectory” of the bandit policy. A particular emphasis is placed on the belief trajectory being “interpretable” and capturing changes in the policy’s “knowledge of the world” over time. + +The paper’s main contributions are (i) formalizing the inverse contextual bandits problem, (ii) designing two algorithms for this problem based on two different ways of modelling beliefs of the bandit policy, and (iii) providing empirical illustrations of how their algorithm can be used to investigate and explain changes in medical decision-making over time + +Discussion: This paper has received high quality, long and detailed reviews that highlighted some flaws, in particular in the well-posedness of the problem and the clarity of the writing. The authors' response was long and detailed as well, and its quality was recognized by the committee. +However, the consensus is that this work would require a full pass allowing to include most of the feedback received in the main text rather than in appendices, to discuss related problems in the literature in more depth and perhaps to refocus the exposition on the problem considered. + +Recommendation: Reject.",ICLR2022, +rk_K4JprM,1517250000000.0,1517260000000.0,399,H1DkN7ZCZ,H1DkN7ZCZ,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"Authors present a method for representing DNA sequence reads as one-hot encoded vectors, with genomic context (expected original human sequence), read sequence, and CIGAR string (match operation encoding) concatenated as a single input into the framework. Method is developed on 5 lung cancer patients and 4 melanoma patients. + + +Pros: +- The approach to feature encoding and network construction for task seems new. +- The target task is important and may carry significant benefit for healthcare and disease screening. + +Cons: +- The number of patients involved in the study is exceedingly small. Though many samples were drawn from these patients, pattern discovery may not be generalizable across larger populations. Though the difficulty in acquiring this type of data is noted. +- (Significant) Reviewer asked for use of public benchmark dataset, for which authors have declined to use since the benchmark was not targeted toward task of ultra-low VAFs. However, perhaps authors could have sourced genetic data from these recommended public repositories to create synthetic scenarios, which would enable the broader research community to directly compare against the methods presented here. The use of only private datasets is concerning regarding the future impact of this work. +- (Significant) The concatenation of the rows is slightly confusing. It is unclear why these were concatenated along the column dimension, rather than being input as multiple channels. This question doesn't seem to be addressed in the paper. +Given the pros and cons, the commitee recommends this interesting paper for workshop.",ICLR2018, +Do9mqq9fUx,1576800000000.0,1576800000000.0,1,SJe9qT4YPr,SJe9qT4YPr,Paper Decision,Reject,"The paper attacks the important problem of learning time series models with missing data and proposes two learning frameworks, RISE and DISE, for this problem. The reviewers had several concerns about the paper and experimental setup and agree that this paper is not yet ready for publication. Please pay careful attention to the reviewer comments and particularly address the comments related to experimental design, clarity, and references to prior work while editing the paper.",ICLR2020, +rlKxugojQLJi,1642700000000.0,1642700000000.0,1,9qKAGxS1Tq2,9qKAGxS1Tq2,Paper Decision,Reject,"This paper attempts to rationalize data augmentation techniques for compositional generalization by evoking the principle of meaningful +learning which posits that learning new concepts builds on previously learned concepts (which learners already understand). So for compositional generalization, this means that a model exposed to some new concepts in the test set, should link them to known concepts which have been already attested in the training set. The links between concepts are presumed to be semantic, e.g., hyponyms, hypernyms, or lexical variants. Ideally, a model should perform semantic linking on its own, however the authors do not propose a linking mechanism. Rather they investigate data augmentation as a way of exposing a model to semantic links and then explore whether different operationalizations of semantic linking enable the model to generalize better. Inductive learning is a bottom-up approach, where links are created from general to specific concepts, whereas deductive learning is a top-down approach where links are created from specific to general concepts. Experimental results indicate that inductive learning works better. + +The reviewers had the following issues with the submission (a) the technical contribution is not very strong (the idea of data augmentation is not new, although the authors' meaningful learning perspective is) (b) semantic linking seems to be able to handle only cases pertaining to lexical generalization (even though the authors include examples with structural generalization in their splits, there is no reason why semantic linking could handle these cases); (c) it would be more interesting/useful to learn the linking than assume it is given. The authors did their best to respond to the criticism, but ultimately addressing the criticism is future work. I would also recommend to take a look at this dataset which might be useful for machine learning experiments: https://arxiv.org/abs/2105.14802",ICLR2022, +Syx8VyaHG,1517250000000.0,1517260000000.0,351,B1lMMx1CW,B1lMMx1CW,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"Meta score: 6 + +This is a thorough empirical paper, demonstrating the effectiveness of a relatively simple model for recommendations: + +Pros: + - strong experiments + - always good to see simple models pushed to perform well + - presumably of interest to practioners in the area + +Cons: + - quite oriented to the recommendation application + - technical novelty is in the experimental evaluation rather than any new techniques + +On balance I recommend the paper is invited to the workshop.",ICLR2018, +O-knKXhTY,1576800000000.0,1576800000000.0,1,Syetja4KPH,Syetja4KPH,Paper Decision,Reject,"This paper combines DQN and Randomized value functions for exploration. + +All the reviewers agreed the paper is not yet ready for publication. The experiments lack appropriate baselines and thus it is unclear how this new approach improves exploration in Deep RL. The reviewers also found some of the algorithmic design decisions unintuitive and unexplained. The authors main response was the objective was to improve and compare against vanilla DQN. This could be a valid goal, but it requires clear motivation (perhaps the focus is on simply algorithms that are commonly used in applications or something). Even then comparisons with other methods would be of interest to quantify how much the base algorithm is improved, and to justify empirically all the design decisions that went into building such an improvement (performance vs complexity of implementation etc). + +The reviewers gave nice suggestions for improvements. This is a good area of study: keep going!",ICLR2020, +r1SBLyTrf,1517250000000.0,1517260000000.0,773,By9iRkWA-,By9iRkWA-,ICLR 2018 Conference Acceptance Decision,Reject,"Generally solid engineering work but a bit lacking in terms of novelty and some issues with clarity. At the end of the day the empirical gains are not sufficient for acceptance - the results are state-of-the-art relative to published work, but not in the top 10 based on the official leaderboard (not even at time of submission). Since the technical contributions are small and the engineering contributions have been made obsolete by concurrent work, I suggest rejection.",ICLR2018, +7K9x-69-X1j,1642700000000.0,1642700000000.0,1,7x_47XJULn,7x_47XJULn,Paper Decision,Reject,"Meta Review of Federated Learning with Heterogeneous Architectures using Graph HyperNetworks + +This work investigates a method for federated learning in a neural architecture-agnostic setting. They do this by using a graph hypernetwork to predict the weights of given neural network architectures (which is not exactly known at the onset). The authors conduct federated learning experiments to demonstrate good performance on several real datasets, and also showed that the trained GHN model can generalize (somewhat) to unseen architectures (which are mainly in the ResNet family). Personally, as AC, I find the results very promising, and the experiments show that GHNs are highly applicable to real world applications. But the reviewers outline several weaknesses in the discussion that makes it difficult to recommend acceptance of this paper for ICLR 2022. + +The main weaknesses of the work are that application is mainly focused on a narrow family of ResNet architectures (can it be shown to go beyond this? If not, can the writing be improved to show that this is useful enough for many applications?) Reviewer U48w suggested improvements to the generalization experiments, and other details that can be addressed in the writing. Reviewer Tk9o mentioned that this work can be seen as a straightforward application of GHNs (limited novelty), while other reviewers do acknowledge the novelty of the work. I recommend improving the writing to clearly address this and defend why this is not a straightforward application of previous work. With these improvements, I'm confident that this work will be accepted at a future ML conference or journal. + +Even though I cannot recommend acceptance, both myself and other reviewers are looking forward to seeing improved versions of this work for publication in the future. As jPp2 also noted, “Previous works on federated learning either focus on the mechanism of parameter aggregation or the aspect of privacy. This paper opens a new direction in FL where clients may not be willing to share their unique model designs. From this perspective, I think this paper has promising impact on the research field of FL.” Good luck!",ICLR2022, +3jR4YK2-Fq,1576800000000.0,1576800000000.0,1,HJgdo6VFPH,HJgdo6VFPH,Paper Decision,Reject,"This paper presents OmniNet, an architecture based on the popular transformer for learning on data from multiple modalities and predicting on multiple tasks. The reviewers found the paper well written, technically sound and empirically thorough. However, overall the scores fell below the bar for acceptance and none of the reviewers felt strongly enough to 'champion' the paper for acceptance. The primary concern cited by the reviewers was a lack of strong baselines, i.e. comparison to other methods for multi-task learning. Unfortunately, as such the recommendation is to reject. However, adding a thorough comparison to existing literature empirically and in the related work would make this a much stronger submission to a future conference.",ICLR2020, +HJx_lyzelV,1544720000000.0,1545350000000.0,1,H1xQVn09FX,H1xQVn09FX,novel approach with good results shown by extensive evaluation,Accept (Poster),"1. Describe the strengths of the paper. As pointed out by the reviewers and based on your expert opinion. + +- novel approach to audio synthesis +- strong qualitative and quantitative results +- extensive evaluation + +2. Describe the weaknesses of the paper. As pointed out by the reviewers and based on your expert opinion. Be sure to indicate which weaknesses are seen as salient for the decision (i.e., potential critical flaws), as opposed to weaknesses that the authors can likely fix in a revision. + +- small grammatical issues (mostly resolved in the revision). + +3. Discuss any major points of contention. As raised by the authors or reviewers in the discussion, and how these might have influenced the decision. If the authors provide a rebuttal to a potential reviewer concern, it’s a good idea to acknowledge this and note whether it influenced the final decision or not. This makes sure that author responses are addressed adequately. + +No major points of contention. + +4. If consensus was reached, say so. Otherwise, explain what the source of reviewer disagreement was and why the decision on the paper aligns with one set of reviewers or another. + +The reviewers reached a consensus that the paper should be accepted. +",ICLR2019,4: The area chair is confident but not absolutely certain +1N_h-fr9_k1,1642700000000.0,1642700000000.0,1,Ub1BQTKiwqg,Ub1BQTKiwqg,Paper Decision,Reject,"This submission proposes a method for learning sparse DNNs which consists of three components: First, a ""dense"" network is maintained and updated in each backwards pass, but the forward pass is done via a sparsified version of the network; sparsification is done via ""soft"" thresholding; and the sparsity ratio is increased over the course of training. Reviewers noted that each of these components had been previously proposed, and that the state-of-the-art baselines are not actually state-of-the-art anymore. They also noted that the paper read more like a draft and needs substantial improvement. The consensus was therefore to reject.",ICLR2022, +nPc6aXSFYQ5,1610040000000.0,1610470000000.0,1,SP5RHi-rdlJ,SP5RHi-rdlJ,Final Decision,Reject,"This paper compresses neural networks via so called Sparse Binary Neural Network designs. All reviewers agree that the paper has limited novelty. Experiments are only performed on small datasets with simple neural networks. However, even with toy experiments, results are very weak. There is no comparison with the SOTA. Numerous related works are missed by the authors. Besides, the paper is poorly written, and there are misleading notations.",ICLR2021, +Ogtx83cWJX0,1610040000000.0,1610470000000.0,1,ZwZ3sc0qad,ZwZ3sc0qad,Final Decision,Reject,The consensus view was that the reviewers were not convinced that the analysis done in the paper was sufficient motivated.,ICLR2021, +rkkcLy6Hf,1517250000000.0,1517260000000.0,836,HJ4IhxZAb,HJ4IhxZAb,ICLR 2018 Conference Acceptance Decision,Reject,"In general, this seems like a sensible idea, but in my opinion the empirical results do not show a very compelling margin between using *entropy* as an active learning selection criterion vs the proposed methods. The difference is small enough that in practice it is very hard for me to believe that many researchers would choose to use the meta-learning via deep RL method (given that they'd need to train on multiple datasets and tune REINFORCE which is not going to be obviously easy). For that reason I am inclined to reject the paper. + +In a follow-up version, I would heed the advice of Reviewer 1 and do more ablation analyses to understand the value of myopic vs non-myopic, cross-dataset vs. not, bandits vs RL, on the fly vs not (these are all intermingled issues). The relative lack of such analyses in the paper does not help in terms of it passing the bar.",ICLR2018, +O0M7CJNQxdX,1610040000000.0,1610470000000.0,1,npOuXc85I5k,npOuXc85I5k,Final Decision,Reject,"This paper aims to address the robustness issues by considering natural accuracy, sensitivity-based robustness and spatial robustness at the same. However, the reviewers pointed out that many things, like the expriment, the presentation, the algorithm, are not clear. In addition, the technique part is weak and below the bar of ICLR.",ICLR2021, +GaJqTZ0vce,1642700000000.0,1642700000000.0,1,famc03Gg231,famc03Gg231,Paper Decision,Reject,"The paper introduces a method to solve inverse problems: given y, find x such that P(x)=y, for a given physical simulator P. A standard approach is to learn a neural net such that the inverse x=NN(y;\theta). The authors state that this is problematic because it is difficult to take ""higher order"" gradient information into consideration when using this standard approach. The method assumes that there is an approximate inverse solver inv(P) and discusses an alternative ""Physical Gradient"" objective that can incorporate knowledge of an approximate inv(P) and a neural network. The experiments are good though comparing performance on an iteration basis is not always fair since an iteration of the PG method can be much more expensive than standard approaches. + +The biggest issue that reviewers had was the clarity of the presentation. The authors have made a reasonable attempt to correct this, but I'm inclined to agree with the general reviewer sentiment that the presentation is still not at the required level. I agree that there are many things that are not clear, including the confusing discussion in section 2.1 about how the method takes higher order information into consideration. It only becomes partially transparent later in the experiments what is meant by higher order information. + +Overall, I feel this is the basis of a potentially valuable contribution but that the current presentation is quite confusing. As mentioned by others, I would also suggest to find a different name since Physical Gradient is also rather misleading. + +The following points were not part of the review process and I do not base the final decision on them, but the authors may want to consider the following: + +I believe there is also an error in the basic approach, or at least an approximation is made which is not explained. The error is that the approximate inv(P) depends also on the parameter \theta (since this is used to initialise inv(P)). This dependency is not taken into consideration in the paper. For example, in theorem 1 in appendix A, the calculation of the gradient dM/d\theta is incorrect since the authors assume that inv(P) is independent of \theta, which it is not (since the preconditioner value depends on \theta). If we do take this into account, we would need to know the derivative of inv(P|x) with respect to the preconditioner x. This dependency would alter the gradient, potentially considerably. The gradient in figure 2 for the PG is also incorrect. One may of course simply say that the paper discounts this correction term in order to retain tractability; however, this would need to be stated as an approximation.",ICLR2022, +Byjd3zLdg,1486400000000.0,1486400000000.0,1,Bkfwyw5xg,Bkfwyw5xg,ICLR committee final decision,Reject,"Reviewers agree that the findings are not clear enough to be of interest, though the effort to do a controlled study is appreciated.",ICLR2017, +SkeCKvqfl4,1544890000000.0,1545350000000.0,1,BJesDsA9t7,BJesDsA9t7,Lack of sufficient justification of the privacy definition,Reject,"Following the unanimous vote of the reviewers, this paper is not ready for publication at ICLR. A significant concern is that the definition of privacy used here is not adequately justified. This opens up issues of: 1) possible attacks, 2) privacy-guarantees that are not worst-case, among others. ",ICLR2019,5: The area chair is absolutely certain +Tt6vG5Z-dfP,1610040000000.0,1610470000000.0,1,sy4Kg_ZQmS7,sy4Kg_ZQmS7,Final Decision,Accept (Poster),"The reviewers appreciated the paper's applied neural net approach to the problem of designing features for 2SLS regression for IV analysis as an alternative to sieve approaches. The paper would make a good contribution to ICLR. While the paper does not focus on theory -- learning data-driven features appears to be mostly heuristic -- it should still be grounded in a sound approach to the IV problem, and the reviewers recommend various important technical clarifications regarding the foundations of IV models; the authors should implement these suggestions very carefully and correctly in future versions. For example, even if the structural models are well-specified in that Eq. (5) holds for some parameters, since the dependence is non-linear on parameters, it is not clear when we should expect this to be identifying of (theta_X,theta_Z) (these are in fact not identifiable in general) and when we should expect the proposed method to be consistent.",ICLR2021, +HkxACit1gV,1544690000000.0,1545350000000.0,1,HJlEUoR9Km,HJlEUoR9Km,No reviewer has championed accepting this paper,Reject,"No reviewer has made a strong case for accepting this paper or championed it so I am recommending rejecting it. The unfavorable reviewers, although they mention real issues, have not highlighted some of the most important barriers to accepting this work. + +One major, but not necessarily dispositive, concern is that the paper only presents results on MNIST. However, even if we put aside this concern, there are several issues with the motivation and approach of this paper. If this technique is actually good at improving the model outside the clean image distribution, then the paper should show that and not just L2 worst case perturbations. To quote the intro of the paper: ""How can deep learning systems successfully generalise and at the same time be extremely vulnerable to minute changes in the input?"" The answer is: they don't generalize and this work does not show us improved generalization. Even a small amount of test error in the data distribution suggests that the closest test error to a given point will often be quite close to the starting point, although this is easier to see with linear models. The best way to fix this work would be to study (average case) error on noisy distributions (as in the concurrent submission https://openreview.net/forum?id=S1xoy3CcYX ).",ICLR2019,5: The area chair is absolutely certain +SJluFssAJ4,1544630000000.0,1545350000000.0,1,HkgxasA5Ym,HkgxasA5Ym,Limited experiments,Reject,"The paper studies the problem of uncertainty estimation of neural networks and proposes to use Bayesian approach with noice contrastive prior. + +The reviewers and AC note the potential weaknesses of experimental results: (1) lack of sufficient datasets with moderate-to-high dimensional inputs, (2) arguable choices of hyperparameters and (3) lack of direct evaluations, e.g., measuring network calibration is better than active learning. + +The paper is well written and potentially interesting. However, AC decided that the paper might not be ready to publish in the current form due to the weakness.",ICLR2019,4: The area chair is confident but not absolutely certain +mGdyvxNGYH,1576800000000.0,1576800000000.0,1,H1ezFREtwH,H1ezFREtwH,Paper Decision,Accept (Poster),"This paper considers deep reinforcement learning skill transfer and composition, through an attention model that weighs the contributions of several base policies conditioned on the task and state, and uses this to output an action. The method is evaluated on several Mujoco tasks. + +There were two main areas of concern. The first was around issues with using equivalent primitives and training times for comparison methods. The second was around the general motivation of the paper, and also the motivation for using a BiRNN. These issues were resolved in a comprehensive discussion, leaving this as an interesting paper that should be accepted.",ICLR2020, +ONZVM38Bum,1576800000000.0,1576800000000.0,1,B1eBoJStwr,B1eBoJStwr,Paper Decision,Reject,"This paper proposes a method for semi-supervised semantic segmentation through consistency (with respect to various perturbations) regularization. While the reviewers believe that this paper contains interesting ideas and that it has been substantially improved from its original form, it is not yet ready for acceptance to ICLR-2020. With a little bit of polish, this paper is likely to be accepted at another venue.",ICLR2020, +S1gEJ4jAkV,1544630000000.0,1545350000000.0,1,BylQV305YQ,BylQV305YQ,Good-quality paper,Accept (Poster),The reviewers that provided extensive and technically well-justified reviews agreed that the paper is of high quality. The authors are encouraged to make sure all concerns of these reviewers are properly addressed in the paper.,ICLR2019,5: The area chair is absolutely certain +HJPShzU_g,1486400000000.0,1486400000000.0,1,BJ0Ee8cxx,BJ0Ee8cxx,ICLR committee final decision,Reject,"This paper was reviewed by three experts. While they find interesting ideas in the manuscript, all three point to deficiencies (unconvincing results, etc) and unanimously recommend rejection.",ICLR2017, +zLF1G4aDBmO,1610040000000.0,1610470000000.0,1,VNJUTmR-CaZ,VNJUTmR-CaZ,Final Decision,Reject,"This paper presents a GNN architecture for policies that solve multi-robot task allocation problems. The proposed architecture extends Koul et al (2019) by adding payload constraints and task deadlines. The paper looks at routing problems of medium-to-large size, e.g. 20 robots and 200 tasks. The reviewers are happy that most of their concerns were addressed by they are still concerned that the experimental validation is focusing too much on the multi-TSP or Vehicle Routing Problems, and request more extensive experimental validation on similar optimization problems as in Nunes et al (2017). I tend to agree. The proposed method has a lot of merit and just needs one more iteration of improvements to incorporate further experiments, before it becomes ready for publication. + + ",ICLR2021, +wR36i6sKUCl,1642700000000.0,1642700000000.0,1,xwAw8QZkpWZ,xwAw8QZkpWZ,Paper Decision,Reject,"The paper provides an algorithmic framework to accelerate RL through Behavioral Priors, while having some notion of safety incorporated. The reviewers are divided about this paper: + +On the positive side, some of the reviewers consider the problem important, and the experimental results reasonable and promising. + +On the negative side, reviewers raised issues such as +1) The paper is on a heuristic side. +2) No formal guarantee on the safety is provided. +3) The paper is not as self-contained as it should be, as it relies much on Singh et al. (2021). +4) The algorithm requires access to unsafe offline data. + +I do not give the same weights to all these concerns. For example, even though (d) is an issue in some applications, it is alright for others. What concern me most are (1) and (2). + +A method for safety that is only evaluated empirically and does not have any formal guarantee cannot be used for safety critical tasks. I realize that some other published papers may have the same issue. But given that this is a real concern, and that two out of four reviewers believe that the paper should not be accepted, unfortunately I cannot recommend acceptance of this paper, especially given the competitiveness of this conference. + +P.S: I also noticed that in the proof of Proposition 3.1, an expectation term $E[p_\phi(a|s,c)]$ in Eq. (9) is replaced by a $\log p_\phi(a|s,c)$. This requires more justifications.",ICLR2022, +SkClpMUOe,1486400000000.0,1486400000000.0,1,B16dGcqlx,B16dGcqlx,ICLR committee final decision,Accept (Poster),"pros: + - new problem + - huge number of experimental evaluations, based in part on open-review comments + + cons: + - the main critiques related to not enough experiments being run; this has been addressed in the revised version + + The current reviewer scores do not yet reflect the many updates provided by the authors. + I therefore currently learn in favour of seeing this paper accepted.",ICLR2017, +19ysPpJsAMm,1642700000000.0,1642700000000.0,1,hqkN6lE1fFQ,hqkN6lE1fFQ,Paper Decision,Reject,"This paper extends the recent work on continuous-domain sparse attention mechanisms to use kernel parametrizations, and thus allow more flexible multi-modal shapes. Continuous attention extends the standard attention mechanisms to continuous-valued key/value/query functions, involving integrals over probability measures instead of sums over softmax-weighted sums. + +Kernel methods fit very well in the framework and provide great expressivity. Reviewers agree it is an interesting and well-motivated idea. The contribution of incorporating kernel families in continuous attention seems substantially novel in comparison to the previous work on the topic. + +The main concern, however, is that the paper focuses too much on the theory and not enough on the modeling benefits enabled by flexible kernels. I would stress that this isn't a question of *improving performance* purely (although quantitative results would help!) but perhaps more of qualitative results, demonstrating e.g. multimodality, selectivity, interpretability. + +I very much look forward to a revised version, which I expect would be a strong paper.",ICLR2022, +r2-uZTcpRb,1576800000000.0,1576800000000.0,1,rJx0Q6EFPB,rJx0Q6EFPB,Paper Decision,Reject,"This paper proposes a new distillation-based method for using large pretrained models like BERT to produce much *smaller* fine-tuned target-task models. + +This paper is low-borderline: It has merit and meets our basic standards, but owing to capacity limitations we had to give preference to papers we see as having a higher potential impact. Reviewers had some concerns about experimental design, but those seem to have been fully resolved after discussion. Reviewers were not convinced, even after some discussion, that the method and results were sufficiently novel and effective to have a substantial impact on the state of practice in this area.",ICLR2020, +hB_YYeNde1s,1642700000000.0,1642700000000.0,1,PDYs7Z2XFGv,PDYs7Z2XFGv,Paper Decision,Accept (Poster),The paper presents a simple and effective solution to tune the receptive field of CNNs for 1D time series classification. The reviewers think the idea is original and elegant but would appreciate more theoretical insights into the solution.,ICLR2022, +HkYRIJTHM,1517250000000.0,1517260000000.0,899,rkw-jlb0W,rkw-jlb0W,ICLR 2018 Conference Acceptance Decision,Reject,"Dear authors, + +While the reviewers appreciated your analysis, they all expressed concerns about the significance of the paper. Indeed, given the plethora of GAN variants, it would have been good to get stronger evidence about the advantages of the Dudley GAN. Even though I agree it is difficult to provide a clean comparison between generative models because of the lack of clear objectives, the LL on one dataset and images generated is limited. For instance, it would have been nice to show robustness results as this is a clear issue with GANs.",ICLR2018, +BJe-g33-gE,1544830000000.0,1545350000000.0,1,HygQBn0cYm,HygQBn0cYm,Paper decision,Accept (Poster),"Reviewers are in a consensus and recommended to accept after engaging with the authors. Please take reviewers' comments into consideration to improve your submission for the camera ready. +",ICLR2019,4: The area chair is confident but not absolutely certain +Kg-j6npq7E,1576800000000.0,1576800000000.0,1,HJeRveHKDH,HJeRveHKDH,Paper Decision,Reject,"The authors introducing programming puzzles as a way to help AI systems learn about reasoning. The authors then propose a GAN-like generation algorithm to generate diverse and difficult puzzles. + +This is a very novel problem and the authors have made an interesting submission. However, at least 2 reviewers have raised severe concerns about the work. In particular, the relation to existing work as pointed by R2 was not very clear. Further, the paper was also lacking a strong empirical evaluation of the proposed ideas. The authors did agree with most of the comments of the reviewers and made changes wherever possible. However, some changes have been pushed to future work or are not feasible right now. + +Based on the above observations, I recommended that the paper cannot be accepted now. The paper has a lot of potential and I would strongly encourage a revised submission addressing the questions/suggestions made by the reviewers.",ICLR2020, +ZMlXwySIXhr,1642700000000.0,1642700000000.0,1,xOeWOPFXrTh,xOeWOPFXrTh,Paper Decision,Reject,"The paper has received 5 reviews with 4 advocating for rejection (marginal or clear cut) and one borderline leaning towards a weak accept. The key concerns voiced by the reviewers are the lack of novelty (*the novelty of the proposed multi-derivative architecture is limited*), the lack of comparisons with specific architectures in appropriate setting (rPPGNet without the STVEN module, DeeprPPG, RhythmNet, CVD), and concerns about the use of synthetic data (although authors provide some justifications to that end). It appears that the key to reviewers' scores is that higher-order dynamics did not constitute a sufficient novelty. + +Given the post-rebuttal scores and discussions, AC has no option but to recommend a reject at this point.",ICLR2022, +YfCeyhV6Nz,1576800000000.0,1576800000000.0,1,HkghoaNYPB,HkghoaNYPB,Paper Decision,Reject,"The paper does not provide theory or experiment to justify the various proposed relaxations. In its current form, it has very limited scope.",ICLR2020, +yaS85Su8zX1Y,1642700000000.0,1642700000000.0,1,9vsRT9mc7U,9vsRT9mc7U,Paper Decision,Reject,"## A Brief Summary +Recent works in deep learning have shown that it is possible to solve [[combinatorial optimization]] problems (COP) with neural networks. However, generalization beyond the examples seen in the training set is still challenging, e.g., generalizing to TSP with more cities than the ones seen in the training set. This paper proposes the GANCO approach, where a separate generative neural network based on GAN generates new hard-to-solve training instances for the optimizer. The optimizer and the generative network are trained in an alternating fashion. The authors have run experiments with the attention model (AM) and POMO with their GAN-based data augmentation approach. The authors provide experimental results on several well-known COPs, such as the traveling salesman problem. + +## Reviewers' Feedback + +Below, I will summarize some reviewers' feedback and would like the authors to address the cons noted below. +### Reviewer sEuD + +*Pros:* +- Paper is well-written. +- Task is important and well-motivated. +- Good experimental results. +*Cons:* +- The paper's core contribution on the necessity of adversarial entities is not well-motivated. +- Missing baselines: +- RL agent trained on all target distributions. To figure out how far GANCO is from the optimal policy. +- The performance of an agent trained on a curriculum. +- Figure 2 is unnecessary/redundant in the paper. + + + +### Reviewer tjCH +*Cons:* +- The paper is reasonably written. However, it would be much easier to follow with a few changes. For example, section 3.1 explains the architecture and, in related works, a more comprehensive overview of the methods to improve the robustness of RL methods. +- It is widely known that data augmentation helps in deep learning. The paper's claims would be more convincing if it provided some crucial baselines, such as comparing different data augmentations methods and carefully ablating them. + +### Reviewer N945 +*Pros:* +- Well-written +- Good evaluation +- Simple model with good results +*Cons:* +- Missing citation to the PAIRED paper. +- How important are the adversarial entities generated? Is it possible to achieve similar results by just training on more samples? +- Missing baselines: Instead of training in stages, alternate optimizer and generator network per step basis. + +### Reviewer mumN +*Pros:* +- The proposed approach is novel. +- Comprehensive and extensive experiments. +- Figure 1 provides a good summary of the approach. + +*Cons:* +- Motivation is for the GANCO is not very convincing. +- Concerns about the capacity of the neural nets used in the paper. +- Concerns on forgetting the original distribution. +- Concerns about experimental evaluation protocol. +- Including experiments on routing problems to show the generality of the proposed approach. +- Request for improvements in the writing and the formatting of the paper. + +## Key Takeaways and Thoughts +I think this paper attacks an interesting problem. As far as I am aware of the approach is novel. However, generative adversarial networks have been used in the machine learning literature for data augmentation and RL for augmenting the environment (see the PAIRED paper.) GAN type of approaches hasn't been used to improve the generalization of the deep learning approaches for COP. The results look promising. However, as pointed out by Reviewer mumN and tjCH, this paper would benefit more from further ablations, particularly the necessity of adversarial generation part to make the arguments more convincing. As it stands now, it is not clear where exactly the improvements are coming from. Reviewer mumN also raised some concerns about the poorly configured LHK3 baseline in the discussion period. Furthermore, I agree with the reviewer mumN and tjcH that this paper would benefit from restructuring to make it flow better. I do think that this paper needs another round of reviews. I would recommend the authors go over the feedback provided here and address them for future submission.## References",ICLR2022, +pCldbIKtL7o,1642700000000.0,1642700000000.0,1,GiddFXGDmqp,GiddFXGDmqp,Paper Decision,Reject,"This paper introduces a VAE-based generative model of 3D point-clouds inspired by SPAIR that can do unsupervised segmentation, named SPAIR3D. The model uses both global and local latent variables to encode global scene structure as well as individual objects. + +The proposed model is relatively complex, but the presentation is overall clear. + +Experimental results on simple synthetic datasets look promising. However, one might argue that for these simple tasks a direct application of a simpler mixture of VAEs (such as IODINE) might be sufficient, so it would be informative to make a direct comparison between these methods and/or show results on a problem clearly out of the scope of these simpler methods (e.g. with high imbalance in the point clouds).",ICLR2022, +mhgf1AmSSgK,1610040000000.0,1610470000000.0,1,zDy_nQCXiIj,zDy_nQCXiIj,Final Decision,Accept (Spotlight),"The paper gives an elegant and efficient closed form solution for steering directions in the latent space of a pretrained GAN to to produce transformations in the image domain such as scaling and rotation etc, this also extended to attribute transfer. The new method leads to ""speed up, analytical transformation end points, and better disentanglement"" w.r.t to competitive methods. All reviewers agreed on the merits of this work, and the good qualitative and quantitative results . The rebuttal addressed reviewers questions and concerns regarding the structure of the paper and its coherence. Accept",ICLR2021, +It4E2xBvILL,1642700000000.0,1642700000000.0,1,6y2KBh-0Fd9,6y2KBh-0Fd9,Paper Decision,Accept (Poster),"The paper investigates the use of flow models for out-of-distribution detection. The paper proposes to use a combination of random projections in the latent space of flow models and one-sample / two-sample statistical tests for detecting OOD inputs. The authors present results on image benchmarks as well as non-image benchmarks. + +The reviewers found the approach well-motivated and appreciated the ablations. The authors did a good job of addressing reviewer concerns during the rebuttal. During the discussion phase, the consensus decision leaned towards acceptance. I recommend accept and encourage the reviewers to address any remaining concerns in the final version. + +It might be worth discussing this paper in the related work: Density of States Estimation for Out-of-Distribution Detection https://arxiv.org/abs/2006.09273",ICLR2022, +osJmSuRKc7h,1642700000000.0,1642700000000.0,1,7VH_ZMpwZXa,7VH_ZMpwZXa,Paper Decision,Reject,"The paper investigates how the geometrical compactness of in-distribution examples affects OOD detection performance and proposes architectural modifications to enable compact in-distribution embeddings. All the reviewers agreed that the paper has several interesting contributions. I agree with the authors that simplicity is a strength, not a weakness. + +My main concern is that the paper's contributions feel a bit scattered. For instance, the paper does a detailed evaluation of normalization and compactness, but makes a few other minor contributions (as detailed by +the authors at https://openreview.net/forum?id=7VH_ZMpwZXa¬eId=m-1y5byLbwS​). However, the latter contributions feel a bit narrow to specific methods and are not as comprehensively tested as the claims around normalization. + +Overall, the reviewers and I think that the current version falls below the acceptance threshold. I encourage the authors to revise the draft and resubmit to a different venue.",ICLR2022, +w5mBS0mfI9h,1642700000000.0,1642700000000.0,1,L7wzpQttNO,L7wzpQttNO,Paper Decision,Accept (Poster),"This work suggests an extension of diffusion-based generative models, where both the forward and reverse process have learnable parameters (rather than just the reverse process). This is then applied to speech synthesis, with high-fidelity audio generated in very few sampling steps compared to what is typical for this class of models. The proposed model is specifically compared to other diffusion-based approaches for speech synthesis in terms of inference speed. + +Reviewers highlighted the novelty of the idea and the convincing experimental results. Concerns were raised about the accessibility and clarity of the presentation (structure, too many technical details), lack of a related work section, and the methodology used to compare the proposed model against baselines. The authors have attempted to address these issues, and two reviewers raised their scores as a result. All reviewers now recommend acceptance. + +I am therefore recommending acceptance as well, but I would like to encourage the authors to polish the presentation further, in order to make the work maximally accessible to a wide audience.",ICLR2022, +r1eQLvrbgN,1544800000000.0,1545350000000.0,1,ByME42AqK7,ByME42AqK7,"interesting method, promising results",Accept (Poster),"The paper proposes an evolutionary architecture search method which uses weight inheritance through network morphism to avoid training candidate models from scratch. The method can optimise multiple objectives (e.g. accuracy and inference time), which is relevant for practical applications, and the results are promising and competitive with the state of the art. All reviewers are generally positive about the paper. Reviewers’ feedback on improving presentation and adding experiments with a larger number of objectives has been addressed in the new revision. + +I strongly encourage the authors to add experiments on the full ImageNet dataset (not just 64x64) and/or language modelling -- the two benchmarks widely used in neural architecture search field.",ICLR2019,5: The area chair is absolutely certain +TC51GzTzCU,1642700000000.0,1642700000000.0,1,aKZeBGUJXlH,aKZeBGUJXlH,Paper Decision,Reject,"This paper introduces a defense method (gradient broadcast adaptation) against backdoor attacks on pretrained language models. It proposes to utilize prompt tuning to guide the perturbed weights back to a normal state and thus helps avoid the degradation of model's generalization ability. + +Strengths: +- Experiments are conducted across multiple datasets with different types of backdoor attacks, demonstrating the effectiveness of the proposed approach +- The proposed idea is well motivated and intuitive + +Weakness: +- Improvement on experiment results seems marginal +- Some technical details of the attack setup are unclear +- Writing of the paper needs improvement",ICLR2022, +NORyVj32-PG,1642700000000.0,1642700000000.0,1,k-sNDIPY-1T,k-sNDIPY-1T,Paper Decision,Reject,"This paper explores the use of recurrent neural networks to model neural activity time-series data. The hope is that computationally demanding biophysical models of neural circuits could be replaced by RNNs when the goal is simply to capture the right input-output functions. The authors show that they can fit RNNs to the behaviour of a complex, biophysical model of the C elegans nervous system, and they explore the space of hyperparameter and network choices that lead to the best fits. + +The reviews for this paper were borderline, with scores of 3, 6, and 8. On the positive side, the reviewers agreed that the paper is very effective in demonstrating that the input-output behaviour of the biophysical model of C elegans can be replicated by RNNs. But, on the negative side there were concerns about the limited nature of the empirical results, lack of details about the simulation, too much emphasis in describing well-known RNN architectures, and lack of systematic strategy for applying this technique in other systems. The rebuttals did not change the borderline scores. + +Thus, this is an instance where the AC must be a bit more involved in the decision. After reading the paper and reviews, the AC felt that this work was not sufficiently general in its application. Ultimately, using artificial neural networks to fit neural data is common practice nowadays, so really, this paper serves as a proof-of-concept for replacing a complex biophysical model with a simpler RNN. But, given that RNNs are quite good at modelling sequence data, it's not terribly surprising that this works. Moreover, though the authors do a very careful search over network design decisions, they don't provide a systematic strategy for others to employ if they so wished. Also, the authors do not provide much insight into what the RNNs learn that might help us to better understand the modelled neural circuits. And most importantly, this only demonstrates the effectiveness for systems where we have biophysical models with well-established accuracy, which is not the case for most neural circuits. Given these considerations, a reject decision was reached.",ICLR2022, +a3VUgzbBmE5,1642700000000.0,1642700000000.0,1,e_FK_rDajEv,e_FK_rDajEv,Paper Decision,Reject,"This paper proposes an active intervention-targeting mechanism for causal structure discovery. After the discussion, there was a consensus among the reviewers that this paper needs another round of revision to address the lingering concerns. These concerns include providing a more fair experimental setup (e.g. by properly distinguishing and designing proper experiments for the observational, random intervention, and targeted intervention settings). Since the paper lacks theoretical guarantees (which is OK and not a requirement for acceptance), the merits rest on providing a thorough and fair experimental evaluation.",ICLR2022, +j0PvSA8cLn9,1610040000000.0,1610470000000.0,1,Ef1nNHQHZ20,Ef1nNHQHZ20,Final Decision,Reject,"This submission aims to improve adversarial training by making it involve also layer-wise (instead of only input-wise) perturbations. This is an interesting idea and it is accompanied by an interesting ODE-based perspective on the resulting dynamics. However, as the comments and reviews detail, the current manuscript misses the discussion of very relevant previous work, does not specify important details of the approach (e.g., how to bound the extent of the perturbations used), and relies on weak primitives (FGSM vs PGD). + +The consensus is that this would be an interesting and valuable contribution but only after addressing the above shortcomings. ",ICLR2021, +r1D3G1Trz,1517250000000.0,1517260000000.0,18,HkfXMz-Ab,HkfXMz-Ab,ICLR 2018 Conference Acceptance Decision,Accept (Oral),"This paper presents a novel and interesting sketch-based approach to conditional program generation. I will say upfront that it is worth of acceptance, based on its contribution and the positivity of the reviews. I am annoyed to see that the review process has not called out the authors' lack of references to the decently body of existing work on generating structure on neural sketch programming and on generating under grammatical constraint. The authors' will need look no further than the proceedings of the *ACL conferences of the last few years to find papers such as: +* Dyer, Chris, et al. ""Recurrent Neural Network Grammars."" Proceedings of NAACL-HLT (2016). +* Kuncoro, Adhiguna, et al. ""What Do Recurrent Neural Network Grammars Learn About Syntax?."" Proceedings of EACL (2016). +* Yin, Pengcheng, and Graham Neubig. ""A Syntactic Neural Model for General-Purpose Code Generation."" Proceedings of ACL (2017). +* Rabinovich, Maxim, Mitchell Stern, and Dan Klein. ""Abstract Syntax Networks for Code Generation and Semantic Parsing."" Proceedings of ACL (2017). + +Or other work on neural program synthesis, with sketch based methods: +* Gaunt, Alexander L., et al. ""Terpret: A probabilistic programming language for program induction."" arXiv preprint arXiv:1608.04428 (2016). +* Riedel, Sebastian, Matko Bosnjak, and Tim Rocktäschel. ""Programming with a differentiable forth interpreter."" CoRR, abs/1605.06640 (2016). + +Likewise the references to the non-neural program synthesis and induction literature are thin, and the work is poorly situated as a result. + +It is a disappointing but mild failure of the scientific process underlying peer review for this conference that such comments were not made. The authors are encouraged to take heed of these comments in preparing their final revision, but I will not object to the acceptance of the paper on these grounds, as the methods proposed therein are truly interesting and exciting.",ICLR2018, +Hkshoz8Ox,1486400000000.0,1486400000000.0,1,SJU4ayYgl,SJU4ayYgl,ICLR committee final decision,Accept (Poster),The reviewers are in agreement that this paper is well written and constitutes a solid contribution to graph-based semi-supervised learning based on variants of CNNs.,ICLR2017, +qt8rN439b8,1576800000000.0,1576800000000.0,1,HylLq2EKwS,HylLq2EKwS,Paper Decision,Reject,"The paper proposes to learn a ""virtual user"" while learning a ""recommender"" model, to improve the performance of the recommender system. A reinforcement learning algorithm is used for address the problem the authors defined. Multiple reviewers raised several concerns regarding its technical details including the feedback signal F, but the authors have not responded to any of the concerns raised by the reviewers. The lack of authors involvement in the discussion suggest that this paper is not at the stage to be published.",ICLR2020, +qnRJu100aag,1642700000000.0,1642700000000.0,1,w4cXZDDib1H,w4cXZDDib1H,Paper Decision,Accept (Poster),"The paper introduces an object detection method that integrates vision and detection transformers through a novel Reconfigured Attention Module (RAM). Among other questions, the reviewers raised concerns about fair comparison with baselines, limited novelty of the RAM module, completeness of experiments, and missing details. The rebuttal adequately addressed these concerns with clarifications and additional experiments. R1 remained unconvinced that a simple modification to YOLOS could not be devised to improve the speed similar to the proposed method, but stated he/she wouldn’t argue strongly for rejection. While this is a legitimate concern, the AC agrees with R2 and R3 that the paper has enough merits to be accepted at ICLR, as the results are strong and are likely to have significant practical value.",ICLR2022, +21Gokiocin,1576800000000.0,1576800000000.0,1,B1xgQkrYwS,B1xgQkrYwS,Paper Decision,Reject,"This is an observational work with experiments for comparing iterative pruning methods. + +I agree with the main concerns of all reviewers: + +(a) Experimental setups are of too small-scale or with easy datasets, so hard to believe they would generalize for other settings, e.g., large-scale residual networks. This aspect is very important as this is an observational paper. +(b) The main take-home contribution/message is weak considering the high-standard of ICLR. + +Hence, I recommend rejection. + +I would encourage the authors to consider the above concerns as it could yield a valuable contribution.",ICLR2020, +ByxetpSJgN,1544670000000.0,1545350000000.0,1,r1gRCiA5Ym,r1gRCiA5Ym,Meta-Review,Reject,"The paper introduces a new variant of the Dropout method. The reviewers agree that the procedure is clear. However, motivations behind the method are heuristic, and have to lean much on empirical evidence. A strong motivation behind the procedure is lacking, and the motivation behind the method is unclear. Furthermore, the empirical evidence is lacking in detail and could use better comparisons with existing literature.",ICLR2019,4: The area chair is confident but not absolutely certain +SJehiI0ryN,1544050000000.0,1545350000000.0,1,B14rPj0qY7,B14rPj0qY7,"Simple design to address generalizability and interpretability, but needs more work",Reject,"The paper presents a unified system for perception and control that is trained in a step-wise fashion, with visual decoders to inspect scene parsing and understanding. Results demonstrate improved performance under certain conditions. But reviewers raise several concerns that must be addressed before the work is accepted. + +Reviewer Pros: ++ simple elegant design, easy to understand ++ provides some insight behind system function during failure conditions (error in perception vs control) ++ improves performance under a subset of tested conditions + +Reviewer Cons: +- Concern about lack of novelty +- Evaluation is limited in scope +- References incomplete +- Missing implementation details, hard to reproduce +- Paper still contains many writing errors",ICLR2019,4: The area chair is confident but not absolutely certain +51aHQi0fML0,1610040000000.0,1610470000000.0,1,AT7jak63NNK,AT7jak63NNK,Final Decision,Reject,"The paper was evaluated by 4 knowledgeable reviewers and got mixed scores. While most reviewers appreciated the new intuitive approach to meta RL. there were severe concerns about algorithmic choices and the evaluations that led to a poor score from some reviewers. These concerns are summarized below: +- The motivation of experience relabeling for out of distribution samples is not clear (R2) +- It is unclear why experience relabeling does not work for in distribution samples (R2, R4) +- The reported performance is not a fair comparison as it is typically not known when a task is in-distribution or out-distribution, so we would either have to take always experience relabeling or never (or learn when do use which algorithm) +- The paper falls short in terms of evaluations (R3, R4), in particular it remains unclear to me if MIER can, under realistic circumstances. It is suggested to use more established benchmarks such as Meta-World to evaluate the performance of MIER. + +For the given reasons, I recommend that the authors do these corrections and go through another round of reviews at another conference. +",ICLR2021, +XdS3cXZwV,1576800000000.0,1576800000000.0,1,rkePU0VYDr,rkePU0VYDr,Paper Decision,Reject,"This paper presents an analysis on different methods of noise injection in adversarial examples, using gaussian noise for example. There are important issues raised by reviewers 1 & 2 about some conclusions not being well supported by the experiments and the utility/importance of some conclusions. After a discussion among reviewers, as of now all 3 reviewers stand by the decision that substantial improvements, and analysis can be made in the paper. Thus, Im recommending a Rejection.",ICLR2020, +63MySLlyvP,1576800000000.0,1576800000000.0,1,BJes_xStwS,BJes_xStwS,Paper Decision,Reject,This paper proposes a scalable approach for graph learning from data. The reviewers think the approach appears heuristic and it is not clear the algorithm is optimizing the proposed sparse graph recovery objective. ,ICLR2020, +grAfSt5iTJ,1576800000000.0,1576800000000.0,1,r1lUl6NFDH,r1lUl6NFDH,Paper Decision,Reject,"The paper proposes to use the mirror descent algorithm for the binary network. It is easy to read. However, novelty over ProxQuant is somehow limited. The theoretical analysis is weak, in that there is no analysis on the convergence and neither how to choose the projection for mirror mapping construction. Experimental results can also be made more convincing, by adding comparisons with bigger datasets, STOA networks, and ablation study to demonstrate why mirror descent is better than proximal gradient descent in this application.",ICLR2020, +S1kynf8ue,1486400000000.0,1486400000000.0,1,rksfwnFxl,rksfwnFxl,ICLR committee final decision,Reject,"This is a pure application paper: an application of LSTMs to host intrusion detection systems based upon observed system calls. And from the application standpoint, I don't believe this is a bad paper, the authors seem to achieve reasonable results from the method (though admittedly, I can't really judge the quality of these results without a lot more familiarity based upon current work in intrusion detection systems). However, in terms of the ICLR audience, I simply don't believe there is enough here to warrant substantial interest. As an example, the authors highlighted the distinction between network intrusion detection systems (NIDS) and host intrusion detection systems (HIDS), and felt that one review was completely unsuitable because they didn't realize this distinction when reading the paper. This is of course a crucial distinction from the security application side, but from the algorithmic/ML side, it's simply not that important, and the fact that there _has_ been previous work exactly on LSTMs for NIDS makes this paper unlikely to have a huge impact in the ICLR community. It would be much better suited to a security conference, where the application could be judged on its own merits, and the community would likely understand much better how significant these results were. + + Pros: + + Nice application of LSTMs to HIDS task + + Cons: + - Nothing really novel from the algorithmic/ML side + - The significance of the results are difficult to assess without more formal understanding of the problem domains + + The PCs have thus decided that this paper isn't ready to appear at the conference.",ICLR2017, +ByJMEJpHz,1517250000000.0,1517260000000.0,300,Skw0n-W0Z,Skw0n-W0Z,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"There is a concern from one of the reviewers that the paper needs deeper analysis. On the other hand, applying finite horizon techniques to deep RL is relatively unexplored, and the paper does provide some interesting results in that direction. +",ICLR2018, +pewV34cNX,1576800000000.0,1576800000000.0,1,r1xa9TVFvH,r1xa9TVFvH,Paper Decision,Reject,"As the reviewers have pointed out and the authors have confirmed, the original version of this paper was not a significant leap beyond combining recent understanding of Neural Tangent Kernels and previous techniques for kernelized bandits. In a revision, the authors updated their draft to allow the point at which gradients are centered around, theta_0, to now equal theta_t. This seems like a more reasonable algorithm and it is satisfying that the authors were able to maintain their regret bound for this dynamic setting. However, the revision is substantial and it seems unreasonable to expect reviewers to read the revised results in detail--the reviewers also felt it may be unfair to other ICLR submissions. All reviewers believe the paper has introduced valuable contributions to the area but should go under a full review process at a future venue. A reviewer would also like to see a comparison to Kernel UCB run on the true NTK (or a good approximation thereof). ",ICLR2020, +TpIAyFbOXpn,1610040000000.0,1610470000000.0,1,TV9INIrmtWN,TV9INIrmtWN,Final Decision,Reject,"Inspired by biological agents that have developed mechanisms like attention as an information bottleneck to help function more effectively under various constraints of life, this paper looks at an approach of learning a hard attention scheme by leveraging off the prediction errors of an internal world model. They demonstrate their approach via a simple but easy to understand 2D pixel multi-agent game, a gridworld env, and also PhysEnv, to show the effectiveness of the learned hard attention, and go on to discuss interesting aspects such as curiosity attention. + +Overall, I thought more highly of the paper than the reviewers, and might have proposed a score of 6 if I were a reviewer, but I also read each review and respect the points given by all four reviewers, and also agree with much of their feedback in the end. I think if this work was submitted to ALife (Journal or Conference), it might have been accepted. Not that those venues are easier, if anything they can often be more selective, but I think what ICLR (and similar venues like ICML) tends to expect is a bit different than what this paper offers. + +To improve this work, I recommend following some of the reviewers' advice (especially R4), particularly on experimental design. Reviewers suggest that the current experiments are small and simple, but while true to some extent, I think more importantly missing are clear baseline methods to compare your approach against. What can your approach do that existing popular approaches in RL will totally fail at doing? It can be worthwhile to try your approach on a larger task domain, such as Atari (but perhaps modified) or ProcGen [1] to show the benefits of hard attention compared to existing approaches. For instance, some recent work [2] demonstrated that hard attention can help agents generalize to out-of-training domain tasks the agent has not seen before during training - something that traditional approaches without attention tend to fail at doing. + +In the current state, the work will be a great workshop paper. But I recommend the authors to continue improving the work in the direction that can help the idea gain acceptance by the broader community. + +[1] https://openai.com/blog/procgen-benchmark/ +[2] https://arxiv.org/abs/2003.08165 +",ICLR2021, +SyTifkpHf,1517250000000.0,1517260000000.0,9,Hk6kPgZA-,Hk6kPgZA-,ICLR 2018 Conference Acceptance Decision,Accept (Oral),"This paper attracted strong praise from the reviewers, who felt that it was of high quality and originality. The broad problem that is being tackled is clearly of great importance. + +This paper also attracted the attention of outside experts, who were more skeptical of the claims made by the paper. The technical merits do not seem to be in question, but rather, their interpretation/application. The perception by a community as to whether an important problem has been essentially solved can affect the choices made by other reviewers when they decide what work to pursue themselves, evaluate grants, etc. It's important that claims be conservative and highlight the ways in which the present work does not fully address the broader problem of adversarial examples. + +Ultimately, it has been decided that the paper will be of great interest to the community. The authors have also been entrusted with the responsibility to consider the issues raised by the outside expert (and then echoed by the AC) in their final revisions. + +One final note: In their responses to the outside expert, the authors several times remark that the guarantees made in the paper are, in form, no different from standard learning-theoretic claims: ""This criticism, however, applies to many learning-theoretic results (including those applied in deep learning)."" I don't find any comfort in this statement. Learning theorists have often focused on the form of the bounds (sqrt(m) dependence and, say, independence from the # of weights) and then they resort to empirical observations of correlation to demonstrate that the value of the bound is predictive for generalization. because the bounds are often meaningless (""vacuous"") when evaluated on real data sets. (There are some recent examples bucking this trend.) In a sense, learning theorists have gotten off easy. Adversarial examples, however, concern security, and so there is more at stake. The slack we might afford learning theorists is not appropriate in this new context. I would encourage the authors to clearly explain any remaining work that needs to be done to move from ""good enough for learning theory"" to ""good enough for security"". The authors promise to outline important future work / open problems for the community. I definitely encourage this. + + + + +",ICLR2018, +HycpjG8ux,1486400000000.0,1486400000000.0,1,r1tHvHKge,r1tHvHKge,ICLR committee final decision,Reject,"This paper presents a few interesting ideas, namely the idea of keeping around a set of ""danger states"" and treating these states with some special consideration in reply to make sure that their impact is not neglected after collecting a lot of additional data. + + However, there are two main problems: 1) the actual implementation here seems fairly ad-hoc, and it's not at all clear to me that this particular algorithm (building a classifier with equal numbers of good and danger states, and then injecting an additional reward into the Q-learning task based upon this classifier), is the right way to go about this. The presentation is also difficult to follow, and the final results imply aren't that compelling (though this is improving after the revisions, but still has a way to go. We therefore encourage the authors to resubmit their work at a future conference venue. + + Pros: + + Interesting idea of keeping around danger states and injecting them into training + + Cons: + - Algorithm doesn't seem that well motivated + - Presentation is a bit unclear, takes until page 6 to actually present the basic approach. + - Experiments aren't that convincing (better after revisions, but still need work)",ICLR2017, +snOCEfDTMER,1642700000000.0,1642700000000.0,1,i8d2kdxii1L,i8d2kdxii1L,Paper Decision,Reject,"This work studies a variant of a message-passing scheme, aiming to improve the efficiency of GNNs to heterophilic graphs, as well as improving its stability to noise. The authors provide a new architecture, called $p$-Laplacian message passing, as well as some theoretical analysis and empirical evaluation. +Reviewers highlighted several positive aspects on this work, such as the general idea of considering p-Laplacians, as well as the extensive empirical evaluation. However, during the review discussions, several important issues arose, namely important concerns regarding the theoretical contributions, as well as concerns in calibrating the baselines in some empirical evaluations. Overall, the AC is of the opinion that this paper requires a further iteration before it can be considered for publication, and encourages the authors to take the time to address the comments raised by the reviewers.",ICLR2022, +r8LwmZUNOjh,1642700000000.0,1642700000000.0,1,SwIp410B6aQ,SwIp410B6aQ,Paper Decision,Accept (Poster),"Based on the previously observed neural collapse phenomenon that the features learned by over-parameterized classification networks show an interesting clustering property, this paper provides an explanation for this behavior by studying the transfer learning capability of foundation models for few-shot downstream tasks. Both theoretical and empirical justifications are presented to elaborate that neural collapse generalizes to new samples from the training classes, and to new classes as well. + +The problem that this paper delves into is important. The paper is well-motivated, and well structured with a good flow. Both theoretical and empirical analyses of the paper are solid. Preliminary ratings are mixed, but during rebuttal, multi-round responses and in-depth discussions were carried out between authors and reviewers, and the final scores are all positive with major concerns well addressed. AC considers the paper itself and all relevant threads, and recommends the paper for acceptance. Authors shall incorporate all response materials into the future version.",ICLR2022, +HyggiieNx4,1544980000000.0,1545350000000.0,1,S1xoy3CcYX,S1xoy3CcYX,Major revisions required.,Reject,"In light of the reviews and the rebuttal, it seems that the paper needs to be rewritten to head off some of the confusions and criticisms that the reviewers have made. That said, the main argument seems to contradict some of the lower bounds recently established by Madry and colleagues, showing the existence of distributions where the sample complexity for finding robust classifiers is arbitrarily larger than that for finding low-risk classifiers. I recommend the authors take a closer look at this apparent contradiction when revising.",ICLR2019,4: The area chair is confident but not absolutely certain +Bka1N16Bz,1517250000000.0,1517260000000.0,270,r1HhRfWRZ,r1HhRfWRZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Since this seems interesting, I suggest to accept this paper at the conference. However, there are still some serious issues with the paper, including missing references. ",ICLR2018, +FvC-sGoBl7D,1642700000000.0,1642700000000.0,1,BrFIKuxrZE,BrFIKuxrZE,Paper Decision,Accept (Poster),"This paper addresses fair representation learning, with the aim of obstructing the recovery of sensitive features from the learned representation, hence enforcing the fairness of subsequent prediction tasks. In the setting where probability density can be estimated for sensitive groups, Fair Normalizing Flows (FNF) tries to minimize the statistical distance between group-wise latent representations, thereby providing theoretical fairness guarantees. Experimental confirm the effectiveness of FNF in fairness, transferrability, and interpretability. + +The paper received extensive and in-depth discussion. The rebuttal did an excellent job in clarification. Although there are still some concerns on the theoretical properties of the optimal solution, overall the reviewers and myself find this paper interesting and worth publishing.",ICLR2022, +SkqgLJpSM,1517250000000.0,1517260000000.0,713,rJUBryZ0W,rJUBryZ0W,ICLR 2018 Conference Acceptance Decision,Reject,"The author's revisions addressed clarity issues and some experimental issues (e.g., including MAML results in the comparison). The work takes an original path to an important problem (transfer learning, essentially). There is a question of significance, and this is due to the fact that the empirical comparisons are still very limited. The task is an artificial one derived from MNIST. I would call this ""toy"" as well. On this toy task, the approach isn't that much different from MAML, which is not in of itself a problem, but it would be interested to have a less superficial discussion of the differences. + +The authors mention that they didn't have time for a larger empirical study. I think one is necessary in this case because the work is purposing a new learning algorithm/framework, and the question of its potential impact/significance is an empirical one.",ICLR2018, +hO5jKgFXEs,1642700000000.0,1642700000000.0,1,24N4XH2NaYq,24N4XH2NaYq,Paper Decision,Reject,"The submission introduces the sparse hierarchical table ensemble (S-HTE), based on oblivious decision trees for tabular data. The reviewers acknowledged the clarity of the presentation and the importance of the computational complexity analysis. However, they also raised concerns regarding the novelty of the proposed method and the significance of the results compared to competing methods (e.g., CatBoost). Given the consensus that the submission is not ready for publication at ICLR, I recommend rejection at this point.",ICLR2022, +5qlnRAraEu,1610040000000.0,1610470000000.0,1,LFs3CnHwfM,LFs3CnHwfM,Final Decision,Reject,"This paper proposes to solve the fuel optimization problem in hybrid electric vehicles using reinforcement learning. The work is interesting, but the reviewers consider it lacks novelty and it has different concerns on the assumptions of the modeling. The paper is quite difficult to follow. ",ICLR2021, +Lcxa32Kdsbv,1610040000000.0,1610470000000.0,1,paUVOwaXTAR,paUVOwaXTAR,Final Decision,Reject,"This paper presents an approach for modular multi-task learning. All the reviewers believe the goals are appealing and the idea is reasonable. However, R2 and R4 raise concerns with respect to novelty. There are also strong concerns regarding experiments. The concerns vary from reproducibility to small improvements and right baselines. The rebuttal fails to provide any new experiments or handle the reviewer concerns. All reviewers and AC agree that paper is not yet ready for publication.",ICLR2021, +0-1SLUZBdQ3,1610040000000.0,1610470000000.0,1,3FAl0W6gZ_e,3FAl0W6gZ_e,Final Decision,Reject,"Paper proposes and demonstrates a method to reconstruct 3d shape for a tree, from drone data. While the reviewers all appreciated to work, all felt there were many shortcomings of the paper with respect to an ICLR audience: +(a) no machine learning novelty +(b) highly interactive data processing method +(c) only one example processed tree shown +(d) inadequate connections with relevant literature on 3d reconstruction, both general purpose, and examples applied to vegetation. +(e) incomplete presentation of the method: no ablation studies, no listing of the times required for individual steps of the processing. + +In view of these concerns, we have decided to reject the paper. But we hope you find the reviewers' comments helpful, and make use of them in a revision of the work. +",ICLR2021, +kncwFMK75M,1576800000000.0,1576800000000.0,1,B1gX8kBtPr,B1gX8kBtPr,Paper Decision,Accept (Poster),"This work shows that there exist neural networks that can be certified by interval bound propagation. It provides interesting and surprising theoretical insights, although analysis requires the networks to be impractically large and hence does not directly yield practical advances. ",ICLR2020, +GuksYKgTUpe,1642700000000.0,1642700000000.0,1,fOsN52jn25l,fOsN52jn25l,Paper Decision,Accept (Poster),"I recommend this paper for acceptance but I do so with significant reservations. Since this metareview will be public for all time, I direct this metareview to future readers of this paper so that they can weigh its merits and drawbacks in a clear-minded way. + +This paper proposes a ""dual lottery ticket hypothesis."" For those unfamiliar, the original lottery ticket hypothesis (Frankle & Carbin, ICLR 2019) states approximately that any randomly initialized neural network contains a subnetwork that can be trained in isolation to full accuracy in the same number of steps as the original network. That is, $\forall$ neural networks, $\exists$ a subnetwork such that $Accuracy(Train($subnetwork$)) \geq Accuracy(Train($network$))$ for a standard, fixed training procedure $Train$. (For the sake of posterity, note that this claim was supported on small-scale neural networks but there is not evidence that it holds in general; only that it holds on the state of networks *early* in training. See *Linear Mode Connectivity and the Lottery Ticket Hypothesis* by Frankle et al. 2020.) To support this claim, Frankle & Carbin develop a procedure that finds such subnetworks, demonstrating that they exist in certain settings. + +As far as I understand, the dual lottery ticket hypothesis states that, $\forall$ subnetworks of a neural network, $\exists$ a setting of the weights such that $Accuracy(Train($subnetwork$)) \geq Accuracy(Train($network$))$. Like the original lottery ticket paper, this paper shows that such subnetworks exist: it trains the subnetwork with an L2 penalty on all of the weights except those of the subnetwork, allowing them to gradually fade away and leaving a new setting of the weights for the subnetwork that then allows it to train in isolation to full accuracy (like those subnetworks found in the original lottery ticket hypothesis paper). + +The reason that I have reservations about this approach is that the subnetwork found by the dual lottery ticket hypothesis procedure contains fully trained weights. This is novel but - to me - much less surprising and interesting: a randomly sparse subnetwork can be set with trained weights such that, after all of the other weights are fully pruned away, it can recover full accuracy. On the one hand, this is almost reminiscent of a standard pruning procedure where the network is both trained and pruned until a sparse subnetwork reaches full accuracy, with the dense network needed for much or all of training. On the other hand, the impressive part is that this can be done with a *randomly selected* sparse network rather than one chosen by a pruning heuristic. To me, that is the most interesting part of the paper. (And, for those readers wondering why specifically this paper is distinct from standard pruning, this is it.) + +I wonder about the significance of this finding given that the subnetwork is set by training (not by random initialization or a tiny amount of training as in work on the lottery ticket hypothesis), but it's a novel idea and I think future scholars and future research should be the judge of that significance, not me or the reviewers. The novelty alone merits publication, and we will have to wait and see about the significance. Thus, I weigh in favor of acceptance, although with reservations.",ICLR2022, +HkMeI1THG,1517250000000.0,1517260000000.0,706,rJ4uaX2aW,rJ4uaX2aW,ICLR 2018 Conference Acceptance Decision,Reject,"Pros: ++ The proposed large-batch, synchronous SGD method is able to generalize at larger batch sizes than previous approaches (e.g., Goyal et al., 2017). + +Cons: +- Evaluation on more than one task would make the paper more convincing. +- The addition of more hyperparameters makes the proposed algorithm less appealing. +- Some theoretical justifiction of the layer-wise rate scaling would help. +- It isn't clear that the comparison to Goyal et al., 2017 is entirely fair, because that paper also had recommendations for the implementation of batch normalization, weight decay, and a momentum correction as the learning rate is scaled up, but this submission does not address any of those. + +Although the revised paper addressed many of the reviewers' concerns, they still did not feel it was quite strong enough to be accepted to ICLR. +",ICLR2018, +BJe8q8VwlE,1545190000000.0,1545350000000.0,1,rkeqCoA5tX,rkeqCoA5tX,"Interesting and promising novel approach for demixing, but with no theoretical grounding and limited experimental evaluation",Reject,"The paper proposes two simple generator architecture variants enabling the use of GAN training for the tasks of denoising (from known noise types) and demixing (of two added sources). While the denoising approach is very similar to AmbientGAN and could thus be considered somewhat incremental, all reviewers and the AC agree that the developed use of GANs for demixing is an interesting novel direction. The paper is well written, and the approach is supported by encouraging experimental results on MNIST and Fashion-MNIST. +Reviewers and AC noted the following weaknesses of the paper: a) no theoretical support or analysis is provided for the approach, this makes it primarily an empirical study of a nice idea. +b) For an empirical study, the experimental evaluation is very limited, both in terms of dataset/problems it is tested on; and in terms of algorithms for demixing/source-separation that it is compared against. +Following these reviews, the authors added the experiments on Fashion-MNIST and comparison with ICA which are steps in the right direction. This improvement moved one reviewer to positively update his score, but not the others. +Taking everything into account, the AC judges that it is a very promising direction, but that more extensive experiments on additional benchmark tasks for demixing and comparison with other demixing algorithms are needed to make this work a more complete contribution. +",ICLR2019,3: The area chair is somewhat confident +3x4CbROGp,1576800000000.0,1576800000000.0,1,SJxAlgrYDr,SJxAlgrYDr,Paper Decision,Reject,"The paper explores the use of RL (actor-critic) for planning the expansion of a metro subway network in a City. The reviewers felt that novelty was limited and there was not enough motivation on what is special about this application, and what lessons can be learned from this exercise. ",ICLR2020, +cOG4avzM-R8,1610040000000.0,1610470000000.0,1,XKgo1UfNRx8,XKgo1UfNRx8,Final Decision,Reject,"Following a strong consensus across the reviewers, the paper is recommended for rejection. +They have all acknowledged some weaknesses of the paper, for instance + +* Inadequate reference to prior work +* Unsatisfactory level of polishing +* Too limited evaluation, with more comparisons to baselines required +* The proposed approach (""Dijkstra algorithm"") is not enough justified and motivated +* Clarity (missing definitions of key components). + +This list, together with the detailed comments of the reviewers, highlight opportunities to improve the manuscript for a future resubmission. +",ICLR2021, +vxKV0_7ABn,1576800000000.0,1576800000000.0,1,Hkl6i0EFPH,Hkl6i0EFPH,Paper Decision,Reject,"This paper addresses the problem of differential private data generator. The paper presents a novel approach called G_PATE which builds on the existing PATE framework. The main contribution is in using a student generator with an ensemble of teacher discriminators and in proposing a new private gradient aggregation mechanism which ensures differential privacy in the information flow from discriminator to generator. + +Although the idea is interesting, there are significant concerns raised by the reviewers about the experiments and analysis done in the paper which seem to be valid and have not been addressed yet in the final revision. I believe upon making significant changes to the paper, this could be a good contribution. Thus, as of now, I am recommending a Rejection.",ICLR2020, +eCJb6T11-fE,1642700000000.0,1642700000000.0,1,6Jf6HX4MoLH,6Jf6HX4MoLH,Paper Decision,Reject,"The paper proposes a planning framework that uses a transformer-based architecture as an attention mechanism that guides the search of a traditional sample-based planner (e.g., RRT*). More specifically, features extracted from a sliding window over the 2D search space serve as input to a transformer that produces a mask indicating where to draw samples from. By constraining the search space for the sample-based planner, the method reduces the time required for planning. The method is compared to both traditional and learning-based planners on different 2D navigation tasks and found to improve sample complexity (and, in turn, computation time), while also being capable of generalizing to unseen and real-world maps. + +The manner by which the method combines the advantages of sample-based planning with an attentional mechanism as a way to constrain the sampling process is interesting. As the reviewers emphasize, the experimental evaluation shows that this approach results in performance gains over both traditional (sample-based) and learning-based planners, while also being able to scale to larger maps as well as better generalize to out-of-distribution settings (compared to learning-based methods). These results support the value of both the overall approach as well as the architectural components (e.g., the transformer and the use of positional encoding). The reviewers initially raised a few concerns with the paper, the most notable of which are the need to include preprocessing in the overall computation time, the accuracy of some of the claims in the paper (e.g., with regards to generalizability), generalization to higher-dimensional domains, and the performance on the Dubins car domain. The authors responded to each of the reviews and updated the submission to address many of these concerns. However, questions still remain regarding whether or not the approach can be adapted to state/configuration spaces with more than two dimensions, something that traditional planners are readily capable of, and the unconvincing results on the Dubins car domain. + +Overall, the paper proposes an interesting approach to an important problem that is relevant to the robotics and machine learning communities. The paper makes promising contributions to improve the efficiency of planning, however the significance of these contributions needs to be made clearer.",ICLR2022, +BkyWhf8dl,1486400000000.0,1486400000000.0,1,H12GRgcxg,H12GRgcxg,ICLR committee final decision,Accept (Poster),"Reviewers agreed that the problem was important and the method was interesting and novel. The main (shared) concerns were preliminary nature of the experiments and questions around scalability to more classes. + + During the discussion phase, the authors provided additional CIFAR-100 results and introduced a new approximate but scalable method for performing inference. I engaged the reviewers in discussion, who were originally borderline, to see what they thought about the changes. R2 championed the paper, stating that the additional experiments and response re: scalability were an improvement. On the balance, I think the paper is a poster accept.",ICLR2017, +rJgZqXnAyE,1544630000000.0,1545350000000.0,1,S1x8WnA5Ym,S1x8WnA5Ym,Limited novelty,Reject,"The paper proposes GAN regularized by Determinantal Point Process to learn diverse data samples. + +The reviewers and AC commonly note the critical limitation of novelty of this paper. The authors pointed out + +""To the best of our knowledge, we are the first to introduce modeling data diversity using a Point process kernel that we embed within a generative model. "" + +AC does not think this is convincing enough to meet the high standard of ICLR. + +AC decided the paper might not be ready to publish in the current form.",ICLR2019,4: The area chair is confident but not absolutely certain +rJgNieMelV,1544720000000.0,1545350000000.0,1,SkE6PjC9KX,SkE6PjC9KX,somewhat limited novelty but good performance,Accept (Poster),"1. Describe the strengths of the paper. As pointed out by the reviewers and based on your expert opinion. + +- The paper is clear and well-motivated. +- The experimental results indicate that the proposed method outperforms the SOTA + +2. Describe the weaknesses of the paper. As pointed out by the reviewers and based on your expert opinion. Be sure to indicate which weaknesses are seen as salient for the decision (i.e., potential critical flaws), as opposed to weaknesses that the authors can likely fix in a revision. + +- The novelty is somewhat minor. +- An interesting (but not essential) ablation study is missing (but the authors promised to include it in the final version). + +3. Discuss any major points of contention. As raised by the authors or reviewers in the discussion, and how these might have influenced the decision. If the authors provide a rebuttal to a potential reviewer concern, it’s a good idea to acknowledge this and note whether it influenced the final decision or not. This makes sure that author responses are addressed adequately. + +There were no major points of contention. + +4. If consensus was reached, say so. Otherwise, explain what the source of reviewer disagreement was and why the decision on the paper aligns with one set of reviewers or another. + +The reviewers reached a consensus that the paper should be accepted. +",ICLR2019,4: The area chair is confident but not absolutely certain +F0tWpnIGyD,1576800000000.0,1576800000000.0,1,SkeYUkStPr,SkeYUkStPr,Paper Decision,Reject,"The authors propose a clustering algorithm for users in a system based on their lifetime distribution. The reviewers acknowledge the novelty of the proposed clustering algorithm, but one concern left unresolved is how the results of the analysis can be of use in the real world examples used. ",ICLR2020, +WkL57E2_5K,1642700000000.0,1642700000000.0,1,bPadTQyLb2_,bPadTQyLb2_,Paper Decision,Reject,"This paper improves on the efficiency of prior work that uses homomorphic +encryption to perform privacy-preserving inference. There are two main +concerns raised by the reviewers. First, multiple reviewers (and I) found +this paper difficult to read. Multiple pieces of the problem are not +clearly presented especially with respect to the technical contributions. +This was fixed in part in the rebuttal but more could still be done here. +But more importantly, three reviewers raise concerns about the evaluation +methodology, especially with respect to comparisons to prior work. On top +of this, there are valid criticisms raised by the reviewers about if the +contribution here is that significant when compared to prior work. (This +is something that both more clear writing and more careful experiments +could hep address.) Taken together I do not believe this paper is yet ready +for publication.",ICLR2022, +ryPAX1TBM,1517250000000.0,1517260000000.0,250,H1VjBebR-,H1VjBebR-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The reviewers were generally positive about this paper with a few caveats: + +PROS: +1. Important and challenging topic to analyze and any progress on unsupervised learning is interesting. +2. the paper is clear, although more formalization would help sometimes +3. The paper presents an analysis for unsupervised learning of mapping between 2 domains that is totally new as far as I know. +4. A large set of experiments + +CONS: +1. Some concerns about whether the claims are sufficiently justified in the experiments +2. The paper is very long and quite dense +",ICLR2018, +cCNiiQwOw21k,1642700000000.0,1642700000000.0,1,xVGrCe5fCXY,xVGrCe5fCXY,Paper Decision,Reject,"This paper explores replacing the Gaussian noise typically used in diffusion-based generative models with noise from other distributions, specifically the Gamma distribution. The effect of this change is studied empirically for both image and speech generation. + +Reviewers welcomed the exploration of the design space of diffusion models, and several reviewers consider the study of alternative noise distributions in particular an important contribution. They also raised several issues with precision and clarity (several mistakes in the manuscript were pointed out), the quality of the experiments, and, especially, a lack of convincing motivation for this exploration / sufficient demonstration of its impact. + +While the authors have made a significant effort to address the reviewers' comments and suggestions, which includes running additional experiments, all reviewers have nevertheless chosen borderline ratings, with half erring on the side of rejection, and the other half tentatively recommending acceptance. + +I am inclined to agree that, as it stands, the benefit of the proposed change of noise distribution is not convincingly shown to outweigh the additional complexity this introduces, so I am also recommending rejection.",ICLR2022, +endRhWx8EK,1576800000000.0,1576800000000.0,1,S1eQuCVFvB,S1eQuCVFvB,Paper Decision,Reject,"This paper proposes a family of new methods, based on Bayesian Truth Serum, that are meant to build better ensembles +from a fixed set of constituent models. + +Reviewers found the problem and the general research direction interesting, but none of the three of them were convinced that the proposed methods are effective in the ways that the paper claims, even after some discussion. It seems as though this paper is dealing with a problem that doesn't generally lend itself to large improvements in results, but reviewers weren't satisfied that the small observed improvements were real, and urged the authors to explore additional settings and baselines, and to offer a full significance test.",ICLR2020, +HUT7RfOwmEp,1642700000000.0,1642700000000.0,1,ks_uMcTPyW4,ks_uMcTPyW4,Paper Decision,Reject,"Due to the delayed rebuttal made it very hard for reviewers to react. + +The paper proposes a new sub-type of POMDPs dubbed AFA-POMDP. The proposed approach first learns a sequential VAE, then an RL approach learns control and feature acquisition policies jointly. The approach is evaluated on two tasks and shows very promising results compared to baselines. Overall the setting and the approach are very interesting. + +The replies and revised paper managed to address some of the concerns of the reviewers. However, there remain a few open questions and doubts (see updated reviews), in particular as some of the arguments of the authors remain in the hypothetical, and the reviewers are still not entirely convinced by the choice of the experimental tasks.",ICLR2022, +pn48nn0tMWNH,1642700000000.0,1642700000000.0,1,fSeD40P0XTI,fSeD40P0XTI,Paper Decision,Reject,"This paper presents a reinforcement learning algorithm to target variable in every time step. Although the paper proposes an important problem in many real-world applications, there were various major criticisms raised by reviewers. Most importantly, technical novelty is not well motivated or justified. There is also a significant lack of a specific description of the proposed method, discussion of computational complexity, clarity and presentation, and evaluation metrics, which decreased the enthusiasm of the reviewers.",ICLR2022, +vNHFAwedUo-,1610040000000.0,1610470000000.0,1,OZgVHzdKicb,OZgVHzdKicb,Final Decision,Reject,"Summary: +This paper introduces a method to try to learn in environments where a person specifies successful outcomes but there is no environmental reward signal. + +I'd personally be interested in knowing where people were able to easily provide such successful outcomes instead of, for instance, providing demonstrations or reward feedback. Similarly, I'd be interested in how other methods of providing human prior knowledge compared. + +Discussion: +Reviewers agreed the paper was interesting, but none of the 4 thought the paper should be accepted. + +Recommendation: +While I do not think this paper should be accepted in its current form, I hope the authors will find the comments and constructive criticism useful.",ICLR2021, +IxJJVr5g9W,1576800000000.0,1576800000000.0,1,H1eRYxHYPB,H1eRYxHYPB,Paper Decision,Reject,"The paper examines the problem of unsupervised domain translation. It poses the problem in a rigorous way for the first time and examines the shortcomings of existing CycleGAN-based methods. Then the authors propose to consider the problem through the lens of Optimal Transport theory and formulate a practical algorithm. + +The reviewers agree that the paper addresses an important problem, brings clarity to existing methods, and proposes an interesting approach / algorithm, and is well-written. However, there was a shared concern about whether the new approach just moves the complexity elsewhere (into the design of the cost function). The authors claim to have addressed in the rebuttal by adding an extra experiment, but the reviewers remained unconvinced. + +Based on the reviewer discussion, I recommend rejection at this time, but look forward to seeing the revised paper at a future venue.",ICLR2020, +BJ6eEy6HM,1517250000000.0,1517260000000.0,284,SywXXwJAb,SywXXwJAb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper seemingly joins a cohort of ICLR submissions which attempt to port mature concepts from physics to machine learning, make a complex and non-trivial theoretical contribution, and fall short on the empirical front. The one aspect that sets this apart from its peers is that the reviewers agree that the theoretical contribution of this work is clear, interesting, and highly non-trivial. While the experiment sections (MNIST!) is indubitably weak, when treating this as a primarily theoretical contribution, the reviewers (in particular 6 and 3) are happy to suggest that the paper is worth reading. Taking this into account, and discounting somewhat the short (and, by their own admission, uncertain) assessment of reviewer 5, I am leaning towards pushing for the acceptance of this paper. At very least, it would be a shame not to accept it to the workshop track, as this is by far the strongest paper of this type submitted to this conference.",ICLR2018, +B1ej4RBfxV,1544870000000.0,1545350000000.0,1,HylSk205YQ,HylSk205YQ,"An interesting take on partially observable MARL, without enough supporting evidence",Reject,"The paper presents an extension of MADDPG, adding communication between agents. The methods targets extremely noisy observations settings, so that agents need to decide if they communicate their private observations (or not). There is no intrinsic/explicit reward to guide the learning of the communication, only the extrinsic/implicit reward of the downstream task. + +The paper is clear and easy to follow, in particular after the updated writing. I believe some of the reviewers' points were addressed by the rebuttal. Nonetheless, some of the weaknesses of the paper still hold: namely the complexity of the approach compounded with a very specific experimental evaluation. The more complex an approach is (and it may be justified by the complexity of the setting!), the more varied its supporting evidence should be. + +In its current form, the paper would constitute a good workshop contribution (to discuss the approach), but I believe it needs more varied (and/or harder) experiments to be published at ICLR.",ICLR2019,3: The area chair is somewhat confident +3T_5mkzPyqG,1610040000000.0,1610470000000.0,1,pg9c6etTWXR,pg9c6etTWXR,Final Decision,Reject,"The paper proposes a simple modification to Laplace approximation to improve the quality of uncertainty estimates in neural networks. + +The key idea is to add “uncertainty units” which do not affect the predictions but change the Hessian of the loss landscape, thereby improving the quality of uncertainty estimates. The “uncertainty units” are themselves trained by minimizing a non-Bayesian objective that minimizes variance on in-distribution data and maximizes variance on known out-of-distribution data. Unlike previous work on outlier exposure and prior networks, the known out-of-distribution data is used only post-hoc. + +While the idea is interesting and intriguing, the reviewers felt that the current version of the paper falls a bit short of the acceptance threshold (see detailed comments by R3 and R2’s concerns about Bayesian justification for this idea). I encourage the authors to revise and resubmit to a different venue. +",ICLR2021, +9nltW7EvRLY,1610040000000.0,1610470000000.0,1,H-AAaJ9v_lE,H-AAaJ9v_lE,Final Decision,Reject,"All three reviews for this paper were negative, and the authors did not provide rebuttals or comments on the reviews. The main drawback of this work identified by the reviewers is that the empirical study is not sufficient (e.g., limited comparisons and ablation studies as well as low-dimensional examples).",ICLR2021, +MOrPDxVCpYDq,1642700000000.0,1642700000000.0,1,MQuxKr2F1Xw,MQuxKr2F1Xw,Paper Decision,Reject,"The paper discusses an approach for privacy preservation in the context of multi-task classification. All reviewers struggled to follow the paper and had fundamental questions about the motivation, methods and technical contributions. Unfortunately there was no feedback from the authors to help support the submission.",ICLR2022, +n90gSW1u9ud8,1642700000000.0,1642700000000.0,1,LNmNWds-q-J,LNmNWds-q-J,Paper Decision,Reject,"This paper aims to use pre-training to bridge the gap in performance between 2D GNN and 3D GNN. Specifically, during pretraining, it trains both 2D GNNs and 3D GNNs on data equipped with 3D geometry to maximize the mutual information between the 2D GNN representation with the 3D GNN representation. The proposed approach is interesting and novel and the paper presents some promising results showing that the pre-training does provide some benefits for downstream tasks where 3D geometry information is not available in comparison to several other baseline pretraining methods. While the reviewers agree that property prediction without only 2D graph is a practically important setting for high throughput screening, there are concerns about whether the current set of results paint a clear picture on the benefits and superiority of the proposed methods to alternatives (e.g., vs conf-gen) even after the revision. This is not due to lacking of results, but more of a presentation issue where results are not organized and discussed clearly to provide a coherent story. We do see clear and strong potential for this paper but it needs a careful rewrite/re-organization to tease out the key messages and how the experiments support them.",ICLR2022, +RBkRBjAlPNd,1610040000000.0,1610470000000.0,1,iAX0l6Cz8ub,iAX0l6Cz8ub,Final Decision,Accept (Oral),"The paper proposes an insightful study on the robustness and accuracy of the model. It was hard to simultaneously keep the robustness and accuracy. A few works tried to improve accuracy while maintaining the robustness by investigating more data, early stopping or dropout. From a different perspective, this paper aims to improve robustness while maintaining accuracy. + +There are some interesting findings in this paper, which could deepen our understanding of adversarial training. For example, the authors conducted experiments with different sizes of the network in standard training and adversarial training. The capacity of an overparameterized network can be sufficient for standard training, but it may be far from enough to fit adversarial data, because of the smoothing effect. Hence given the limited model capacity, adversarial data all have unequal importance. Though this technique is simple and widely studied in traditional ML, it is an interesting attempt in adversarial ML and the authors provide extensive experimental results to justify its effectiveness. + +In the authors' responses, the concerns raised by the reviewers have been well addressed. The new version becomes more complete by including more results on different PGD steps and the insights on designing weight assignment function. Also, the authors gave an interesting discussion on enough model size for the adversarial training, though it is still kind of an open question. I would thus like to recommend the acceptance of this paper. ",ICLR2021, +oX0czEdmWgMB,1642700000000.0,1642700000000.0,1,t3E10H8UNz,t3E10H8UNz,Paper Decision,Reject,"The paper proposes a hierarchical meta imitation learning framework for few-shot transfer in the context of long-horizon control tasks. Underlying the framework is a hierarchical adaptation of model-agnostic meta learning (MAML) that jointly learns the high-level policy together with the set of modular low-level policies (sub-skills), both of which are fine-tuned at test time based on a small number of demonstrations. Experimental evaluations on the meta-world benchmark as well as a kitchen environment benchmark compare the proposed framework with recent baselines. + +As several reviewers note, the problem of jointly learning modular policies together with the high-level policy for composing these sub-skills is both challenging and interesting to the robotics and learning communities. The manner by which the paper extends existing work in meta-learning (MAML) and hierarchical imitation learning is novel and technically sound. The reviewers raised some concerns, notably those regarding (1) the the framework's sensitivity to various hyperparameter settings and its ability to generalize to other domains; (2) the merits of joint optimization over decoupled optimization of the sub-skills and high-level policy; and (3) the need for experiments/evaluations on different domains. The authors provided a detailed response to each of the reviewers that includes the addition of a different benchmark evaluation (the kitchen environment), new ablation studies, and updates to the text. After a thorough review, however, concerns remain regarding the reproducibility of the results, which call into question some of the key contributions that the paper claims to provide over the existing state-of-the-art. The authors are encouraged to provide a more balanced discussion of the contributions along with evidence to support reproducibility in any future version of the paper.",ICLR2022, +rSjcxo6Juq,1610040000000.0,1610470000000.0,1,jWXBUsWP7N,jWXBUsWP7N,Final Decision,Reject,"The paper proposes a distributional perspective on the value function and uses it to modify PPO for both discrete and continuous control reinforcement learning tasks. The referees had noticed a number of wrong/misleading statements in the initial version of the submission, and the AC had also pointed out several problematic statements in a revised version. While the authors had acknowledged these mistakes and made appropriate corrections, there are several places that still need clear improvement before the paper is ready for publication. The paper seems to introduce a novel actor-critic algorithm. However, the correctness of its key step, the SR($\lambda)$ algorithm, has not been rigorously justified. For example, it is unclear how the geometric random variables would arise in that algorithm. For experiments, the AC seconds the comments provided by Reviewer 2 during the discussion: ""The empirical comparisons are overall still lacking: for the smaller-scale experiments, whilst the authors have been actively engaged in improving these comparisons during the rebuttal, at present, they are still in need of updating to make a fair comparison, for example in terms of the number of parameters included. The authors have acknowledged this, although the rebuttal period ran out before they were able to post new plots. The large-scale empirical results are still lacking reasonable baselines against existing distributional RL agents."" + +",ICLR2021, +xlztmPw1a0Y,1610040000000.0,1610470000000.0,1,aLtty4sUo0o,aLtty4sUo0o,Final Decision,Reject,"The paper treats a relevant and challenging problem in sequential learning scenarios -- how to detect distributional change over time when the pre- and post-change distributions are not known up to certainty. All reviewers more or less acknowledge that the paper presents a new approach towards solving this inference problem, where the high level idea is to approximately learn the pre- and post-change distribution parameters online using gradient descent and then apply well-known tests for change detection (e.g., the Shiryaev or CUSUM rules) with these assumed to be the pre- and post-change parameters. + +However, beyond the concerns expressed by the reviewers, my finding after going through the manuscript myself is that the presentation of the paper's results leaves a lot to be desired in terms of clarity of exposition, comprehensiveness of performance benchmarking and comparison to existing approaches. Despite some of the reviwers' scores being revised upwards, the overall evaluation of the paper according to me is not adequate to merit acceptance, as per the concerns listed below. + +1. There are two settings assumed in the paper (beginning of Sec. 2): (a) a completely Bayesian one, with the pre- and post-change distributional parameters drawn from a prior \cal{F} and the change time lambda drawn from a prior pi, and (b) a minmax one, where everything is the same as in (a) except that there is no prior over the change time lambda. However, it is not at all clear, in the algorithm design of the paper, where the prior \cal{F} over the distributions is used in computing (or approximating) conditional probabilities such as P[lambda | v_alpha = n]. + +2. There seem to be meaningless (or ill-defined) expressions in the paper's crucial portion motivating the algorithm design, such as P(X_t ~ f_{theta_0} | v_alpha = n), P(X_t ~ f_{theta_1} | v_alpha = n). It is hard to understand what the event ""X_t ~ f_{theta_0}"" even means -- I find it impossible to relate it to a sample path property. This leads me to question the validity of the technical development in the paper. + +3. Another undefined term is ""r-quickly"" in eq. (4); I had to dig through the classical work of Lai, and Tartakovsky-Veeravalli to get a formal definition for this term. This is not to be expected of a paper that attempts to develop a new change point detection procedure from scratch, especially to an audience (ICLR) that may largely be unfamiliar with classicalt change detection theory. + +4. There are several technical statements made without adequate formal proof, e.g., ""Given the optimal stopping time \nu, it's possible to evaluate the posterior distribution of the change point P(lambda=t | v_alpha=n), which in turn is a good classifier of the pre and post change observation"". What the precise meaning of the term ""classifier"" is, what its ""goodness"" is, and how exactly it is related to the posterior distribution of lambda given the value of v_alpha, is formally not spelt out for a paper that largely uses formal probability language to develop its main results. + +5. While I understand that the final algorithm to detect the change involves several approximations and heuristics along the way, which may very well be intuitively appealing, I do not understand (even after repeated passes over the submission) several key aspects -- a concern also expressed by Reviewer 3. Why is it reasonable to assume that the conditional distribution of the change time lambda given the algorithm's stop time v_alpha would be logistic, and with the specific parameters mu and s given in the section ""Distribution Approximation""? Moreover, it is hard to discern from the crucial Section 3.2 why the functions f_0^n, f_1^n should be useful in practice as proxies to the actual expected log likelihood ratios under the true parameters -- despite Lemma 2 showing that they converge to the true expectations (again, the sense in which this convergence occurs is omitted leading to imprecision in the statement), the rates as a function of n, t_2 may be slow. I agree in this regard with the same concern voiced by Reviewer 1, and do not see a satisfactory explanation to it in the paper's discussion. + +6. Comparison to literature. Contrary to the general picture painted in the paper about the lack of sufficient investigation of the ""unknown pre and post change parameter"" case, there does seem to be a rigorous body of work existing in this line that is not discussed in the manuscript. For instance, ""SEQUENTIAL CHANGE-POINT DETECTION WHEN THE PRE- AND POST-CHANGE PARAMETERS ARE UNKNOWN"", Lai and Xing, 2009, and ""A BAYESIAN APPROACH TO SEQUENTIAL SURVEILLANCE IN EXPONENTIAL FAMILIES"", Lai-Liu-Xing, 2009, are both works that address this very setting and in a comprehensive manner with theoretical guarantees. What the current manuscript does, in the context of both these works, is highly unclear. Is it trying to suggest an approximate way of computing the natural posterior distribution of the change time lambda given all data up to now, using the proxy P(lambda | v_alpha = n), or using a completely different approach altogether, is not adequately discussed at all, which makes the motivating arguments for the algorithm vague. + +7. Finally, but in no lesser measure, the Experimental Results section features a rather narrow set of (two) scenarios for which it presents numerics. For a paper that claims to demonstrate ""experimental results (over a wide variety of settings)"" [from the author response], this is quite telling as it renders the argument in favor of the paper's approach quite weak. Here again, for the first (synthetic) setting, I do not understand the relevance of the neural network adopted to fix the parameters of a Gaussian distribution. Moreover, the reported distributions of the ""regretted detection delay"" seem to be quite wide for all the approaches compared (unknown params, adaptive, GLR), precluding a reasonable comparison of their performance. The author(s) would do well to expand the scope of both synthetic and non-synthetic experiments to show the validity of their approach, and in each case carry out many more independent trials than just 500 for more accurate benchmarks. + +I do note that more experimental results have been reported in the appendix, but I would presume that they have more value being in the main body after the algorithm design is explained in a more succinct and clearer manner. This can only come about by a significant rewriting and reorganizing of the paper, which I am confident the author(s) can carry out in order to make this into a much stronger submission. I wish the author(s) good luck on this, and hope to see the strengths of this new approach brought out in a more impactful manner in the next revision. +",ICLR2021, +zIPrpEwbT,1576800000000.0,1576800000000.0,1,BklxI0VtDB,BklxI0VtDB,Paper Decision,Reject,"This paper introduces a two-level hierarchical reinforcement learning approach, applied to the problem of a robot searching for an object specified by an image. The system incorporates a human-specified subgoal space, and learns low-level policies that balance the intrinsic and extrinsic rewards. The method is tested in simulations against several baselines. + +The reviewer discussion highlighted strengths and weaknesses of the paper. One strength is the extensive comparisons with alternative approaches on this task. The main weakness is the paper did not adequately distinguish between which aspects of the system were generic to HRL and which aspects are particular to robot object search. The paper was not general enough to be understood as a generic HRL method. It was also ignoring much relevant background knowledge (robot mapping and navigation) if the paper is intended to be primarily about robot object search. The paper did not convince the reviewers that the proposed method was desirable for either hierarchical reinforcement learning or for robot object search. + +This paper is not ready for publication as the contribution was not sufficiently clear to the readers. +",ICLR2020, +H1e6zIcMxE,1544890000000.0,1545350000000.0,1,SkgkJn05YX,SkgkJn05YX,Some interesting ideas but is not mature enough for publication,Reject,"This paper presents a new technique for modifying neural network structure, and suggest that this structure provides improved robustness to black-box attacks, as compared to standard architectures. The paper is very thorough in its experimentation, and the method is simple and quite easy to understand. It also raises some important questions about adversarial examples. + +However, there are serious concerns regarding the evaluation methodology. In particular, the authors claim ""black-box robustness"" but do not test against any query-based attacks, which are known to perform better against gradient masking-based adversarial defenses. Furthermore, it is not clear why one would expect adversarial examples to transfer between models representing two completely different functions (i.e. from a standard model to a random mask model). So, the gray-box evaluation is much more informative and, unfortunately, random-mask seems to provide little to no robustness in this setting. + +Given how fundamental sound and convincing evaluation is for proposed defense methods, the submission is not ready for publication yet. In particular, the authors are urged to (a) evaluate on stronger black-box attacks, and (b) compare to a baseline that is known to be non-robust, (e.g. JPEG encoding or SAP), to verify that these results are actually due to black-box robustness and not simply obfuscation.",ICLR2019,5: The area chair is absolutely certain +gy43aB4wZP3,1610040000000.0,1610470000000.0,1,iMKvxHlrZb3,iMKvxHlrZb3,Final Decision,Reject,"This paper proposed an extension of the SIGN model as an efficient and scalable solution to handle prediction problems on heterogeneous graphs with multiple edge types. The approach is quite simple: (1) sample subsets of edge types, then construct graphs with these subsets of edge types and (2) compute node features on each such graph as if they have only a single edge type, (3) then aggregate the representations from multiple graphs into one using an attention mechanism, and (4) train MLPs on node representations as in SIGN. Results show that such a simple method can produce quite good results, and is very efficient and scalable. + +The reviewers of this paper put it on the borderline, with 3 out of 4 leaning toward rejection. The most common criticism is the lack of novelty. Indeed this paper is an extension of prior work SIGN, and the proposed approach is simple. However, I personally think the simplicity and the great empirical results is rather the strength of this paper. + +The authors also did a good job addressing reviewers’ comments and concerns in the discussions, but a few reviewers unfortunately didn’t actively engage in the process. + +I'd really encourage the authors to improve and highlight the strength of this paper more and submit to the next venue.",ICLR2021, +Gypvo2DLyD8,1610040000000.0,1610470000000.0,1,muu0gF6BW-,muu0gF6BW-,Final Decision,Reject,"This paper introduces a form of cubic smoothing for use with ODE-RNNs, to remove the jump when new observations occur. I think this paper's motivation is based on a misunderstanding of what the hidden state of an RNN represents. Specifically, an RNN hidden state is a belief state, not the estimated state of the system. + +I think R2 is right that it's correct for a filter to jump when seeing new data. It's not a matter of whether the phenomenon being modeled is slow-changing or not. The filtering output is a belief state, which can change instantaneously even if the true state does not. + +The important distinction to make is filtering (conditioning only on previous-in-time data) vs smoothing (conditioning on all data). The smoothing posterior should generally be smooth if the true state changes slowly. + +As R4 notes, all of the tasks are based on interpolation, which is not what the ODE-RNN is trying to do, and the proposed method would make the same predictions as a standard ODE-RNN. Finally, as R4 notes, ""The authors do not provide any experimentation on real-world irregularly sampled time series"".",ICLR2021, +B2RWA3OW45,1576800000000.0,1576800000000.0,1,HkgxheBFDS,HkgxheBFDS,Paper Decision,Reject,"The paper investigates the sensitivity of a QA model to perturbations in the input, by replacing content words, such as named entities and nouns, in questions to make the question not answerable by the document. Experimental analysis demonstrates while the original QA performance is not hurt, the models become significantly less vulnerable to such attacks. Reviewers all agree that the paper includes a thorough analysis, at the same time they all suggested extensions to the paper, such as comparison to earlier work, experimental results, which the authors made in the revision. However, reviewers also question the novelty of the approach, given data augmentation methods. Hence, I suggest rejecting the paper.",ICLR2020, +iY17PLkKrBI,1642700000000.0,1642700000000.0,1,3Skn65dgAr4,3Skn65dgAr4,Paper Decision,Reject,"The paper deals with the problem of adjusting the learning rate during gradient descent optimisation. Unfortunately the proposed approach is very similar to methods already presented in the literature and no significant contribution can be recognised. During the rebuttal, the author(s) have acknowledged their ignorance about the relevant literature and provided some further clarifications that did not turn into a revision of the reviewers’ initial assessment of the work.",ICLR2022, +Hkltjjuee4,1544750000000.0,1545350000000.0,1,HJe3TsR5K7,HJe3TsR5K7,Shows promise but requires improvements to presentation to make contribution clear,Reject,"This paper proposes a new image to image translation technique, presenting a theoretical extension of Wasserstein GANs to the bidirectional mapping case. + +Although the work presents promise, the extent of miscommunication and errors of the original presentation was too great to confidently conclude about the contribution of this work. + +The authors have already included extensive edits and comments in response to the reviews to improve the clarity of method, experiments and statement of contribution. We encourage the authors to further incorporate the suggestions and seek to clarify points of confusion from other reviewers and submit a revised version to a future conference.",ICLR2019,4: The area chair is confident but not absolutely certain +bexa0ZhJGHB,1642700000000.0,1642700000000.0,1,uF_Wl0xSA7O,uF_Wl0xSA7O,Paper Decision,Reject,"This paper presents work on multi-task learning. The reviewers appreciated the method based on SVD of loss gradients. However, concerns were raised regarding empirical effectiveness and overall impact. The reviewers considered the authors' response in their subsequent discussions. While the methods are interesting, the concerns over their effectiveness would need to be more thoroughly addressed in order to improve the impact of the paper. As such, it is encouraged that the authors take these suggestions into account in preparing a new version of the paper for a future submission.",ICLR2022, +NPUDy-sR4Gft,1642700000000.0,1642700000000.0,1,HCelXXcSEuH,HCelXXcSEuH,Paper Decision,Accept (Poster),"The paper presents a novel approximate second order optimization method for convex and nonconvex optimization problems. The search direction is obtained by preconditioning the gradient information with a diagonal approximation of the Hessian via Hutchinson's method and exponential averaging. The learning rate is updated using an estimate of the smoothness parameter. + +The merit of the paper has to be evaluated from the theoretical and empirical point of view. + +From the internal discussion, the reviewers agreed that the new algorithm is a mix a known methods, mainly present in AdaHessian, with a small tweak on the exponential average. Moreover, the theoretical guarantees do not seem to capture the empirical performance of the algorithm nor they provide any hint on how to set the algorithm's hyperparameters. For example, in Theorem 4.6 the optimal setting of $\beta_2$ is 1. That said, the most important theoretical contribution seems to lie in the fact that AdaHessian did not have any formal guarantee. Hence, this paper is the first one to show a formal guarantee this type of algorithms. + +From the empirical point of view, the empirical evidence is very limited for the today standards in empirical machine learning papers. The reviewers and me do not actually believe that the proposed algorithm dominates the state-of-the-art optimization algorithms used in machine learning. However, in the internal discussion we agreed that the algorithm has still potential and it should be added to the pool of optimization algorithms people can try. + +Overall, considering the paper in a holistic way, there seems to be enough novelty and results to be accepted at this conference. + +That said, I would urge the authors to take into account reviewers comments (and I also add some personal ones here). In particular, a frank discussion of current theoretical analysis and empirical evaluation is needed. + +Some specific comments: +- AdaGrad was proposed by two different groups at COLT 2010, so both papers should be cited. So, please add a citation to: +McMahan and Streeter. Adaptive bound optimization for online convex optimization. COLT 2010. +- Remark 4.7, second item: Neither Reddi et al.(2019) nor Duchi et al. (2011) *assume* bounded iterates, that must be proved not assumed. Instead, they explicitly project onto a domain that they assumed to be bounded. +- The convergence of the gradient to zero does not imply convergence to a critical point. To prove convergence to a critical point you should prove that the iterates converge, that in general is false even for lower bounded functions. Indeed, consider $f(x)=log(1+exp(-x))$, the iterates would actually diverge while the gradient still go to zero.",ICLR2022, +BlOF5X3Ikkd,1642700000000.0,1642700000000.0,1,T6lAFguUbw,T6lAFguUbw,Paper Decision,Reject,"The paper proposes a multi-agent RL framework that make decisions in a more human-like manner by incorporating rational inattention. The approach is evaluated on two game theoretic problems. The reviewers agree that the topic of the paper is interesting. However, there are concerns about the significance of the proposed approach. As the method incorporates human-inspired limitations, it's aim is not to outperform SOTA RL methods on regularly considered domains; at the same time, as the approach is only evaluated on two simulation-based tasks, it is unclear how it would perform in more realistic scenarios that may benefit from human-like decision making. For these reasons, I recommend rejection.",ICLR2022, +AKuwKcph79E,1610040000000.0,1610470000000.0,1,7JSTDTZtn7-,7JSTDTZtn7-,Final Decision,Reject,"This paper presents an algorithm for distributed optimization in that aims to be ""Byzantine-robust"", in the sense that it learns successfully when some of the workers send arbitrary messages. The goal in this work is to remain robust when each worker samples data from a different distribution. + +While reviewers found the work interesting, issues about the theoretical development arose during the discussion period, and it appears that the paper cannot be accepted in its current form. + +The most serious issue was with Proposition I, which appears to be incorrect. In its putative proof, the authors write that each gradient is sampled at most $s$ times. This naturally leads to the conclusion that, in Algorithm 2, when the Break statement is reached, $g_{j_i}$ is not used to compute $\bar{g}_t$. + +Given this interpretation of Algorithm 2, it seems that Proposition I cannot be true. For example, if $s=1$, once $t \geq n/2$, fairly often, gradients will be sampled that had previously been sampled. In such cases, would be zero, so that, on average, $\bar{g}_t$ would be biased toward zero in later rounds. + +Their putative proof of Proposition I refers to a whole chapter of a statistics text. We couldn't find anything in that chapter that implies what they claim about Algorithm 2 (or that treats a sampling scheme like Algorithm 2 at all). + +Throughout the paper, when the authors took expectations, it was not always clear what was random and what was fixed. After some discussion, disagreement remained about how to interpret some of the assumptions. This was true in particular about the assumption in the first displayed equation on page two. ",ICLR2021, +FL_d4bw0Zu,1610040000000.0,1610470000000.0,1,tYxG_OMs9WE,tYxG_OMs9WE,Final Decision,Accept (Poster),"This paper proposes a new approach to learning deep generative models with induced structure in the latent representation. All four reviewers gave the same score of 6 to this paper, showing a consensus that the paper is above the bar for acceptance. The authors did a commendable job of detailed replies to reviewer comments, which as R1, R3, and R4 all note has improved the clarity and quality of the paper, addressing their concerns.",ICLR2021, +Ehwwk01bkD,1576800000000.0,1576800000000.0,1,r1lZ7AEKvB,r1lZ7AEKvB,Paper Decision,Accept (Spotlight),The paper focuses on characterizing the expressiveness of graph neural networks. The reviewers were satisfied that the authors answered their questions suffciiently and uniformly agree that this is a strong paper that should be accepted.,ICLR2020, +u_JpDM4H31,1642700000000.0,1642700000000.0,1,_4GFbtOuWq-,_4GFbtOuWq-,Paper Decision,Accept (Poster),"The authors' provide a discussion of Cover's Theorem in the setting of equivariance. The reviewers consider the work well explained and interesting, especially after the revisions, and so I will vote to accept.",ICLR2022, +H1lqyTzegV,1544720000000.0,1545350000000.0,1,ryxHii09KQ,ryxHii09KQ,Meta-Review,Reject,"This paper presents an interesting strategy of curriculum learning for training neural networks, where mini-batches of samples are formed with a gradually increasing level of difficulty. +While reviewers acknowledge the importance of studying the curriculum learning and the potential usefulness of the proposed approach for training neural networks, they raised several important concerns that place this paper bellow the acceptance bar: (1) empirical results are not convincing (R2, R3); comparisons on other datasets (large-scale) and with state-of-the-art methods would substantially strengthen the evaluation (R3); see also R2’s concerns regarding the comprehensive study; (2) important references and baseline methods are missing – see R2’s suggestions how to improve; (3) limited technical novelty -- R1 has provided a very detailed review questioning novelty of the proposed approach w.r.t. Weinshall et al, 2018. +Another suggestions to further strengthen and extend the manuscript is to consider curriculum and anti-curriculum learning for increasing performance (R1). +The authors provided additional experiment on a subset of 7 classes from the ImageNet dataset, but this does not show the advantage of the proposed model in a large-scale learning setting. +The AC decided that addressing (1)-(3) is indeed important for understanding the contribution in this work, and it is difficult to assess the scope of the contribution without addressing them. +",ICLR2019,5: The area chair is absolutely certain +ZRTkeVkYK,1576800000000.0,1576800000000.0,1,S1gN8yrYwB,S1gN8yrYwB,Paper Decision,Reject,"The authors propose a hybrid model-free/model-based policy gradient method that attempts to reduce sample complexity without degrading asymptotic performance. They evaluate their approach on a collection of benchmark tests. + +The reviewers raised concerns about limited novelty of the proposed approach and flaws in the evaluation. The authors need to compare to more baselines and ensure that the baseline algorithms are performing as previously reported. Even then, the reported improvements were small. + +Given the issues raised by the reviewers, this paper is not ready for publication at ICLR.",ICLR2020, +PEIapz3c80,1576800000000.0,1576800000000.0,1,SkgWIxSFvr,SkgWIxSFvr,Paper Decision,Reject,"The paper proposes to regularize the decoder of the VAE to have a flat pull-back metric, with the goal of making Euclidean distances in the latent space correspond to geodesic distances. This, in turn, results in faster geodesic distance computation. I share the concern of R2 that this regularization towards a flat metric could result in ""biased"" geodesic distances in regions where data is scarce. I suggest the authors discuss in the next version of the paper if there are situations where this regularization might have drawbacks and if possible, conduct experiments (perhaps on toy data) to either rule out or highlight these points, particularly about scarce data regions. ",ICLR2020, +YZTfrgBtVCl,1610040000000.0,1610470000000.0,1,tkra4vFiFq,tkra4vFiFq,Final Decision,Reject,"While all reviewers agree the problem of TEEs for model training is well motivated, the reviewers remain divided on whether the concept of randomly selecting computations to verify has sufficient novelty, and whether the proposed gradient clipping method is well motivated. +",ICLR2021, +rkgAfEjge4,1544760000000.0,1545350000000.0,1,ByldlhAqYQ,ByldlhAqYQ,Meta Review,Accept (Poster),"This paper presents a method for transferring source information via the hidden states of recurrent networks. The transfer happens via an attention mechanism that operates between the target and the source. Results on two tasks are strong. + +I found this paper similar in spirit to Hypernetworks (David Ha, Andrew Dai, Quoc V Le, ICLR 2016) since there too there is a dynamic weight generation for network given another network, although this method did not use an attention mechanism. + +However, reviewers thought that there is merit in this paper (albeit pointed the authors to other related work) and the empirical results are solid. +",ICLR2019,4: The area chair is confident but not absolutely certain +SJetruXgeV,1544730000000.0,1545350000000.0,1,Ske1-209Y7,Ske1-209Y7,insufficient novelty,Reject,"The paper presents an architecture search method which jointly optimises the architecture and its weights. As noted by reviewers, the method is very close to Shirakawa et al., with the main innovation being the use of categorical distributions to model the architecture. This is a minor innovation, and while the results are promising, they are not strong enough to justify acceptance based on the results alone.",ICLR2019,5: The area chair is absolutely certain +_YWKKbORrZo,1642700000000.0,1642700000000.0,1,_faKHAwA8O,_faKHAwA8O,Paper Decision,Reject,"Authors present an approach to consolidate multiple teachers into a single student model that can be adapted to new tasks. The method involves using a proxy dataset to facilitate distillation to prevent having to replay images from the teacher datasets. A multi-task multi-head objective is utilized, agnostic to the loss function, in which two are studied. Downstream task performance is used as the performance measure. + +Pros: +- The problem of how to best leverage multiple teachers for a downstream task is important and interesting. +- Presents a method to generate distilled students that can be finetuned to tasks that demonstrates performance gains over baselines (imagenet alone or task specific teacher). +- Easy to follow and implement. +- Analysis across multiple datasets. + +Cons: +- Multiple reviewers expressed concerns about current level of novelty / contribution. In some sense, it is natural to expect that combinations of task-related and generalist distillation would improve performance. +- Main results demonstrate improvements in performance when teacher and tasks are related to one another. But authors do not address how to select task-specific teachers for distillation. Related tasks and their matching to the target task are assumed to be known. Authors cited related prior works that attempt to do this matching, but do not apply it to their study for a full solution. +- Authors do not study variations of generalist teachers. How does changing the generalist teacher impact performance? +- Some reviewers expressed concern presentation is not clear. In particular, the style of figures may not be appropriate to best convey results and analyses of this type of work. Comparing different approaches is difficult looking at thin lines. Tables are perhaps better suited to convey these results. +- Multiple reviewers expressed concerns full-finetuning results are not convincing (Fig 4), though few-shot results look more convincing + +Authors and reviewers had interaction, but reviewers maintained their recommendation of weak reject. All reviews unanimous in their decisions. Authors are encouraged to take into consideration all the comments and submit to another venue.",ICLR2022, +gcahMv4sAHf,1642700000000.0,1642700000000.0,1,nhnJ3oo6AB,nhnJ3oo6AB,Paper Decision,Accept (Spotlight),"The paper addresses vision-based and proprioception-based policies for learning quadrupedal locomotion, using simulation and real-robot experiments with the A1 robot dog. The reviewers agree on the significance of the algorithmic, simulation, and real-world results. Given that there are also real-robot evaluations, and an interesting sim-to-real transfer, the paper appears to be an important acceptance to ICLR.",ICLR2022, +z4j4TXaTk,1576800000000.0,1576800000000.0,1,r1e8qpVKPS,r1e8qpVKPS,Paper Decision,Reject,"This paper theoretically and empirically studies the inner and outer learning rate of the MAML algorithm and their role in convergence. While the paper presents some interesting ideas and add to our theoretical understanding of meta-learning algorithms, the reviewers raised concerns about the relevance of the theory. Further the empirical study is somewhat preliminary and doesn't compare to prior works that also try to stabilize the MAML algorithm, further bringing into question its usefulness. As such, the current form of the paper doesn't meet the bar for ICLR.",ICLR2020, +gV0ujR7cQf,1576800000000.0,1576800000000.0,1,ByxY8CNtvr,ByxY8CNtvr,Paper Decision,Accept (Poster),"Main content: + +Blind review #2 summarizes it well: + +Summary: This paper deals with the representation degeneration problem in neural language generation, as some prior works have found that the singular value distribution of the (input-output-tied) word embedding matrix decays quickly. The authors proposed an approach that directly penalizes deviations of the SV distribution from the two prior distributions, as well as a few other auxiliary losses on the orthogonality of U and V (which are now learnable). The experiments were conducted on small and large scale language modeling datasets as well as the relatively small IWSLT 2014 De-En MT dataset. + +Pros: ++ The paper is well-written with great clarity. The dimensionality of the involved matrices (and their decompositions) are clearly provided, and the approach is clearly described. The authors also did a great job providing the details of their experimental setup. ++ The experiments seem to show consistent improvements over the baseline methods (at least the ones listed by the authors) on a relatively extensive set of tasks (e.g., of both small and large scales, of two different NLP tasks). Via WT2 and WT103, the authors also showed that their method worked on both LSTM and Transformers (which it should, as the SVD on word embedding should be independent of the underlying architecture). ++ I think studying the expressivity of the output embedding matrix layer is a very interesting (and important) topic for NLP. (e.g., While models like BERT are widely used, the actual most frequently re-used module of BERT is its pre-trained word embeddings.) + +-- + +Discussion: + +The reviewers agree that it is a very well written paper, and this is important as a conference paper to illuminate readers. + +The one main objection is that spectrum control regularization was previously proposed and applied to GANs (Jiang et al ICLR 2019). However the authors convincingly point out that the technique is widely used, not only for GANs, and that application to neural language generation has quite different characteristics requiring a different, new approach: ""our proposed prior distributions as shown in Figure 2 in our paper are fundamentally different from the singular value distributions learned using their penalty functions (See Figure 1 and Table 7 in Jiang et al.’s paper). Figure 1 in their paper suggests that their penalty function, i.e., D-optimal Reg, will encourage all the singular values close to 1, which is well aligned with their motivation for training GAN. However, if we use such penalty function to train neural language models, the learned word representations will lose the power of modeling contextual information, and can result in much worse results than the baseline methods."" + +-- + +Recommendation and justification: + +I concur with the majority of reviewers that this paper is a weak accept. Though not revolutionary, it is well written, has usefully broad application, and is supported well empirically.",ICLR2020, +HkmySJprf,1517250000000.0,1517260000000.0,476,Hy_o3x-0b,Hy_o3x-0b,ICLR 2018 Conference Acceptance Decision,Reject,"The paper proposes a VAE variant by embedding spatial information with multiple layers of latent variables. Although the paper reports state-of-the-art results on multiple datasets, some results may be due to a bug. This has been discussed, and the author acknowledges the bug. We hope the problem can be fixed, and the paper reconsidered at another venue. +",ICLR2018, +bjt_OodcwA2,1642700000000.0,1642700000000.0,1,FxBdFwFjXX,FxBdFwFjXX,Paper Decision,Reject,"Though some concepts discussed in the submission are interesting, there are many major concerns: there is a lack of literature review, comparison experiments with the state-of-the-art methods are missing, the technical novelty of the proposed method is very limited. +In the rebuttal, the authors agreed with reviewers' comments and did not provide responses to address reviewers' concerns. + +Therefore, based on its current form, this submission does not meet the standard of publication at ICLR.",ICLR2022, +Wfe1py7mele,1610040000000.0,1610470000000.0,1,P0p33rgyoE,P0p33rgyoE,Final Decision,Accept (Poster),"This paper revisits the under-explored ""implicit"" variant of Variational Intrinsic Control introduced by Gregor et al. They identify a flaw that biases the original formulation in stochastic environments and propose a fix. + +Reviewers agree that there is a [at least a potential, R4] contribution here: ""even the description of what implicit VIC is trying to do is a novel contribution of this work"", in the words of R2, and ""the derivation has theoretical value and is not a simple re-derivation of VIC"", in R4's post-rebuttal remarks. Several reviewers raised significant concerns around clarity, which were addressed in an updated manuscript, which also provided new visualizations and new experiments which reviewers found compelling. All reviewers agreed that the revised manuscript was considerably improved. + +R4's score stands at the 5, with the other reviewers all standing at 6. R4's main concerns are around whether the missing term in the mutual information identified by the authors is a problem in practice on non-toy tasks (echoing somewhat R3's concerns re: high-dimensional tasks). While this is a valid concern, the function of a conference paper needn't necessarily be to (even attempt) to provide the final word on a matter. Identifying subtle issues such as the one brought forth in this manuscript and re-examining old ideas is a valuable service to the community, and this paper will serve as a beginning to a conversation rather than an end. The AC also considers themselves rather familiar with the original VIC paper, and found the results herein somewhat surprising and noteworthy. + +I recommend acceptance, but encourage the authors to incorporate remaining feedback in the camera-ready.",ICLR2021, +dWoqrw8qIb,1642700000000.0,1642700000000.0,1,LcF-EEt8cCC,LcF-EEt8cCC,Paper Decision,Accept (Poster),"This paper discusses an issue with decomposing a conditional generative model into an unconditional model and a separate classifier using Bayes' theorem, which is an approach that has recently received increased attention in the context of score-based generative models. It explores several alternatives for mitigating this issue, including a novel one, which is to use a different loss function to train the classifier. + +Reviewers praised the writing and the way this work draws attention to an issue that is underappreciated in the community. Although several weaknesses (clarity, scale of experiments, appropriateness of baselines, missing experiments) were also highlighted in the original reviewers, all reviewers agree after discussion that the authors have adequately addressed these for the paper to be considered for acceptance. I will follow their recommendation and recommend acceptance as well.",ICLR2022, +HUXUeHJuSRJ,1642700000000.0,1642700000000.0,1,VLgmhQDVBV,VLgmhQDVBV,Paper Decision,Accept (Poster),"There was a healthy discussion with all the reviewers with a consensus that the results are somewhat expected and unlikely to shed light beyond the ntk regime, yet within the confine of ntk there is a solid and nicely written technical contribution.",ICLR2022, +CszKZkVN74p,1610040000000.0,1610470000000.0,1,aFvG-DNPNB9,aFvG-DNPNB9,Final Decision,Reject,"The paper proposes a variant of the hierarchical VAE architectures. All reviewers felt that the paper's clarity was lacking. While the authors made very significant improvements during the feedback phase, which were recognized by reviewers, the paper could use a revision that takes clarity into account from the ground up. I also think that the ablation studies should be expanded (if SOTA is not the goal, then science should be), e.g., compare to the setting in which q does not share the bijective layers with p.",ICLR2021, +NRWkkNdx1h,1610040000000.0,1610470000000.0,1,pbXQtKXwLS,pbXQtKXwLS,Final Decision,Reject,"Addressing the initialization issue in DNNs is an important topic, and the proposed approach is found by the reviewers to be interesting. However, the reviewers feel that to clearly promote this research beyond the 'proof of concept' phase, deeper investigation in multi-layer architectures would be required. This would raise the significance of the paper. Besides extending the study to deeper networks, the paper could also benefit from elaborate experiments to increase convincingness, in particular by addressing R4's concerns regarding robustness of performance e.g. on small dataset sizes. Finally, the methodology is sound and the authors clarify the significance of the ReLU associated covariance; however, overall the paper does not offer significant technical advancements that could make up for the shortcomings in the areas discussed above. ",ICLR2021, +f00u1ucEPfQ,1642700000000.0,1642700000000.0,1,_3bwD_KXl5K,_3bwD_KXl5K,Paper Decision,Reject,"The authors propose in this manuscript to use spiking neural networks (SNNs) as an efficient alternative to dilated temporal convolutions. They propose to utilize the membrane time constant of neurons instead of synaptic delays for memory efficiency. Training such networks with BPTT achieves better performance than other SNN-based methods and achieve close to SOTA compared to ANN solutions for keyword spotting. + +Pros: +- The manuscript addresses an interesting problem. +- Performance is good + +Cons: +- Limited evaluations regarding efficiency, although this is a main point of the paper. +- The technical novelty is limited. +- One reviewer noted that the model is not actually an SNN, due to the use of multiple spikes per time step. +- Benchmarking is weak. Little comparison with previous work. +- Structure and writing of the paper needs improvement. + +The authors did not reply to any of these critical points. In summary, although the idea seems interesting, the manuscript is not ready for publication.",ICLR2022, +6vuHw07gIaH,1642700000000.0,1642700000000.0,1,#NAME?,#NAME?,Paper Decision,Reject,"This work takes ReQuest, an approach for safe deep reinforcement learning utilizing human feedback, and studies it's feasibility in pixel-based 3D environments (previously it was only shown to work in simple 2D environments). In order to apply ReQuest in this much more challenging settings, this novel instantiation of ReQuest learns a pixel-based dynamics model from a lot of human demonstration data, and a different (as compared to the ""base"" ReQuest) reward sketching approach to infer the reward function from human feedback. + +**Strengths** +globally a well motivated and well written/structured manuscript +Adresses an important problem, and shows promising results + +**Weaknesses** +on the more detailed level, there are some clarity concerns (even with the lengthy appendix) +evaluation was missing some more relevant comparison (partially fixed after rebuttal/revision) +lack of technical novelty, and lack of in depth analysis of results +motivation of algorithmic choices : why did you choose the reward sketching approach that you chose? How is it different, and does it improve performance? + +**Rebuttal** +The authors addressed most questions/things that were unclear and updated the paper to include an additional baseline. + +Additional baseline: First, it's great that you added this additional baseline! Yet, to me it's unclear what that additional baseline really represents (Request + sparse rewards). The original ReQuest paper also learns reward from human feedback, is that what you did for this paper? If yes then what does the sparse reward mean? Why does this version of request perform worse than your proposed version? + +**Summary** +I agree with the reviewers and authors that this is a promising direction. However, in it's current form this manuscript is not ready yet for publication. My concern are centered around motivation of algorithmic choices: The reward sketching part (while not novel in itself) is a novel component of the ReQuest pipeline, but you do not evaluate what it adds, and neither do you motivate that choice. Furthermore, the additional baseline is not clearly described, it's unclear how it's different for your proposed approach and why we see the performance improvement of your ReQuest version vs the baseline ReQuest version.",ICLR2022, +uAl3JbXlyjW,1642700000000.0,1642700000000.0,1,327eol9Xgyi,327eol9Xgyi,Paper Decision,Reject,"The submission receives mixed ratings initially. Three reviewers are on the borderline and one reviewer EG97 leans negatively. The raised issues mainly reside on the technical contribution, technical correctness, and experimental validation. In the rebuttal, the authors have tried to address the raised issues and discussed them in-depth with reviewers. However, the discussion does not change the reviewer's mind. After checking all the reviews, rebuttals, and discussions. The AC stands for the reviewer side that the technical contribution is a major issue that ought to be solved. The proposed TPN comes from the summarization of the existing FPN based structure and there are not sufficient insights to make significant improvements. Besides, there are still unsolved issues regarding the technical presentation and experimental validations. The authors are suggested to improve the current manuscript based on these reviews and welcome to submit for the next venue.",ICLR2022, +kP-qCO6VbAq,1610040000000.0,1610470000000.0,1,mo3Uqtnvz_,mo3Uqtnvz_,Final Decision,Reject,"This submission got 3 rejection and 1 marginally below the threshold. In the original reviews, most of the concerns lie in the limited novelty, the inferior performance to some existing similar works and the limited scalability of the proposed method. Though authors provide some additional experiments, the reviewers still feel the experiments are not convincing and keep their ratings. AC agrees with the reviewers comments on this paper. Though achieving SOTA performance is not necessary for every submission, NAS-alike method is purely pursuing better performance (either higher accuracy or better efficiency). Thus, the performance is also important for evaluating a NAS paper. From the reviewers, the proposed method does not show better performance than some existing works, like BiFPN. This makes the value of the paper is not clear, in particular considering the method novelty is limited. The authors could consider to improve the submission in the experiments to better justify the proposed method, either achieving better performance or higher efficiency than existing works. At its current status, AC cannot make accept recommendation. ",ICLR2021, +qpIiMuuBzSZ,1642700000000.0,1642700000000.0,1,on54StZqGQ_,on54StZqGQ_,Paper Decision,Reject,"The work focuses on the observation that, given a certified epsilon-robust model and a certified clean input x, many inputs within the epsilon ball around x are themselves not epsilon-certifiable although they are correctly classified. The authors argue that an adversary can exploit this property to produce inputs which are correctly classified by the model yet are not certifiably robust. Reviewers agreed that the paper was overall well written, the methods were clear and overall evaluated thoroughly, and many felt that the main idea was interesting. There were some concerns regarding the significance of the contribution, the primary observation itself is arguably novel but somewhat obvious, and the proposed algorithm for finding non-certifiable points isn't a significant contribution when standard techniques like PGD are sufficient. Much of the reviewer discussion concerned whether or not the proposed attack made sense as a threat model. It is the AC's opinion that this discussion did not reach any meaningful conclusions. It is important to remember that the lp threat model is intended as an abstract toy game so that a formal theory of neural network certification can be developed under idealized settings. It is not intended to model any realistic security scenarios, and even more generalized notions of ""imperceptible"" or ""subtle"" attacks aren't realistic when for the bulk of applied settings real adversaries are not restricted to small perturbations in the first place [1]. The example provided by the authors regarding small perturbations of a stop sign isn't a relevant example when the adversary has more effective options, e.g. knocking over stop signs [1, Figure 3]. + +For the sake of discussion, one could consider whether or not a degradation attack would make sense under more general threat models such as content-preserving perturbations. An example discussed in [1] concerns adversaries uploading copyrighted content to public streaming services—this attacker defender game is being actively played in the wild where defenders produce statistical models which attempt to flag content as semantically matching existing copyrighted content in a private database, while attackers make large semantically-preserving modifications in order to evade statistical detection. An example attack would be cropping 20% of the boundary pixels of a movie and replacing the cropped portion with arbitrary adversarially constructed backgrounds. Epsilon perturbations are possible, but are almost a measure 0 subset of the full attacker action space. Suppose in the far future neural network certification advanced to the point where we could certify that a classifier was robust to all possible content-preserving perturbations of a specific movie. In this case the defender would be using the certification method on their private database of copyrighted content, they would not be running the certifier on any content uploaded by users. If a movie in the private database is certified, then we already know that an attacker cannot successfully upload an adversarial version of it, it would be unnecessary to certify whether or not user uploaded content could be further perturbed in a way to become adversarial. Perhaps degradation attacks could be possible as a training poisoning attack, but this seems a bit far-fetched when more traditional training poisoning attacks would be preferred. Given this, at least in this example the AC does not see how a degradation attack would make sense as a threat model. + +Given that the primary contribution of this work is a novel threat model for ML security, it is crucial that the authors rewrite their work to make more realistic assumptions of the capabilities of realistic adversaries. Starting with some of the examples discussed in [1] may be useful to the authors. Although the example of adversarial attacks on copyright detection classifiers doesn't seem to fit the degradation attack threat model, perhaps other scenarios would. + +1. Gilmer et. al, Motivating the Rules of the Game for Adversarial Example Research, https://arxiv.org/pdf/1807.06732.pdf",ICLR2022, +v0-AWjsH3Yy,1610040000000.0,1610470000000.0,1,QQzomPbSV7q,QQzomPbSV7q,Final Decision,Reject,"This paper is truly borderline. On one hand, the theoretical contribution seems novel and interesting, however, there appears to be somewhat of a gap between theory and practice. + +There is unfortunately another problem. According to the authors, the main contribution of this publication is arguably the introduction of the nearest neighbor as the positive example in the triplet loss. However, the authors seem to be unaware of the history of the triplet loss. It was originally introduced by Schultz & Joachims 2004 as a loss over all triplets. Weinberger et al. 2005 changed it and use the nearest neighbor as ""target neighbor"", which is called ""easy positives"" here, as the objective of LMNN. In 2009 Chechik et al. subsequently relaxed this positive neighbor formulation to any similarly labeled sample (going back to the Schultz & Joachims formulation) but sampling triplets. The re-introduction of the nearest neighbor as ""easy positive"" was then covered by Xuan et al. 2020. + +Unfortunately all of this diminishes the novelty significantly and it is clear that the paper in its current form does not have a strong enough contribution. I do encourage the authors to take a close look at the original LMNN publication and Xuan et al and write an improved re-submission for the next conference that maybe focuses more on the theoretical contribution. +Good luck, + +AC",ICLR2021, +B8AlMtkrDU,1610040000000.0,1610470000000.0,1,6Tm1mposlrM,6Tm1mposlrM,Final Decision,Accept (Spotlight),"The paper proposes to minimize the loss while regularizing its sharpness: so that the minimum will lie in a region with uniformly low loss. +The reviewers uniformly appreciated the paper. They have made a number of suggestion for improving the paper, which the authors should consider incorporating in their final version. + +",ICLR2021, +Hy43fkaSG,1517250000000.0,1517260000000.0,15,HkwZSG-CZ,HkwZSG-CZ,ICLR 2018 Conference Acceptance Decision,Accept (Oral),"Viewing language modeling as a matrix factorization problem, the authors argue that the low rank of word embeddings used by such models limits their expressivity and show that replacing the softmax in such models with a mixture of softmaxes provides an effective way of overcoming this bottleneck. This is an interesting and well-executed paper that provides potentially important insight. It would be good to at least mention prior work related to the language modeling as matrix factorization perspective (e.g. Levy & Goldberg, 2014).",ICLR2018, +NND8P2M2iY,1642700000000.0,1642700000000.0,1,daYoG2O4TtU,daYoG2O4TtU,Paper Decision,Reject,"This paper introduces a deep neural network sequence-to-sequence framework for modifying the length of a speech sequence. It employs a convolutional encoder-decoder architecture optimized under a Bayesian formulation with variational inference. The proposed framework is evaluated on a voice conversion task and three emotion conversion tasks. The results show that it can successfully change the duration of an utterance without accessing the target utterance. Almost all reviewers raised concerns with some strong or inaccurate claims made by the authors in the paper. The literature review on related work also needs to be significantly improved. Another major concern is on experiments. Other than the DTW compared in the work, the proposed method should also be compared with existing duration modification techniques. The MOS evaluation seems to be limited and needs further improvement to make the results stronger and more convincing. Since the authors did not provide a rebuttal, all these major concerns remain unanswered.",ICLR2022, +9URc6c_RNIJ,1610040000000.0,1610470000000.0,1,Dh29CAlnMW,Dh29CAlnMW,Final Decision,Reject," +This paper presents ""Automunge"" a python library for pre-processing tabular data. +The authors develop a useful library that can be used by practicioners for data engineering in NNs applications. +The reviewers raised a common concern regarding the lack of focus on the actual usefulness of the librabry in improving the +performance of the models that is applied on. A common concern was the lack of performance plots compared to other alternatives. +In the response the authors have done a rather thorough job of addressing the reviewers comments and +adding material in the supplementary. However, given the current presentation, the manuscript needs a considerable amount of rewriting to incorporate the suggested changes into the main paper. As it is, I don't think ICLR is the right venue for the manuscript. It might reach its audience better in venues like SysMl or PyCon also suggested by a reviewer. +",ICLR2021, +hn1-7_MRN,1576800000000.0,1576800000000.0,1,S1xKYJSYwS,S1xKYJSYwS,Paper Decision,Reject,"This paper proposes to represent the distribution w.r.t. which neural architecture search (NAS) samples architectures through a variational autoencoder, rather than through a fully factorized distribution (as previous work did). + +In the discussion, a few things improved (causing one reviewer to increase his/her score from 1 to 3), but it became clear that the empirical evaluation has issues, with a different search space being used for the method than for the baselines. There was unanimous agreement for rejection. I agree with this judgement and thus recommend rejection.",ICLR2020, +ign6Y3XzgP4,1610040000000.0,1610470000000.0,1,0F_OC_oROWb,0F_OC_oROWb,Final Decision,Reject,"The paper proposes a variant derivative-free optimization algorithm, that belongs to the family of Evolution Strategies (ES) and zero-order optimization algorithms, to train deep neural networks. The proposed Random Search Optimization (RSO) perturbs the weights via additive Gaussian noise and updates the weights only when the perturbations improve the training objective function. Unlike the existing ES and black-box optimization algorithms that perturb all the weights at once, RSO perturbs and updates the weights in a coordinate descent fashion. RSO adds noise to only a subset of the weights sequentially, layer-by-layer, and neuron-by-neuron. The empirical experiments demonstrated RSO can achieve comparable performance when training small convolutional neural networks on MNIST and CIFAR-10. + +The paper contains some interesting ideas. However, there are some major concerns in the current submission: + +1) Novelty: there is a wealth of literature in optimization neural networks via derivative-free methods. The proposed algorithm belongs to Evolution Strategies and other zero-order methods, (Rechenberg & Eigen, (1973); Schmidhuber et al., (2007); Salimans et al., (2017). Unforunately, among all the rich prior works on related algorithms, only Salimans et al. (2017) is merely mentioned in the related works. Furthermore, the experiments only compared against SGD rather than any other zero-order optimization algorithms. + +Many ideas in Algorithm 1 was proposed in the prior ES literature: + +- Evaluate the weights using a pair of noise, -\deltaW and +\deltaW in Alg1 Line13-14 is known as antithetic sampling Geweke (1988), also known as mirrored sampling Brockhoff et al. (2010) in the ES literature. + +- Update the weights by considering whether the objective function has improved or not was proposed in Wierstra et al. (2014) that is known as fitness shaping. + +Given the current submission, it is difficult to discern the contribution of the proposed method when compared to the prior works. In addition, the convergence analysis of the zero-order optimization was studied in Duchi et. al. (2015) that includes the special coordinate descent version closely related to the proposed algorithm. + +2) Experiments: + +- Although the experiments showcase the performance of sequential RSO, the x-axis in Figure 4 only reported the iterations after updating the entire network. The true computational cost of the proposed RSO is the #forwardpass x #parameters, that is much more costly than the paper currently acknowledges. Also, RSO requires drawing 5000 random samples and perform forward-passes on all 5000 samples for every single weight update. It will be a great addition to include the #multiplications and computation complexity of RSO and the baseline algorithms. + +- More importantly, the paper only compared RSO with SGD in all the experiments. It will significantly strengthen the current paper by including some of the existing ES algorithms. + + +In summary, the basic idea is interesting, but the current paper is not for publication and will need further development and non-trivial modification. + +",ICLR2021, +HJewlYGWgV,1544790000000.0,1545350000000.0,1,HyxPx3R9tm,HyxPx3R9tm,Intuitive idea that leads to impressive results! ,Accept (Poster),"The paper proposes a simple and general technique based on the information bottleneck to constrain the information flow in the discriminator of adversarial models. It helps to train by maintaining informative gradients. While the information bottleneck is not novel, its application in adversarial learning to my knowledge is, and the empirical evaluation demonstrates impressive performance on a broad range of applications. Therefore, the paper should clearly be accepted. +",ICLR2019,5: The area chair is absolutely certain +ryXpXJTSG,1517250000000.0,1517260000000.0,233,HyRVBzap-,HyRVBzap-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper forms a good contribution to the active area of adversarial training. The main issue with the original submission was presentation quality and excessive length. The revised version is much improved. However, it still needs some work on the writing, in large part in the transferability section but also to clean up a large number of non-native formulations like missing/extra determiners and some awkward phrasing. It should be carefully proofread by a native English speaker if possible. Also, the citation formatting is incorrect (frequently using \citet instead of \citep).",ICLR2018, +9kYo8wadbeg,1610040000000.0,1610470000000.0,1,zspml_qcldq,zspml_qcldq,Final Decision,Reject,"The paper discusses the problem of how to augment cross-modal retrieval for the task of multi-modal classification -- it uses image caption pairs to improve downstream multimodal learning, and shows improvement in the task of visual question answering. However, the paper has the following weaknesses: (a) lack of novelty, (b) lack of thorough empirical evaluation, (c) the complex model did not give significant gains.",ICLR2021, +2ymtuG5ZHqm3,1642700000000.0,1642700000000.0,1,FqMXxvHquTA,FqMXxvHquTA,Paper Decision,Reject,"This paper deals with segmentation of time series. The paper has received quite detailed reviews and the approach seems to have several interesting aspects (interesting architecture choice, stepwise classification approach, ability of capturing long range dependencies). However, there is a consensus that the paper would definitely benefit from a further iteration before publication in ICLR or in any other similar venue. The authors in their final response have already identified the improvement points raised by the reviewers. In addition to these, I believe it would be helpful to put the contributions better into perspective with existing literature. I think all these this would require a major rewrite and I encourage the authors to make a fresh submission in a future venue.",ICLR2022, +F03E44pYtE,1576800000000.0,1576800000000.0,1,rJxX8T4Kvr,rJxX8T4Kvr,Paper Decision,Accept (Poster),"The authors consider a parameter-server setup where the learner acts a server communicating updated weights to workers and receiving gradient updates from them. A major question then relates in the synchronisation of the gradient updates, for which couple of *fixed* heuristics exists that trade-off accuracy of updates (BSP) for speed (ASP) or even combine the two allowing workers to be at most k steps out-of-sync. Instead, the authors propose to learn a synchronisation policy using RL. The authors perform results on a simulated and real environment. Overall, the RL-based method seems to provide some improvement over the fixed protocols, however the margin between the fixed and the RL get smaller in the real clusters. This is actually the main concern raised by the reviewers as well (especially R2) -- the paper in its initial submission did not include the real cluster results, rather these were added at the rebuttal. I find this to be an interesting real-world application of RL and I think it provides an alternative environment for testing RL algorithms beyond simulated environments. As such, I’m recommending acceptance. However, I do ask the authors to be upfront with the real cluster results and move them in the main paper. +",ICLR2020, +MNWJNVw__Wt,1642700000000.0,1642700000000.0,1,bsycpMi00R1,bsycpMi00R1,Paper Decision,Accept (Poster),"The paper studies the convergence of a generalized gradient descent ascent (GDA) flow in certain classes of nonconvex-nonconcave min-max setups. While the nonconvex-nonconcave setups are computationally intractable and GDA is known to converge even in some convex-concave setups, the paper argues that (generalized) GDA can converge on what is dubbed ""Hidden Convex-Concave Games"" in the paper and argued that it contains GANs as a special case. + +The reviewers all found the paper interesting and a worthy contribution to the literature on nonconvex-nonconcave zero-sum games. Main concerns expressed by the reviewers were w.r.t. the lack of convergence rate established for the considered dynamics, novelty compared to existing work, and practical usefulness of the considered scheme, as it involves preconditioning/matrix inversion. The authors made an effort to address all the concerns, to the extent possible. + +Given the complexity of nonconvex-nonconcave min-max setups, their importance in GAN training, and the insightful perspective on hidden convexity/concavity in typical problem instances, I would like to see this paper published at ICLR. I would, however, strongly advise the authors to take all of the reviewers' comments into account when preparing a revision.",ICLR2022, +Zn3c95qXChv,1610040000000.0,1610470000000.0,1,EKw6nZ4QkJl,EKw6nZ4QkJl,Final Decision,Reject,"The paper combines logical reasoning and statistical methods to improve knowledge graph completion. Rules are mined from the KG using AMIE and recursive backward steps are taken, using the mined rules, to determine if a fact is true. The reviewers agree that the paper can be improved by explaining more details of the method to make it more easy to understand. +",ICLR2021, +o4Y8X0k8lH2,1610040000000.0,1610470000000.0,1,fMHwogGqTYs,fMHwogGqTYs,Final Decision,Reject,"This paper proposes to learn representations in an unsupervised manner using a generative model in which observations are generated by combining independent causal mechanisms (ICMs), in combination with a global mechanism. The authors introduce an unconventional mixture prior for the shared and independent components of the representation and train an encoder, discriminator and generator using a Wasserstein GAN with additional terms that enforce consistency in the data and latent space. Experiments consider variations of MNIST and Fashion-MNIST and perform comparisons against a standard VAE, a β-VAE, and the Ada-GVAE. + +Reviewers are broadly in agreement that this submission is not ready for publication in its current form. R4 in particular has left very detailed comments regarding clarity. The authors were able to in part address these comments, and R4 raised their score in response. That said, from a read of the manuscript in its latest form, the metareviewer (who is very familiar with literature on disentangled representations) is inclined to agree with the reviewers that this is work that has value, but is very difficult to follow in its current form. The metareviewer would like to suggest that the authors regroup, think carefully about how to improve clarity (in addition to addressing concrete points raised in reviews) and resubmit to a different venue. ",ICLR2021, +a1W_5eQg0nj,1642700000000.0,1642700000000.0,1,ucASPPD9GKN,ucASPPD9GKN,Paper Decision,Accept (Poster),"Heterophily is known to degrade the performance of graph neural networks. This paper explores whether, for graph convolutional networks (GCNs), this is a general phenomenon, or if there are some circumstances under which a GCN can still perform well in a heterophilous setting. This paper characterizes one such setting under a contextual stochastic block model (CSBM) distribution with two classes (generalized in the appendix to multiple classes). The main takeaway is that there are indeed scenarios where a GCN can be expected to perform well, even under heterophilic neighborhoods. + +There are limitations, and the reviewers have been fairly thorough in pointing these out: the analysis is specific to GCNs under CSBM, and there are a number of assumptions on the node label/feature/neighborhood distributions. The non-linear operations in the GCN have also been dropped. Even still, the reviewers were generally satisfied that the experiments backed up the claims in this specific scenario. + +There is still quite a bit more to do in order to make this a more general result. Essentially, this paper shows that heterophily is not always a problem. One reviewer has stated that it is not always considered a problem anyway, but at least this paper outlines a specific scenario in which this is theoretically true. However, there is still a large space of “bad” heterophily, and this paper leaves open what these are, and how to deal with them. It is also possible that there are other “good” scenarios as well that are unexplored. + +Still, in the narrow scope under which the analysis lies, the paper is clear and accomplishes what it sets out to do. I would encourage the authors to ensure that the paper incorporates the suggestions of the reviewers, particularly with regard to scope, to ensure that the paper is properly grounded in its claims. + +All reviewers leaned towards the side of acceptance, except one who did not engage in post-review discussion. After reading over their review, and the subsequent response, I am satisfied that their concerns have been adequately addressed.",ICLR2022, +BJxPqvDZg4,1544810000000.0,1545350000000.0,1,BJgEjiRqYX,BJgEjiRqYX,related literature and evaluations,Reject,"The paper proposes a generative model that generates one object at a time, and uses a relational network to encode cross-object relationships. Similar object-centric generation and object-object relational network is proposed in ""sequential attend, infer, repeat"" of Kosiorek et al. for video generation, which first appeared on arxiv on June 5th 2018 and was officially accepted in NIPS 2018 before the submission deadline for ICLR 2019. Moreover, several recent generative models have been proposed that consider object-centric biases, which the current paper references but does not compare against, e.g., 'attend, infer, repeat' of Eslami et al., or ""DRAW: A Recurrent Neural Network For Image Generation"" of Gregor et al. . The CLEVR dataset considered, though it contains real images, the intrinsic image complexity is low because it features a small number of objects against table background. As a result, the novelty of the proposed work may not be sufficient in light of recent literature, despite the fact that the paper presents a reasonable and interesting approach for image generation. +",ICLR2019,4: The area chair is confident but not absolutely certain +HkeNKSLGxE,1544870000000.0,1545350000000.0,1,ByG_3s09KX,ByG_3s09KX,"Supportive of open source DRL frameworks, but this is not a scientific contribution",Reject,"The paper presents Dopamine, an open-source implementation of plenty of DRL methods. It presents a case study of DQN and experiments on Atari. The paper is clear and easy to follow. + +While I believe Dopamine is a very welcomed contribution to the DRL software landscape, it seems there is not enough scientific content in this paper to warrant publication at ICLR. Regarding specifically the ELF and RLlib papers, I think that the ELF paper had a novelty component, and presented RL baselines to a new environment (miniRTS), while the RLlib paper had a stronger ""systems research"" contribution. This says nothing about the future impact of Dopamine, ELF, and RLlib – the respective software.",ICLR2019,3: The area chair is somewhat confident +FbM9YFgsS1,1576800000000.0,1576800000000.0,1,BJxSWeSYPB,BJxSWeSYPB,Paper Decision,Reject,"This work proposes a self-supervised segmentation method: building upon Crawford and Pineau 2019, this work adds a Monte-Carlo based training strategy to explore object proposals. +Reviewers found the method interesting and clever, but shared concerns about the lack of a better comparison to Crawford and Pineau, as well as generally a lack of care in comparisons to others, which were not satisfactorily addressed by authors response. +For these reasons, we recommend rejection.",ICLR2020, +VNsifw-wTJ,1642700000000.0,1642700000000.0,1,Nfl-iXa-y7R,Nfl-iXa-y7R,Paper Decision,Accept (Spotlight),"This is an intriguing work that introduces a novel sparse training technique. The core insight is a novel reparametrization or sparsity pattern based on the so-called butterfly matrices that enables fast training and good generalization. The theory is solid and useful. Most importantly, the method is novel and is likely to become impactful. Understanding better what contributes to the excellent performance is an interesting question for future work. In agreement with all the reviewers, it is my pleasure to accept the work.",ICLR2022, +woR_yDUm7SR,1610040000000.0,1610470000000.0,1,Z4YatHL7aq,Z4YatHL7aq,Final Decision,Reject,"The paper proposes an upsampling layer design for converting layouts to images. Three reviewers rate the paper below the bar, while one reviewer rates the paper marginally above the bar. The main concern that several reviewers raise is the novelty. Particularly, R1 and R3 point out that the proposed method shares great similarity to CARAFE [Wang et al. ICCV 2019]. The AC agrees with the reviewers. +",ICLR2021, +2AGaOiknJhj,1642700000000.0,1642700000000.0,1,9W2KnHqm_xN,9W2KnHqm_xN,Paper Decision,Reject,"Despite some positive points, the criticisms (and overall scores) put this paper below the bar. The reviewers raise issues of novelty, as well as problems with the experiments and argue that some claims are unsupported.",ICLR2022, +X9ao1AbUse,1576800000000.0,1576800000000.0,1,SyxL2TNtvr,SyxL2TNtvr,Paper Decision,Accept (Poster),"The authors address the important and understudied problem of tuning of unsupervised models, in particular variational models for learning disentangled representations. They propose an unsupervised measure for model selection that correlates well with performance on multiple tasks. After significant fruitful discussion with the reviewers and resulting revisions, many reviewer concerns have been addressed. There are some remaining concerns that there may still be a gap in the theoretical basis for the application of the proposed measure to some models, that for different downstream tasks the best model selection criteria may vary, and that the method might be too cumbersome and not quite reliable enough for practitioners to use it broadly. All of that being said, the reviewers (and I) agree that the approach is sufficiently interesting, and the empirical results sufficiently convincing, to make the paper a good contribution and hopefully motivation for additional methods addressing this problem.",ICLR2020, +U2dft_wCCCf,1610040000000.0,1610470000000.0,1,TaUJl6Kt3rW,TaUJl6Kt3rW,Final Decision,Reject,"The authors propose an approach for the task of categorizing competencies in terms of worker skillsets. This is a potentially useful (if somewhat niche) task, and one strength here is a resource to support further research on the topic. However, the contribution here is limited: The methods considered are not new, and while the problem has some practical importance it does not seem likely to be of particular interest to the broader ICLR community. ",ICLR2021, +1DzVgEGyNe,1610040000000.0,1610470000000.0,1,4I5THWNSjC,4I5THWNSjC,Final Decision,Reject,"The reviews were largely split in the beginning. Some of the concerns are firmly addressed, e.g. new results evaluating the actual latency in real hardware, and one reviewer raised the score from 5 to 6. However, another reviewer is not fully convinced by the response and decided to keep the original score of ""3: Clear rejection"". + +There are mainly two issues here. One is about the novelty of the method. The reviewer asked about the novelty and difference from CondConv/DynamicConv, however the authors emphasized the techniques to successfully train conditional computation models in general. As the focus of the paper is the proposal of the new (claimed as better) method, I have to say it is missing the points. (The authors could have organized the storyline of the paper as ""best practices for training conditional computation models"" or something like that, if that is the true contribution the paper.) Another issue is about the advantage over the CondConv method. In the newly added results, BasisNet does not show clear advantage in terms of accuracy-speed trade-off without early exiting. Although the authors simply state that it is ""infeasible"" to do early termination on CondConv, the reason is not clear. Indeed, one can easily try layer-level early exiting as done in BranchNet for example, if not the model-level early exiting assumed for the BasisNet. + +Base on the discussion above, I conclude that the paper has to clarify many issues before publication and thus recommend rejection.",ICLR2021, +r1xg0J3lxN,1544760000000.0,1545350000000.0,1,SyG4RiR5Ym,SyG4RiR5Ym,Meta-Review for Neural Distribution Learning,Reject,"All reviewers agree to reject. While there were many positive points to this work, reviewers believed that it was not yet ready for acceptance.",ICLR2019,5: The area chair is absolutely certain +r19L7J6HM,1517250000000.0,1517260000000.0,146,By4HsfWAZ,By4HsfWAZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper proposes to use data-driven deep convolutional architectures for modeling advection diffusion. It is well motivated and comes with convincing numerical experiments. +Reviewers agreed that this is a worthy contribution to ICLR with the potential to trigger further research in the interplay between deep learning and physics. ",ICLR2018, +y6Qjo_evtCR,1642700000000.0,1642700000000.0,1,CoMOKHYWf2,CoMOKHYWf2,Paper Decision,Reject,"The work AdaFocal proposes an approach to tune Focal Loss' $\gamma$ hyperparameter to improve the model's calibration, particularly to avoid the occasional underconfidence when using focal loss. This tuning is done not as a learned constant hyperparameter across training but as one that evolves over training. + +The work is both well-motivated and well-written. However, multiple reviewers share the concern (which I agree with) that the method fails on ImageNet experiments, and the method often fails to beat even temperature scaling. The experimental comparison arguing that the approach improves upon many methods pre-temperature scaling is unfair as no other method leverages the validation set. This makes for a fairly deceiving slight of hand if not read carefully. When compared to temperature scaling, which does use the validation set, the performance improvement gap is diminished altogether. + +I recommend the authors use the reviewers' feedback to enhance their preprint should they aim to submit to a later venue. In particular, improve the clarity around the experimental validation.",ICLR2022, +XLrHwpUuU5n,1642700000000.0,1642700000000.0,1,F2r3wYar3Py,F2r3wYar3Py,Paper Decision,Reject,"Meta Review of Learning from One and Only One Shot + +The motivation of this work is to address the problem of learning from very few samples, which is of high relevance for many machine learning problems. The paper proposes an (interpretable) approach for one or few-shot learning, which tries to simulate the human recognition ability for “distort” objects. To achieve few-shot learning, they first model the topological distance with training data points while minimizing the distortions, to find neighbors that are conceptually similar to the input image. Their experimental results show that this simple method can achieve good performance when only very few samples are available and no pre-training is allowed. + +All reviewers, including myself, agree that this paper is well motivated, nicely written, and appreciated how they connected ideas from neuro-psychology to their ML model, and the novelty is recognized. But there are issues raised by reviewers with the paper that prevent it from meeting the bar for me to recommend it for acceptance at ICLR 2022. + +The main issues raised by all reviewers is that the proposed method is only experimentally verified on simple datasets such as MNIST, EMNIST and Omniglot (and to some extent, Quick, Draw!). The authors (to their credit) in the rebuttal noted that the narrative of the paper is to focus on abstract images, and the purpose is more from a scientific investigation perspective (rather than proposing an algorithm that is immediately useful for ML practitioners), and this is a fair point. However, I do believe the issue here is beyond the simple criticism of ""it works on MNIST, how about ImageNet?"" as I think some reviewers genuinely think there are fundamental aspects of the approach that might prevent it from scaling (as planned in future work, even by the authors in the last section). For instance, as gUCx noted: + +1) The proposed approach is based on topological similarity. It seems that it only is suitable for images with simple topological structure, such as the character images. Maybe it is hardly used to classifier complex nature images since we need more information for natural image classification, not only use topological structure. + +2) The authors did not provide the experimental comparison with enough training data, such as the whole training set in MNIST. The reviewer wonders about the upper performance of this approach with enough data. + +I tend to agree with these points. Non-topological similarity can be displayed in abstract images / datasets. Even in ""abstract"" images, the paper should describe limitations of the approach, and whether it breaks down (like in the ""abstract"" Quick, Draw! dataset, there are different types of distinct ""yoga"" poses in the yoga class. And likewise, in the cat or pig class, there are animals with only the heads, and animals with the head and the full body). Conveniently, Quick, Draw! had not been used in any of their classification experiments [1], and only for a simple clustering example. + +And for the other points, reporting the terminal MNIST performance will be useful, even if it doesn't look good, so the readers have an idea of the limitations of the approach, where it is good, where it is not, and what needs to be improved. I would love to see improvements (either in the writing or in the experiments) in future work where the paper can effectively convince the readers that the direction has the promise of being able to scale to ""real"" or ""complex"" images. (Perhaps even performing the approach on the output of a pre-trained self-supervised autoencoder on ImageNet, as a method to get ""abstract"" versions of real photos, like a parallel of the giraffe experiment, though this may distract from the narrative of no-pre-training). All in all, I don't want to discourage the authors as we are all excited about the direction of this work. I hope to see an updated version of this work published in a future venue, good luck! + +[1] https://www.kaggle.com/c/quickdraw-doodle-recognition",ICLR2022, +IyNIqO03jK,1576800000000.0,1576800000000.0,1,rJeeKTNKDB,rJeeKTNKDB,Paper Decision,Reject,"Two reviewers are negative on this paper while the other reviewer is slightly positive. Overall, the paper does not make the bar of ICLR. A reject is recommended.",ICLR2020, +r1lgmDCllV,1544770000000.0,1545350000000.0,1,SkxXg2C5FX,SkxXg2C5FX,The simplicity is what makes the proposed methods elegant. The empirical results are strong.,Accept (Poster),"This paper presents new generalized methods for representing sentences and measuring their similarities based on word vectors. More specifically, the paper presents Fuzzy Bag-of-Words (FBoW), a generalized approach to composing sentence embeddings by combining word embeddings with different degrees of membership, which generalize more commonly used average or max-pooled vector representations. In addition, the paper presents DynaMax, an unsupervised and non-parametric similarity measure that can dynamically extract and max-pool features from a sentence pair. + +Pros: +The proposed methods are natural generalization of exiting average and max-pooled vectors. The proposed methods are elegant, simple, easy to implement, and demonstrate strong performance on STS tasks. + +Cons: +The paper is solid, no significant con other than that the proposed methods are not groundbreaking innovations per say. + +Verdict: +The simplicity is what makes the proposed methods elegant. The empirical results are strong. The paper is worthy of acceptance.",ICLR2019,4: The area chair is confident but not absolutely certain +HJeW0wtXeN,1544950000000.0,1545350000000.0,1,B1eZRiC9YX,B1eZRiC9YX,An intriguing idea but the exact impact is unclear,Reject,"This paper conducts a study of the adversarial robustness of Bayesian Neural Network models. The reviewers all agree that the paper presents an interesting direction, with sound theoretical backing. However, there are important concerns regarding the significance and clarity of the work. In particular, the paper would greatly benefit from more demonstrated empirical significance, and more polished definitions and theoretical results. ",ICLR2019,4: The area chair is confident but not absolutely certain +IdfIbs9vuB8,1610040000000.0,1610470000000.0,1,u_bGm5lrm72,u_bGm5lrm72,Final Decision,Reject,"The manuscript presents a training method for Spiking Neural Networks (SNN). The method jointly optimizes input spike encoding parameters, spiking neuron parameters (membrane leak and voltage threshold), and weights in an end-to-end fashion using gradient descent. SNNs are very interesting for energy-efficient implementations of neural networks. Their energy efficiency strongly depends on inference latency (SNNs compute in time, unlike feed-forward ANNs) and activation sparsity. + +All reviewers acknowledged that the approach directly improves inference latency and activation sparsity on large convolutional models at very good performance levels. + +The main concern of all reviewers was the limited conceptual novelty. The paper combines some known techniques (hybrid SNN training, direct input encoding, training of neuron parameters like leak time constant and threshold) and scale the setup up to large networks and datasets (e.g. ImageNet). + +In summary, the paper presents impressive results, but the conceptual innovation is missing. + + +",ICLR2021, +dCNLIWRg1Wh,1642700000000.0,1642700000000.0,1,krI-ahhgN2,krI-ahhgN2,Paper Decision,Reject,"This paper modifies the loss of supervised contrastive (SupCon) learning by adding a self-contrastive loss. Utilizing a multi-exit network and contrasting the multiple outputs of this network, the proposed self-contrastive (SelfCon) learning removes the requirement of additional data augmentation samples for creating positive pairs. The proposed SelfCon loss is theoretically connected to the lower bound of a label conditional mutual information between the intermediate and last feature. The paper focuses its study on SupCon & SelfCon-M, which are multi-batch variates that first augment the images, and then contrast the views between both augmented and non-augmented samples of the same class, and on SupCon-S and SelfCon-S, which are single-batch variates that only contrast between the samples of the same class and do not require additional data augmentations. A wide variety of experiments have been done on CIFAR-10, CIFAR-100, TinyImageNet, ImageNet-100, and ImageNet, but mostly with relatively small networks. + +The ratings for the paper were mixed [3,5,5,8 before rebuttal; 5,5,6,8 after rebuttal]. All four reviewers had provided detailed initial reviews, pointing out a long list of issues. The authors had incorporated these reviews to make a large number of improvements to their initial submission. After the author rebuttal period, while one reviewer raised the score from 5 to 6, two reviewers maintained their negative positions: Reviewer ZiPE is clearly concerned about the risk of accepting a method that may break as soon as a slightly larger model (ResNet50 instead 18) is used, the model is trained a bit longer, or the baselines are tuned, while Reviewer MBzi is unsatisfied with how the paper motivates its empirical construction from the perspective of mutual information maximization. + +Given the disagreements between the reviewers, the AC has carefully read the paper to provide an additional review. Some concerning observations of the AC are summarized as follows: + +1. Echoing the concern of Reviewer ZiPE, the performance gain of SelfCon-S over SupCon diminishes in ImageNet with ResNet-18, as shown in Table 13, making it become even more important for the authors to conduct experiments following more standard settings (e.g., ResNet-50 on ImageNet). + +2. The main paper seems to suggest SelfCon-S outperforms SelfCon-M and SupCon outperforms SupCon-S, while Table 13 in the Appendix suggests the opposite. + +3. Table 3 that compares SelfCon-S with SupCon appears very misleading, as SupCon consumes more memory and computation than SelfCon-S simply because it has used data augmentations. If SupCon-S is used, it would take less memory and computation than SelfCon-S. + +4. SelfCon-S adds a subnetwork to the backbone to boost its performance, so technically, it has more parameters than the backbone. Comparing it with a baseline that only uses a backbone model does not seem to be that fair. This point has not been discussed in the paper. + +5. Last but not least, echoing the concerns of Reviewer ZiPE and MBzi, the paper seems to try to validate the motivating of the added loss with mutual information maximization. However, establishing the causal relationship between maximizing the mutual information of the intermediate and last layers and the classification performance needs much more than the correlation analysis provided in the paper. + +Given the above-mentioned concerns, the AC does not consider the paper to be ready for publication at its current stage.",ICLR2022, +8Fe8NJZwcf,1610040000000.0,1610470000000.0,1,YgrdmztE4OY,YgrdmztE4OY,Final Decision,Reject,"This paper presents an approach, FedMix, for federated learning using mixture of experts (MoE). The basic idea is to learn an ensemble of models and user-specific combination weights (mixing proportions). + +The reviewers appreciated the MoE formulation for federated learning. However, there were multiple concerns from the reviewers, which include lack of clarity (regarding the variational lower bound that is being used), significant communication cost and privacy concerns (the server can infer the users' label distributions), weak experimental results, and lack of any theoretical support (which isn't that big an issue if the paper were stronger on other aspects). The author feedback was considered and the reviewers engaged in some discussions (also with the authors). In the end, however, the reviewer were still not convinced that the paper is ready to be published in its current state. Based on my own reading of the paper, the reviews, and the author response, I agree with this assessment. + +The authors are advised to take into account the feedback from the reviewers and resubmit to another venue.",ICLR2021, +lxm1cSLL,1576800000000.0,1576800000000.0,1,Byx9p2EtDH,Byx9p2EtDH,Paper Decision,Reject,"The paper considers the case where policies have been learned in several environments - differing only according to their transition functions. The goal is to achieve a policy for another environment on the top of the former policies. The approach is based on learning a state-dependent combination (aggregation) of the former policies, together with a ""residual policy"". On the top of the aggregated + residual policies is defined a Gaussian distribution. The approach is validated in six OpenAI Gym environments. Lesion studies show that both the aggregation of several policies (the more the better, except for the computational cost) and the residual policy are beneficial. + +Quite a few additional experiments have been conducted during the rebuttal period according to the reviewers' demands (impact of the quality of the initial policies; comparing to fine-tuning an existing source policy). + +A key issue raised in the discussion concerns the difference between the sources and the target environment. It is understood that ""even a small difference in the dynamics"" can call for significantly different policies. Still, the point of bridging the reality gap seems to be not as close as the authors think, for training the aggregation and residual modules requires hundreds of thousands of time steps - which is an issue in real-world robotics. + +I encourage the authors to pursue this promising line of research; the paper would be definitely very strong with a proof of concept on the sim-to-real transfer task.",ICLR2020, +K45sX6jgnq,1576800000000.0,1576800000000.0,1,BJx8Fh4KPB,BJx8Fh4KPB,Paper Decision,Reject,"The paper aims to find locally interpretable models, such that the local models are fit (w.r.t. the ground truth) and faithful (w.r.t. the global underlying black box model). +The contribution of the paper is that the local model is trained from a subset of points, selected via an optimized importance weight function. The difference compared to Ren et al. (cited) is that the IW function is non-differentiable and optimized using Reinforcement Learning. + +A first concern (Rev#1, Rev#2) regards the positioning of the paper w.r.t. RL, as the actual optimization method could be any black-box optimization method: one wants to find the IW that maximizes the faithfulness. The rebuttal makes a good job in explaining the impact of using a non-differentiable IW function. + +A second concern (Rev#2) regards the interpretability of the IW underlying the local interpretable model. + +There is no doubt that the paper was considerably improved during the rebuttal period. However, the improvements raise additional questions (e.g. about selecting the IW depending on the distance to the probes). I encourage the authors to continue on this promising line of search. ",ICLR2020, +XKn-jTVObTS,1642700000000.0,1642700000000.0,1,TBpg4PnXhYH,TBpg4PnXhYH,Paper Decision,Accept (Poster),"This paper proposed a self-supervised speech pre-training approach, by the name of SPIRAL, to learning perturbation-invariant representations in a teacher-student setting. The authors introduced a variety of techniques to improve the performance and stabilize the training. Compared to the popular unsupervised learning model wav2vec 2.0, better WERs were reported using SPIRAL with a reduced training cost. All reviewers considered the work solid with sufficient novelty but also raised concerns regarding the generalization under unseen real-world noisy conditions and missing decoding details. The authors responded with new Chime-3 results and updated LM decoding results. The new results show that, after a bug fix, SPIRAL can outperform wav2vec 2.0 when no external LM is used. + +Overall the proposed approach is technically novel. The experiments are extensive and the results are compelling. In addition, the training time can be significantly reduced compared to wav2vec 2.0. All reviewers are supportive. So I would recommend accept.",ICLR2022, +alXlLkfKR_a,1642700000000.0,1642700000000.0,1,Y4cs1Z3HnqL,Y4cs1Z3HnqL,Paper Decision,Accept (Spotlight),"This paper makes significant advances in offline reinforcement learning by proposing a new approach of being pessimistic to deal with uncertainties in the offline data. The proposed approach uses bootstrapped Q-functions to quantify the uncertainty, which by itself is not new, and introduces additional data based on the pseudo-target that is penalized by the uncertainty quantification. The use of such additional data is the first of a kind, and the paper provides theoretical support for the case of linear MDP and empirical support with the D4RL benchmark. The reviewers had originally raised concerns or confusions regarding theoretical analysis and experiments. The authors have well responded to them, and no major concerns remain.",ICLR2022, +YtiQ9vnyitz,1642700000000.0,1642700000000.0,1,18Ys0-PzyPI,18Ys0-PzyPI,Paper Decision,Accept (Poster),"The paper presents a method for cooperative ad-hoc collaboration by learning latent representations of the teammates. The method is evaluated in three domains. All the reviewers agree that the method is novel and adds an interesting contribution to the important and difficult problem of the ad-hoc collaboration, making fewer assumptions about the team and the teammates. + +The next version of the paper should comment: + +- On the societal impact of the centralized training. +- Wang et al, CoRL 2020, https://arxiv.org/abs/2003.06906, which addresses the cooperative tasks in the ad-hoc teams without privileged knowledge and assumptions about the teammates.",ICLR2022, +zsqnjEaXybR,1610040000000.0,1610470000000.0,1,wG5XIGi6nrt,wG5XIGi6nrt,Final Decision,Reject,"This paper focuses on a notion of privacy in learning representations. + +One of the primary concerns of the reviewers was clarity of the writing and results. Numerous concerns are mentioned in the reviews, and also more engagement with the fairness literature was desired. One reviewer felt that some of the claims in the paper were unsubstantiated, for example: understanding the sanitization process in a human-understandable visual way"", ""integration of a notion of interpretability"". It was felt that the changes required were more than could be expected for a camera ready version. The authors are recommended to revise the paper with a particular eye for clarity to a new reader. + +The notion and measurement of privacy was also considered to be somewhat shaky. It is understood that the nature of privacy considered in this paper is different from differential privacy. That said, the latter is a rigorous definition, and the one in this paper seems to be rather empirical in nature. There are no formal guarantees in terms of privacy preservation, and it is not clear whether the representations could leak information when evaluated with a different network. As privacy is a mission-critical property, some justification of why the heuristic measurement of privacy is acceptable. + +As a side note, the authors should consider using the \citep command for parenthetical citations in the text.",ICLR2021, +ClMc_4RQjjF,1610040000000.0,1610470000000.0,1,yFJ67zTeI2,yFJ67zTeI2,Final Decision,Accept (Poster),"This paper received three borderline reviews (2+ / 1-) and one positive review. Having read through the reviews and author responses, the AC recommends the paper to be accepted. The method, while simple, is proven experimentally to be effectively and will add to the body of work on key-point localization. The authors are requested to add their additional baselines in the response text to the revision of their paper if it has not already been done. +",ICLR2021, +HJxFR9egeV,1544710000000.0,1545350000000.0,1,BygANjA5FX,BygANjA5FX,Interesting method with limited novelty and requiring better baselines.,Reject,"The method under consideration uses parallel convolutional filter groups per layer, where activations are averaged between the groups, forming ""inner ensembles"". + +Reviewers raised a number of concerns, including the increased computational cost for apparently little performance gain, the choice of base architecture (later addressed with additional experiments using WideResNet and ResNeXt), issues of clarity of presentation (some of which were addressed). One reviewer was unconvinced without direct comparison to full ensembles. Another reviewer raised the issue of a missing direct comparison to the most similar method in the literature, maxout (Goodfellow et al, 2013). Authors rebutted this by claiming that maxout is difficult to implement and offering vague arguments for its inferiority to their method. + +The AC agrees that a maxout baseline is important here, as it is extremely close to the proposed method and also trivially implemented, and that in light of maxout (and other related methods) the degree of novelty is limited. The AC also concurs that a full ensemble baseline would strengthen the paper's claims. In the absence of either of these the AC concurs with the reviewers that this work is not suitable for publication at this time.",ICLR2019,4: The area chair is confident but not absolutely certain +rkezR4IWl4,1544800000000.0,1545350000000.0,1,H1lC8o0cKX,H1lC8o0cKX,uncertain link between inferring sensor locations and spatial structure ,Reject,"This paper is borderline for publication for the following reasons: +1) the title is misleading. The majority of the ICLR audience understands by ""spatial structure"" the structure of the external 3D world, as opposed to the position of the sensors in the internal coordinate system of the agent. Though the authors argue that knowing the positions of the sensors eventually leads to learning the 3D world structure, this appears like a leap in the argument. +2) The equation s=\phi(m) described a mapping from robot postures to sensory states. This means the agent should remain within the same scene. The description of this equation in the manuscript as ""The mapping  can be seen as describing how “the world” transforms changes in motor states into changes in sensory states ..."" makes this equation appear more general than what it is. s'=\psi(s,m) would be better described by such sentence. + + +",ICLR2019,2: The area chair is not sure +eFE7wP2s10q,1610040000000.0,1610470000000.0,1,_SKUm2AJpvN,_SKUm2AJpvN,Final Decision,Reject,"This paper introduces ATC, which is a contrastive learning on observations separated in time, to learn representations that do not need to take rewards into consideration. These learned representations allow, for the first time, a real disentanglement between representation learning and control, as the agent can simply load such a representation, “freeze it”, and still recover performance of end-to-end deep reinforcement learning agents. + +Overall, all reviewers agree this is a promising direction. Nevertheless, there has been extensive discussion (with the authors and privately) about the significance of the reported results due to the small number of seeds. On one hand, there’s the argument that there is a wide range of experiments and that should compensate for a small number of seeds in individual experiments. On the other hand, there are experiments with as little as two seeds (e.g., DMControl multi-env) and this can be seen at most as anecdotal evidence. There’s also the argument that we, as a community, should be striving for more reliable and meaningful experiments in reinforcement learning. Moreover, there have been concerns about how “variance” is being reported (max and min performance) and, although the authors replied to that, an alternative plotting was never shown. + +Importantly, at this point it is not clear how many seeds were used in each experiment (Figures 6, 7, 9, 11, 12, 13 do not report the number of seeds used). It is said that each curve represents a minimum of 3 random seeds, but that is very informal and not that useful. Exactly stating the number of seeds would be the right thing to do, not to mention that in the rebuttal it is said that 8-game pretraining for Atari multi-env uses 2 seeds, contradicting the original claim. Also, sometimes, different methods, in the same experiment, are “averaged” across different numbers of seeds (“DMLab offline -- ATC is 4 seeds, PC and CPC are 2 seeds each”). This is particularly problematic because of the small number of seeds and potentially high variance. Reporting the max over 4 numbers drawn from a Gaussian distribution is very likely to lead to a larger number than when reporting the max over 2 numbers drawn from the same Gaussian distribution. + +I do acknowledge the effort to increase the number of seeds during the rebuttal phase, but it is hard to accept a paper with unknown results. We have very little evidence to believe that going from 2 seeds to 5 seeds is not going to change the results. The reviewers couldn’t agree on the variance of this process as well. Some say the variance of PPO is low between runs when using the same hyper parameters while others mention papers (e.g., Deep RL that matters) that show how much variance one can have across these methods. Thus, I cannot accept this paper conditioned on more seeds being added to the final version because we don’t know what the results will look like. Since this paper is mostly an empirical study, it should have thorough experiments and a careful analysis of the results, but the small number of seeds prevents that in my opinion. Thus, as difficult as it is given the promising direction of the paper, I’m recommending its rejection. I strongly encourage the authors to increase the number of runs in their experiments and to use a more standard measure of variability (e.g., standard error, standard deviation) when reporting their results. This will then be a very strong submission for a future conference. +",ICLR2021, +AvRHP8n_GZz,1610040000000.0,1610470000000.0,1,8CCwiOHx_17,8CCwiOHx_17,Final Decision,Reject,"This paper considers the problem of agents learning to autonomously navigate the web, specifically by focusing on filling out forms. The focus is on using adversarial environment generation to form a curriculum of training tasks. +Thank you for the revisions to the manuscript, which have particularly improved readability. +The presented problem is really interesting and seems an important real-world problem for RL. +Despite this, as the paper stands the results are not completely convincing. It seems like there is also scope to rigourously analyse the proposed method on other, better known domains to better quantify its limitations.",ICLR2021, +Anao0KhE3id,1610040000000.0,1610470000000.0,1,TSRTzJnuEBS,TSRTzJnuEBS,Final Decision,Accept (Poster),"Many concerns raised by the reviewers have been addressed by the authors, sometimes through additional experiments. The reviewers have updated their scores in response, and all now recommend acceptance. + +Like Reviewer 4, I think that the relation to nested dropout (Rippel et al. 2014) needs to be acknowledged and discussed appropriately, so I encourage the authors to carefully consider the reviewers' most recent comments about this when preparing the final version of the manuscript. + +I disagree somewhat with Reviewer 3 that the motivation provided for this work is insufficient; controlling the quality/speed trade-off at inference time seems like a compelling application. So does progressive generation, as suggested by Reviewers 3 & 4. I appreciate that this is highly subjective, however. Perhaps a few more concrete examples of practical situations where such trade-offs are useful could be mentioned in the introduction.",ICLR2021, +J8RGfMTmBfo,1610040000000.0,1610470000000.0,1,mPmCP2CXc7p,mPmCP2CXc7p,Final Decision,Reject,"The authors propose a methodoloy for dynamic feature selection. They use differentiable gates with +an RNN architecture to select different subsets of features at each time point thus resulting in dynamic selection. +The reviewers agree that the idea is interesting and the method could be useful and I share their opinion. + +The majority vote is towards rejection. The overarching mwssage of the reviews is that the manuscript raises confusion in a number of points. I see this work as one with good potential for impact but its current presentation is confusing. The vivid discussion that it raised is also an indication of it. The authors have done a good job replying to the concerns and the questions raised. However, the reviewers were still unsatisfied with the authors response to their concerns. I recommend rejection at this time, while encouraging the authors to take seriously the reviewers' requests for a clearer presentation of their approach's contribution in order to strengthen their paper for future submission. + +",ICLR2021, +sTV4aNRc4i,1642700000000.0,1642700000000.0,1,DSCsslei9r,DSCsslei9r,Paper Decision,Reject,"While several reviewers acknowledge that the paper contains potentially useful ideas related to multi-modal self-training applied to genomic data, they also point out a number of weaknesses and room for improvement that the discussion with authors did not fully address. This includes in particular the need to better explain the details of what is done in the paper; the choice of experiments which is not relevant (eg, predicting promoter regions) or complete (eg, showing results on only one transcription factor); the lack of comparison with existing methods, etc... We therefore consider that the paper is not ready for publication in its current form, but hope that the reviews will help the authors work on a revision addressing the issues.",ICLR2022, +1g3-nY48oNN,1642700000000.0,1642700000000.0,1,VSu5WrtLK3q,VSu5WrtLK3q,Paper Decision,Reject,"The paper proposes to use covariance of the approximate posterior to induce a metric on the latent space of the VAE and use it to sample from the Riemannian manifold learned by a VAE. Experiments on MNIST and CelebA show the method outperforms vanilla VAE in terms of sample quality (FID and PR scores). It is also shown to work better than baseline VAE models on a medical imaging classification task. While the reviewers have acknowledged the contributions of the paper, the novelty in the contributions and their importance/impact was seen to be rather limited. The main concern from the reviewers is -- while the paper is mainly based on the use of inverse covariance as the metric for manifold, it doesn't give a reasonable theoretical justification on it is a sensible metric that captures the intrinsic geometry of data. Authors in their response justify it as -- since the covariance matrices are learned from the data and favor through the posterior sampling some direction in the latent space, it is a natural choice as metric. This is not very convincing. A more technical justification for this will certainly make the paper more convincing. I suggest the authors to look at ""Kumar, Abhishek, and Ben Poole. ""On Implicit Regularization in $ β $-VAEs."" International Conference on Machine Learning. PMLR, 2020"" which theoretically connects inverse covariance and the Riemannian metric in Sec 5.2, and see if it can be adapted in their context.",ICLR2022, +BqprJsanEet,1642700000000.0,1642700000000.0,1,zuqcmNVK4c2,zuqcmNVK4c2,Paper Decision,Accept (Poster),"This paper explores a classification approach based on labeling pairs of inputs concurrently using a single network, rather than singletons. The authors test the approach on adversarial robustness (towards norm bounded perturbations), OOD detection next to basic standard accuracy calculations. + +While the key idea is potentially interesting and the paper has received positive comments from the majority of reviewers, there were also some concerns that need to be addressed in a final manuscript: + +* The paper does not motivate or explain theoretically why the joint classification framework is superior, beyond verbose arguments. These +arguments need to be better clarified and linked with the experimental evaluation. + +* While the empirical results are perceived as positive by the reviewers, one reviewer has raised the concern about the comparisons. The adversarial robustness and OOD comparisons are indeed basic. The adversarial attack used here is quite a weak PGD attack with a small radius and low iteration budget. Possibly include stronger attacks. The OOD comparisons are with standard baselines only. Please include further comparisons. + +In its current form, the paper seems to be acceptable, and I strongly encourage the authors to improve both the theoretical justification, and empirical exploration in the final version.",ICLR2022, +nRY1T3G6gPN,1610040000000.0,1610470000000.0,1,Nq5zyAUD65,Nq5zyAUD65,Final Decision,Reject,"All reviews were negative for this paper, due to various issues. I think the main issue was that the experimental results were too weak to be convincing. For example, the reviewers were not sure if the differences in performance between different activations are significant. The reviewers also required more datasets and more experiments. The authors added std to results, more experiments and argued that the current datasets are sufficient, but the reviewers seemed to remain unconvinced.",ICLR2021, +hC7ff0d2b2_,1610040000000.0,1610470000000.0,1,eJIJF3-LoZO,eJIJF3-LoZO,Final Decision,Accept (Poster),"The paper introduces ""Concept Embeddings"" to Prototypical Network, which are part-based representations and are learnt by a set of independent networks (which can share weights). The method first computes the concept embeddings of an input, and then takes the summation of the distances between those concept embeddings and their corresponding concept prototypes in each class to estimate the class probability. The experiments validates the proposed methods on 4 benchmarks in three different domains, including vision, language and biology. For the biology task, the authors also develop a new benchmark on cross-organ cell type classification. The key novel idea of transferable concepts results in significantly improved generalization ability over the existing few-shot learning methods. + +Although some reviewers raised concerns about not using other few-shot image classification datasets such as MiniImageNet these are not appropriate benchmarks, as the method requires the “part-based concepts” to reasonably span the space of all images which is a characteristic of fine-grained image classification problem. Although this does limit the scope of the method, the fact that it is applicable for multiple tasks is a strong counteragument to the claim that it is too limited, so overall I disagree with the assessment of one reviewer that the choice of benchmarks is insufficient. + +",ICLR2021, +GwTubrGbT,1576800000000.0,1576800000000.0,1,rkgfdeBYvH,rkgfdeBYvH,Paper Decision,Accept (Poster),"The article studies the role of the activation function in learning of 2 layer overparaemtrized networks, presenting results on the minimum eigenvalues of the Gram matrix that appears in this type of analysis and which controls the rate of convergence. The article makes numerous observations contributing to the development of principles for the design of activation functions and a better understanding of an active area of investigation as is convergence in overparametrized nets. The reviewers were generally positive about this article. ",ICLR2020, +_195I9wg8lx,1642700000000.0,1642700000000.0,1,3M3t3tUbA2Y,3M3t3tUbA2Y,Paper Decision,Reject,"The authors propose an alteration to Dreamer that incorporates a swav-like objective. The reviewers raised a number of issues with the paper, overall arguing for rejection. In particular, the reviewers felt that the work was not well motivated, weak performance, that a number of baselines were missing, and a lack of analysis of the results, a lack of novelty. While the authors addressed many of these concerns during the rebuttal, the majority of reviewers still felt this was not enough and that the paper did not meet the bar for acceptance. Therefore, I recommend rejection at this stage so that these concerns can be addressed.",ICLR2022, +ehPefeuHtans,1642700000000.0,1642700000000.0,1,MSgB8D4Hy51,MSgB8D4Hy51,Paper Decision,Accept (Poster),The paper presents a new method for detection of backdoor attacks under strong limitations such as the lack of access to training data and the reference benign model. Its main idea is to utilize a new expected transferability statistic that can be used for detection in broad range of application domains. The effectiveness of the proposed approach is demonstrated experimentally.,ICLR2022, +7IDY11_C3i3,1610040000000.0,1610470000000.0,1,Qk-Wq5AIjpq,Qk-Wq5AIjpq,Final Decision,Accept (Poster),"The paper provides a method for constructing PAC confidence scores for pre-trained deep learning classifiers. The reviewers were all positive about the paper. + +Pros: +- Has provable guarantees on the reliability of the prediction. Such guarantees are quite desirable in practice. +- The problem of neural network uncertainty is important and timely problem, especially in safety-critical applications. +- The method is simple and well-motivated. +- Strong empirical performance. +- Interesting applications to fast DNN inference and safe planning. + +Cons: +- Lack of generalization guarantees-- the guarantees in the paper only hold on the training set; but in practice, performance in test is what's important. +- Only a handful of baselines tested against, most of which (if not all) were naive.",ICLR2021, +_4hYzqt6tc,1576800000000.0,1576800000000.0,1,ryx4PJrtvS,ryx4PJrtvS,Paper Decision,Reject,"This paper tackles the problem of transferring learning between tasks when performing Bayesian hyperparameter optimization. In this setting, tasks can correspond to different datasets or different metrics. The proposed approach uses Gaussian copulas to synchronize the different scales of the considered tasks and uses Thompson Sampling from the resulting Gaussian Copula Process for selecting next hyperparameters. + +The main weakness of the paper resides in the concerns raised about the experiments. First, the results are hard to interpret, leading to a misunderstanding of performances. Moreover, the considered baselines may not be adapted (they may be trivial). This might be due to a misunderstanding of the paper, which would align with the third major concern, that is the lack of clarity. These points could be addressed in a future version of the work, but it would need to be reviewed again and therefore would be too late for the current camera-ready. + +Hence, I recommend rejecting this paper.",ICLR2020, +lkuMUoE3dX,1610040000000.0,1610470000000.0,1,yu8JOcFCFrE,yu8JOcFCFrE,Final Decision,Reject,"During discussion, the reviewers acknowledge improvement of the revised version of the paper through author rebuttal and agree with that the paper is overall well written. + +However, the novelty of the paper is not strong, and reviewers share the concern that distinction between self supervised learning and deep clustering is not convincing. Also, the concerns (2) raised by the Reviewer #2 is important, which is about the effectiveness of the proposed two-step procedure: first apply t-SNE to transform into two dimensions, followed by running K-means, while the answer is not well justified. In my opinion, parameter sensitivity should be studied more carefully. The authors mention that $k$ and $\kappa$ have little effect on clustering performance, while this could imply that the associated terms do not have impact on the proposed method. This point should be clarified further. + +Overall, the paper is still not ready for publication, I will therefore reject the paper.",ICLR2021, +qBCMfO6RTm1,1642700000000.0,1642700000000.0,1,57T1ctyxtP,57T1ctyxtP,Paper Decision,Reject,"The authors propose a flexible variational posterior approximation, relaxing unrealistic factorization and strong parametric constraints that are standard. There was a mixed reception from reviewers. Overall, the paper is on the borderline. The presentation and empirical investigation could be changed so that the nice contributions in the paper are more easily recognized. Indeed, after rebuttal several reviewers still felt like their concerns were not fully addressed. One reviewer was concerned about the evaluation metrics, and wanted to see Stein discrepancy instead of ESS, and did not feel the ESS was sufficiently motivated (as described in updated comments). Another reviewer felt the uncertainty of the predictive distribution was sufficiently well evaluated. Another reviewer generally satisfied by the response. The decision could go either way, but the paper would probably be more widely appreciated by a significant revision, carefully taking into account the questions of the reviewers. The authors are encouraged to accommodate reviewer questions in future versions of the paper.",ICLR2022, +SJXY7yaBM,1517250000000.0,1517260000000.0,183,HJCXZQbAZ,HJCXZQbAZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper marries the idea of Gaussian word embeddings and order embeddings, by imposing order among probabilistic word embeddings. Two reviewers vote for acceptance, and one finds the novelty of the paper incremental. The reviewer stuck to this view even after rebuttal, however, acknowledges the improvement in results. The AC read the paper, and agrees that the novelty is somewhat limited, however, the idea is still quite interesting, and the results are promising. The AC was missing more experiments on other tasks originally presented by Vendrov et al. Overall, this paper is slightly over the bar.",ICLR2018, +D3pfQB_9Z4,1576800000000.0,1576800000000.0,1,H1eNleBYwr,H1eNleBYwr,Paper Decision,Reject,"This paper studies the use of a graph neural network for drug-to-drug interaction (DDI) prediction task (an instance of a link prediction task with drugs as vertices and interaction as edges). In particular, the authors apply structured prediction energy networks (SPEN) and model the dependency structure of the labels by minimising an energy function. The authors empirically validate the proposed approach against feedforward GNNs on two DDI prediction tasks. The reviewers feel that understanding drug-drug interactions is an important task and that the work is well motivated. However, the reviewers argued that the proposed methodology is not novel enough to merit publication at ICLR and that some conclusions are not supported by the empirical analysis. For the former, the benefits of the semi-supervised design need to be clearly and concisely presented. For the latter, providing a more convincing practical benefit would greatly improve the manuscript. As such, I will recommend the rejection of this paper at the current state.",ICLR2020, +DujkY5ta8B4,1642700000000.0,1642700000000.0,1,5JdLZg346Lw,5JdLZg346Lw,Paper Decision,Accept (Poster),"The paper proposes a new method to learn OT maps, and reframes it in the GAN literature. The initial method works when computing maps between equal dimensions, through duality and an identity (10 - 11, amply discussed in the reviewing process). Lemma 4.1 provides the main result. While the discussion right below on the fact that several functions (non-OT maps) might maximize that criterion is not completely satisfactory, the result provides an interesting characterization. The second contribution adds a method to compute OT maps between spaces of unequal dimensions. Overall the contribution sounds a bit ad-hoc, and one wonders whether this does really work (comments such as ""we add small gradient penalty (Gulrajani et al., 2017) on potential ψω for better stability. The penalty in not included in Algorithm 1 to keep it simple."" are strange and point to instability) but the overall creativity and new ideas in the paper seem to have garnered enough support from reviewers to push for an accept.",ICLR2022, +xbVi20gvP,1576800000000.0,1576800000000.0,1,HJem3yHKwH,HJem3yHKwH,Paper Decision,Accept (Poster),This paper proposed to apply emsembles of high precision deep networks and low precision ones to improve the robustness against adversarial attacks while not increase the cost in time and memory heavily. Experiments on different tasks under various types of adversarial attacks show the proposed method improves the robustness of the models without sacrificing the accuracy on normal input. The idea is simple and effective. Some reviewers have had concerns on the novelty of the idea and the comparisons with related work but I think the authors give convincing answers to these questions.,ICLR2020, +4ZXllCVZ4,1576800000000.0,1576800000000.0,1,rygeHgSFDH,rygeHgSFDH,Paper Decision,Accept (Spotlight),"This paper builds on the recent theoretical work by Khemakhem et al. (2019) to propose a novel flow-based method for performing non-linear ICA. The paper is well written, includes theoretical justifications for the proposed approach and convincing experimental results. Many of the initial minor concerns raised by the reviewers were addressed during the discussion stage, and all of the reviewers agree that this paper is an important contribution to the field and hence should be accepted. Hence, I am happy to recommend the acceptance of this paper as an oral. ",ICLR2020, +BkMHQyTHM,1517250000000.0,1517260000000.0,127,HkXWCMbRW,HkXWCMbRW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Some reviewers seem to assign novelty to the compression and classification formulation; however, semi-supervised autoencoders have been used for a long time. Taking the compression task more seriously as is done in this paper is less explored. + +The paper provides some extensive experimental evaluation and was edited to make the paper more concise at the request of reviewers. One reviewer had a particularly strong positive rating, due to the quality of the presentation, experiments and discussion. I think the community would like this work and it should be accepted. +",ICLR2018, +B1DqmyTrG,1517250000000.0,1517260000000.0,201,BydjJte0-,BydjJte0-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Novel way of analyzing neural networks to predict NN attributes such as architecture, training method, batch size etc. And the method works surprisingly good on the MNIST and ImageNet.",ICLR2018, +qLCuCIbaUJB,1642700000000.0,1642700000000.0,1,zz9hXVhf40,zz9hXVhf40,Paper Decision,Accept (Spotlight),"This paper empirically studies various design choices in offline model-based RL algorithms, with a focus on MOPO (Model-based Offline Policy Optimization). Among the key design choices is the uncertainty measure used in MOPO that provides an (approximate) lower bound on the performance, the horizon rollout length, and the number of model used in ensemble. + +The reviewers are positive about the paper, found the experiments thorough, and the results filling a gap in the current literature. They have raised several issues in their reviews, many of which are addressed in the rebuttal and the revised paper. I would like to recommend acceptance of the paper. Also since the results of this work might be of interest to many researchers working on model-based RL, I also recommend a spotlight presentation for this work. + +I have some additional comments: + +(1) The paper studies the correlation of uncertainty measures with the next-state MSE, with the aim of showing which one has a higher correlation. The underlying assumption is that the next-state MSE is the gold standard that we should aim for. + +If we go back to the MOPO paper, we see that to define an uncertainty-penalized reward, we need an upper bound on the absolute value of G(s, a), which is the difference between the expected value of the value function at the next-state according to the true model and the estimated model. + +If we assume that the value function belongs to the Lipschitz function class w.r.t. a metric d, this upper bound is proportional to the 1-Wasserstein distance between the true next-state distributions and the model's distribution. +If the dynamics is deterministic, 1-Wasserstein distance becomes the $d( T(s, a), \hat{T}(s,a) )$. If the distance d is the Euclidean distance, this becomes the squared error. + +Therefore, the squared error makes sense for deterministic dynamics, and it only provides an upper bound of $|G(s, a)|$. +If the environment is not deterministic, the squared error may not be a reasonable gold standard anymore to compare the correlation of various uncertainty measures with. + +The paper introduces a generic MDP framework, but does not mention anything about its focus on MBRL for deterministic environments until the last sentence of its conclusion. Please clarify this in your camera ready paper. + +(2) The experiments are conducted using 3 or 4 seeds. Although this is the common practice in the deep RL community, it is too small. Standard deviations in Tables 1, 2, ... are computed with 3 seeds, which would be cringeworthy to statisticians and empirical scientists. I encourage the authors to increase the number of independent random experiments to make their results more powerful.",ICLR2022, +#NAME?,1610040000000.0,1610470000000.0,1,B8fp0LVMHa,B8fp0LVMHa,Final Decision,Reject,"**Overview** The paper provides a simplified offline RL algorithm based on BCQ. It analyzes the algorithms using a sampling-based maximization of the Q function over a behavior policy for both Bellman targets and for policy execution -- the EMaQ. Based on this, the paper then proposes to use more expressive autoregressive models (MADE) for learning the behavior policy from replay buffer data. The methods work well for harder tasks in the D4RL benchmark. + +**Pro** +- The method is relatively novel +- Algorithms are simple modifications of existing ones +- Empirical results are strong, matching or exceeding BEAR on D4RL while at the same time matching SAC for online learning +- Work for both and offline +- ablation study on the choice of generative model for μ(a|s) + +**Con** +- The current form of the complexity measure is somewhat not practical. +- Theoretical results are not strong enough +- Algorithmic contributions appear incremental + +**Recommendation** The paper is on the borderline. It contributes simple and nice algorithmic ideas and these ideas work well empirically. These results demonstrate that a good choice of the behavior policy generative model is important for some tasks. At the same time, the reviewers are concerned about the theoretical parts, e.g., issues relates new complexity measure. Overall, the meta-reviewer believes that the paper might not be in a status ready for publication.",ICLR2021, +vFmDH4d6zOL,1610040000000.0,1610470000000.0,1,6IVdytR2W90,6IVdytR2W90,Final Decision,Reject,"This submission proposes an approach for fusing representations at multiple scales to improve object detection systems. Reviewers thought the paper was well-written and showed positive results on COCO, a common object detection benchmark. However, reviewers agreed that there was not sufficient methodological novelty or empirical improvement over existing approaches to warrant acceptance at ICLR: several prior works have addressed multiscale fusion and reviewers did not find the evaluation/ablations sufficient to demonstrate the approach yielded substantial improvements over these existing approaches. I hope the authors will consider resubmitting the paper after refining it based on the reviewers' feedback.",ICLR2021, +-1NGXHjO3GS,1610040000000.0,1610470000000.0,1,PP4KyAaBoBK,PP4KyAaBoBK,Final Decision,Reject,"This paper focuses on a segmentation of cell imagery (as opposed to the more commonly studied domain of ""natural images""). Among its contributions are a novel metric for evaluation of results and a novel dataset. These are acknowledged by the reviewers as strengths. Multiple issues raised in the initial reviews were addressed in the revision (the reviewers agree on this and most of them raised their scores). On the other hand, the concerns remaining have to do with significance and impact. The final evaluation ratings are split, with only a single score clearly in favor of acceptance. + +I tend to agree that the contributions, while without a doubt valuable, make this less of a fit to ICLR than to a more specialized venue focusing on biomedical data. ",ICLR2021, +3m-QysWO6Eh,1642700000000.0,1642700000000.0,1,aD7uesX1GF_,aD7uesX1GF_,Paper Decision,Accept (Poster),"This work proposes a new framework that can learn the object-centric representation for video. The authors did a good job during rebuttal and turned one slightly negative reviewer into positive ones. The final scores are 6,6,8,8. AC agrees that this work is very interesting and deserves to be published on ICLR. The reviewers did raise some valuable concerns that should be addressed in the final camera-ready version of the paper. The authors are also encouraged to make other necessary changes.",ICLR2022, +27PunIH6oS0,1610040000000.0,1610470000000.0,1,X9LHtgR4vq,X9LHtgR4vq,Final Decision,Reject,"This paper would greatly benefit from some reorganization/rewriting since, as pointed out by some of the reviewers, it’s hard to follow in its current form. While a biologically inspired NAS algorithm could be an interesting direction to explore, the current paper falls short in providing evidence that the approach is well-motivated or empirically strong. In terms of empirics, too many details are missing on the search space/architecture, ablations and comparison with existing methods. For future submissions, it would be particularly useful for the authors to explicitly discuss why they don’t find competing methods applicable to their setting.",ICLR2021, +Z3BulymPNL,1610040000000.0,1610470000000.0,1,JVs1OrQgR3A,JVs1OrQgR3A,Final Decision,Reject,"There are some interesting ideas raised on continuous-time models with latent variables in machine learning. However, the reviewers argue, and I agree, that the connection to causal models as typically required in applications about the effects of interventions is not addressed with as much care as it might have been needed.",ICLR2021, +0hDkvlep_H,1610040000000.0,1610470000000.0,1,2_Z6MECjPEa,2_Z6MECjPEa,Final Decision,Reject,"This paper explores the use of a texture-based foveation stage in +scene categorization. They show that the foveated system shows +presevation of high spatial-frequency information relative to other +matched transformations. + +This paper engendered a lot of discussion and had a wide range of +ratings. An extra review was requested that fell intermediate between +the high and low scores. Generally reviewers agree that the question +is interesting but that the paper does not clearly elucidate the logic +of the paper. That is, Reviewer 1, and another reviewer in +discussion, had issues with the logic of the paper. I also found the +motivation behind the paper, difficult to understand at first. I had +to find the Rosenholtz paper to understand why the authors were +considering this particular representation. This should be better +explained in the paper - the main points of the paper should be +understandable by itself. + + +There is also concern that the claims are not validated by +the presented results. For example, the authors claim that their +foveated network were more robust to occlusion but Reviewer 5 points +out that this is likely due to the foveation-nets having more +unoccluded information. + + +On the positive side, Reviewer 4 points out that the experiments are +extensive and several reviewers commented that they trust that the experiments +were done correctly. Reviewer 2, 4 and 5 all mention that there are too many +results reported and recommend paring down to the most important +results. + +In my view the paper is right at the border of acceptance. Acceptance/rejection will +depend on capacity limits and balancing areas. I recommend that if accepted, +or resubmitted to another conference, that the results be pared down, and +more space be devoted to explaining the question(s) and why they are +interesting and relevant and why the comparison networks allow the questions +to be answered. + +Originality - High +Quality - High +Clarity - Could be improved +Significance - Could be better articulated and is hard to assess as is. +Pros: - interesting idea, many well done experiments +Cons: - claims not well validated, clarity could be improved to emphasize significance +Other: paper too dense - should be pared down to improve clarity +",ICLR2021, +SJwPXJaHG,1517250000000.0,1517260000000.0,158,BkJ3ibb0-,BkJ3ibb0-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The paper studied defenses against adversarial examples by training a GAN and, at inference time, finding the GAN-generated sample that is nearest to the (adversarial) input example. Next, it classifies the generated example rather than the input example. This defense is interesting and novel. The CelebA experiments the authors added in their revision suggest that the defense can be effective on high-resolution RGB images.",ICLR2018, +HygksWKBeV,1545080000000.0,1545350000000.0,1,Ske5r3AqK7,Ske5r3AqK7,Interesting submission,Accept (Poster),Word vectors are well studied but this paper adds yet another interesting dimension to the field.,ICLR2019,5: The area chair is absolutely certain +qObRtL_CXbw,1642700000000.0,1642700000000.0,1,2s4sNT11IcH,2s4sNT11IcH,Paper Decision,Reject,"The major concern with this paper is the unfair comparison between global and local clipping (at least from the theoretical point of view). The assumption that the norms of the gradients are bounded in Theorem 2 is too strict for the following reasons. Clipping has been introduced exactly because we cannot assume the norm of the gradient to be bounded by a fixed constant in the first place. Accordingly, comparing two clipping methods under the bounded gradient assumption does not seem relevant. Further, the two methods are not studied under the same set of assumptions (In Theorem 1, the norm of the gradient is not assumed to be bounded, but in Theorem 2 it is). +A fair comparison needs to be presented to make the case for the proposed method.",ICLR2022, +pqNotDIMbg7,1642700000000.0,1642700000000.0,1,x4tkHYGpTdq,x4tkHYGpTdq,Paper Decision,Reject,"The paper integrates several dimensionality reduction and sparsity methods for improving the efficiency of large pre-trained models. Overall, the paper is interesting and discusses an important topic. However, it seems that it is not ready to be published at the current stage. I would encourage the authors to take reviewers comments into account and further improve the paper + +The pros and cons of the papers are summarized in the following: + +Pros: ++ Improving the efficiency of large pre-trained models is an essential research issue. ++ The idea is interesting although the technical novelty is a bit limited. + +Cons: +- The key concern is that the technical and practical benefit of the proposed approach is not clear based on the results demonstrated in the experiments. +- The writing of the paper can be further improved in general to make the motivation more clear.",ICLR2022, +syk9jam7zM,1576800000000.0,1576800000000.0,1,S1gXiaEYvr,S1gXiaEYvr,Paper Decision,Reject,"The reviewers have provided extensive comments, we encourage the authors to take them into account seriously in further iterations of this work.",ICLR2020, +xFRZQ0LBF-8,1642700000000.0,1642700000000.0,1,ChMLTGRjFcU,ChMLTGRjFcU,Paper Decision,Accept (Poster),"This paper presents new insights for training on random subspaces of low dimension, with several theoretical and experimental contributions. This is a paper that would be interesting to many people doing research in deep learning, both from the theoretical and practical side.",ICLR2022, +rkD141THf,1517250000000.0,1517260000000.0,265,r1ZdKJ-0W,r1ZdKJ-0W,ICLR 2018 Conference Acceptance Decision,Accept (Poster),The paper proposes a method to embed graph nodes into a gaussian distribution rather than the standard latent vector embeddings. The reviewers concur that the method is interesting and the paper is well-written especially after the opportunity to update.,ICLR2018, +n7foB609u4I,1642700000000.0,1642700000000.0,1,u7PVCewFya,u7PVCewFya,Paper Decision,Reject,"The reviewers all seemed to agree that the investigation of other losses is an interesting direction of study, and acknowledged there was some empirical performance improvement for standard computer vision tasks. However, they felt the justification of the specific form of loss was a bit shaky and heuristic, and were furthermore unconvinced by results exclusively for image classification (one reviewer was unmoved by the magnitude of improvement). This was a borderline decision, but we hope the authors refine and resubmit their work as this is an interesting but underexplored direction within DPML. + +As one recent related work which investigates the effect of other architecture differences in the DP setting, the authors may be interested in https://arxiv.org/abs/2110.08557.",ICLR2022, +Ma4gJxDXYOV,1642700000000.0,1642700000000.0,1,9ZPegFuFTFv,9ZPegFuFTFv,Paper Decision,Accept (Poster),"The paper presents miniF2F, a dataset of 488 highschool and college level math problems. The problems are fully formalized and include proofs in the Metamath, Lean and Isabelle theorem provers (as the reviewers pointed out, the support for Isabelle is limited, and that should be made clearer in the abstract). This multi-platform support is the main selling point of the benchmark, because it will make it possible to make direct comparisons among systems targeting different theorem provers. + +The paper also does a good job discussing the benchmark selection and formalization process. This is important since some of the problems were translated from word problems. + +As part of the rebuttal, the authors added extra information on the performance of the baselines and some qualitative details on how they fail. + +Overall, there is agreement among the reviewers that this is a valuable benchmark that will enable comparisons among systems that today are very hard to compare.",ICLR2022, +l-exAdmxgCe,1610040000000.0,1610470000000.0,1,XavM6v_q59q,XavM6v_q59q,Final Decision,Reject,"While there are some potentially interesting aspects to this work, it doesn’t acknowledge a significant amount of relevant literature, and there are some unsupported claims. All reviewers believe the paper is not ready for acceptance. Reviewers provided some good thorough reviews and suggestions, but the authors did not choose to respond or engage in discussions to improve the paper. ",ICLR2021, +ByxBiLFlxV,1544750000000.0,1545350000000.0,1,HJzLdjR9FX,HJzLdjR9FX,"Work would be strengthened by additional analyses, and measuring computational resource reduction after applying technique.",Reject,"The authors propose a framework for compressing neural network models which involves applying a weight distortion function periodically as part of training. The proposed approach is relatively simple to implement, and is shown to work for weight pruning, low-rank compression and quantization, without sacrificing accuracy. +However, the reviewers had a number of concerns about the work. Broadly, the reviewers felt that the work was incremental. Further, if the proposed techniques are important to get the approach to work well in practice, then the paper would be significantly strengthened by further analyses. Finally, the reviewers noted that the paper does not consider whether the specific weight pruning strategies result in a reduction of computational resources beyond potential storage savings, which would be important if this method is to be used in practice. + +Overall, the AC tends to agree with the reviewers criticisms. The authors are encouraged to address some of these issues in future revisions of the work. +",ICLR2019,4: The area chair is confident but not absolutely certain +v8WHG-dZCsqT,1642700000000.0,1642700000000.0,1,G9M4FU8Ggo,G9M4FU8Ggo,Paper Decision,Reject,"All reviewers recommend rejection, and I'm following this recommendation.",ICLR2022, +N5SDEFVOsNp,1610040000000.0,1610470000000.0,1,Ogga20D2HO-,Ogga20D2HO-,Final Decision,Accept (Poster),"The paper proposes to apply Mixup to Federated Learning (FL) for addressing the challenge of non-iid data. The idea is very simple, but seems to work well in empirical evaluation. Some concerns were raised regarding the communication costs and privacy. The authors rebuttal and revised draft provide reasonable answers to these concerns. + +For the final version, it is suggested that the authors can address the following issues: + +1) Improve the writing - especially the formulation of the proposed method + +2) Provide more discussions and experiments on the communication costs. ",ICLR2021, +SJ9WaMU_x,1486400000000.0,1486400000000.0,1,SyW2QSige,SyW2QSige,ICLR committee final decision,Reject,"This paper proposes information gain as an intermediate reward signal to train deep networks to answer questions. The motivation and model are interesting, however the experiments fail to deliver. There is a lack of comparative simple baselines, the performance of the model is not sufficiently analyzed, and the actual tasks proposed are too simple to promise that the results would easily generalize to more useful tasks. This paper has a lot of good directions but definitely requires more work. I encourage the authors to follow the advice of reviewers and explore the various directions proposed so that this work can live up to its potential.",ICLR2017, +e-cDXG8Tv8,1576800000000.0,1576800000000.0,1,HkgqmyrYDH,HkgqmyrYDH,Paper Decision,Reject,"This paper presents a language model for Amharic using HMMs and incorporating POS tags. The paper is very short and lacks essential parts such as describing the exact model and the experimental design and results. The reviewers all rejected this paper, and there was no author rebuttal. This paper is clearly not appropriate for publication at ICLR. ",ICLR2020, +a88DkaHApg,1642700000000.0,1642700000000.0,1,VNdFPD5wqjh,VNdFPD5wqjh,Paper Decision,Reject,"This paper receives recommendations from four experts who are actively working on the Re-ID problem. Two reviewers give more positive comments, while the comments of the remaining reviewers are relatively negative. The paper aims to achieve generalizable Re-ID without demographics (DGWD-ReID). However, two Reviewers, umsc and Cuay, pointed out that, the claimed novelty of such a problem setting is oversold, and some relevant works are not cited and compared in the original submission. More specifically, the concept of DGWD-ReID in the paper is not fully convincing, and reviewers umsc and Cuay both think DGWD-ReID setting seems to be a special case of DG-ReID, and may not need to be considered totally independently. Though the reviewer eLS6 gives more positive recommendation, an issue is still raised by this reviewer - the method is general and does not have much specific design for Re-ID, but the authors only used Re-ID datasets for evaluation. In experiments, the authors used powerful techniques/tricks in their backbone model to improve the performance. This also brings some difficulties for the readers to assess which part is really working in their method. It will be good if authors can further improve the paper.",ICLR2022, +rke2LODPeV,1545200000000.0,1545350000000.0,1,Sye7qoC5FQ,Sye7qoC5FQ,"Novel analysis of an important problem, but needs some improvements",Reject,"The paper provides a novel analysis of the robustness to adversarial attacks in network representation learning. It appears to be a useful contribution for important class of models; however, the detailed reviews (1 and 2) raise some concerns that may require a bit of further work (though partially addressed in revised version).",ICLR2019,3: The area chair is somewhat confident +ryxSJf9kxE,1544690000000.0,1545350000000.0,1,HylKJhCcKm,HylKJhCcKm,Promising as a direction of research but reviewers have several concerns with the paper,Reject,"The paper proposes to replace dynamic routing in Capsule networks with a trainable layer that produces routing coefficients. The goal is to improve their scalability. This is promising as a research direction but reviewers have raised several concerns about unclear contributions and lack of a thorough evaluation of the approach. There is also a recent relevant work pointed out by Reviewer 1 that should be discussed. Given these concerns, the paper is not suitable for publication in its current form, however I encourage the authors to use reviewers' comments for improving the paper and resubmit in next venues.",ICLR2019,5: The area chair is absolutely certain +BJxXP3KzlV,1544880000000.0,1545350000000.0,1,S1eFtj0cKQ,S1eFtj0cKQ,Meta-Review,Reject,"This paper presents empirical evaluation and comparison of different generative models (such as GANs and VAE) in the continual learning setting. +To avoid catastrophic forgetting, the following strategies are considered: rehearsal, regularization, generative replay and fine-tuning. The empirical evaluations are carried out using three datasets (MNIST, Fashion MNIST and CIFAR). + +While all reviewers and AC acknowledge the importance and potential usefulness of studying and comparing different generative models in continual learning, they raised several important concerns that place this paper bellow the acceptance bar: (1) in an empirical study paper, an in-depth analysis and more insightful evaluations are required to better understand the benefits and shortcomings of the available models (R1 and R2), e.g. analyzing why generative replay fails to improve VAE, why is rehearsal better for likelihood models, and in general why certain combinations are more effective than others – see more suggestions in R1’s and R2’s comments. The authors discussed in their response to the reviews some of these questions, but a more detailed analysis is required to fully understand the benefits of this empirical study. (2) The evaluation is geared towards quality metrics for the generative models and lacks evaluation for catastrophic forgetting in continual learning (hence it favours GANs models) -- See R3’s suggestion how to improve. + +To conclude, the reviewers and AC suggest that in its current state the manuscript is not ready for a publication. We hope the reviews are useful for improving and revising the paper. +",ICLR2019,5: The area chair is absolutely certain +S1gSGx3xeV,1544760000000.0,1545350000000.0,1,HJehSnCcFX,HJehSnCcFX,Meta-Review for neural Hawkes particle smoothing paper,Reject,"All reviewers agree to reject. While there were many positive points to this work, reviewers believed that it was not yet ready for acceptance.",ICLR2019,5: The area chair is absolutely certain +BJ9QhGIdx,1486400000000.0,1486400000000.0,1,SJvYgH9xe,SJvYgH9xe,ICLR committee final decision,Accept (Poster),"Timely topic (interpretability of neural models for NLP), interesting approach, surprising results.",ICLR2017, +IAJ-4bfgVy,1576800000000.0,1576800000000.0,1,HylsTT4FvB,HylsTT4FvB,Paper Decision,Accept (Poster),All three reviewers agree that the paper provide an interesting study on the ability of generative adversarial networks to model geometric transformations and a simple practical approach to how such ability can be improved. Acceptance as a poster is recommended.,ICLR2020, +r1xHTF-ElV,1544980000000.0,1545350000000.0,1,SJfPFjA9Fm,SJfPFjA9Fm,"Interesting contribution, even if the analysis is mostly standard",Accept (Poster),"The main criticisms were around novelty: that the analysis is rather standard. Given that all the reviewers agreed the paper is well written, I'm inclined to think the paper will be a useful contribution to the literature. The authors also highlight the analysis of the discretization, which seems to be missed by the most critical reviewer. I would suggest to the reviewers that they use the criticisms to rework the paper's introduction, to better explain which parts of the work are novel and which parts are standard. I would also suggest that standard background be moved to the appendix so that it is there for the nonexpert, while making the body of the work more focused on the novel aspects.",ICLR2019,3: The area chair is somewhat confident +#NAME?,1642700000000.0,1642700000000.0,1,12RoR2o32T,12RoR2o32T,Paper Decision,Accept (Poster),"The paper studies how to build predictive models that are robust to nuisance-induced spurious correlations present in the data. It introduces nuisance-randomized distillation (NuRD), constructed by reweighting the observed data, to break the nuisance-label dependence and find the most informative representation to predict the label. Experiments on several datasets show that by using a classifier learned on this representation, NuRD is able to improve the classification performance by limiting the impact of nuisance variables. The main concerns were about the presentation and organization of the paper, which was heavily focused on the theoretical justifications but fell short in explaining the intuitions and implementation details. The revision and rebuttal have addressed some of these concerns and improved the overall exposition of the paper, based on which two reviewers raised their scores to 8. While there is still room to further improve the paper by providing more detailed discussions about the proposed algorithms, the AC considers the paper ready for publication under its current form.",ICLR2022, +rkg-zVtJg4,1544680000000.0,1545350000000.0,1,BJlgNh0qKQ,BJlgNh0qKQ,"Novel, well-founded, and interesting method. Concerns about baseline",Accept (Poster),"This paper proposes a method for unsupervised learning that uses a latent variable generative model for semi-supervised dependency parsing. The key learning method consists of making perturbations to the logits going into a parsing algorithm, to make it possible to sample within the variational auto-encoder framework. Significant gains are found through semi-supervised learning. + +The largest reviewer concern was that the baselines were potentially not strong enough, as significantly better numbers have been reported in previous work, which may have a result of over-stating the perceived utility. + +Overall though it seems that the reviewers appreciated the novel solution to an important problem, and in general would like to see the paper accepted.",ICLR2019,4: The area chair is confident but not absolutely certain +ik5GyWG2HJF,1610040000000.0,1610470000000.0,1,ebS5NUfoMKL,ebS5NUfoMKL,Final Decision,Accept (Poster),"The paper proposes an interesting architecture that dues Graph Neural Networks (GNN) and Gradient Boosting Decision Tree. This new architecture works on graphs with heterogeneous tabular features and BGNNs work well on graphs where the nodes contain heterogeneous tabular data and is optimized end-to-end and seems to obtain great SOTA results. End to end learning is done by iteratively adding trees that fit the GNN gradient updates, allowing the GNN to backpropagate into the GBDT. All reviewers agreed that the idea is interesting, the paper is well-written, and the results found in the paper are impressive. In addition, author response satisfactorily addressed most of the points raised by the reviewers, and most of them increased their original score. Therefore, I recommend acceptance.",ICLR2021, +HylCxh5feV,1544890000000.0,1545350000000.0,1,S1gUVjCqKm,S1gUVjCqKm,Not ready for presentation at ICLR,Reject,"Following the unanimous vote of the submitted reviews, this paper is not ready for publication at ICLR. Among other concerns raised, the experiments need significant work, and the exposition needs clarification.",ICLR2019,5: The area chair is absolutely certain +xBstNk9iTuI,1642700000000.0,1642700000000.0,1,WZeI0Vro15y,WZeI0Vro15y,Paper Decision,Reject,"This article proposes a novel uncertainty quantification method, formulating the problem as a Bayesian inference problem. Instead of training multiple ensemble models through MAP optimisation, as in ensemble methods, the proposed approach tries to learn a mapping function between the prior distribution and the posterior distribution of model parameters. This avoids the complex training of ensemble models and achieves better efficiency. + +The approach is novel, and the problem of importance. The paper however suffers from a number of weaknesses: +* Some theoretical results would need to be made mathematically more rigorous +* The presentation is unclear and confusing in some places +* Empirical results are not reproducible due to the lack of details +Although the authors clarified some of the points raised by reviewers in their response, the paper in its current form is not ready for publication, and I recommend rejection.",ICLR2022, +SJYRByTBM,1517250000000.0,1517260000000.0,685,B1NGT8xCZ,B1NGT8xCZ,ICLR 2018 Conference Acceptance Decision,Reject,"Pros +-- Nice way to formulate domain adaptation in a Bayesian framework that explains why autoencoder and domain difference losses are useful. + +Cons +-- Model closely follows the framework, but the overall strategy is similar to previous models (but with improved rationale). +-- Experimental section can be improved. It would interesting to explore and develop the relationship between the proposed technique and Tzeng et al. + +Given the aforementioned cons, the AC is recommending that the paper be rejected. +",ICLR2018, +QRkiyHs61xQ,1642700000000.0,1642700000000.0,1,vdbidlOkeF0,vdbidlOkeF0,Paper Decision,Reject,The paper introduces a technique to improve density ratio estimation. This is an important problem and very relevant to the ICLR conference. The main idea is to consider density ratios with respect to intermediate distributions to “scale” the densities and make the ratios easier to estimate by training a suitable discriminative model (classifier). Reviewers found the idea interesting but there was a consensus the paper is not ready for publication.,ICLR2022, +p-D2KYgIT9P,1642700000000.0,1642700000000.0,1,KSSfF5lMIAg,KSSfF5lMIAg,Paper Decision,Accept (Poster),"The paper studies interpretability in multi instance learning (where model is trained with a label provided for a bag of instances). The author proposes model-agnostic weight-sampling strategy to improve sampling in prior methods such as (SHAP), and evaluate their performance on three datasets (and authors provided results on more datasets during rebuttal). + +All reviewers agree the paper is well written and well motivated. The paper presents a simple but meaningful extensions to existing interpretability study and will be helpful for the community. Reviewers had some concerns with the comprehensiveness of the evaluation, the strength of their proposed results, and the originality/novelty of the paper. The authors have provided further experimental results on new datasets as well as additional baselines. Given the study of MIL setting in interpretability is scarce, I am leaning towards the acceptance.",ICLR2022, +Syx3uOdVxV,1545010000000.0,1545350000000.0,1,rkxtl3C5YX,rkxtl3C5YX,"Contains interesting results, but certain key aspects were criticized and these criticisms were not addressed in the rebuttal.",Reject,"This work examines the AlphaGo Zero algorithm, a self-play reinforcement learning algorithm that has been shown to learn policies with superhuman performance on 2 player perfect information games. The main result of the paper is that the policy learned by AGZ corresponds to a Nash equilibrium, that and that the cross-entropy minimization in the supervised learning-inspired part of the algorithm converges to this Nash equillibrium, proves a bound on the expected returns of two policies under the and introduces a ""robust MDP"" view of a 2 player zero-sum game played between the agent and nature. + +R3 found the paper well-structured and the results presented therein interesting. R2 complained of overly heavy notation and questioned the applicability of the results, as well as the utility of the robust MDP perspective (though did raise their score following revisions). + +The most detailed critique came from R1, who suggested that the bound on the convergence of returns of two policies as the KL divergence between their induced distributions decreases is unsurprising, that using it to argue for AGZ's convergence to the optimal policy ignores the effects introduced by the suboptimality of the MCTS policy (while really interesting part being understanding how AGZ deals with, and whether or not it closes, this gap), and that the ""robust MDP"" view is less novel than the authors claim based on the known relationships between 2 player zero-sum games and minimax robust control. + +I find R1's complaints, in particular with respect to ""robust MDPs"" (a criticism which went completely unaddressed by the authors in their rebuttal), convincing enough that I would narrowly recommend rejection at this time, while also agreeing with R3 that this is an interesting subject and that the results within could serve as the bedrock for a stronger future paper.",ICLR2019,3: The area chair is somewhat confident +HJ633MLug,1486400000000.0,1486400000000.0,1,Bkab5dqxe,Bkab5dqxe,ICLR committee final decision,Accept (Poster),"The paper proposes a Neural Physics Engine for predicting intuitive physics. It is able to model the objects' dynamics and pairwise object interactions in order to build predictive models. This is a very nice direction, as also noted by the reviewers. One reviewer was particularly enthusiastic, while the other two less so. The main concerns were similarities to existing work, which however can be considered as done in parallel. The reviewers also had comments wrt evaluation, which the authors addressed. This is a good paper, and the AC recommends acceptance. The authors are also encouraged to look at ""Learning Multiagent Communication with Backpropagation"" by Sukhbaatar, whose method (albeit applied in a different context) seems relevant to the proposed approach.",ICLR2017, +2uhqwkw724Y,1610040000000.0,1610470000000.0,1,C5th0zC9NPQ,C5th0zC9NPQ,Final Decision,Reject,"All three reviewers recommend rejection, based on multiple (mostly shared) concerns. While the authors address the concerns in their rebuttal, the unanimously negative scores remain. I don't see basis to accept the paper.",ICLR2021, +uceaR6Q9Ety,1610040000000.0,1610470000000.0,1,lgNx56yZh8a,lgNx56yZh8a,Final Decision,Accept (Poster),"The paper presents a Bayesian approach for classification able to adapt to novel classes given only a few labeled examples. The models combines a one-vs-each approximation of the likelihood combined with a Gaussian process. This allows to resort to a data-augmentation scheme based on Polya-gamma random variables. +The paper is clearly written and combines existing techniques in a convincing manner; the experiments demonstrate better accuracy and uncertainty quantification on benchmark datasets. + +I recommend acceptance.",ICLR2021, +cIcDrkpOvM6,1642700000000.0,1642700000000.0,1,yuv0mwPOlz3,yuv0mwPOlz3,Paper Decision,Reject,"This paper considers the problem of active learning (AL) with data drawn from multiple domains. This framing motivates integrating work on domain shift detection and adaptation into standard AL approaches. + +The reviewers agreed that the work reports a robust set of experiments, which is a clear strength. However, they also raised key concerns, namely: (i) The heterogeneous setting considered is not particularly well motivated; (ii) The technical contributions of this work are limited. The latter would not be a major issue if the empirical evaluation addressed a clear open question (since this would constitute a useful contribution in and of itself), but the empirical contribution is somewhat limited given the unique setting considered and the relevant prior work (some of which seems to have been overlooked by the authors).",ICLR2022, +O57CvsvLEQ,1576800000000.0,1576800000000.0,1,rJlf_RVKwr,rJlf_RVKwr,Paper Decision,Reject,"Thanks for your detailed feedback to the reviewers, which clarified us a lot in many respects. +However, there is still room for improvement; for example, convergence to a good solution needs to be further investigated. +Given the high competition at ICLR2020, this paper is unfortunately below the bar. +We hope that the reviewers' comments are useful for improving the paper for potential future publication.",ICLR2020, +cWVpj_efPTKi,1642700000000.0,1642700000000.0,1,lbauk6wK2-y,lbauk6wK2-y,Paper Decision,Accept (Poster),"This paper received 4 quality reviews. The reviewers like the problem formulation and various ideas presented to enabling a working pipeline. However, almost all experiments are conducted on synthetic data. Concerns are also raised regarding its usage and application on real-world data. In the updated version, the authors add some results on real-world data, yet without quantitative evaluation. After the rebuttal and discussion, the final rating is 6 from 3 reviewers, and 5 from 1 reviewer (note that reviewer SKNa stated that the rating will be increased to 6 but did not end up changing it in the ""recommendation"" entry). The AC sees both pros and cons of this work. Given that a conference paper does not have to be comprehensive in all frontiers and the novel idea presented in this paper, the AC is leaning toward in favoring of this work and thus recommends acceptance.",ICLR2022, +TjBW-2VoHZ7,1642700000000.0,1642700000000.0,1,pfNyExj7z2,pfNyExj7z2,Paper Decision,Accept (Poster),"This paper adopts ViT in the VQ-GAN framework replacing CNN, and achieves SOTA FID and IS scores. The empirical results are pretty impressive. It could benefit some practical applications. + +The technical novelty is limited, but the tricks such as l2-normalization of codes are interesting.",ICLR2022, +2JXW-fi-CJ,1642700000000.0,1642700000000.0,1,CSfcOznpDY,CSfcOznpDY,Paper Decision,Accept (Poster),"This paper proposes an algorithm for achieving disentangled representations by encouraging low mutual information between features at each layer, rather than only at the encoder output, and proposes a neural architecture for learning. Empirically, the proposed method achieves good disentanglement metric and likelihood (reconstruction error) in comparison to prior methods. The reviewers think that the methodology is natural and novel to their knowledge, and are happy with the detailed execution. The authors are encouraged to improve the presentation of the paper, by providing rigorous formulation of the ""Markov chains"" to avoid confusions, justification of the independence assumptions behind them, and more in-depth discussions of the learning objectives.",ICLR2022, +T9_6-DHaySo,1610040000000.0,1610470000000.0,1,_mQp5cr_iNy,_mQp5cr_iNy,Final Decision,Accept (Poster),"This work tackles to address the sparse reward problem in RL. They augment actor-critic algorithms by adding an adversarial policy. The adversary tries to mimic the actor while the actor itself tries to differentiate itself from the adversary in addition to learning to solve the task. This in a way provides diversity in exploration behavior. Reviewers liked the paper in general but had several clarification questions. The authors provided the rebuttal and addressed some of the concerns. Considering the reviews and rebuttal, AC and reviewers believe that the paper provides insights that are useful to share with the community. That being said, the paper will still immensely benefit with more extensive experimentation on standard benchmark environments like Atari, etc. Please refer to the reviews for other feedback and suggestions.",ICLR2021, +S1zS8JpSf,1517250000000.0,1517260000000.0,771,ryepFJbA-,ryepFJbA-,ICLR 2018 Conference Acceptance Decision,Reject,"Pros: +The proposed regularization for GAN training is interesting and simple to implement. + +Cons: +- Reviewers agree that the methodology is incremental over the WGAN with gradient penalty and the modification is not well motivated. +- Experimental results do not clearly demonstrate the benefits of the proposed algorithm and the paper also lacks comparisons with related works. +GIven the pros/cons, the committee feels the paper is not ready for acceptance in its current state.",ICLR2018, +V-A8y1vmbi,1576800000000.0,1576800000000.0,1,HJgcw0Etwr,HJgcw0Etwr,Paper Decision,Reject,"The article studies a student-teacher setting with over-realised student ReLU networks, with results on the types of solutions and dynamics. The reviewers found the line of work interesting, but they also raised concerns about the novelty of the presented results, the description of previous works, settings and claims, and experiments. The revision clarified some of the definitions, the nature of the observations, experiments, and related works, including a change of the title. However, the reviewers still were not convinced, in particular with the interpretation of the results, and keep their original ratings. With many points that were raised in the original reviews, the article would benefit from a more thorough revision. ",ICLR2020, +O8hzint9or,1576800000000.0,1576800000000.0,1,H1l_gA4KvH,H1l_gA4KvH,Paper Decision,Reject,"The paper is not overly well written and motivated. A guiding thread through the paper is often missing. Comparisons with constrained BO methods would have improved the paper as well as a more explicit link to multi-objective BO. It could have been interesting to evaluate the sensitivity w.r.t. the number of samples in the Monte Carlo estimate. What happens if the observations of the function are noisy? Is there a natural way to deal with this? +Given that the paper is 10+ pages long, we expect a higher quality than an 8-pages paper (reviewing and submission guidelines). ",ICLR2020, +HyQMhfLOx,1486400000000.0,1486400000000.0,1,SJCscQcge,SJCscQcge,ICLR committee final decision,Reject,"While this is an interesting topic, both the method description and experimental setup could be improved.",ICLR2017, +25AEPAUMhU,1576800000000.0,1576800000000.0,1,SkxxtgHKPS,SkxxtgHKPS,Paper Decision,Accept (Poster),"The authors provide bounds on the expected generalization error for noisy gradient methods (such as SGLD). They do so using the information theoretic framework initiated by Russo and Zou, where the expected generalization error is controlled by the mutual information between the weights and the training data. The work builds on the approach pioneered by Pensia, Jog, and Loh, who proposed to bound the mutual information for noisy gradient methods in a step wise fashion. + +The main innovation of this work is that they do not implicitly condition on the minibatch sequence when bounding the mutual information. Instead, this uncertainty manifests as a mixture of gaussians. Essentially they avoid the looseness implied by an application of Jensen's inequality that they have shown was unnecessary. + +I think this is an interesting contribution and worth publishing. It contributes to a rapidly progressing literature on generalization bounds for SGLD that are becoming increasingly tight. + +I have one strong request that I will make of the authors, and I'll be quite disappointed if it is not executed faithfully. + +1. The stepsize constraint and its violation in the experimental work is currently buried in the appendix. This fact must be brought into the main paper and made transparent to readers, otherwise it will pervert empirical comparisons and mask progress. + +2. In fact, I would like the authors to re-run their experiments in a way that guarantees that the bounds are applicable. One approach is outline by the authors: the Lipschitz constant can be replaced by a max_i bound on the running squared gradient norms, and then gradient clipping can be used to guarantee that the step-size constraint is met. The authors might compare step sizes, allowing them to use less severe gradient clipping. The point of this exercise is to verify that the learning dynamics don't change when the bound conditions are met. If they change, it may upset the empirical phenomena they are trying to study. If this change does upset the empirical findings, then the authors should present both, and clearly explain that the bound is not strictly speaking known to be valid in one of the cases. It will be a good open problem. + + + + +",ICLR2020, +P0CA5IbrTbu,1642700000000.0,1642700000000.0,1,wMXYbJB-gX,wMXYbJB-gX,Paper Decision,Reject,"The authors proposed a two-stage algorithm for exploiting label smoothing and provided some analysis based on how label smoothing may have reduced the variance in the stochastic gradient. While the authors provided substantial experiments to justify their work (with additional ones during the response stage), none of the reviewers was very excited in the end, for obvious reasons perhaps: (a) the two-stage algorithm is a straightforward combination of existing practices (first run with label smoothing and then run without label smoothing), without any new, interesting insight from the authors' side; (b) the analysis is a direct consequence of the authors' assumptions. Basically, if label smoothing reduces variance, SGD would converge faster and vice versa, which is nothing surprising or insightful. The key is to understand when and how any particular way to smooth the label would lead to significant reduction of the variance, which the authors did not provide any guidance or insight other than offering some empirical results. Overall, we do not believe this work, in its current form, adds significant value to our understanding of label smoothing.",ICLR2022, +2FKyNMplY,1576800000000.0,1576800000000.0,1,rkeZNREFDr,rkeZNREFDr,Paper Decision,Reject,"This paper proposes to learn self-explaining neural networks using a feature leveling idea. Unfortunately, the reviewers have raised several concerns on the paper, including insufficiency of novelty, weakness on experiments, etc. The authors did not provide rebuttal. We hope the authors can improve the paper in future submission based on the comments. +",ICLR2020, +ByT-L1THG,1517250000000.0,1517260000000.0,729,HypkN9yRW,HypkN9yRW,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers generally agree that the DDRprog method is both novel and interesting, while also seeing merit in outperformance of related methods in the empirical results. However, There were a lot of complaints about the writing quality, the clarity of the exposition, and unclear motivation of some of the work. The reviewers also noted insufficient comparisons and discussions regarding relevant prior art, including recursive NNs, Tree RNNs, IEP, etc. While the authors have made substantial revisions to the manuscript, with several additional pages of exposition, reviewers have not raised their scores or confidence in response.",ICLR2018, +SK9ARl-vUF,1576800000000.0,1576800000000.0,1,S1e_9xrFvS,S1e_9xrFvS,Paper Decision,Accept (Spotlight),"The paper proposes a data-driven approach to learning atomic-resolution energy functions. Experiment results show that the proposed energy function is similar to the state-of-art method (Rosetta) based on physical principles and engineered features. + +The paper addresses an interesting and challenging problem. The results are very promising. It is a good showcase of how ML can be applied to solve an important application problem. + +For the final version, we suggest that the authors can tune down some claims in the paper to fairly reflect the contribution of the work. ",ICLR2020, +RsgOysWLDh,1576800000000.0,1576800000000.0,1,ByxRM0Ntvr,ByxRM0Ntvr,Paper Decision,Accept (Poster),"The paper provides a proof that Transformer networks (a popular deep learning model) are universal approximators for sequence-to-sequence functions. The theorem relies on the idea of contextual mappings (Definition 3.1), which models the attention layers. The results provide an important starting point for understanding a very widely used architecture. + +As with many theoretical papers, the reviewers provided several suggestions as to which are important parts to be presented in the main paper. The authors were very responsive during the discussion period, updating the structure of the paper significantly. This shows nice evidence supporting the need for a long discussion period for ICLR. One reviewer upgraded their score (to 8), which is not reflected in the system. + +This is an excellent paper, providing much needed theoretical analysis of a popular neural architecture. Clear accept. + +",ICLR2020, +gcfrgSYhQ2t,1610040000000.0,1610470000000.0,1,0BaWDGvCa5p,0BaWDGvCa5p,Final Decision,Reject,"All reviewers appreciated the main idea in the paper for solving the nonconvex-nonconcave minimax problems, which is deemed an extremely hard open problem. However, as R1 also pointed out, neither the theoretical nor the experimental results seem particularly strong, given that many variations of GDA and theoretical understanding of different notions of optimality have been recently developed. The paper fails to draw proper comparisons to these existing work. +Unfortunately, the paper is slightly below borderline and cannot be accepted this time. +",ICLR2021, +VHekEGRDxzj,1610040000000.0,1610470000000.0,1,wS0UFjsNYjn,wS0UFjsNYjn,Final Decision,Accept (Spotlight),"This paper addresses a method for unsupervised meta-learning where a VAE with Gaussian mixture prior is used and set-level inference, taking episode-specific dataset as input, is performed to calculate its posterior. In the meta-testing phase, semi-supervised learning with the learned VAE is used to fast adapt to few-show learning. Reviewers are satisfied with the author responses, agreeing that the method is a principled way to tackle unsupervised meta-learning. +",ICLR2021, +DNRVVjHdjt,1576800000000.0,1576800000000.0,1,rklfIeSFwS,rklfIeSFwS,Paper Decision,Reject,"This paper proposes a channel pruning approach based one-shot neural architecture search (NAS). As agreed by all reviewers, it has limited novelty, and the method can be viewed as a straightforward combination of NAS and pruning. Experimental results are not convincing. The proposed method is not better than STOA on the accuracy or number of parameters. The setup is not fair, as the proposed method uses autoaugment while the other baselines do not. The authors should also compare with related methods such as Bayesnas, and other pruning techniques. Finally, the paper is poorly written, and many related works are missing.",ICLR2020, +h_Ul9Cd0MQx,1642700000000.0,1642700000000.0,1,POvMvLi91f,POvMvLi91f,Paper Decision,Accept (Spotlight),"The paper proposes an interesting hypothesis about deep nets' generalization behavior inside RL methods: it suggests that the nets' implicit regularization favors a particular form of degeneracy, in which there is excessive aliasing of state-action pairs that tend to co-occur. It proposes a new regularizer to mitigate this problem. It evaluates the hypothesis and the regularizer empirically, and it provides suggestive derivations to motivate both. + +The reviewers praised the comprehensive empirical analysis, the insights into learning, and the combination of empirical and theoretical evidence. The authors participated responsively and helpfully in the discussion period, and addressed any concerns raised by the reviewers. + +This is a strong paper: it derives and motivates a novel hypothesis about an important problem, and analyzes this hypothesis both mathematically and experimentally.",ICLR2022, +ZPf8LMBXj7Bd,1642700000000.0,1642700000000.0,1,lD8qAOTu5FJ,lD8qAOTu5FJ,Paper Decision,Reject,"This paper investigates the stability-plasticity dilemma in the class incremental learning context. It investigates which model components are eligible to be “reused, added, fixed, or updated” to achieve a good balance. Initially the paper had one supporter (xDnv) who liked the motivation and extensiveness of the ablation. NFc9 and UadE also echoed some similar points about motivation and liked that the work was easy to follow. Reviewers expressed concerns such as incrementality w.r.t. spaceNet, too many hyper-parameters, unclear performance benefit, lack of comparison to SOTA, fixed sequence of classes specified by the authors, not clear how much forgetting is happening in each method (echoed by multiple reviewers), and limited datasets used for evaluations. The authors responded to the critical reviews and provided a revised version of the paper with additional comparisons to rehearsal-free methods and with more datasets (MNIST/FashionMNIST). + +Following the author response, NFc9 stated that they thought the paper was in better shape with the revisions and upgraded their score claiming it was “closer to acceptance”. Yet, they still had concerns with the practical implications of having too many hyper-parameters. UadE engaged further with the authors but claimed that they avoided the reviewer’s direct concerns. UadE maintained their concerns with the manual ordering of classes and older baselines. I agree that there are several rehearsal or pseudo-rehearsal methods to which they could have compared. Reviewer o6Js did not engage further. Overall this is a borderline paper. I appreciate the authors engaging in the discussion period, though my assessment is that the key issues still remain. This paper could use further development so my recommendation is reject.",ICLR2022, +BcFFI-IscFz,1610040000000.0,1610470000000.0,1,F2v4aqEL6ze,F2v4aqEL6ze,Final Decision,Accept (Poster),"There was some positive consensus towards this paper, which slightly improved after the very strong author rebuttal. Reviewers, in general, appreciate the simplicity of the approach as well as its effectiveness. The most acute criticisms derived from several theoretical and technical points, similarity with [Mizadeh, 2020], and missing baseline comparisons. The author rebuttal responds to each of these points very clearly and convincingly, as well as with new experimental baseline comparisons that clearly demonstrate the effectiveness of the CPR approach. I encourage the authors to include the extensive comparison with [Mizadeh, 2020] provided in the rebuttal, especially given the similarity to the proposed approach. and to also tone down the strong claims of novelty in light of the similarities. +",ICLR2021, +GWFqCqZ-IB,1576800000000.0,1576800000000.0,1,Byx_YAVYPH,Byx_YAVYPH,Paper Decision,Accept (Poster),"This paper proposes a flexible environment for studying never ending learning. During the discussion period, all reviewers found the paper to be borderline. + +Pros: +- we don't have good lifelong or never-ending RL environments, and this paper seems to provide one +- includes a number of interesting features such as multiple input modalities, non-episodic interactions, flexible task definitions + +Cons: +- procedurally generated, toy environment +- unclear if the environment reflects the characteristics of real world NEL problems + +In the balance, I think the environments add value to the RL community, and being presented at ICLR would increase its visibility.",ICLR2020, +M63mnFO3m3NW,1642700000000.0,1642700000000.0,1,Rj-x5_ej6B,Rj-x5_ej6B,Paper Decision,Reject,"In this paper the authors consider a contextual batched bandit setting where they rely on imputationin order to estimated the non-executed actions in each batch. Even though the idea is quite ineteretsing, and can lead to new methods, there is still a lof of issues raised by the reviwers. In particular, part of the proof was incorrect (and the authors tried to fix it) but given the short time, the reviwers felt that this part should be rewritten and scrutanized further. Also, there are many suggestions by reviewers that the authors need to apply in order to make this work publishable.",ICLR2022, +r1nEUypBM,1517250000000.0,1517260000000.0,765,Hy7EPh10W,Hy7EPh10W,ICLR 2018 Conference Acceptance Decision,Reject,"Pros: +The paper aims to unify classification and novelty detection which is interesting and challenging. + +Cons: +- The reviewers find that the work is incremental and contains heuristics. Reviewers find the repurposing of the fake logit in semi-supervised GAN discriminator for assigning novelty strange. +- The experiments presented are weak and authors do not compare with traditional/stronger approaches for novelty detection such as ""learning with abstention"" models and density models. +GIven the pros and cons, the committe finds the paper to fall short of acceptance in its current form. +",ICLR2018, +HreVJryIQRs,1610040000000.0,1610470000000.0,1,TfscevJuPNY,TfscevJuPNY,Final Decision,Reject,"This paper proposes a k-NN smoothing procedure for dealing with the problem of churn prediction. The idea is interesting and is based on theoretical foundations. The reviews have raised some limitations in the significance and in the experiments. The rebuttal provided by the authors have addressed some concerns. However, the new experimental evaluation have raised new concerns about the results, in particular with respect to results given in Table 3. Some typo may exist, but even some doubts remain on the experimental evaluation and results and thus on the effectiveness of the results. +Authors' rebuttal was too late to allow another round of discussion. +Considering the current concerns and uncertainties on the paper, I have to recommend rejection.",ICLR2021, +BjJBc1dDtTGp,1642700000000.0,1642700000000.0,1,1v1N7Zhmgcx,1v1N7Zhmgcx,Paper Decision,Reject,"This paper presents a simple approach called PDM for composing non-linear and complex normalizing flows with score-based generative models. Since score-based models can be considered as a special form of continuous-time normalizing flows, PDM corresponds to a composition of different classes of normalizing flows. + +Pros: +* Combining generic normalizing flows with score-based models is an interesting direction as they have different characteristics and can be complementary to each other. +* Using Ito's lemma to show that the model learns a non-linear SDE in data space is valuable. +* The authors show that the variational gap can be reduced using normalizing flows. + +Cons: +* The proposed method does not exhibit a clear advantage compared to the diffusion baseline without the normalizing flow component. On the CIFAR10 dataset, the best NLL and FID results are obtained by the diffusion baseline. + +* Theorem 2 makes a very unrealistic assumption that a flow network is flexible enough to transform $p_r$ to any arbitrary distribution. If this holds, we wouldn't need the score-based generation model anymore. We could simply train the normalizing flow to map the input data distribution to a Normal distribution. + +* This submission chooses to discuss differences with the recent LSGM framework. However, in doing so, several inaccurate claims are made. The lack of inference data diffusion in LSGM is mentioned as one of its drawbacks. However, it is not clear what is the value of having such a mechanism and what implications it may have on the expressivity of the model. Note that mapping from data space to latent space in VAEs can be considered as a stochastic inversion rather than an exact inversion. Ito's lemma does not require invertibility and it can be easily applied to the forward and generative diffusion in LSGM. The authors argue that applying it to the forward diffusion in LSGM will result in $\hat{p_{r}}\ne p_{r}$. But, $\hat{p_{r}}$ would be only considered for visualization of the forward diffusion and it is not used for training or any other purposes. LSGM, the proposed PDM, and score-based models are all trained with a reweighting of ELBO (see [here](https://arxiv.org/abs/2106.02808)). It is not clear if the drawback mentioned above has an impact on the training or expressivity of the model. + +* The presentation in the paper requires improvement. The motivation on why invertibility plays a key role is not clear beyond generating the visualization in Figure 2. + +In summary, the paper proposes an interesting idea and explores directions very relevant to the current focus in generative learning. However, given the concerns above, we don't believe that the paper in its current form is ready for presentation at ICLR.",ICLR2022, +T3YtNxarDmm,1610040000000.0,1610470000000.0,1,YjXnezbeCwG,YjXnezbeCwG,Final Decision,Reject,"This paper improves the wait-k based simultaneous NMT by training on an adaptive wait-m policy with a controller determining the lag for sentence pair. The controller is trained with RL to minimize the loss on a validation set. The overall model is reasonable, which is well presented. I however have the following two concerns +1. There is a clear mismatch between training/inference strategies, which raises two problems + 1. The motivation: the authors tried to explain that in discussion, but it is not convincing enough + 2. The title is misleading since there is no future information to use during inference +2. The experiments is not convincing enough in that a) the improvement over baseline is modest, and b) comparison to adaptive wait-k and other strong baseline is insufficient + +In conclusion I would suggest to reject this paper. + +",ICLR2021, +23fs_TgPt6_,1610040000000.0,1610470000000.0,1,KTS3QeWxRQq,KTS3QeWxRQq,Final Decision,Reject,"This submission analyses the VAE objective from the perspective of non-linearly scaled isometric embeddings, with the aim of improving our information-theoretic understanding of the variational objective. + +Reviewers are in consensus that this submission in its current form is very difficult to read, even after revisions by the authors. The metareviewer, who is highly familiar with information-theoretic and even information-geometric interpretations of VAEs, similarly struggled to understand this paper. Many concepts (e.g. KLT transforms) are not introduced in a self-contained manner, nor are related works like RaDOGAGA. Moreover the exposition introduces lots of notation (often somewhat implicitly) and requires more high-level plain-English statements that signal the structure of the overall narrative to the reader. As a result, it is hard to understand the paper, even at the level of the contributions that are claimed by the authors. It appears that Section 3.4 should be read as culminating in a ""correction"" of the rate-distortion view of VAEs proposed by Alemi et al. Unfortunately the metareviewer is not able to understand from the writing what fault the authors find with the proposed interpretation, and how their view informs a better perspective. + +It is difficult to provide the authors with concrete addressable suggestions at this stage of revision of their manuscript. The metareviewer's advice would be to attempt to focus on defining a narrative structure that clearly explains what insights this perspective of VAEs contributes, what misconception it corrects, and how it corrects it –– and then focus on streamlining notation in a manner that makes it possible to follow along with the exposition more easily. ",ICLR2021, +HyerHQoEx4,1545020000000.0,1545350000000.0,1,rJG8asRqKX,rJG8asRqKX,metareview for dynamic survival analysis paper,Reject,"While there was disagreement on this paper, reviewers remained unconvinced about the scalability and novelty of the presented work. While it was universally agreed that many positive points exist in this paper, it is not yet ready for publication. ",ICLR2019,4: The area chair is confident but not absolutely certain +asM72hm9wS,1576800000000.0,1576800000000.0,1,B1gXR3NtwS,B1gXR3NtwS,Paper Decision,Reject,"The authors develop stochastic variational approaches to learn Bayesian ""structure distributions"" for neural networks. While the reviewers appreciated the updates to the paper made by the authors, there will still a number of remaining concerns. There were particularly concerns about the clarity of the paper (remarking on informality of language and lack of changes in the revision with respect to comments in the original review), and the fairness of comparisons. Regarding comparisons, one reviewer comments: ""I do not agree that the comparison with DARTS is fair because the authors remove the options for retraining in both DARTS and DBSN. The reason DARTS trains using one half of the data and validate on the other is that it includes a retraining phase where all data is used. Therefore fair comparison should use the same procedure as DARTS (including a retraining phrase). At the very least, to compare methods without retraining, results of DARTS with more data (e.g., 80%) for training should be reported."" The authors are encouraged to continue with this work, carefully accounting for reviewer comments in future revisions.",ICLR2020, +MtIbxeI2r,1576800000000.0,1576800000000.0,1,rklMnyBtPB,rklMnyBtPB,Paper Decision,Reject,"Thanks to the authors for submitting the paper and providing further explanations and experiments. This paper aims to ensure robustness against several perturbation models simultaneously. While the authors' response has addressed several issues raised by the reviewers, the concern on the lack of novelty remains. Overall, there is not enough support among the reviewers for the paper to be accepted.",ICLR2020, +ryxqizaT1E,1544570000000.0,1545350000000.0,1,SyfIfnC5Ym,SyfIfnC5Ym,Using domain adaptation for robust adversarial learning,Accept (Poster),"The paper presents an interesting idea for increasing the robustness of adversarial defenses by combining with existing domain adaptation approaches. All reviewers agree that the paper is well written and clearly articulates the approach and contribution. + +The main areas of weakness is that the experiments focus on small datasets, namely CiFAR and MNIST. That being said, the algorithm is reasonably ablated on the data explored and the authors provided valuable new experimental evidence during the rebuttal phase and in response to the public comment. ",ICLR2019,4: The area chair is confident but not absolutely certain +SkbHNyaBz,1517250000000.0,1517260000000.0,339,Sy4c-3xRW,Sy4c-3xRW,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"This paper proposes a general regularization algorithm which builds on the dropout idea. This is a very significant topic. The overall motivation is good, but the specific design choices are less well motivated over, for example, ad-hoc choices. Some concerns remain after the post-rebuttal discussion with the reviewers: the improvement is incremental in terms of concepts and methodology, the clarity needs to be improved and the experiments are somehow weak. +In summary, the main idea and research direction is interesting, but the attempted generality of the algorithm and the significance of the area call for a more clear and convincing presentation. +",ICLR2018, +m6lZY-XCaYA,1642700000000.0,1642700000000.0,1,JvPopr9skL0,JvPopr9skL0,Paper Decision,Reject,"The paper adopts CVAE to generate OOD samples for training an outliner detector. It consists of two phases that train an OOD detector by leveraging the generated OOD data and shows it outperform other methods. According to reviewers’ discussion, there is a concern from the discussion: why CVAE works but other variants or cGAN doesn’t. The paper needs more motivation or evidence or ablations to support the generality of the work.",ICLR2022, +IAobiE5d8R,1610040000000.0,1610470000000.0,1,9UFIOHeVEh,9UFIOHeVEh,Final Decision,Reject,"The manuscript presents an approach for identifying sources of uncertainty in object classification tasks by disentangling representations in latent spaces. + +Three reviewers agreed that the manuscript is not ready for publication. +Some of the concerns are conceptual flaws, weak evaluation protocol, and an incorrect interpretation of experiment results. + +There is no author response. +",ICLR2021, +JGy5iT1Z_l,1576800000000.0,1576800000000.0,1,rJxlc0EtDr,rJxlc0EtDr,Paper Decision,Accept (Poster),"The authors introduce a new associative inference task from cognitive psychology, show shortcomings of current memory-augmented architectures, and introduce a new memory architecture that performs better with respect to the task. The reviewers like the motivation and thought the experimental results were strong, although they also initially had several questions and pointed to areas of the paper which lacked clarity. The authors updated the paper in response to the reviewer's questions and increased the clarity of the paper. The reviewers are satisfied and believe the paper should be accepted.",ICLR2020, +BgCdYMCfKVA,1642700000000.0,1642700000000.0,1,g5ynW-jMq4M,g5ynW-jMq4M,Paper Decision,Accept (Spotlight),"The paper provides new insights about how to identify latent variable distributions, making explicit assumptions about invariances. A lot of this is studied in the literature of non-linear ICA, although the emphasis here is on dropping the ""I"". I think more could be said about how allowing for dependencies among latents truly change the nature of the problem since any distribution can be built out of independent latents, by some more explicit contrast against the recent references given by the reviewers. In any case, the role of allowing for dependencies in the context of the invariances adopted is discussed, and despite no experimentation, the theoretical results are of general interest to the ICLR community and a worthwhile contribution to be discussed among researchers in this field.",ICLR2022, +6asfdkLJ22,1576800000000.0,1576800000000.0,1,Hkx3ElHYwS,Hkx3ElHYwS,Paper Decision,Reject,"The paper propose a new quantization-friendly network training algorithm called GQ (or DQ) net. The paper is well-written, and the proposed idea is interesting. Empirical results are also good. However, the major performance improvement comes from the combination of different incremental improvements. Some of these additional steps do seem orthogonal to the proposed idea. Also, it is not clear how robust the method is to the various hyperparameters / schedules. For example, it seems that some of the suggested training options are conflicting each other. More in-depth discussions and analysis on the setting of the regularization parameter and schedule for the loss term blending parameters will be useful.",ICLR2020, +VJI-7RzytUTK,1642700000000.0,1642700000000.0,1,3Od_-TkEdnG,3Od_-TkEdnG,Paper Decision,Reject,"This paper links OOD generalization with adversarial training and argues that adversarial training can help address the problem of OOD generalization. Based on all responses and reviews, there still are novelty concerns in this paper. In the meantime, this paper lacks theoretical justifications. More importantly, DAT only considers very limited situations regarding DG, which also reflects on its experimental results. In the following, I summarize the drawbacks of this paper for the possible revision in the future. + +1. It seems not novel to link AT with OOD generalization since two reviewers show some references related to using AT to address DG. + +2. From Eq. (9), DAT is based on perturbations rather than transformations. This means that DAT only considers very limited situations regarding DG. In the ordinary DG, source and target domains are from the same meta-distribution, which is clearly a more general case compared to the case considered in this paper. DAT-based AT might mislead the research direction of DG. It would be better to consider smart ways to generate adversarial examples, such as ""Pixeldefend: Leveraging generative models to understand and defend against adversarial examples"" (ICLR2018). + +3. Eqs. (7), (8) and (10) are not rigours. It is not convincing to propose a method based on these formulas. + +4. The method doesn't show substantial improvements compared to ERM in most tasks (CMNIST dominates the average), which implies the limitations of DAT (see 3). + +5. There are no theoretical contributions regarding DG. This paper does not mention the key assumption behind the DAT. For example, DG can be a well-defined problem if source and target domains are from the same meta-distribution. However, this paper does not clarify what assumptions it assumes and does not show how DAT can address DG in theory. + +Based on the above drawbacks, I recommend rejection for this paper.",ICLR2022, +J8VFUgsx9x,1610040000000.0,1610470000000.0,1,3hGNqpI4WS,3hGNqpI4WS,Final Decision,Accept (Poster),"# Quality: +The algorithm is thoroughly evaluated and several interesting experiments are included in the appendix. + +# Clarity: +The paper is generally well written. + +# Originality: +The proposed approach is a small but novel improvement over existing algorithms (to the best of the reviewers and my knowledge). The concept of ""deployment-efficiency"" is, in my opinion not novel, since it seems mostly a rebranding of what the MBRL community traditionally refers to as ""data-efficiency"" -- although I agree that deployment-efficiency is indeed a more accurate term. + +# Significance of this work: +The paper deal with a relevant and timely topic. However, the paper does not compare to the larger MBRL literature. Hence, it is difficult to gauge the significance of this work. + +# Overall: +This manuscript offers a good contribution to the topic of model-based reinforcement learning algorithms. + +# Minor comments: +- I suggest removing the word ""impressive"" from the abstract. This is a subjective term, which should be avoided. +- In my personal opinion, it would be nice to include experiments with more state-of-the-art baselines such as PETS and POPLIN, for which code is available online. It is unclear to me how much the improvement in performance depends on the algorithm itself compared to just having larger batch sizes. From this perspective, Figure 5 in Appendix B is probably the most interesting insight of the manuscript, to me.",ICLR2021, +zYv51yqavK,1576800000000.0,1576800000000.0,1,BJlOcR4KwS,BJlOcR4KwS,Paper Decision,Reject,"The paper proposed Channel Equilibrium (CE) to overcome the over-sparsity problem in CNNs using 'BN+ReLU'. Experiments on ImageNet and COCO show its effectiveness by introducing little computational complexity. However the reviewers pointed a number of problems in the writing and the clarity of the paper. Although the authors addressed all the se concerns in details and agreed to make revisions in the paper, it's better for the authors to submit the revised version to another opportunity.",ICLR2020, +1uOr7Ecc9Aa,1642700000000.0,1642700000000.0,1,bUKyC0UiZcr,bUKyC0UiZcr,Paper Decision,Reject,"The topic of this paper is non-uniform priors and exploration in reinforcement learning with the graph Laplacian. + +All reviewers appreciated several aspects of this work but they all also have several reservations. + +Looking at the paper, reviews and discussions, I see the potential a very nice more general contribution. This potential is not fully realised as the paper stands now. Acceptance can therefore not be recommended.",ICLR2022, +rksyhMI_e,1486400000000.0,1486400000000.0,1,ryxB0Rtxx,ryxB0Rtxx,ICLR committee final decision,Accept (Poster),"The paper begins by presenting a simple analysis for deep linear networks. This is more to demonstrate the intuitions behind their derivations, and does not have practical relevance. They then extend to non-linear resnets with ReLU units and demonstrate that they have finite sample expressivity. They formally establish these results. Inspired by their theory, they perform experiments using simpler architectures without any batch norm, dropout or other regularizations and fix the last layer and still attain competitive results. Indeed, they admit that data augmentation is a form of regularization and can replace other regularization schemes. + I think the paper meets the threshold to be accepted.",ICLR2017, +nTY68F4q8a,1576800000000.0,1576800000000.0,1,Bklrea4KwS,Bklrea4KwS,Paper Decision,Reject,The authors propose a novel MIL method that uses a novel approach to normalize the instance weights. The majority of reviewers found the paper lacking in novelty and sufficient experimental performance evidence.,ICLR2020, +2oopq90b8Ju,1642700000000.0,1642700000000.0,1,MqEcDNQwOSA,MqEcDNQwOSA,Paper Decision,Reject,"Overall, the paper proposes an interesting idea to share parameters across words and reduce the size of the embedding which hasn't been explored in the past with promising results on XNLI task. However, all the reviewers agree that the novelty of this paper is not enough. In addition, the clarity and experiments are not sufficient enough too.",ICLR2022, +P3y3TbnMauh,1610040000000.0,1610470000000.0,1,iG_Cg6ONjX,iG_Cg6ONjX,Final Decision,Reject,This paper studies the number of linear regions of a multi-layer ReLU network and gives a new upper bound. Reviewers concern about the writing and the results are incremental compared with previous results.,ICLR2021, +5DcrGpp8KT,1642700000000.0,1642700000000.0,1,SbV8J9JHb6,SbV8J9JHb6,Paper Decision,Reject,The authors propose a new way of addressing the ML as a service problem through using garbled circuits. As the reviewers point out the novelty is limited and comparison to existing work is not complete. The authors have also not responded to the reviews.,ICLR2022, +_Nzgy7iAC7,1576800000000.0,1576800000000.0,1,rJg8NertPr,rJg8NertPr,Paper Decision,Reject,"The paper proposes a top-down approach to train deep neural networks -- freezing top layers after supervised pre-training, then re-initializing and retraining the bottom layers. As mentioned by all the reviewers, the novelty is on the low side. The paper is purely experimental (no theory), and the experimental section is currently too weak. In particular: +- Experiments on different domains should be performed. +- Different models should be evaluated. +- Ablation experiments should be performed to understand better under which conditions the proposed approach works. +- For speech recognition, WER should be reported - even if it is without a LM - such that one can compare with existing work. +",ICLR2020, +H1KT3fIOe,1486400000000.0,1486400000000.0,1,HJTzHtqee,HJTzHtqee,ICLR committee final decision,Accept (Poster),"This paper proposes a framework whereby, to an attention mechanism relating one text segment to another piecewise, an aggregation mechanism is added to yield an architecture matching words of one segment to another. Different vector comparison operations are explored in this framework. The reviewers were satisfied that this work is relevant, timely, clearly presented, and that the empirical validation was sound. ",ICLR2017, +Fs4xmSjke,1576800000000.0,1576800000000.0,1,SJlxglSFPB,SJlxglSFPB,Paper Decision,Reject,"This paper studies the problem of out-of-distribution (OOD) detection for semantic segmentation. + +Reviewers and AC agree that the problem might be important and interesting, but the paper is not ready to publish in various aspects, e.g., incremental contribution and less-motivated/convincing experimental setups/results. + +Hence, I recommend rejection.",ICLR2020, +4ZlfETccZqS,1610040000000.0,1610470000000.0,1,20qC5K2ICZL,20qC5K2ICZL,Final Decision,Reject,"The paper studies robust learning in the presence of noisy labels and proposes a new loss function called the golden symmetric loss (GSL) combining both regular cross-entropy and reverse cross entropy and leveraging an estimate of the corruption matrix. The paper appears to be well-written. + +Pros: +- Good range of application domains (both vision and text). +- Learning with noisy labels is an important practical problem. +- Theoretical guarantees for the procedure using framework from recent work. + +Cons: +- Limited novelty as the method appears to be a weighted combination of two existing ideas. +- Concerns about the baselines used: why are the same baselines not being used throughout? +- Having at least a few trials with mean and standard deviation for the experiments would make the conclusions stronger. + +Overall, the limited novelty combined with the concerns about the empirical analysis was a key reason for rejection. ",ICLR2021, +HUfzPwLh8J3,1610040000000.0,1610470000000.0,1,09-528y2Fgf,09-528y2Fgf,Final Decision,Accept (Poster),"This paper revisits the design of positional embedding in the pre-trained language models, and propose a new approach to handling the positional encoding. + +Overall, the paper is well-motivated. The authors have addressed most comments based on the review. The method proposed in the paper is simple and effective. Experiments are comprehensive and demonstrate the effectiveness of the proposed approaches. +",ICLR2021, +ryBMUJ6rz,1517250000000.0,1517260000000.0,735,SJvrXqvaZ,SJvrXqvaZ,ICLR 2018 Conference Acceptance Decision,Reject,"Reviewers are unanimous in scoring this paper below threshold for acceptance. The authors did not submit any rebuttals of the reviews. + +Pros: +Paper is generally clear. +Hardware results are valuable. + +Cons: +Limited simulation results. +Proposed method is not really novel. +Insufficient empirical validation of the approach.",ICLR2018, +HyeK5OMJxV,1544660000000.0,1545350000000.0,1,ryGiYoAqt7,ryGiYoAqt7,metareview,Reject,"The authors take two algorithmic components that were proposed in the context of discrete-action RL - priority replay and parameter noise - and evaluate them with DDPG on continuous control tasks. The different approaches are nicely summarized by the authors, however the contribution of the paper is extremely limited. There is no novelty in the proposed approaches, the empirical evaluation is inconclusive and limited, and there is no analysis or additional insights or results. The AC and the reviewers agree that this paper is not strong enough for ICLR.",ICLR2019,5: The area chair is absolutely certain +S14-aGI_g,1486400000000.0,1486400000000.0,1,ryCcJaqgl,ryCcJaqgl,ICLR committee final decision,Reject,"I appreciate the authors putting a lot of effort into the rebuttal. But it seems that all the reviewers agree that the local trend features segmentation and computation is adhoc, and the support for accepting the paper is lukewarm. + + As an additional data point, I would argue that the model is not end-to-end since it doesn't address the aspect of segmentation. Incorporating that into the model would have made it much more interesting and novel.",ICLR2017, +XvX1mwCYbRa,1642700000000.0,1642700000000.0,1,01AMRlen9wJ,01AMRlen9wJ,Paper Decision,Accept (Spotlight),"This paper presents a novel methodology for performing meta learning for gradient-based hyperparameter optimization. The approach overcomes limitations (scaling, e.g.) of previous methods through distilling the gradients of the hyperparameters. The paper received 4 reviews, of which all were positive (6, 6, 8, 8). The reviewers appreciated the technical clarity of the paper and found the proposed approach sensible, novel, technically sophisticated and effective. The main concerns were regarding the comprehensiveness of the experiments and technical presentation of the dataset distillation. It seems that the reviewers found the author response (lots of results were added) satisfactory regarding these points. Thus the recommendation is to accept.",ICLR2022, +V7Gca0j6KX2D,1642700000000.0,1642700000000.0,1,vA7doMdgi75,vA7doMdgi75,Paper Decision,Accept (Spotlight),"The paper looks at subspace recovery in the presence of outliers, of which there have been many formulations. They study a recent formulation, DPCP, but relax the requirement that the dimension of the subspace is known -- obviously very important in practice. The approach is quite clever: they exploit the fact that for this non-convex problem, starting a simple algorithm at a randomly chosen starting point will converge to a local minimizer, and they can run an ensemble of these algorithms (each with different starting points) and be guaranteed the solutions will span an appropriate subspace. This idea alone is a nice contribution. The paper has theory and experiments. + +Most reviewers were positive about the paper. The most critical review, by 1qf1, still acknowledged that this paper has a lot of potential, but in their opinion the paper was not in a state ready for acceptance, especially regarding the formulation of the main result, Theorem 7. The other reviewers were OK with the state of the paper, and the authors made changes in the rebuttal. Hence, while acknowledging the paper could possibly still be improved (what paper couldn't be!), I think the paper is in a good enough state to accept it for ICLR. I don't think there would be enough benefit to the community (authors, readers and reviewers) to ask for this to go through one more round of submission/revision.",ICLR2022, +ty0RozFeyJz,1642700000000.0,1642700000000.0,1,gtvM-nBZEbc,gtvM-nBZEbc,Paper Decision,Reject,"This paper proposes a framework for novel object captioning by combining BERT and CLIP. The model improves fluency, fidelity, and adequacy of generated captions. However, as reviewers mentioned, the novelty is limited, combining large models and big data to solve a downstream task does not make useful insights at this moment.",ICLR2022, +_jIT4MTusrR,1642700000000.0,1642700000000.0,1,_j4hwbj6Opj,_j4hwbj6Opj,Paper Decision,Reject,"This paper applies a metalearning strategy to point cloud registration, which refines 3D registration networks to improve performance on specific datasets/settings. Reviews for this paper recognized its potential interest but uniformly highlighted that the work is lacking in polish---both from an expository perspective and in terms of experiments. Questions included whether the experiments truly support the claim of generalization, and whether the work would be better considered as a method for scene flow. Authors did not rebut these points, so I am recommending rejection.",ICLR2022, +ByH1aM8ul,1486400000000.0,1486400000000.0,1,B1ewdt9xe,B1ewdt9xe,ICLR committee final decision,Accept (Poster),"This paper proposes an interesting architecture for predicting future frames of videos using end-to-end trained deep predictive coding. + The architecture is well presented and the paper is clearly written. The experiments are extensive and convincing, include ablation analyses, and show that this architecture performs well compared to other current methods. + Overall, this is an interesting, solid contribution.",ICLR2017, +rcRPTcLTD_,1576800000000.0,1576800000000.0,1,HylNWkHtvB,HylNWkHtvB,Paper Decision,Reject,"This paper proposes an adaptive gradient method for optimization in deep learning called AvaGrad. The authors argue that AvaGrad greatly simplifies hyperparameter search (over e.g. ADAM) and demonstrate competitive performance on benchmark image and text problems. In thorough reviews, thorough author response and discussion by the reviewers (which are are all appreciated) a few concerns about the work came to light and were debated. One reviewer was compelled by the author response to raise their recommendation to weak accept. However, none of the reviewers felt strongly enough to champion the paper for acceptance and even the reviewer assigning the highest score had reservations. A major issue of debate was the treatment of hyperparameters, i.e. that the authors tuned hyperparameters on a smaller problem and then assumed these would extrapolate to larger problems. In a largely empirical paper this does seem to be a significant concern. The space of adaptive optimizers for deep learning is a crowded one and thus the empirical (or theoretical) burden of proof of superiority is high. The authors state regarding a concurrent submission: ""when hyperparameters are properly tuned, echoing our results on this matter"", however, it seems that the reviewers disagree that the hyperparameters are indeed properly tuned in this paper. It's due to these remaining reservations that the recommendation is to reject. ",ICLR2020, +rJZ3jMUux,1486400000000.0,1486400000000.0,1,ryb-q1Olg,ryb-q1Olg,ICLR committee final decision,Reject,"The reviewers pointed out several issues with the paper, and all recommended rejection.",ICLR2017, +3c4p7cm2Xx,1576800000000.0,1576800000000.0,1,r1gdj2EKPB,r1gdj2EKPB,Paper Decision,Accept (Poster),"The submission addresses the problem of continual learning with large numbers of tasks and variable task ordering and proposes a parameter decomposition approach such that part of the parameters are task-adaptive and some are task-shared. The validation is on omniglot and other benchmarks. + +The reviews were mixed on this paper, but most reviewers were favorably impressed with the problem setup, the scalability of the method, and the results. The baselines were limited but acceptable. The recommendation is to accept this paper, but the authors are advised to address all the points in the reviews in their final revision.",ICLR2020, +EZa-YwkGJY,1576800000000.0,1576800000000.0,1,rke5R1SFwS,rke5R1SFwS,Paper Decision,Reject,"The paper addresses the setting of continual learning. Instead of focusing on catastrophic forgetting measured in terms of the output performance of the previous tasks, the authors tackle forgetting that happens at the level of the feature representation via a meta-learning approach. As rightly acknowledged by R2, from a meta-learning perspective the work is quite interesting and demonstrates a number of promising results. +However the reviewers have raised several important concerns that placed this work below the acceptance bar: + (1) the current manuscript lacks convincing empirical evaluations that clearly show the benefits of the proposed approach over SOTA continual learning methods; specifically the generalization of the proposed strategy to more than two sequential tasks is essential; also see R1’s detailed suggestions that would strengthen the contributions of this approach in light of continual learning; +(2) training a meta-learner to predict the weight updates with supervision from a multi-task teacher network as an oracle, albeit nicely motivated, is unrealistic in the continual learning setting -- see R1’s detailed comments on this issue. +(3) R2 and R3 expressed concerns regarding i) stronger baselines that are tuned to take advantage of the meta-learning data and ii) transferability to the different new tasks, i.e. dissimilarity of the meta-train and meta-test settings. Pleased to report that the authors showed and discussed in their response some initial qualitative results regarding these issues. An analysis on the performance of the proposed method when the meta-training and testing datasets are made progressively dissimilar would strengthen the evaluation the proposed meta-learning approach. +There is a reviewer disagreement on this paper. AC can confirm that all three reviewers have read the rebuttal and have contributed to a long discussion. Among the aforementioned concerns, (3) did not have a decisive impact on the decision, but would be helpful to address in a subsequent revision. However, (1) and (2) make it very difficult to assess the benefits of the proposed approach, and were viewed by AC as critical issues. AC suggests, that in its current state the manuscript is not ready for a publication and needs a major revision before submitting for another round of reviews. We hope the reviews are useful for improving and revising the paper. +",ICLR2020, +J4t4dbhPNn,1642700000000.0,1642700000000.0,1,KBQP4A_J1K,KBQP4A_J1K,Paper Decision,Accept (Poster),"This work proposes a novel Transformer Control Flow model and achieves near-perfect accuracy on length generalization, simple arithmetic tasks, and computational depth generalization. All reviewers give positive scores. AE agrees that this work is very interesting and has many potentials. It would be exciting if the author could extend this framework to more challenging tasks (e.g. visual reasoning [1. 2]). Given the novelty of the proposed model, AC recommends accepting this paper! + +[1] CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning. ICCV 2017. + +[2] PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning, NeurIPS 21",ICLR2022, +Uo7hjO4i_dz5,1642700000000.0,1642700000000.0,1,wQStfB93RZZ,wQStfB93RZZ,Paper Decision,Reject,"The paper provides an ""asynchronous"" method for multi-agent actor-critic with macro-actions. A major contribution of this paper is the integration of the macro-action-value from the Q-value-based macro-action MARL method into multi-agent policy gradient. Although it appears an interesting contribution, reviewers found that several parts of the paper were not clear enough and there is a lack of fair comparison with previous works.",ICLR2022, +EbAS8jD-ZHi,1610040000000.0,1610470000000.0,1,uMNWbpIQP26,uMNWbpIQP26,Final Decision,Reject,"The paper shows linear convergence for generalized mirror descent on smooth function under the PL assumption. It extends the result to stochastic generalized mirror descent under an additional assumption on the Jacobian of the mirror map. Reviewers pointed out several technical issues with the submission. While some of the problems have since been resolved in the updated version, the paper still lacks sufficient novelty, and some concerns regarding the correctness/clarity of the claims remain. Unfortunately, I can not recommend acceptance at this time. ",ICLR2021, +#NAME?,1576800000000.0,1576800000000.0,1,SJlNnhVYDr,SJlNnhVYDr,Paper Decision,Reject,"The authors focus on low-resource text classifications tasks augmented with ""rationales"". They propose a new technique that improves performance over existing approaches and that allows human inspection of the learned weights. + +Although the reviewers did not find any major faults with the paper, they were in consensus that the paper should be rejected at this time. Generally, the reviewers' reservations were in terms of novelty and extent of technical contribution. + +Given the large number of submissions this year, I am recommending rejection for this paper. +",ICLR2020, +Q_PRJZj7Ryr,1642700000000.0,1642700000000.0,1,8QE3pwEVc8P,8QE3pwEVc8P,Paper Decision,Reject,"This paper received scores of 5,5,6,8. The reviewer giving a score of 8 stated that they would've given a 7, but that that is not an option in the system. The other reviewer giving an acceptance scores mentioned that they would also be OK with a rejection. The details of the assessment are thus less enthusiastic than could be assumed with an overall average score of 6. I am therefore weakly recommending rejection. + +The main criticisms of the reviewers are lack of novelty, lack of deeper analyses that really provide insights into why zero-cost operation scoring works, and lack of the number of NAS benchmarks tested. Out of these, personally, I would not criticize a lack of novelty, since it is not trivial to put together zero-cost and one-shot methods and the results appear promising. +However, even the most positive reviewer criticized that the work focuses on NAS-Bench-201 heavily (which is particularly problematic given that NAS-Bench-201 uses a fixed wiring and only allows the choice of operations; this may make the proposed method particularly applicable). During the rebuttal, the authors added NAS-Bench-1shot, which is a very good step, but the proposed technique does not actually work as well there. While this may be due to the special nature of operations in the nodes rather than in the edges for NAS-Bench-1shot1, for a revision, it would be good to add additional experiments on further NAS benchmarks in order to allow for a better understanding under which circumstances the proposed method works well. In particular, it would be interesting how well the method works on a quite different search space, such as the one of MobileNet.",ICLR2022, +tzxggDa8-h4,1610040000000.0,1610470000000.0,1,PmVfnB0nkqr,PmVfnB0nkqr,Final Decision,Reject," +The paper proposes a general framework for learn object-centric abstractions represented using PPDDL (a probabilistic planning language). The work assumes that objects and their attributes / features are identified. The key contribution of the paper appears to be proposing to group individual objects into object types based on whether objects have the same outcome in planning. Using the learned object types, it would then be possible to transfer learned operators from one task to another. The framework is demonstrated on block world (blocks are stacked on top of on another) and minecraft. + + +Review Summary: Initially, the submission received negative to borderline reviews with R4 being the most positive (score 6), and R1, R2, R3 being more negative (scores 4, 3, 5). After discussion, R4 lowered their score to 4 and indicated that they felt the work was not ready for acceptance at ICLR. Overall, there was limited discussion by the reviewers. Reviewers (R2,R4) found the direction of the work promising and interesting. After the author response, reviewers indicated that the revision and author response clarified some points, but believe that the work is not yet ready for acceptance, as 1) there is a significant amount of hand-crafting required and 2) parts of the approach is still not clearly specified. + +Clarity: As some reviewers note, the description of the framework is at a very high level, making it difficult to follow with missing details on specific details of how the object types are groups. The specific aspect of the work that is novel is also not clearly stated, thus making it difficult to judge the originality and significance of the work. After revision (the authors added brief paraphrase to the introduction to clarify the contribution, and additions to the appendices providing more details on how the difference steps work for the Minecraft scenario as well as failure cases), the manuscript is improved but the overall manuscript is still difficult to follow. + +Pros: +- Interesting and important problem (combining probabilistic/neural approaches with symbolic approaches) that is timely and deserves attention +- The idea of clustering objects based on their effect is interesting (R4). +- The framework proposed by the paper is interesting and potentially useful direction and can stimulate followup work + +Cons: +- The paper is difficult to follow with symbols/terms that are not clearly defined and missing details. (R1) The specific contributions of the work, wrt prior work, is also not clearly stated. +- The novelty/contribution of the work on top of existing work (Konidaris et al 2018, Ugur & Piater 2015, etc) is not that clear (R2) +- The experimental setup is weak with limited comparisons and no statistical results. Overall, reviewers felt the results are not convincing enough to support claims on transferability and learning efficiency. +- Lack of baselines comparisons (R3). In the rebuttal, the authors argue that there is no appropriate baselines. +- The set of steps that is involved is fairly complex (R1), with many important details provided in the appendix +- There are concerns about the generalization of the approach as many of the steps are handcrafted (R1, R4). In the provided scenarios, many of the steps, including the set of provided options, and representation of objects, are manually designed. + +Recommendation: +The AC agrees with the reviewers that the paper is not ready for acceptance to ICLR. It is the AC's opinion that the work addressing a very interesting problem and +would potentially be of interest to the community. However, the exposition of the paper needs to be improved so that 1) the contribution of the work over prior work (Konidaris et al 2018, Ugur & Piater 2015, etc) is clearer 2) the assumptions and details of the proposed method is also clearer and easier to follow. The authors are encouraged to improve their work and resubmit to an appropriate venue. ",ICLR2021, +lxIfPfkqs,1576800000000.0,1576800000000.0,1,rJgqjREtvS,rJgqjREtvS,Paper Decision,Reject,"All three reviewers agreed that the paper should not be accepted. No rebuttal was offered, thus the paper is rejected.",ICLR2020, +w1RLEPftxC,1576800000000.0,1576800000000.0,1,H1lhqpEYPr,H1lhqpEYPr,Paper Decision,Accept (Poster),"The authors propose an actor-critic method for finding Nash equilibrium in linear-quadratic mean field games and establish linear convergence under some assumptions. There were some minor concerns about motivation and clarity, especially with regards to the simulator. In an extensive and interactive rebuttal, the authors were able to argue that their results/methods, which appear to be rather specialized to the LQ setting, offer insight/methods beyond the LQ setting.",ICLR2020, +#NAME?,1610040000000.0,1610470000000.0,1,8bZC3CyF-f7,8bZC3CyF-f7,Final Decision,Reject,"The reviewers appreciated that the paper was clear and well written. They also appreciated that the paper has been largely improved during the discussions. The results seem to support the claim and the experiments on Minecraft are convincing. + +Yet, the reviewers had some important concerns. First the focus on RUDDER seems too strong and the method doesn't seem to be that much related to RUDDER. Presenting the work as a trajectory matching method seems more appropriate. In addition, the authors support their choice of referring to RUDDER because it comes with theoretical guarantees. But RUDDER's guarantees come from the usage of a modified LSTM while Align-RUDDER doesn't use an LSTM. + +The hierarchical approach was also questioned as the way to switch between different sub-policies is not very well explained in the paper. Baselines wrt to the switching method could not be provided. Similarly, the structure of the Minecraft task seems to be used heavily to define the hierarchy and meta-planning, so more baselines (with less structured tasks) were requested. + +The method also suffers from scalability issues as the authors acknowledge that if the number of events grows, they would need to downsample the events so as to apply their method. ",ICLR2021, +vEGklJXD5dy,1610040000000.0,1610470000000.0,1,04cII6MumYV,04cII6MumYV,Final Decision,Accept (Poster),"This paper studies the problem of multi-domain few-shot image classification and proposes a Universal Representation Transformer (URT) layer, which leverages universal features by dynamically re-weighting and composing the most appropriate domain-specific representations in a meta-learning way. The paper extends the prior work of SUR [Dvornik et al 2020] by using meta-learning and avoiding additional training during test phase. The experimental results show improvements over SUR in both accuracy (not always significant on some datasets though) and inference efficiency. Overall, the paper is well written with sufficient contributions. After the author's rebuttal and revision, reviewers generally agree the paper can be accepted. I recommend to Accept (Poster). ",ICLR2021, +mLBsEEdiLQ,1610040000000.0,1610470000000.0,1,ijJZbomCJIm,ijJZbomCJIm,Final Decision,Accept (Poster),"The premise of the work is simple enough: investigate if networks that are trained with an adversarial objective end up being more suitable for transfer learning tasks, especially in the context of limited labeled data for the new domain. The work uncovers the fact that shape-biased representations are learned this way and this helps for the tasks they considered. + +There was rather robust back and forth between the authors and the reviewers. The consensus is that this work has merit, has good quality experiments and investigates something with high potential impact (given the importance of transfer learning in general). I hope that most of the back and forth findings are incorporated in the final version of this work (especially the discussion and comparison with Shafahi et al., as well as all the nuances of the shape bias).",ICLR2021, +nIhTlSojRa,1576800000000.0,1576800000000.0,1,HkgaETNtDB,HkgaETNtDB,Paper Decision,Accept (Poster),"This paper presents mixout, a regularization method that stochastically mixes parameters of a pretrained language model and a target language model. Experiments on GLUE show that the proposed technique improves the stability and accuracy of finetuning a pretrained BERT on several downstream tasks. + +The paper is well written and the proposed idea is applicable in many settings. The authors have addressed reviewers concerns' during the rebuttal period and all reviewers are now in agreement that this paper should be accepted. + +I think this paper would be a good addition to ICLR and recommend to accept it. +",ICLR2020, +SJijLkTrz,1517250000000.0,1517260000000.0,860,rkTBjG-AZ,rkTBjG-AZ,ICLR 2018 Conference Acceptance Decision,Reject,"This paper introduces a framework for specifying the model search space for exploring over the space of architectures and hyperparameters in deep learning models (often referred to as architecture search). Optimizing over complex architectures is a challenging problem that has received significant attention as deep learning models become more exotic and complex. This work helps to develop a methodology for describing and exploring the complex space of architectures, which is a challenging problem. The authors demonstrate that their method helps to structure the search over hyperparameters using sequential model based optimization and Monte Carlo tree search. + +The paper is well written and easy to follow. However, the level of technical innovation is low and the experiments don't really demonstrate the merits of the method over existing strategies. One reviewer took issue with the treatment of related work. The underlying idea is compelling and addresses an open question that is of great interest currently. However, without experiments demonstrating that this works better than, e.g., the specification in the hyperopt package, it is difficult to assess the contribution. The authors must do a better job of placing this contributing in the context of existing literature and empirically demonstrate its advantages. The presented experiments show that the method works in a limited setting and don't explore optimization over complex spaces (i.e. over architectures - e.g. number of layers, regularization for each layer, type of each layer, etc.). There's nothing presented empirically that hasn't been possible with standard Bayesian optimization techniques. + +This is a great start, but it needs more justification empirically (or theoretically). + +Pros: +- Addresses an important and pertinent problem - architecture search for deep learning +- Provides an intuitive and interesting solution to specifying the architecture search problem +- Well written and clear + +Cons: +- The empirical analysis does not demonstrate the advantages of this approach over existing literature +- Needs to place itself better in the context of existing literature",ICLR2018, +nJ0UtpCVh1,1576800000000.0,1576800000000.0,1,rklxF0NtDr,rklxF0NtDr,Paper Decision,Reject,"This paper was reviewed by 3 experts, who recommend Weak Reject, Weak Reject, and Reject. The reviewers were overall supportive of the work presented in the paper and felt it would have merit for eventual publication. However, the reviewers identified a number of serious concerns about writing quality, missing technical details, experiments, and missing connections to related work. In light of these reviews, and the fact that the authors have not submitted a response to reviews, we are not able to accept the paper. However given the supportive nature of the reviews, we hope the authors will work to polish the paper and submit to another venue.",ICLR2020, +rkxuYr5fx4,1544890000000.0,1545350000000.0,1,SkVRTj0cYQ,SkVRTj0cYQ,Needs significant justification of novelty,Reject,"Following the unanimous vote of the reviewers, this paper is not ready for publication at ICLR. The greatest concern was that the novelty beyond past work has not been sufficiently demonstrated.",ICLR2019,5: The area chair is absolutely certain +R1zrQ-ec0Ie,1610040000000.0,1610470000000.0,1,O-6Pm_d_Q-,O-6Pm_d_Q-,Final Decision,Accept (Poster),"This paper introduces the multiple manifold problem - in a simple setting there are two data manifolds representing the positive and negative samples, and the goal is to train a neural network (or any predictor) that separates these two manifolds. The paper showed that this is possible to do with a deep neural network under certain assumptions - notably on the shape of the manifold and also on the ability of the neural network to represent certain functions (which is harder to verify, and only verified for a 1-d case in the paper). The optimization of neural network falls in the NTK regime but requires new techniques. Overall the question seems very natural and the results are reasonable first steps. There are some concerns about clarity that the authors should address in the paper.",ICLR2021, +PagJTXqs6,1576800000000.0,1576800000000.0,1,ryefE1SYDr,ryefE1SYDr,Paper Decision,Reject,"A nice idea: the latent prior is replaced by a GAN. A general agreement between all four reviewers to reject the submission, based on a not thorough enough description of the approach, and possibly not being novel.",ICLR2020, +EqI5vF39_7R,1642700000000.0,1642700000000.0,1,#NAME?,#NAME?,Paper Decision,Reject,"The paper suggests that robust overfitting could be viewed as the early-part of a double descent phenomenon for adversarial training. The authors identify implicit label noise, i.e. the label distribution mismatch between the true example and the generated adversarial example as a possible explanation for this phenomenon in adversarial training. This claim is empirically supported by experiments using static adversarial examples. The authors propose a method using temperature scaling and interpolation to mitigate the effects caused by implicit label noise for robust overfitting. This method is evaluated on CIFAR 10/100 and tiny-Imagenet. Concerns have been raised in the reviews about sufficient justification for the claim that implicit label noise leads to adversarial overfitting. The rebuttal answers this question to some extent. Concerns have also be raised about the writing and whether sufficient details of the experimental setup are present in the main paper. While I acknowledge the difficulty of fitting all details within page limits, I would think that these details are crucial given that primary support for the claims made are from empirical observations.",ICLR2022, +ZG27KFOTGM,1576800000000.0,1576800000000.0,1,HkgMxkHtPH,HkgMxkHtPH,Paper Decision,Reject,"This paper proposed to improve the quality of underwater images, specifically color distortion and haze effect, by an unsupervised generative adversarial network (GAN). An end-to-end autoencoder network is used to demonstrate its effectiveness in comparing to existing works, while maintaining scene content structural similarity. Three reviewers unanimously rated weak rejection. The major concerns include unclear difference with respect to the existing works, incremental contribution, low quality of figures, low quality of writing, etc. The authors respond to Reviewers’ concerns but did not change the rating. The ACs concur the concerns and the paper can not be accepted at its current state.",ICLR2020, +ZWr34xEJtSI,1642700000000.0,1642700000000.0,1,Kwm8I7dU-l5,Kwm8I7dU-l5,Paper Decision,Accept (Poster),"The authors introduce a GNN based method for classifying irregular multivariate timeseries. +They represent the dependencies among sensors using a graph structure and deploy message passing to +model the effect of a sensor on another)s=. The approach jointly learns embeddings and the dependency graph. + +The manuscript gathered a clear accept (8) and two marginal below the threshold scores (5). +I want to accept this work and I explain why. + +The reviews and the ongoing discussion during the rebuttal showed that the work is interesting +with its main strength being the novel exploration of GNNs application on irregularly samples multivariate +time series. + +There were many concerns raised by a reviewer regarding important theoretical and methodological issues in the paper. +During the rebuttal phase, the authors clarified and resolved the majority of the concerns and there was an ongoing discussion among the two sides, authors and reviewer (which I have to admit was a pleasure to watch researchers communicating). +The authors took into account the feedback and revised the manuscript accordingly. Having read the edits myself, I believe the submission is substantially improved and addressed the concerns sufficiently. + +I expect that this work will stimulate further research in the community and I would like to accept this.",ICLR2022, +Byf3HkpBf,1517250000000.0,1517260000000.0,651,HJr4QJ26W,HJr4QJ26W,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers agree that the idea of incorporating humans in the training of generative adversarial networks is interesting and worthwhile exploring. However, they felt that the paper fell short in providing strong support for their approach. The AC agrees. The authors are encouraged to strengthen their work and resubmit to a future venue.",ICLR2018, +GMezGXB2EVe0,1642700000000.0,1642700000000.0,1,NudBMY-tzDr,NudBMY-tzDr,Paper Decision,Accept (Oral),"This paper presents a method to interpret neurons in the vision neural models by generating natural language description that specifies the activation selectivity of a given neuron. The proposed method first identifies an exemplar set of input image regions that corresponds to a neuron, then searches a natural language description by optimizing the point-wise mutual information between descriptions and the exemplar set. + +Strength: +- Reasonable method design and clear writing +- Important problem and broad applications +- Extensive experiments for evaluation of the proposed method + +Weakness: +- Need more discussion on the limitations of the proposed method +- Elaboration on the human inter-annotation agreement +- Analysis on method transferability across tasks.",ICLR2022, +QwpN3Lfjxm,1576800000000.0,1576800000000.0,1,S1eik6EtPB,S1eik6EtPB,Paper Decision,Reject,"This submission studies an interesting problem. However, as some of the reviewers point out, the novelty of the proposed contributions is fairly limited.",ICLR2020, +HfpnDm-qJe,1642700000000.0,1642700000000.0,1,a3mRgptHKZd,a3mRgptHKZd,Paper Decision,Reject,"This paper builds upon existing works to prove that learning (correlated) equilibrium can be fast, i.e., faster than \sqrt{n} even in extensive form games. + +Three reviewers are rather lukewarm, and one reviewer is more positive (but seems less confident in his score). The two major criticisms is that this paper is very difficult to read and that the results might seem rather incremental with respect to the literature. + +I tend to agree with both points but the paper still as merits: the reason is that extensive form games are intrinsically way harder than normal form games and they more or less all have a burden of notations. We agreed that the authors actually did some efforts to make it fit within the page limit. but another a conference or a journal would have been better suited than ICLR. + +Our final conclusion is that the result is interesting yet maybe not breathtaking for the ICLR community; we are fairly certain that another venue for this paper will be more appropriate and that it will be accepted in the near future (I can only suggest journals based on the large amount of content and notations, such as OR, MOR, or GEB - yet, conferences such as EC should be more scoped too) . It does not, unfortunately, reach the ICLR bar.",ICLR2022, +01eERhtP2sZ,1642700000000.0,1642700000000.0,1,bUAdXW8wN6,bUAdXW8wN6,Paper Decision,Reject,"The paper describes an adversarial training approach that, in addition to the commonly used robustness loss, requires the network to extract similar representation distributions for clean and attacked data. The proposed method is inspired by domain adaptation approaches that require a model to extract domain invariant/agnostic features from two domains. Although the experimental results are solid and technically sound, the novelty of the methodology is not enough, as the domain classifier and the gradient reversal layer are the same with those methods in domain adaptation such as ""unsupervised domain adaptation by backpropagation"". On the other hand, more recent SOTA methods are missing and only smaller scale datasets are used for evaluation. During the discussions, the major concerns from three reviewers are novelty. + +I totally agree that the simplicity of the method should be a virtue. However, the idea of domain-invariant representation learning is already established well, and its application to adversarial training is quite intuitive to the community. Also, the similar methodology already exists in domain adaptation. According to the top-tier conference culture in the ML community, what most valuable is the novelty and insight, not the performance. In the end, I think that this paper may not be ready for publication at ICLR, but the next version must be a strong paper.",ICLR2022, +h8K3T8bkpED,1610040000000.0,1610470000000.0,1,jQ0XleVhYuT,jQ0XleVhYuT,Final Decision,Reject,"This paper discusses the conditional independence test using GAN. In the same way as GCIT (Bellot & van der Schaar, 2019), they realize sampling under the null hypothesis by generating sample from P(X|Z) approximately with GAN. They propose to use a test statistic defined by the maximum of generalized covariance measures (GCM) over random neural networks. They theoretically discuss the advantage of GCM and show the asymptotic results of the proposed test statistic, which demonstrates improved justification over GCIT. Experimental results show favorable performances over existing conditional independence tests. + +The proposed method gives an advance in the methodology of conditional independence tests for continuous domain, which is an important but difficult problem because of the difficulty of obtaining the null distribution. In the line of Bellot & van der Shcaar (2019), they solve it using the strong conditional sampling ability of GAN, which is an important research area. The theoretical analysis and experimental results are also making good contributions. + +However, there are some weakness in the proposed method and comparison with existing methods. First, as R4 points out, there are many hyperparameters in the proposed method, and their choice is not easy. While the authors addressed some aspects of this issue in their rebuttal and revision, it is still unclear how to justify the choice of B, the functions h_j, and the neural networks of GAN, which should potentially have significant influence on the test performance. Second, the comparison with Bellot and van der Schaar (2019) is not very clear. In the paper, the GCIT has been used with the distance correlation, which is known to be an instance of HSIC (MMD) with a specific choice of positive definite kernel (Sejdinovic et al 2013). The HSIC can be formulated as the maximum of generalized covariance measures over the unit ball of the RKHS. Thus, the difference of GCIT with distance correlation and the proposed methods are essentially the difference of the function classes for the maximum. On the other hand, the experimental results show significant difference in the test performance. I think more elaborate and careful comparison is needed for these two methods. + +Overall, the paper is a good contribution on the topic. However, the evaluation of the reviewers is not high enough to justify the acceptance in the high competition of ICLR. I encourage the authors complete their work by reflecting reviewers’ comments and submit this work to another conference or journal. + +Reference: +Sejdinovic, D., Sriperumbudur, B., Gretton, A., & Fukumizu, K. (2013). Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Annals of Statistics, 41(5), 2263–2291. +",ICLR2021, +3QAdOGfVH3i,1642700000000.0,1642700000000.0,1,P07dq7iSAGr,P07dq7iSAGr,Paper Decision,Accept (Poster),"This paper has been independently evaluated by four expert reviewers. After discussion with authors, three of them set their recommendations at marginal acceptance, one at straight accept. Perhaps the key criticism involved limited rigor of theoretical justification for the proposed method, but it appears to be applicable in practice as the empirical results suggest. All things considered, I am leaning towards recommending that this paper is accepted for ICLR 2022.",ICLR2022, +ixswxdHvoU,1576800000000.0,1576800000000.0,1,rJebgkSFDB,rJebgkSFDB,Paper Decision,Reject,"The paper looks at meta learning using random Fourier features for kernel approximations. The idea is to learn adaptive kernels by inferring Fourier bases from related tasks that can be used for the new task. A key insight of the paper is to use an LSTM to share knowledge across tasks. + +The paper tackles an interesting problem, and the idea to use a meta learning setting for transfer learning within a kernel setting is quite interesting. It may be worthwhile relating this work to this paper by Titsias et al. (https://arxiv.org/abs/1901.11356), which looks at a slightly different setting (continual learning with Gaussian processes, where information is shared through inducing variables). + +Having read the paper, I have some comments/questions: +1. log-likelihood should be called log-marginal likelihood (wherever the ELBO shows up) +2. The derivation of the ELBO confuses me (section 3.1). First, I don't know whether this ELBO is at training time or at test time. If it was at training time, then I agree with Reviewer #1 in the sense that $p(\omega)$ should not depend on either $x$ or $\mathcal {S}$. If it is at test time, the log-likelihood term should not depend on $\mathcal{S}$ (which is the training set), because $\mathcal S$ is taken care of by $p(\omega|\mathcal S)$. However, critically, $p(\omega|\mathcal S)$ should not depend on $x$. I agree with Reviewer #1 that this part is confusing, and the authors' response has not helped me to diffuse this confusion (e.g., priors should not be conditioned on any data). +3. The tasks are indirectly represented by a set of basis functions, which are represented by $\omega^t$ for task $t$. In the paper, these tasks are then inferred using variational inference and an LSTM. It may be worthwhile relating this to the latent-variable approach by Saemundsson et al. (http://auai.org/uai2018/proceedings/papers/235.pdf) for meta learning. +4. The expression ""meta ELBO"" is inappropriate. This is a simple ELBO, nothing meta about it. If we think of the tasks as latent variables (which the paper also states), this ELBO in equation (9) is a vanilla ELBO that is used in variational inference. +5. For the LSTM, does it make a difference how the tasks are ordered? +6. Experiments: Figure 3 clearly needs error bars, and MSEs need to be reported with error bars as well; +6a) Figures 4 and 5 need error bars. +6b) Error bars should also be based on different random initializations of the learning procedure to evaluate the robustness of the methods (use at least 20 random seeds). I don't think any of the results is based on more than one random seed (at least I could not find any statement regarding this). +7. Table 1 and 2: The highlighting in bold is unclear. If it is supposed to highlight the best methods, then the highlighting is dishonest in the sense that methods, which perform similarly, are not highlighted. For example, in Table 1, VERSA or MetaVRF (w/o LSTM) could be highlighted for all tasks because the error bars are so huge (similar in Table 2). +8. One of the things I'm missing completely is a discussion about computational demand: How efficiently can we train the model, and how long does it take to make predictions? It would be great to have some discussion about this in the paper and relate this to other approaches. +9. The paper evaluates also the effect of having an LSTM that correlates tasks in the posterior. The analysis shows that there are some marginal gains, but none of the is statistically significant. I would have liked to see much more analysis of the effect/benefit of the LSTM. + +Summary: The paper addresses an interesting problem. However, I have reservations regarding some theoretical bits and regarding the quality of the evaluation. Given that this paper also exceeds the 8 pages (default) limit, we are supposed to ask for higher acceptance standards than for an 8-pages paper. Hence, putting everything together, I recommend to reject this paper.",ICLR2020, +ZvDVfBMj5D7,1610040000000.0,1610470000000.0,1,jWkw45-9AbL,jWkw45-9AbL,Final Decision,Accept (Oral),"The paper studies the problem of being able to control text generated by pre-trained language models. +The problem is timely and important. The paper frames the problem as constraint satisfaction over a probability distribution. Both pointwise and distributional constraints can be imposed. The proposed algorithm, Generation with Distributional Control (GDC), is elegant, and is an interesting new addition to this line of work. Overall, the paper brings forth news ideas, and could have impact. +",ICLR2021, +f6n2YWQKLBL,1642700000000.0,1642700000000.0,1,8hWs60AZcWk,8hWs60AZcWk,Paper Decision,Accept (Poster),"Summary: Authors present an approach to improve the robustness of vision transformers by mapping standard tokens into discrete tokens that are invariant to small perturbations. Method is applied to a variety of backbone architectures and evaluated on a range of out of distribution forms of ImageNet test set. Significant performance gains are measured across many of these tasks. + +Pros: +- Novel, simple, effective approach +- General approach applicable across model variants, complimentary to other methods to improve robustness. +- Comprehensive study, evaluated on many ImageNet robustness benchmarks +- Well written overall + +Cons: +- Biggest issue: 3 reviewers point out concerns about validity of claims that ViT architecture is more reliant on local patterns and less on global context. This seems mostly a semantic issue around conjectures about why the method works – it does not invalidate the value of the new approach or its solid results. Authors have responded to reviewer concerns by changing wording in paper to relax the claims, specifying “shape information” rather than “global information”. They have also added experiments to measure shape bias, as defined in prior art, to backup these claims. +- Paper missing baselines of data augmentation strategies. Authors have responded by including such comparative experiments. +- Paper is missing ablation studies on changing the type of codebook. Authors have responded by including multiple variations of codebooks, and varying the codebook size. + +This paper was a close call based on the reviews. However, in AC opinion, the critiques have been adequately addressed by the authors. This is confirmed by adding an extra expert reviewer to the pool, who agreed with some earlier critiques, and was satisfied with the changes and additional experiments presented by the authors. AC recommendation is to accept.",ICLR2022, +H1e-jkTBxV,1545090000000.0,1545350000000.0,1,ryxLG2RcYX,ryxLG2RcYX,"innovative approach and strong results, concerns about comparison to baselines",Reject,"The paper presents a novel approach to exploration in long-horizon / sparse reward RL settings. The approach is based on the notion of abstract states, a space that is lower-dimensional than the original state space, and in which transition dynamics can be learned and exploration is planned. A distributed algorithm is proposed for managing exploration in the abstract space (done by the manager), and learning to navigate between abstract states (workers). Empirical results show strong performance on hard exploration Atari games. + +The paper addresses a key challenge in reinforcement learning - learning and planning in long horizon MDPs. It presents an original approach to this problem, and demonstrates that it can be leveraged to achieve strong empirical results. + +At the same time, the reviewers and AC note several potential weaknesses, the focus here is on the subset that substantially affected the final acceptance decision. First, the paper deviates from the majority of current state of the art deep RL approaches by leveraging prior knowledge in the form of the RAM state. The cause for concern is not so much the use of the RAM information, but the comparison to other prior approaches using ""comparable amounts of prior knowledge"" - an argument that was considered misleading by the reviewers and AC. The reviewers make detailed suggestions on how to address these concerns in a future revision. Despite initially diverging assessments, the final consensus between the reviewers and AC was that the stated concerns would require a thorough revision of the paper and that it should not be accepted in its current stage. + +On a separate note, a lot of the discussion between R1 and the authors centered on whether more comparisons / a larger number of seeds should be run. The authors argued that the requested comparisons would be too costly. A suggestion for a future revision of the paper would be to only run a large number (e.g., 10) of seeds for the first 150M steps of each experiment, and presenting these results separately from the long-running experiments. This should be a cost efficient way to shed light on a particularly important range, and would help validate claims about sample efficiency.",ICLR2019,4: The area chair is confident but not absolutely certain +gxAihoNTWM,1576800000000.0,1576800000000.0,1,H1gcw1HYPr,H1gcw1HYPr,Paper Decision,Reject,"This paper proposes a network architecture which labels object with an identifier that it is trained to retain across subsequent instances of that same object. + +After discussion, the reviewers agree that the approach is interesting, well-motivated and written, and novel. However, there was unanimous concern about the experimental evaluation, so the paper does not appear to be ready for publication just yet, and I am recommending rejection.",ICLR2020, +rJloypYxgE,1544750000000.0,1545350000000.0,1,BJlyznAcFm,BJlyznAcFm,meta-review,Reject,"The paper presents a novel architecture, reminescent of mixtures-of-experts, +composed of a set of advocates networks providing an attention map to a +separate ""judge"" network. Reviewers have several concerns, including lack +of theoretical justification, potential scaling limitations, and weak +experimental results. Authors answered to several of the concerns, which did +not convinced reviewers. The reviewer with the highest score was also the least +confident, so overall I will recommend to reject the paper.",ICLR2019,4: The area chair is confident but not absolutely certain +cDmbvXuFuyE,1642700000000.0,1642700000000.0,1,LtI14EpWKH,LtI14EpWKH,Paper Decision,Reject,"This paper proposes an image tesselation scheme to improve the robustness of image classifiers. The reviewers agree that the method is simple and intuitive, and view this as a positive attribute. At the same time, the reviewers want to see if the method works on higher resolution images. It was also not clear to reviewers how the attacks on the method were constructed, whether they were white box, and whether they were adaptive. Without a rebuttal, these questions remain unanswered.",ICLR2022, +MmkqSEbt0Jb,1610040000000.0,1610470000000.0,1,oLltLS5F9R,oLltLS5F9R,Final Decision,Reject,"Four knowledgeable referees have indicated reject mainly because of limited motivation [R1,R3,R4], limited insights on the proposed approach [R1,R2,R3,R4], and inconclusive results [R1,R2,R3,R4]. The claims of the paper could have been strengthened by e.g. discussing currently missing experimental details [R1], performing statistical significance tests [R2,R4], and including comparisons to baselines/previously introduced normalization strategies [R2,R3]. Unfortunately, there was no rebuttal. The paper is therefore rejected.",ICLR2021, +SJtGTGLOg,1486400000000.0,1486400000000.0,1,B1-Hhnslg,B1-Hhnslg,ICLR committee final decision,Reject,"The program committee appreciates the authors' response to concerns raised in the reviews. Unfortunately, reviews are not leaning sufficiently towards acceptance. Reviewers have concerns about the relationships of this work to existing work in literature (both in terms of a discussion to clarify the novelty, and in terms of more complete empirical comparisons). Authors are strongly encouraged to incorporate reviewer feedback in future iterations of the work.",ICLR2017, +BkejS5O-xN,1544810000000.0,1545350000000.0,1,HJldzhA5tQ,HJldzhA5tQ,meta review,Reject,"The paper proposes and approach for model-based reinforcement learning that adds a constraint to encourage the predictions from the model to be consistent with the observations from the environment. The reviewers had substantial concerns about the clarify of the initial submission, which has been significantly improved in revisions of the paper. The experiments have also been improved. +Strengths: The method is simple, the performance is competitive with state-of-the-art approaches, and the experiments are thorough including comparisons on seven different environments. +Weaknesses: The main concern of the reviewers is the lack of concrete discussion about how the method compares to prior work. While the paper cites many different prior methods, the paper would be significantly improved by explicitly comparing and contrasting the ideas presented in this paper and those presented in prior work. A secondary weakness is that, while the results appear to be statistically significant, the improvement over prior methods is still relatively small. +I do not think that this paper meets the bar for publication without an improved discussion of how this work is placed among the existing literature and without more convincing results. + +As a side note, the authors should consider comparing to the below NeurIPS '18 paper, which significantly exceeds the performance of Nagabandi et al '17: https://arxiv.org/abs/1805.12114",ICLR2019,4: The area chair is confident but not absolutely certain +IailUqNOji,1642700000000.0,1642700000000.0,1,RB_2cor6d-w,RB_2cor6d-w,Paper Decision,Reject,"This paper studies physical ""adversarial programs"" that allow an attacker to control a machine learning model by placing transparent patches on top of an image. The reviewers are split on this paper: while some reviewers like the work, others are concerned about the practicality, novelty, or utility of the attack. + +Starting with novelty, reviewers raise valid concerns about how this approach is similar to prior attacks that generate programs. The authors respond here, but the overall question remains unanswered and it is not clear which of the new pieces this paper introduces are responsible for the success. (Would prior techniques have sufficed? If not what part of prior methods makes this not the case?) + +For utility, the paper does not make a clear case of why it would be easier for an adversary to place N~=5 patches on top of an image as compared to other physical attacks (see especially Li et al. 2019 as a paper that deserves more than a sentence of comparison---why is this approach easier?). + +One final comment raised by many reviewers is the fact that the title and setup to this paper heavily lean on the ""physical"" component of the evaluation, and yet the paper does not demonstrate anything physical. The authors rebuttal that the word ""towards"" absolves them of responsibility for trying an attack in the real world does not convince me; either the paper should attempt this attack in the physical world (and say if it works or if it doesn't) or make it clear from the top that the attack is going to be digital from the start, but motivated by the physical world. Prior accepted papers that include physical world in the title (e.g., Kurakin et al., Athalye et al., Li et al.) don't solve the problem completely, but at least run experiments in the physical world.",ICLR2022, +ahHW2xn_S,1576800000000.0,1576800000000.0,1,SJgmR0NKPr,SJgmR0NKPr,Paper Decision,Accept (Poster),"The paper proposes an alternative to BPTT for training recurrent neural networks based on an explicit state variable, which is trained to improve both the prediction accuracy and the prediction of the next state. One of the benefits of the methods is that it can be used for online training, where BPTT cannot be used in its exact form. Theoretical analysis is developed to show that the algorithm converges to a fixed point. Overall, the reviewers appreciate the clarity of the paper, and find the theory and the experimental evaluation to be reasonably well balanced. After a round of discussion, the authors improved the paper according to the reviews. The final assessments are overall positive, and I’m therefore recommending accepting this paper.",ICLR2020, +HFo6TkNL6P,1642700000000.0,1642700000000.0,1,qCBmozgVr9r,qCBmozgVr9r,Paper Decision,Reject,"This paper has been reviewed with four expert reviewers. The reviewers have reached the consensus that the paper is not yet ready for publication. The main concerns are related to novelty. All reviewers gave substantial and constructive feedback. Following the recommendation of the reviewers, the meta reviewer recommends rejection.",ICLR2022, +clmQ9gWiKr,1576800000000.0,1576800000000.0,1,rkgOlCVYvB,rkgOlCVYvB,Paper Decision,Accept (Poster),This paper studies the landscape of linear networks and its critical point. The authors utilize geometric properties of determinantal varieties to derive interesting results on the landscape of linear networks. The reviewers raised some concerns about the fact that many of the results stated here can already be achieved using other techniques and therefore had some concerns about the novelty of these results. The authors provided a detailed response addressing these concerns. One reviewer however still had some concerns about the novelty. My own understanding of the paper is that while some of these results can be obtained using other approaches the proof techniques (brining ideas from algebraic geometry) is novel and could be rather useful. While at this point it is not clear that the techniques generalize to the nonlinear case I think algebraic geometry perspective have a good potential and provide some diversity in the theoretical techniques. As a result I recommend acceptance if possible.,ICLR2020, +oYAHAOT0vL-,1642700000000.0,1642700000000.0,1,2g9m74He1Ky,2g9m74He1Ky,Paper Decision,Reject,"The paper proposed to learn a disentangled representation of spatiotemporal mobility data using a VAE-based architecture, in order to separate spatial and temporal dependencies. This is an interesting and relevant problem, but the reviewers found the paper to be weak in motivation and empirical evaluations.",ICLR2022, +0rvNn-dybYI,1642700000000.0,1642700000000.0,1,3GHHpYrYils,3GHHpYrYils,Paper Decision,Reject,"The work presented in this submission is focused on a new approach for learning a model that can perform well at any point in time, and called Anytime Learning at Macroscale (ALMA). The algorithm processes data through a series of training batches, each of these processing steps being followed to a model evaluation. The total loss is the average (or sum) of the losses computed at each step. + +Reviewers agreed that the paper is not ready for acceptance at ICLR 2022 as the presentation of the work lacks of clarity, especially w.r.t. to the similarities with online learning and the learning of streams of data, and the fundamental difference between small or moderate batch sizes and very large batches.",ICLR2022, +rJl0SgASgN,1545100000000.0,1545350000000.0,1,rkx0g3R5tX,rkx0g3R5tX,not above threshold,Reject,"All reviewers agree that the paper is not quite ready for publication. +",ICLR2019,4: The area chair is confident but not absolutely certain +ShQbB_VWdht,1610040000000.0,1610470000000.0,1,P63SQE0fVa,P63SQE0fVa,Final Decision,Reject,"This paper proposes a deep reinforcement learning approach for solving minimax multiple TSP problem. Their main algorithmic contribution is to propose a specialized graph neural network to parameterize the policy and used a clipped idea to stabilize the training. Unfortunately, the reviewers remain to be unconvinced by the experiments after the rebuttal and the writing need to be significantly improved. Also, it would be worthwhile to study how the proposed method can generalize to other problems. +",ICLR2021, +HJMaMJarf,1517250000000.0,1517260000000.0,27,SJJinbWRZ,SJJinbWRZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The reviewers agree that the paper presents nice results on model based RL with an ensemble of models. The limited novelty of the methods is questioned by one reviewer and briefly by the others, but they all agree that this paper's results justify its acceptance.",ICLR2018, +r1xA3ZcpyV,1544560000000.0,1545350000000.0,1,B1gabhRcYX,B1gabhRcYX,Nice work combining Bundle Adjustment and Deep Learning Methods,Accept (Oral),"The first reviewer summarizes the contribution well: This paper combines [a CNN that computes both a multi-scale feature pyramid and a depth prediction, which is expressed as a linear combination of ""depth bases""]. This is used to [define a dense re-projection error over the images, akin to that of dense or semi-dense methods]. [Then, this error is optimized with respect to the camera parameters and depth linear combination coefficients using Levenberg-Marquardt (LM). By unrolling 5 iterations of LM and expressing the dampening parameter lambda as the output of a MLP, the optimization process is made differentiable, allowing back-propagation and thus learning of the networks' parameters.] + +Strengths: +While combining deep learning methods with bundle adjustment is not new, reviewers generally agree that the particular way in which that is achieved in this paper is novel and interesting. The authors accounted for reviewer feedback during the review cycle and improved the manuscript leading to an increased rating. + +Weaknesses: +Weaknesses were addressed during the rebuttal including better evaluation of their predicted lambda and comparison with CodeSLAM. + +Contention: +This paper was not particularly contentious, there was a score upgrade due to the efforts of the authors during the rebuttal period. + +Consensus: +This paper addresses an interesting area of research at the intersection of geometric computer vision and deep learning and should be of considerable interest to many within the ICLR community. The discussion of the paper highlighted some important nuances of terminology regarding the characterization of different methods. This paper was also rated the highest in my batch. As such, I recommend this paper for an oral presentation. ",ICLR2019,4: The area chair is confident but not absolutely certain +SkzznGU_x,1486400000000.0,1486400000000.0,1,H1Gq5Q9el,H1Gq5Q9el,ICLR committee final decision,Reject,"This paper effectively demonstrates that the use of pretraining can improve the performance of seq2seq models for MT and summarization tasks. However, despite these empirical gains, the reviewers were not convinced enough by the novelty of the work itself and did not feel like there were technical contributions to make this a fit for ICLR. + + Pros: + - All reviewers agree that the empirical gains in this paper are convincing and lead to BLEU improvements on a large scale translation and translation like tasks. Reviewer 4 also praises the detailed analysis that demonstrates that these gains come from the pretraining process itself. + - From an impact perspective, the reviewers found the approach clear and implementable. + + Cons: + - Novelty criticisms are that the method is a ""compilation"" of past approaches (although at a larger scale) and therefore primarily experimental, and that the objectives given are ""highly empirical"" and not yet motivated by theory. The authors did respond, but the reviewer did not change their score. + - There are suggestions that this type of work would perhaps be more widely impactful in an NLP venue, where a BLEU improvement of this regard is a strong supporting piece on its own.",ICLR2017, +Sy0lnzIdl,1486400000000.0,1486480000000.0,1,B1vRTeqxg,B1vRTeqxg,ICLR committee final decision,Invite to Workshop Track,"As part of this meta-review, I read the paper and found some surprising claims, such as the somewhat poorly motivated claim that coercing the output of a sub-network be a unit vector my dividing it by its L2 norm is close to layer normalisation which is mathematically almost true, if the mean of of the activations is 0, and we accept a fixed offset in the calculation of the stddev, but conceptually a different form of normalisation. It is also curious that other methods of obtaining stable training in recursive networks, such as TreeLSTM (Zhu et al. 2015, Tai et al. 2015), were not compared to. None of these problems is particularly damning but it is slightly disappointing not to see these issues discussed in the review process. + +Overall, the reviews, which I found superficial in comparison to the other papers I am chairing, found the method proposed here sound, although some details lacked explanation. The consensus was that the general problem being addressed is interesting and timely, given the attention the topics of program induction and interpretation have been receiving in the community recently. There was also consensus that the setting the model was evaluated on was far too simple and unnatural, and that there is need for a more complex, task involving symbolic interpretation to validate the model. It is hard to tell, given all the design decisions made (l2-normalisation vs layer norm, softmax not working), whether the end product is tailored to the task at hand, and whether it will tell us something useful about how this approach generalises. I am inclined, on the basis of the reviewer's opinions of the setting and my own concerns outlined above, to recommend redirection to the workshop track.",ICLR2017, +cVg58FZ6jgm,1642700000000.0,1642700000000.0,1,nsjkNB2oKsQ,nsjkNB2oKsQ,Paper Decision,Reject,"The paper introduces an interesting new model for MDPs, where the time is divided into random segments, and at the end of each segment the cumulative reward for the given segment is communicated to the agent. Some theoretical results with a policy improvement algorithm, as well as a more practical algorithm are presented. While the reviewers valued these contributions, they all had issues with the presentation of the paper. + +These presentation issues make the paper extremely hard to follow -- this was a problem for all reviewers, and I also verified it myself. The reviewers also raised issues regarding the experiments, where the algorithms should be tuned properly to be able to draw valid conclusions. + +While unfortunately the above issues prevent me from recommending acceptance of the paper, the authors are strongly encouraged to revise their paper and resubmit to the next venue, with a special emphasis on making the presentation proper. There are several problems/recommendations mentioned in the reviews which will certainly help in this regard (I would also add that special care should be made that everything is defined properly, e.g., the equation for your policy iteration should appear in the main text not in a proof in the appendix, or $\hat{Q}_\phi$ should be defined, etc.).",ICLR2022, +ByvLL1pBM,1517250000000.0,1517260000000.0,789,H1U_af-0-,H1U_af-0-,ICLR 2018 Conference Acceptance Decision,Reject,"This an interesting new contribution to construction of random features for approximating kernel functions. While the empirical results look promising, the reviewers have raised concerns about not having insights into why the approach is more effective; the exposition of the quadrature method is difficult to follow; and the connection between the quadrature rules and the random feature map is never explicitly stated. Some comparisons are missing (e.g., QMC methods). As such the paper will benefit from a revision and is not ready for ICLR-2018 acceptance.",ICLR2018, +z1JJ9TYRvf,1576800000000.0,1576800000000.0,1,H1gX8C4YPr,H1gX8C4YPr,Paper Decision,Accept (Poster),"The authors present and implement a synchronous, distributed RL called Decentralized Distributed Proximal Policy Optimization. The proposed technique was validated for pointgoal visual navigation task on recently introduced Habitat challenge 2019 and got the state of art performance. + +Two reviews recommend this paper for acceptance with only some minor comments, such as revising the title. The Blind Review #2 has several major concerns about the implementation details. In the rebuttal, the authors provided the source code to make the results reproducible. + +Overall, the paper is well written with promising experimental results. I also recommend it for acceptance. +",ICLR2020, +dJPdwYEX6O4n,1642700000000.0,1642700000000.0,1,Vt1lpp5Vebd,Vt1lpp5Vebd,Paper Decision,Reject,"Three experts reviewed this paper and all recommended rejection. The rebuttal did not change the reviewers' recommendations. The reviewers was not excited by the proposed probabilistic framework and raised many concerns regarding the comparison with baselines and competing methods, limited size of datasets, and limited scope of one dataset for one task. Considering the reviewers' concerns, we regret that the paper cannot be recommended for acceptance at this time. The authors are encouraged to consider the reviewers' comments when revising the paper for submission elsewhere.",ICLR2022, +y6dQ5xNJKz8,1642700000000.0,1642700000000.0,1,DnG75_KyHjX,DnG75_KyHjX,Paper Decision,Accept (Poster),"A deep Bayesian generative model is presented for multi-omics +integration, using fused Gromov-Wasserstein regularization between +latent representations of the data views. The method removes several +non-trivial and practically important restrictions from an earlier +method BayRel, enabling application in new setups, while still +performing well. + +Reviewers discussed the paper with the authors, resolving +misunderstandings of the differences from earlier work +(esp. BayReL). The authors reported more extensive experiments in the +rebuttal, though not comparisons. The main remaining weakness is that +the contributions are in a very narrow field, or at least aplications +have only been demonstrated in the narrow field of multi-omics data +analysis. And even within that field, only in a narrow subfield. In a +machine learning venue that is restrictive. Another issue is +computational efficiency. The final decision then depends on how much +weight we place on the novel contributions vs these weaknesses.",ICLR2022, +wArIlIZqV,1576800000000.0,1576800000000.0,1,B1xZD1rtPr,B1xZD1rtPr,Paper Decision,Reject,"Main content: + +Blind review #1 summarizes it well: + +This paper introduces a variant of the Information Bottleneck (IB) framework, which consists in permuting the conditional probabilities of y given x and y given \hat{x} in a Kullback-Liebler divergence involved in the IB optimization criterion. + +Interestingly, this change only results in changing an arithmetic mean into a geometric mean in the algorithmic resolution. + +Good properties of the exponential families (existence of non-trivial minimal sufficient statistics) are preserved, and an analysis of the new critical points/information plane induced is carried out. + +-- + +Discussion: + +The reviews generally agree on the elegant mathematical result, but are critical of the fact that the paper lacks any empirical component whatsoever. + +-- + +Recommendation and justification: + +The paper would be good for ICLR if it had any decent empirical component at all; it is a shame that none was presented as this does not seem very difficult.",ICLR2020, +aN2RvgJtU9,1576800000000.0,1576800000000.0,1,Hygv3xrtDr,Hygv3xrtDr,Paper Decision,Reject,"The paper proposes an interesting idea of identifying repeated action sequences, or behavioral motifs, in the context of hierarchical reinforcement learning, using sparsity/compression. While this is a fresh and useful idea, it appears that the paper requires more work, both in terms of presentation/clarity and in terms of stronger empirical results. +",ICLR2020, +rkxpCvDxxE,1544740000000.0,1545350000000.0,1,BkeU5j0ctQ,BkeU5j0ctQ,Simple method. Good results. Limited Novelty.,Accept (Poster),"This paper combines two different types of existing optimization methods, CEM/CMA-ES and DDPG/TD3, for policy optimization. The approach resembles ERL but demonstrates good better performance on a variety of continuous control benchmarks. Although I feel the novelty of the paper is limited, the provided promising results may justify the acceptance of the paper.",ICLR2019,4: The area chair is confident but not absolutely certain +B1bdIJaBM,1517250000000.0,1517260000000.0,812,H1DJFybC-,H1DJFybC-,ICLR 2018 Conference Acceptance Decision,Reject,"The paper addresses an interesting problem, is novel and works. +While the paper improved through reviews + rebuttal, the reviewers still find the presentation lacking. ",ICLR2018, +a60Ig2LqYV8,1610040000000.0,1610470000000.0,1,l35SB-_raSQ,l35SB-_raSQ,Final Decision,Accept (Poster),"This paper proposes a method to solve regression without correspondence. The problem is well-motivated, and the proposed method is technically sound. The motivation, organization, and presentation of the paper are very clear. Reviewers’ suggestions to further improve the paper (e.g., clarifications on initialization, comparison and discussion with with EM, AD, etc) were adequately incorporated to the revised manuscript. ",ICLR2021, +SkgPhyk4lV,1544970000000.0,1545350000000.0,1,H1xEwsR9FX,H1xEwsR9FX,Area chair recommendation,Reject,"The authors replace the large filtering step in the permutohedral lattice with a spatially varying convolutional kernel. They show that inference is more efficient and training is easier. + +In practice, the synthetic experiments seem to show a greater improvement than appears in real data. There are concerns about the clarity, lack of theoretical proofs, and at times overstated claims that do not have sufficient support. + +The ratings before the rebuttal and discussion were 7-4-6. After, R1 adjusted their score from 6 to 4. R2 initially gave a 7 but later said ""I think the authors missed an opportunity here. I rated it as an accept, because I saw what it could have been after a good revision. The core idea is good, but fully agree with R1 and R3 that the paper needs work (which the authors were not willing to do). I checked the latest revision (as of Monday morning). None of R3's writing/claims issues are fixed, neither were my additional experimental requests, not even R1's typos."" There is therefore a consensus among reviewers for reject. +",ICLR2019,5: The area chair is absolutely certain +yoFT62NzZl,1576800000000.0,1576800000000.0,1,ryesZANKPB,ryesZANKPB,Paper Decision,Reject,"Despite the new ideas in this paper, reviewers feel that it needs to be revised for clarification, and that experimental results are not convincing. I have down-weighted the criticisms of Reviewer 2 because I agree with the authors' rebuttal. However, there is still not enough support among the remaining reviews to justify acceptance. ",ICLR2020, +Rr7kPxGOE,1576800000000.0,1576800000000.0,1,H1ervR4FwH,H1ervR4FwH,Paper Decision,Reject,"The work addresses the problem of inferring group structure from unstructured data in multi-agent learning settings, proposing a novel approach that has key computational / run time advantages over a prior approach. A key limitation raised by reviewers is the limited quantitative evaluation and comparison to previous approaches, as well as a resulting set of general insights into advantages of the proposed approach compared to prior work (beyond computational benefits). While some of the key limitations were addressed in the rebuttal, the contribution in its current form remains too narrow. The paper is not ready for publication at ICLR at this stage.",ICLR2020, +wFHtyBaJxGS,1610040000000.0,1610470000000.0,1,mzfqkPOhVI4,mzfqkPOhVI4,Final Decision,Reject,"The paper proposes a multi-scale spatial-temporal joint graph convolution for spatiotemporal forecastings. Many reviewers have concerns regarding novelty, baseline comparisons, and writing clarity of the draft.",ICLR2021, +HpSJvYKS1HB,1642700000000.0,1642700000000.0,1,FS0XKbpkdOu,FS0XKbpkdOu,Paper Decision,Reject,"In spite of some slightly mixed scores (with one borderline positive review), scores are ultimately lukewarm and tend toward negative (and furthermore, reviews are broadly in agreement as to the issues they raise). Main issues center around low significance of the results, and issues with the presentation that need to be addressed.",ICLR2022, +2Z0NUIViikS,1610040000000.0,1610470000000.0,1,IUaOP8jQfHn,IUaOP8jQfHn,Final Decision,Reject,"This paper received 4 reviews with mixed initial ratings: 7, 5, 4, 5. The main concerns of R1, R2 and R4, who gave unfavorable scores, included: lack of methodological novelty (analysis-only paper), absence of experiments on real data (3 synthetic-only benchmarks), missing baselines and an overall inconclusive discussion. At the same time R5 notes that the offered fair comparison between SOTA methods was indeed ""much needed"", and the paper can ""serve an important role"" in guiding future developments in the community. In response to that, the authors submitted a new revision and provided detailed answers to each of the reviews separately. R1, R2 and R4 did not participate in the discussion, and R5 stayed with the positive rating. +AC agrees with R5 that the provided analysis is insightful, and the effort put into organizing the research community around a single set of benchmarks and metrics is indeed valuable. However, given a simplistic nature of the proposed datasets and lack of other methodological contributions, the submission is not meeting the acceptance bar for ICLR. After discussion with PCs, the final recommendation is to reject.",ICLR2021, +QDKN4JLrMu,1610040000000.0,1610470000000.0,1,kvhzKz-_DMF,kvhzKz-_DMF,Final Decision,Accept (Poster),"The paper shows the success of a relatively simple idea -- fine tune a pretrained BERT Model using Variational Information Bottleneck method of Alemi to improve transfer learning in low resource scenarios. + +I agree with the reviewers that novelty is low -- one would like to use any applicable method for controlling overfitting when doing transfer learning, and of the suite of good candidates, VIB is an obvious one -- but at the same time, I'm moved by the results because of: the improvements and the success on a wide range of tasks and the surprising success of VIB over other alternatives like dropout etc, and hence I'm breaking the tie in the reviews by supporting acceptance. Its a nice trick that the community could use, if the results of the paper are an indication of its potential.",ICLR2021, +HkeoHwvlgV,1544740000000.0,1545350000000.0,1,SkguE30ct7,SkguE30ct7,Improvements needed,Reject,"This paper formulates the recommendation as a model-based reinforcement learning problem. Major concerns of the paper include: paper writing needs improvement; many decisions in experimental design were not justified; lack of sufficient baselines; results not convincing. Overall, this paper cannot be published in its current form. +",ICLR2019,5: The area chair is absolutely certain +S1zhjfUdg,1486400000000.0,1486400000000.0,1,r1osyr_xg,r1osyr_xg,ICLR committee final decision,Reject,The reviewers agree that the paper's clarity and experimental evaluation can be improved.,ICLR2017, +6teyj1kvXv,1642700000000.0,1642700000000.0,1,nWlk4jwupZ,nWlk4jwupZ,Paper Decision,Reject,"This paper proposes a deep RL framework for the traditional schedule problem. The proposed algorithm is shown to be effective and has zero-shot generalization abilities. Reviewers are mostly satisfied with the response and the overall evaluation is slightly positive. However, there are some drawbacks of the current paper preventing it from getting a higher evaluation: (1) The reviewers believe that the contribution might be small -- at least for the RL area; the experimental performance for the scheduling problem is also not significantly improved compared to other methods (e.g. the search-based ones). Hence the reviewers believe the contribution of the paper is limited. (2) There is a number of typos and language issues in its present version. The paper may need several rounds of polishment before publication. (3) There is a lack of theoretical justification for the proposed method. In sum, the AC recommends a borderline rejection.",ICLR2022, +ZNbyKNNnE_s,1610040000000.0,1610470000000.0,1,O358nrve1W,O358nrve1W,Final Decision,Reject,"There are some interesting ideas in this paper, but I agree with reviewers that without a comparison to existing work, it is hard to place this work in its proper context. The authors make several arguments in dismissing the need for side-by-side comparisons, but I do not find these arguments convincing. +* First, the authors argue that there are no suitable benchmarks for them to compare, and that in particular SyGuS benchmarks would not be suitable because they are dealing with a different problem. I disagree. There are 2 tracks in SyGuS specifically for programming-by-example problems, one for string manipulations and one for bit-vector programs. I think the string manipulation problems would be a good match for this technique. +* The authors also argue that their technique is so much more general than prior techniques that a side-by-side comparison would be unfair. However, their most complex benchmark, sorting, has been somewhat of a standard benchmark in the program synthesis community for about a decade now. And while a lot of recent synthesis work has focused on domain specific languages, many systems starting with Sketch and continuing with Myrth and Synquid were turning complete. Turing completeness can make a big difference if you are trying to synthesize verified code, but in the context of programming-by-example, turning completeness does not really present any fundamental challenges. + +I am willing to believe that this technique is more scalable than existing techniques, so that while existing techniques may do better than this technique when synthesizing for small languages, this technique would surpass them when applied to a bigger language. But if that's the argument that the authors want to make I would like to see some evidence, and ideally some quantitative data as to how big a language would have to be before this technique wins out.",ICLR2021, +so7ECsR_qW2,1642700000000.0,1642700000000.0,1,vr4Wo33bd1,vr4Wo33bd1,Paper Decision,Reject,"The paper addresses semi-supervised learning with unbalanced class distribution, a.k.a long-tail. The main idea is to alternate learning of the representation and the classifier. +Reviewers pointed out that several papers already addressed this learning setup, often under the name ""imbalanced semi-supervised learning"". No rebuttal was submitted. + +The paper should make direct comparison to recent papers listed by reviewers, both in terms of the technical approach and in terms of empirical experiments. It cannot be accepted tot ICLR.",ICLR2022, +qoXChHQxtj,1642700000000.0,1642700000000.0,1,5hLP5JY9S2d,5hLP5JY9S2d,Paper Decision,Accept (Oral),"This paper provides well-written and thorough analysis demonstrating that closed-set recognition performance correlates with open-set recognition performance, and that simply making the close-set model strong via augmentation, label smoothing, etc. along with small scoring changes (using logits rather than softmax probabilities) can get close to (or better than in some cases) performance than much more complicated methods. The authors also propose a large-scale benchmark that varies the semantic similarity across classes, allowing for a more fine-grained analysis of this problem. + +Overall, all of the reviewers thoughts that the paper provides very thorough validation of an insight that would be very interesting to the community. Reviewer HAFU had some concerns about novelty, since a number of papers have shown closed-set classifier improvements (and therefore better embeddings) benefit related problems such as few-shot learning and generalization to novel domains, as well as proposed large-scale experiments. The rebuttal convinced this reviewer, however, that some of the contributions and findings are unique and provide additional evidence to the community, and the new setting provides more fine-grained analysis. Reviewer dw7J had a number of suggestions in terms of additional evaluations, and the rebuttal either clarified why it is not possible or added them. As a result, after the discussion the reviewers all supported acceptance of this paper. + +Given the above discussion, and rebuttal/changes to the paper, I recommend acceptance. It is a very well-done empirical paper, provides interesting findings, stronger baselines, and thorough experimentation. Further, some of the smaller findings (ViT correlation experiment) as well as larger relationship between open-set recognition and out-of-distribution detection are valuable contributions to the community. Finally, I would recommend this paper as oral, given that it may garner a good discussion of these contributions.",ICLR2022, +HJ5jN1pSf,1517250000000.0,1517260000000.0,428,B1CNpYg0-,B1CNpYg0-,ICLR 2018 Conference Acceptance Decision,Reject,"The pros and cons of the paper can be summarized as follows: + +Pros: +* The method of combining together multiple information sources is effective +* Experimental evaluation is thorough + +Cons: +* The method is a relatively minor contribution, combining together multiple existing methods to improve word embeddings. This also necessitates the model being at least as complicated as all the constituent models, which might be a barrier to practical applicability + +As an auxiliary comment, the title and emphasis on computing embeddings ""on the fly"" is a bit puzzling. This is certainly not the first paper that is able to calculate word embeddings for unknown words (e.g. all the cited work on character-based or dictionary-based methods can do so as well). If the emphasis is calculating word embeddings just-in-time instead of ahead-of-time, then I would also expect an evaluation of the speed or memory requirements benefits of doing so. Perhaps a better title for the paper would be ""integrating multiple information sources in training of word embeddings"", or perhaps a more sexy paraphrase of the same. + +Overall, the method seems to be solid, but the paper was pushed out by other submissions. +",ICLR2018, +LRDTfqQQ5uy,1642700000000.0,1642700000000.0,1,5i2f-aR6B8H,5i2f-aR6B8H,Paper Decision,Accept (Poster),"Thank you for your submission. The reviewers agree that this paper provides new contributions to data privacy. In particular, the proposed definition interpolates between the local differential privacy and shuffled differential privacy definitions. As argued in the paper, mechanisms under this framework can prevent certain inferential attacks based on the relationships across the individuals (e.g., which individuals belong to the same household). The paper also provides good evidence that their mechanism guards against a specific type of inferential attacks and provides stronger utility than mechanisms based on uniform shuffling.",ICLR2022, +QBmNQo6QO4,1576800000000.0,1576800000000.0,1,Hkxp3JHtPr,Hkxp3JHtPr,Paper Decision,Reject,"This paper presents two novel VAE-based methods for semi-supervised anomaly detection (SSAD) where one has also access to a small set of labeled anomalous samples. The reviewers had several concerns about the paper, in particular completely addressing reviewer #3's comments would strengthen the paper.",ICLR2020, +VGD_pDzC_m,1610040000000.0,1610470000000.0,1,qiAxL3Xqx1o,qiAxL3Xqx1o,Final Decision,Reject,"In this paper, the authors proposed a geometric graph generator that applies a WGAN model for efficient geometric interpretation. All the reviewers agree that the idea is interesting and the method has the potentials for graph generation tasks. Unfortunately, the experimental part is unsatisfying, which makes the paper on the borderline. More analytic experiments should be designed to verify the properties of the proposed GG-GAN, especially its scalability. Although in the rebuttal phase the authors add a simple example to generate large but simple graphs, we would like to see more experiments and comparisons on more real-world large graphs (even if the performance may not be good, the results will be constructive for both readers and authors to understand the work). ",ICLR2021, +WOLREsOEXOb,1642700000000.0,1642700000000.0,1,g8NJR6fCCl8,g8NJR6fCCl8,Paper Decision,Accept (Spotlight),"The paper proposes two new generalized additive models (GAM) based on neural networks and referred to as NODE-GAM and NODE-GA2M. An empirical analysis shows that the proposed and carefully designed architectures perform comparably to several baselines on medium-sized datasets while outperforming them on larger datasets. Moreover, it is shown that the differentiability of the proposed models allows them to benefit from self-supervised learning. + +Reviewers agreed on the technical significance and novelty of the proposed models and valued the clever design of the new architectures. Most concerns and open questions could be answered in the rebuttal and by changes in the revised manuscript. Based one the suggestions of one reviewer new experiments comparing the proposed models to NAM were added, which improved the paper further. The paper should be accepted.",ICLR2022, +oPQZwXLK0,1576800000000.0,1576800000000.0,1,Skg2pkHFwS,Skg2pkHFwS,Paper Decision,Reject,"This paper presents an ensemble method for reinforcement learning. The method trains an ensemble of transition and reward models. Each element of this ensemble has a different view of the data (for example, ablated observation pixels) and a different latent space for its models. A single (collective) policy is then trained, by learning from trajectories generated from each of the models in the ensemble. The collective policy makes direct use of the latent spaces and models in the ensemble by means of a translator that maps one latent space into all the other latent spaces, and an aggregator that combines all the model outputs. The method is evaluated on the CarRacing and VizDoom environments. + +The reviewers raised several concerns about the paper. The evaluations were not convincing with artificially weak baselines and only worked well in one of the two tested environments (reviewer 2). The paper does not adequately connect to related work on model-based RL (reviewer 1 and 2). The paper does not motivate its artificial setting (reviewer 2 and 1). The paper's presentation lacks clarity from using non-standard terminology and notation without adequate explanation (reviewer 1 and 3). Technical aspects of the translator component were also unclear to multiple reviewers (reviewers 1, 2 and 3). The authors found the review comments to be helpful for future work, but provided no additional clarifications. + +The paper is not ready for publication.",ICLR2020, +E2Fee4sHfK,1576800000000.0,1576800000000.0,1,HkedQp4tPr,HkedQp4tPr,Paper Decision,Reject,"The paper proposes a parallelization approach for speeding up scheduled sampling, and show significant improvement over the original. The approach is simple and a clear improvement over vanilla schedule sampling. However, the reviewers point out that there are more recent methods to compare against or combine with, and that the paper is a bit thin on content and could have addressed this. The proposed approach may well combine well with newer techniques, but I tend to agree that this should be tested.",ICLR2020, +Y0Nh5pxsVF3,1610040000000.0,1610470000000.0,1,7IDIy7Jb00l,7IDIy7Jb00l,Final Decision,Reject,"The paper studies offline meta reinforcement learning. Overall the scope of this contribution seems limited. Reviewers have raised concerns about the significance of the presented results given the assumptions, and that the experimental environments are not extensive and do not fully support the claimed advances. ",ICLR2021, +3fIOx_5ZPl,1610040000000.0,1610470000000.0,1,j9Rv7qdXjd,j9Rv7qdXjd,Final Decision,Accept (Poster),"Most reviewers found the method proposed to be technically sound, well-motivated and particularly interesting due to the interpretability of its results. Indeed, the extraction of interpretable motifs from NAS is a valuable contribution. One of the reviewers was particularly concerned by the lack of guarantees of the proposed method and a perceived failure mode of averaged gradients. We thank both the reviewer and the authors for the detailed discussion on these points. Ultimately, the benefits of the method proposed and the magnitude of the contributions in the paper outweigh these concerns.",ICLR2021, +rJBjEy6BM,1517250000000.0,1517260000000.0,424,SJZ2Mf-0-,SJZ2Mf-0-,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"This paper presents an interesting model which at the time of submission was still quite confusingly described to the reviewers. +A lot of improvements have been made for which I applaud the authors. +However, at this point, the original 20 babi tasks are not quite that exciting and several other models are able to fully solve them as well. +I would encourage the authors to tackle harder datasets that require reasoning or multitask settings that expand beyond babi. + +",ICLR2018, +txfjA7b-blE,1642700000000.0,1642700000000.0,1,74x5BXs4bWD,74x5BXs4bWD,Paper Decision,Accept (Poster),"This paper introduces a novel quality-diversity algorithm, ""Evolutionary Diversity Optimization with Clustering-based Selection (EDO-CS)"", and applies it to reinforcement learning. A bandit approach (UCB) is used to select which cluster to sample parents from. The QD algorithm can be evaluated on its own, outside of the RL context, and if so it should be compared to the several approaches to niching and other standard diversity preservation approaches in evolutionary computation that rely on clustering. (And the authors should make an effort to connect to the niching literature in particular.) However, the use of the algorithm for RL makes it possible to use behavioral features as the space in which to cluster, separating it from standard diversity preservation methods. The resulting algorithm is relatively simple and the empirical results are good. + +Some of the main concerns for reviewers included the bibliography, which the authors promptly acted on by citing several suggested papers and comparing their approach where relevant. There was also discussion about the exact novelty of the paper, for example as compared to the CVT-MAP-Elites algorithm, but this was clarified by the authors. Reviewers agree that the paper is easily to follow and well-written. + +Based on this, it seems that the paper makes a clear contribution to QD methods for RL, and is worth accepting.",ICLR2022, +rJM4hM8Ox,1486400000000.0,1486400000000.0,1,Hyanrrqlg,Hyanrrqlg,ICLR committee final decision,Reject,"This paper presents some interesting and potentially useful ideas, but multiple reviewers point out that the main appeal of the paper's contributions would be in potential follow-up work and that the paper as-is does not present a compelling use case for the novel ideas. For that reason, the recommendation is to reject the paper. I would encourage the authors to reframe and improve the paper and extend it to cover the more interesting possible empirical investigations brought up in the discussion. Unfortunately, the paper is not sufficiently ground breaking to be a good fit for the workshop track.",ICLR2017, +H1U32G8dg,1486400000000.0,1486400000000.0,1,BydARw9ex,BydARw9ex,ICLR committee final decision,Accept (Poster),"The reviewers all agreed that this paper should appear at the conference. The experiments seem to confirm interesting intuition about the capacity of recurrent nets and how difficult they are to train, and the reviewers appreciated the experimental rigor. This is certainly of interest and useful to the ICLR community and will lead to fruitful discussion. The reviewers did request more fine details related to the experiments for reproducibility (thank you for adding more detail to the appendix). The authors are recommended to steer clear of making any strong but unsubstantiated references to neuroscience.",ICLR2017, +GOaCZmAxuZ,1576800000000.0,1576800000000.0,1,r1xQNlBYPS,r1xQNlBYPS,Paper Decision,Reject,"This paper presents a multi-view generative model which is applied to multilingual text generation. Although all reviewers find the overall approach is important and some results are interesting, the main concern is about the novelty. At the technical level, the proposed method is the extension of the original two-view KERMIT to multiviews, which I have to say incremental. At a higher level, multi-lingual language generation itself is not a very novel idea, and the contribution of the proposed method should be better positioned comparing to related studies. (for example, Dong et al, ACL 2015 as suggested by R#3). Also, some reviewers pointed out the problems in presentation and unconvincing experimental setup. I support the reviewers’ opinions and would like to recommend rejection this time. +I recommend authors to take in the reviewers’ comments and polish the work for the next chance. ",ICLR2020, +dd6R98Oh1,1576800000000.0,1576800000000.0,1,BJl9PRVKDS,BJl9PRVKDS,Paper Decision,Reject,"This article sets out to study the advantages of depth and overparametrization in neural networks from the perspective of function space, with results on univariate shallow fully connected ReLU networks and some experiments on deep networks. +The article presents results on the concentration /dispersion of the slope / break point distribution of the functions represented by shallow univariate ReLU networks for parameters from various distributions. The reviewers found that the article contains interesting analysis, but that the presentation could be improved. The revision clarified some aspects and included some experiments illustrating breakpoint distributions in relation to the curvature of some target functions. However, the reviewers did not find this convincing enough, pointing out that the analysis focuses on a very restrictive setting and that that presentation of the article still could be improved. The discussion of implicit regularisation in section 2.4 seems promising, but it would benefit from a clearer motivation, background, and discussion. ",ICLR2020, +Vgm-PPd3W,1576800000000.0,1576800000000.0,1,rkeZIJBYvr,rkeZIJBYvr,Paper Decision,Accept (Talk),"The reviewers generally agreed that the paper presents a compelling method that addresses an important problem. This paper should clearly be accepted, and I would suggest for it to be considered for an oral presentation. + +I would encourage the authors to take into account the reviewers' suggestions (many of which were already addressed in the rebuttal period) and my own suggestion. + +The main suggestion I would have in regard to improving the paper is to position it a bit more carefully in regard to prior work on Bayesian meta-learning. This is an active research field, with quite a number of papers. There are two that are especially close to the VI method that the authors are proposing: Gordon et al. and Finn et al. (2018). For example, the graphical model in Figure 2 looks nearly identical to the ones presented in these two prior papers, as does the variational inference procedure. There is nothing wrong with that, but it would be appropriate for the authors to discuss this prior work a bit more diligently -- currently the relationship to these prior works is not at all apparent from their discussion in the related work section. A more appropriate way to present this would be to begin Section 3.2 by stating that this framework follows prior work -- there is nothing wrong with building on prior work, and the significant and important contribution of this paper is no way diminished by being up-front about which parts are inspired by previous papers.",ICLR2020, +CI-h-uHRASe,1642700000000.0,1642700000000.0,1,YJVMboHZCtW,YJVMboHZCtW,Paper Decision,Reject,"This paper proposes the notation of DB variability, which is essentially prediction variance. It is also closely related to algorithmic stability which is a theoretically more sound notation to derive generalization bounds. The paper is a mixed bag of empirical observations and ""theory"". However, looking at the ""theoretical results"" in the paper, it is clear that the authors lack adequate theoretical background. + +I'd be more positive if the paper has been focused more on the former, and could be judged by the empirical part only. While the reviews were positive, I looked at them and realized that similar to the authors, the reviewers also lack theoretical backgrounds. + +First, the fact large variance implies a generalization lower bound is trivial, to the degree it is not worth stating as a ""result"". Second small variance implies a generalization upper bound isn't true. One can have a predictor that perfectly overfits the training data and predicts class 0 everywhere else. This has small prediction variance but poor generalization. In this context, the upper bound analysis of the paper is clearly misleading. Usually one compares training error to generalization error, where the estimator depends on the training set. In such case, one cannot use the simple argument of the convergence of the empirical mean of sum of independent random variables to the mean due to the dependency of estimator on the training set, e.g. in Thm 3. One needs to use uniform convergence and exponential probability (instead of Chebyshev) inequality to obtain such results. The right hand side of Thm 3 (the theorem itself is also very poorly stated. and shouldn't be allowed to be published) could not be interpreted as training error as should usually be the case for such bounds, but only as validation error. Such a result (comparing validation and test error when distribution isn't changed) has no value. + +I would not elaborate on other similar issues. My recommendation is to focus on the empirical study if the authors are not familiar with theoretical analysis.",ICLR2022, +MX1NRIMyEa,1576800000000.0,1576800000000.0,1,HkgU3xBtDS,HkgU3xBtDS,Paper Decision,Reject,"This paper is a clear reject. The paper is very poorly written and contains zero citations. Also, the reviewers have a hard time understanding what the paper is about.",ICLR2020, +rYENFnv2HQH,1610040000000.0,1610470000000.0,1,Q2iaAc-4I1v,Q2iaAc-4I1v,Final Decision,Reject,"This paper discusses how one can equip reinforcement learning agents with an intrinsic reward function that helps identifying factors of variation within a family of MDPs, effectively allowing agents to do experiments in the environment. This is interpreted as causal factors that control important aspects of the environment dynamics. + +Although this is a very relevant topic and there was extensive discussion during the discussion phase, with reviewers acknowledging that the final version of the submitted manuscript substantially improved over the original submission, most reviewers still recommend the rejection of the paper. This is mainly due to the assessment that there are still several unclear technical aspects related to the paper. Shortly, the reviewers felt that the paper had important clarity issues, that the claims being made were imprecise, and that there was a dearth of details about the empirical results, making them not fully convincing. + +I strongly recommend the authors to take the reviewers suggestions into consideration to have a much stronger submission to future venues. +",ICLR2021, +pzetIbc2_mg,1642700000000.0,1642700000000.0,1,XLxhEjKNbXj,XLxhEjKNbXj,Paper Decision,Accept (Poster),"This paper proposes a labeling trick for subgraph representation learning with GNNs. The proposed method, GLASS, improves on subgraph-level tasks. The topic of subgraph representation learning is relatively new, and this paper makes progress in that community which would be appreciated by other researchers interested in the same problem. + +The paper in the original submission state raised some concerns from the reviewers about unclear writing of the motivation and potential applications, technical novelty, and comparisons with existing approaches (even one that are not specifically designed for subgraph representation learning). It is good that the authors conducted additional experiments to show the effect of SSL (that the approach makes improvements without SSL). This and other clarifications from the authors convinced the reviewers to recommend acceptance.",ICLR2022, +PcKIIay7Cm,1576800000000.0,1576800000000.0,1,rJg7BA4YDr,rJg7BA4YDr,Paper Decision,Reject,"This paper investigates the problem of building a program execution engine with neural networks. While the reviewers find this paper to contain interesting ideas, the technical contributions, scope of experiments, and the presentation of results would need to be significantly improved in order for this work to reach the quality bar of ICLR.",ICLR2020, +H1xywymlxV,1544720000000.0,1545350000000.0,1,rJl8BhRqF7,rJl8BhRqF7,Meta-Review,Reject," The paper presents a new annotation of the CIFAR-10 dataset (the test set) as a distribution over labels as opposed to one-hot annotations. This datasets forms a testbed analysis for assessing the generalization abilities of the state-of-the-art models and their robustness to adversarial attacks. + +All the reviewers and AC acknowledge the contribution of dataset annotation and that the idea of using label distribution for training the models is sound and should improve the generalization performance of the models. +However the reviewers and AC note the following potential weaknesses: (1) the paper requires major improvement in presentation clarity and in-depth investigation and evidence of the benefits of the proposed framework – see detailed comments of R3 on what to address in a subsequent revision; see the suggestions of R2 for improving the scope of the empirical evaluations (e.g. distortions of the images, incorporating time limits for doing the classifications) and the requests of R1 for clarifications; (2) the related work is inadequate and should be substantially extended – see the related references suggested by the R2; also R1 rightly pointed out that two out of four future extensions of this framework have been addressed already, which questions the significance of findings in this submission. +The R2 raised concerns that the current evaluation is missing comparisons to a) the calibration approaches and b) cheaper/easier ways of getting soft labels -- see R2’s suggestion to use the Brier score for model calibration and to use a cost matrix about how critical a misclassification is (cat <-> dog, versus cat <-> car) as soft labels. +Among these, (2) did not have a substantial impact on the decision, but would be helpful to address in a subsequent revision. However, (1) and (3) makes it very difficult to assess the benefits of the proposed approach, and was viewed by the AC as a critical issue. + +There is no author response for this paper. The reviewer with a positive view on the manuscript (R3) was reluctant to champion the paper as the authors have not addressed the concerns of the other reviewers (no rebuttal). + ",ICLR2019,5: The area chair is absolutely certain +yoPUMeKjW8W,1642700000000.0,1642700000000.0,1,C5Q04gnc4f,C5Q04gnc4f,Paper Decision,Reject,"This paper finally received divergent and borderline reviews with two positive (6) and two negative (3) rates. After the thorough reviews by ACs ourselves, we would like to decide to reject this work at this time, even though this submission has a lot of potentials including intensive analyses on instance segmentation frameworks and architectures. + +We first would like to appreciate comprehensive author’s responses and additional empirical results. They should be extremely helpful to make this submission stronger. Here are some of our suggested points for improvement: (i) The novelty, significance, and practical implications of this work (compared to previous analysis work) may need to be better presented in a more persuasive way. (ii) Nuance of stylization transformation can be better explained compared to other types of perturbations or transformations. (iii) Empirical fairness can be better justified. (iv) Since the paper is written in a highly condensed way, some of reduction may improve the readability. (v) Finally, given that this paper focuses on empirical study about instance segmentation, it may be more appreciated in a computer vision venue.",ICLR2022, +mtDzJuYn60v2,1642700000000.0,1642700000000.0,1,c9IvZqZ8SNI,c9IvZqZ8SNI,Paper Decision,Reject,"This paper develops an approach to learning hierarchical representations from sequential data. The reviewers were very positive about the overall approach, finding it well motivated and interesting with strong potential, and thought that the paper was extremely well written with clear examples throughout. There was a good back-and-forth between the reviewers and the authors, discussing several aspects of the paper and providing constructive suggestions for improvement. In particular, the reviewers suggested improvements in terms of independence testing, comparison to further baselines, further experiments, and other improvements as detailed in the reviews. The authors were extremely receptive of these suggestions, which is to be commended and is very much appreciated, and in a response state that they are planning to take the time needed to revise this paper before publication.",ICLR2022, +b4wwCjX09lV,1642700000000.0,1642700000000.0,1,MmujBClawFo,MmujBClawFo,Paper Decision,Reject,"This paper points out connections between the self-attention module in transformers and some prior art, including kernel regression, the non-local mean algorithm, locally linear embeddings, and the self-expression algorithm for subspace clustering. Based on these observations, the authors argue that the innovation of self-attention is not modeling the long-range relation, which is also proposed in prior work, but the learnable parameters and the multi-head design. The authors also suggest several directions for future work, such as using self-attention for manifold clustering. + +Reviewers pointed out several weaknesses with this paper: that some connections (e.g. connection to kernel regression) had been pointed out before, that the relation between self-attention and locally linear embedding and self-expression in subspace clustering is a bit nuanced, as pointed out by one of the reviewers, and that while some speculative future directions might be interesting, the paper falls short in actually trying some of them out empirically, or building a proof-of-concept. + +In the discussion period, the authors pointed out that this is a position paper (which unfortunately was not expressed so assertively in their submission), which according to their view liberates them from digging deeper and test empirically some of these connections and speculative directions. According to the authors, a core contribution of their position paper is that ""it expresses the opinion that the original attention paper failed to cite and acknowledge that attention mechanisms build upon a series of prior works in sparse coding, subspace clustering, and locally linear embedding."" + +There are no specific guidelines to review position papers at ICLR that I know of, but I will base my assessment on the assumption that a good position paper should: +- provide a good historical perspective of a subject +- connect previously unrelated lines of work in non-obvious ways +- inspire the research community to look at new directions. + +While a good position paper can be extremely valuable and enlightening, I am not convinced that this particular paper achieves either of the goals above, and therefore it is my opinion that it does not deserve publication at ICLR. + +As pointed out both by the authors and the reviewers, the connection between self-attention and kernel regression and non-local mean denoising is not new, and so it is not an original contribution of this paper. The relation between self-attention and locally linear embedding and self-expression in subspace clustering appears to be new, but this relation is a bit nuanced, as pointed out by one of the reviewers. + +The tone of this position paper is that some of these connections were missing in the original attention paper -- the authors say ""attention did not properly acknowledge prior art"" in one of their responses (it is not clear if they are referring to Bahdanau et al.'s attention paper or to Vaswani et al.'s transformer paper). However, the historical perspective of how attention mechanisms came to be seems to be missing from this position paper -- attention has been proposed by Bahdanau et al. for machine translation, inspired by the idea of word alignment that has been prevalent in machine translation for decades. Later, in the transformer paper, self-attention was suggested as an alternative to recurrent and convolutional models for machine translation (note that self-attention has been used before the transformer paper, see e.g. [1]). While a theoretical connection with kernel regression etc. exists, this was not related to the original motivation of these works. There are many ways of arriving at the same construction! And given the simplicity of attention mechanisms it doesn't surprise me that connections with other lines of research exist. Had they been noticed, they would probably be a parenthesis in the original papers, because attention is derived there in a much more direct way (this doesn't mean that the connections aren't interesting, but that they are not _essencial_ to the construction). + +In their response, the authors dismissed a constructive suggestion from one of the reviewers which in my opinion would have strengthen this paper -- the connection with graph neural networks. If the point of the paper is to point out past research that connects fundamentally to the idea of attention mechanisms, why leaving this out? + +In sum, in my view this paper lacks the rigor, the insight, and the historical perspective that should characterize a strong position paper, and as such I cannot recommend acceptance. I strongly suggest that the authors take into account some of the insightful suggestions given by the reviewers in future iterations of their work. + +[1] Ankur Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. A decomposable attention model. In Empirical Methods in Natural Language Processing, 2016.",ICLR2022, +Ju4-fvGKjOD,1610040000000.0,1610470000000.0,1,MyHwDabUHZm,MyHwDabUHZm,Final Decision,Accept (Poster),"This paper proposes to use high dimensional representation for labels to strengthen the adversarial robustness of deep neural networks. Experimental results demonstrate that the proposed method improve adversarial robustness. All reviewer agree that the authors propose an interesting idea and this direction deserves further exploration. On the other hand, the reviewers also raise a serious question: There is a lack of explanation of why high dimensional representation of labels improve adversarial robustness. Therefore, it is not clear if the proposed method can defend refined attacks tailored to such dimensional label representation. The authors are highly encouraged to conduct deeper analysis, especially on the robustness against finer attacks.",ICLR2021, +yji9OjZbM7V,1642700000000.0,1642700000000.0,1,Az7opqbQE-3,Az7opqbQE-3,Paper Decision,Accept (Poster),"The submission proposes a model to handle uncertainties in an irregularly sampling time series setting (HetVAE), built on the VAE framework and the previous work on mTAN (multi-time attention networks), and introduces components to encode sparsity information and heteroscedastic output uncertainty. The paper is clear, well-motivated, and contains extensive ablation studies showing the effect of eaach added components. +I recommend this submission for acceptance.",ICLR2022, +2D5uje6q8qY,1610040000000.0,1610470000000.0,1,4q8qGBf4Zxb,4q8qGBf4Zxb,Final Decision,Reject,"This work studies an intriguing problem of searching optimal architectures for unsupervised domain adaptation. It is based on a two-stage approach: (1) transferable architecture search via DARTS + MK-MMD; (2) transferable feature learning via Backbone + MCD. + +The reviews for this paper are very insightful, constructive and of high quality. While all reviewers acknowledge the contribution of a new research problem, they unanimously have suggestions for further improving the paper: +- The novelty and soundness in the technical method are not fantastic. Authors should consider more elaborated loss designs for the approach, and unify the two stages with the same optimization objective. +- The empirical evaluation is by far not extensive and insightful. More stronger NAS approaches and larger-scale datasets should be included to give more evidence to support the claims that NAS-DA is better. +- A featured analysis of how non-iid architecture (as searched out by this work) differs from iid architecture would make the paper much more interesting. + +Authors did not participate in the rebuttal and discussion phase. + +AC scanned through the paper and believes that this paper studies a promising research direction, but the work cannot be accepted before addressing the reviewers' comments. The weaknesses are quite obvious and will have a high probability of being asked by the reviewers of the next conference. So the authors need to make sure that they substantially revise their work before submitting to yet another top venue.",ICLR2021, +z50w1pibRaz,1610040000000.0,1610470000000.0,1,sgnp-qFYtN,sgnp-qFYtN,Final Decision,Reject,"I have serious concerns about how experiments are reported in this paper. Most methods tried to compare at an iteration complexity of roughly 100 epochs because it is known more computation improves performance very significantly but the computational resources are limited for many researchers, especially in academia. While this convention may not be the ideal way to compare different methods, for fairness, this practice has been followed in most of previous papers. + +Unfortunately this paper disregarded this practice, and on Imagenet the reported results from previous works were mixed at 100 epochs (e.g. STR) and at 500 epochs (rigL — which was explicitly marked to be 5x in the original paper) without any clarification, and the only other method in the table showing comparable performance to the proposed method, LRR, also requires many more than 100 epochs. Moreover, the authors did not explicitly disclose the equivalent epochs of their algorithms in the Imagenet experiments, and this is not acceptable. Based on the information inferred from the current writing, it is extremely likely that significant unfair advantages were given to the proposed algorithms. + +Since the authors did not report experiments appropriately, this paper cannot be accepted in its current form regardless of other potential merits of the proposed methods. I hope the authors view this outcome positively, and proactively fix the problem. If in revised versions, the experiments are reported according to the common practice, I am sure the work would become publishable. +",ICLR2021, +nlvmDI3F39S,1642700000000.0,1642700000000.0,1,Ug-bgjgSlKV,Ug-bgjgSlKV,Paper Decision,Accept (Poster),"The reviewers all agree that this paper proposes a very interesting approach of finding useful information encoded inside a generative model. They show how foreground/background semantics learnt in a generative model are useful for tasks like segmentation. +This is a general approach that can be applied to other models in the future. +It is an accept.",ICLR2022, +-9DJ8Apjn,1576800000000.0,1576800000000.0,1,BJg866NFvB,BJg866NFvB,Paper Decision,Accept (Spotlight),Reviewers uniformly suggest acceptance. Please look carefully at reviewer comments and address in the camera-ready. Great work!,ICLR2020, +3OfIe9u43QL,1610040000000.0,1610470000000.0,1,KjeUNkU2d26,KjeUNkU2d26,Final Decision,Reject,"The paper proposes an approach to defining/tackling the question of separating ""style"" and ""content"" of images, and introduces a novel way to learn representation that disentangle these aspects of images. I think it offers some new ideas. The reviewers were split on the evaluation. Among the chief concerns with the initial submission were a problematic formulation of the objective, missing comparisons and analysis, and questions about novelty of the architecture (in particular w.r.t. AdaIn). I think the rebuttal/revision have addressed these fairly well. I do agree with R2 that some flaws remain, in particular the analysis could be more thorough/complete, and the paper could then be stronger. ",ICLR2021, +dW8QtvmB8xy,1610040000000.0,1610470000000.0,1,cU0a02VF8ZG,cU0a02VF8ZG,Final Decision,Reject,"Overall, all reviewers generally agree that the idea of using visual similarity to unsupervised alignment of multiple languages is interesting and the proposed method and dataset are well-designed, while three of them raised some concerns related to the retrieval nature of the method. In particular, discussions about its place as a study of machine translation and comparison with other cross-lingual retrieval baselines were the main issues. Although authors made great effort to address reviewers' concerns points and did clarify some of them, unfortunately the reviewers were not fully convinced by the response, and one reviewer decided to downgrade the initial score. After all, three reviewers rate the paper as 'below the acceptance threshold'. Based on their opinions, I decided to recommend rejection. + +I think the entire picture of the work and the logic flow could be much clearer by discussing in a top down manner why this idea should be implemented with a retrieval-based approach, rather than superficially adding ""using retrieval"" to some sentences. ",ICLR2021, +OwGtF3nhon,1576800000000.0,1576800000000.0,1,rkxawlHKDr,rkxawlHKDr,Paper Decision,Accept (Poster),"The submission presents a differentiable take on classic active contour methods, which used to be popular in computer vision. The method is sensible and the results are strong. After the revision, all reviewers recommend accepting the paper.",ICLR2020, +KpdnWRMhEnN,1610040000000.0,1610470000000.0,1,5PiSFHhRe2C,5PiSFHhRe2C,Final Decision,Reject,"The paper proposes a constituent-based transformer for aspect-based sentiment analysis. The approach allows conducting aspect-based sentiment analysis to leverage the syntactic information without pre-specified dependency parse trees. + +Overall, the idea is interesting. However, all the reviewers shared the following concerns: + +- Paper descriptions of methodology and experiments are not clear and require significant rewriting and reorganization. +- The proposed approach is not well-justified by the empirical study presented in the paper. Especially, a more detailed ablation study is required to justify the design. + +We would suggest the authors addressing the feedback from the reviewers to improve the paper. ",ICLR2021, +ZVHnIM-8TV,1610040000000.0,1610470000000.0,1,vcopnwZ7bC,vcopnwZ7bC,Final Decision,Accept (Poster),"This paper presents the Order-Memory Policy Network (OMPN), an architecture for modelling a hierarchy of sub-tasks and discovering task decompositions from demonstration data. Results are presented on a compositional grid-world task (Craft) and on a simulated robotics task (Dial). + +The reviewers agree that the proposed method is novel and interesting, that the paper addresses an important problem, and that it is well-written. One main criticism by the reviewers, the lack of experimental evaluation of different hyperparameter choices, such as the depth of the memory stack and the expected number of subtasks, has to a large part already been addressed in the revision by the authors. The total number of hyperparameters that need to be tuned, however, is quite large and the authors are encouraged to revise their claim ""Our central message is that OMPN is a general off-the-shelf model for task decompositions"" in this light. The paper is borderline, and could clearly benefit from a revised, stronger presentation and more extensive experimental evaluation, but I am confident that the authors can use the time until the camera-ready version is due to address some of the remaining feedback by the reviewers, and hence I think that this paper can be accepted. + +The authors are further encouraged to take the following additional reviewer feedback into account, which was brought up during the internal discussion period: +1) The complexity of the proposed method could be better justified by more thoroughly investigating the effectiveness of using a multi-level hierarchy (e.g., by running experiments on more complicated and hierarchical tasks with multiple branches). +2) Further strengthening down-stream performance evaluation, such as in imitation learning (in addition to the already presented behavioral cloning results) and/or reinforcement learning, would further strengthen the paper and demonstrate that the discovered decomposition is indeed useful. +",ICLR2021, +zrftqffDE4,1576800000000.0,1576800000000.0,1,rklPITVKvS,rklPITVKvS,Paper Decision,Reject,"This paper proposes incorporating adversarial training on real images to improve the stability of GAN training. The key idea relies on the observation that GAN training already implicitly does a form of adversarial training on the generated images and so this work proposes adding adversarial training on real images as well. In practice, adversarial training on real images is performed using FGSM and experiments are conducted on CelebA, CiFAR10, and LSUN reporting using standard generative metrics like FID. + +Initially all reviewers were in agreement that this work should not be accepted. However, in response to the discussion with the authors Reviewer 2 updated their score from weak reject to weak accept. The other reviewers recommendation remained unchanged. The core concerns of reviewers 3 and 1 is limited technical contribution and unconvincing experimental evidence. In particular, concerns were raised about the overlap with [1] from CVPR 2019. The authors argue that their work is different due to the focus on the unsupervised setting, however, this application distinction is minor and doesn’t result in any major algorithmic changes. With respect to experiments, the authors do provide performance across multiple datasets and architectures which is encouraging, however, to distinguish this work it would have been helpful to provide further study and analysis into the aspects unique to this work -- such as the settings and type of adversarial attack (as mentioned by R3) and stability across GAN variants. + +After considering all reviewer and author comments, the AC does not recommend this work for publication in its current form and recommends the authors consider both additional experiments and text description to clarify and solidify their contributions over prior work. + +[1] Liu, X., & Hsieh, C. J. (2019). Rob-gan: Generator, discriminator, and adversarial attacker. CVPR 2019. +",ICLR2020, +Qwz84Xtgq0y,1642700000000.0,1642700000000.0,1,qPQRIj_Y_EW,qPQRIj_Y_EW,Paper Decision,Reject,"A GNN model is developed for the supervised, real-time learning of optimal solutions for an order-fulfillment problem. GNNs with fast forward computations are naturally one good choice given the real-time nature of the problem. + +While the complexity of the problem and formulation were generally appreciated by the referees, there were major concerns about the experimental setup, datasets, technical claims, sample complexity, and suitability for ICLR. Overall, the paper does not seem ready for publication in ICLR, and the authors are encouraged to consider and work on the reviews carefully.",ICLR2022, +Cf-m5xbKkLH,1610040000000.0,1610470000000.0,1,okT7QRhSYBw,okT7QRhSYBw,Final Decision,Reject,"The paper tries to argue the value of making ensembles more +reproducible through the use of a correlation loss to try to make +components as different as possible. The paper is tough to follow and +the high level motivation is unclear. As one of the reviewers points +out, don't ensembles provide an estimate of uncertainty and +calibration? Further, the experiments were quite limited. Studying +the proposed approach in a small, controlled setting might also be +revealing.",ICLR2021, +CWlv4Bg2i,1576800000000.0,1576800000000.0,1,HJgfDREKDB,HJgfDREKDB,Paper Decision,Accept (Poster),"The submission presents an approach to single-view 3D reconstruction. The approach is quite creative and involves predicting the weights of a network that is then applied to a point set. The presentation is good. The experimental protocol is well-informed and the results are convincing. The reviewers' concerns have largely been addressed by the authors' responses and the revision. In particular, R2, who gave a ""3"", posted ""I would now advise to raise my score (3 previously) to a be in line with the 6: Weak Accept given by the other reviewers."" This means that all three reviewers recommend accepting the paper. The AC agrees.",ICLR2020, +GdrBW7YY1g,1576800000000.0,1576800000000.0,1,rJeB36NKvB,rJeB36NKvB,Paper Decision,Accept (Spotlight),"This paper analyzes the weights associated with filters in CNNs and finds that they encode positional information (i.e. near the edges of the image). A detailed discussion and analysis is performed, which shows where this positional information comes from. + +The reviewers were happy with your paper and found it to be quite interesting. The reviewers felt your paper addressed an important (and surprising!) issue not previously recognized in CNNs.",ICLR2020, +sUqMNEuHlP,1576800000000.0,1576800000000.0,1,B1eWOJHKvB,B1eWOJHKvB,Paper Decision,Accept (Poster),"This paper theoretically studied one of the fundamental issue in CycleGAN (recently gained much attention for image-to-image translation). The authors analyze the space of exact and approximated solutions under automorphisms. + +Reviewers mostly agree with theoretical value of the paper. Some concerns on practical values are also raised, e.g., limited or no-surprising experimental results. In overall, I think this is a boarderline paper. But, I am a bit toward acceptance as the theoretical contribution is solid, and potentially beneficial to many future works on unpaired image-to-image translation. + +",ICLR2020, +TqHOhtRYn,1576800000000.0,1576800000000.0,1,rylVHR4FPB,rylVHR4FPB,Paper Decision,Accept (Poster),"This paper proposes Bayesian quantized networks and efficient algorithms for learning and prediction of these networks. The reviewers generally thought that this was a novel and interesting paper. There were a few concerns about the clarity of parts of the paper and the experimental results. These concerns were addressed during the discussion phase, and the reviewers agree that the paper should be accepted.",ICLR2020, +aBtB52DsmK,1610040000000.0,1610470000000.0,1,e8W-hsu_q5,e8W-hsu_q5,Final Decision,Accept (Poster),"This paper provides a natural combination of conditional neural processes with LieConv models. It is a good step forward for stochastic processes with equivariances. While there is still room to improve the experiments, the authors provided a good response to reviewers, and the paper is a nice contribution.",ICLR2021, +l2GKf6cXhy,1576800000000.0,1576800000000.0,1,BklLVAEKvH,BklLVAEKvH,Paper Decision,Reject,"This paper proposes a deep clustering method based on normalized cuts. As the general idea of deep clustering has been investigated a fair bit, the reviewers suggest a more thorough empirical validation. Myself, I would also like further justification of many of the choices within the algorithm, the effect of changing the architecture.",ICLR2020, +3EXKC_mswz,1610040000000.0,1610470000000.0,1,W1uVrPNO8Bw,W1uVrPNO8Bw,Final Decision,Reject,"This paper uses concepts from physics to make predictions about stochastic gradient descent. The reviews point to two issues. Firstly, the paper was not very accessible to those without a relevant background, and this is reflected in the low confidence rating reviewers gave. More importantly, two of the reviewers consistently pointed out 'vague mathematics' and oversimplification in the mathematical arguments. + +The authors' feedback did not successfully address the reviewer's concerns, both R3 and R4 indicated there were outstanding concerns. + +I should note that despite giving low confidence scores and stating that some concepts from physics are beyond their field of expertise, reviewers gave high quality reviews with detailed comments and questions, and subsequently participated in the discussion revisiting their reviews. This suggests that the low confidence is not a symptom of insufficient reviewer effort, but perhaps a consequence of an inaccessible paper.",ICLR2021, +S1VGSJaSM,1517250000000.0,1517260000000.0,518,Hkp3uhxCW,Hkp3uhxCW,ICLR 2018 Conference Acceptance Decision,Reject,"Thank you for submitting you paper to ICLR. The revision improved the paper e.g. moving Appendix A3 to the main text has improved clarity, but, like reviewer 3, I still found section 4 hard to follow. As the authors suggest, shifting the terminology to ""posterior shifting” rather than “sharpening"" would help at a high level, but the design choices should be more carefully explained. The experiments are interesting and promising. The title, although altered, still seems a misnomer given that the experimental evaluation focusses on RNNs. + +Summary: There is the basis of a good paper here, but the rationale for the design choices should be more carefully explained.",ICLR2018, +Bu5HQgvfx,1576800000000.0,1576800000000.0,1,Skx24yHFDr,Skx24yHFDr,Paper Decision,Reject,"This paper presents a neural topic model with the goal of improving topic discovery with a PLSA loss. Reviewers point out major limitations including the following: + +1) Empirical comparison is done only with LDA when there are many newer models that perform much better. +2) Related work section is incomplete, especially for the newer models. +3) Writing is unclear in many parts of the paper. + +For these reasons, I recommend that the authors make major improvements to the paper before resubmitting to another venue.",ICLR2020, +7_VANJ06hg,1576800000000.0,1576800000000.0,1,r1e9GCNKvH,r1e9GCNKvH,Paper Decision,Accept (Poster),"Based on current unanimous reviews, the paper is accepted.",ICLR2020, +kgMAs8y4UkI,1642700000000.0,1642700000000.0,1,hcMvApxGSzZ,hcMvApxGSzZ,Paper Decision,Accept (Poster),"This submission proposes a method for steganography, i.e. hiding ""secret messages"" in images. Specifically, the proposed approach implements a procedure similar to adversarial example generation, where a perturbation is found that a) is imperceptible and b) can be decoded by a fixed decoder. This approach results in the ability to hide a significant amount of information (up to 3 bpp) with essentially no decoding errors. While the resulting perturbations are sometimes detectable by existing methods, the authors nevertheless provide a compelling case that their approach is a significant step forward and demonstrate an interesting new application to face hiding. Reviewers generally found the paper easy to follow and clear and felt the proposed approach provided a new and effective way to perform steganography. However, there were many requests for clarification of the work's situation in the larger literature of steganography and adversarial ML. The authors have clarified their contribution in the rebuttal, and as long as these changes have made it back into the paper, I am happy to recommend acceptance.",ICLR2022, +LSLYD2hYRJI,1610040000000.0,1610470000000.0,1,EQtwFlmq7mx,EQtwFlmq7mx,Final Decision,Reject,"All reviewers recommend rejection: concerns were raised in terms of technical correctness, quality of presentation and the quality of experiments. There was no rebuttal. The AC agrees with the reviewers and recommends rejection.",ICLR2021, +BylKnGIOx,1486400000000.0,1486400000000.0,1,Byiy-Pqlx,Byiy-Pqlx,ICLR committee final decision,Accept (Poster),"The paper presents a Lie-(group) access neural turing machine (LANTM) architecture, and demonstrates it's utility on several problems. + + Pros: + Reviewers agree that this is an interesting and clearly-presented idea. + Overall, the paper is clearly written and presents original ideas. + It is likely to inspire further work into more effective generalizations of NTMs. + + Cons: + The true impact and capabilities of these architectures are not yet clear, although it is argued that the same can be said for NTMs. + + The paper has been revised to address some NTM features (sharpening) that were not included in the original version. + The purpose and precise definition of the invNorm have also been fixed.",ICLR2017, +vBx2guS6-hY,1642700000000.0,1642700000000.0,1,iEvAf8i6JjO,iEvAf8i6JjO,Paper Decision,Accept (Spotlight),"The submission addresses the problem of whether or not to update weights for a previous task in continual learning. The approach is to specify a trust region based on task similarity and update weights only in the direction of the tasks that are similar enough to the current one. The paper was on the balance well received (3/4 reviewers recommended acceptance, 2 with scores of 8) and complemented for its simple but effective approach, and good discussion of related literature. The submission attracted a reasonable amount of engagement and discussion between reviewers and authors, which should be taken into account in the final version of the paper.",ICLR2022, +C6UuMIKRNz2m,1642700000000.0,1642700000000.0,1,8uqOMUHgW4M,8uqOMUHgW4M,Paper Decision,Reject,"In this paper, the author present a method for learning a shared latent space between the fMRI activity of multiple individuals processing the same stimulus. The method consists of an auto-encoder with a single encoder and subject-specific decoders which is specifically regularized to decouple common and shared representations. This paper generated a lot of discussion between the reviewers and the authors, as well as between the reviewers. In light of these discussion, I cannot recommend acceptance at this point, as the paper is not ready. The main concerns were (1) about how the results and improvement are evaluated statistically, (2) that the baselines chosen were not strong enough and did not include existing approaches (neural or non-neural) and relatedly (3) that the paper was not framed correctly within the existing literature on finding shared spaces between participants, which would help with determining and understanding the novelty of the proposed approach. Some other smaller points were made by the reviewer can also strengthen the paper for a future submission in a neuroscience or machine learning venue.",ICLR2022, +rJ8CHJpBM,1517250000000.0,1517260000000.0,682,rJL6pz-CZ,rJL6pz-CZ,ICLR 2018 Conference Acceptance Decision,Reject,"Learning identity-preserving transformations from unlabeled data is definitely an important and useful direction. However the paper does not have convincing experiments to establish the effectiveness of the proposed method on real datasets which is a crucial limitation in my view, given that the paper is largely based on an earlier published work by Culpepper and Olshausen (2009). ",ICLR2018, +S1ef02PVxN,1545010000000.0,1546870000000.0,1,HJeRkh05Km,HJeRkh05Km,meta-review,Accept (Poster),"The authors propose an approach for visual navigation that leverages a semantic knowledge graph to ground and inform the policy of an RL agent. The agent uses a graphnet to learn relationships and support the navigation. The empirical protocol is sound and uses best practices, and the authors have added additional experiments during the revision period, in response to the reviewers' requests. However, there were some significant problems with the submission - there were no comparisons to other semantic navigation methods, the approach is somewhat convoluted and will not survive the test of time, and the authors did not conclusively show the value of their approach. The reviewers uniformly support the publication of this paper, but with a low confidence. ",ICLR2019,4: The area chair is confident but not absolutely certain +4pCMh8kLUMp,1610040000000.0,1610470000000.0,1,NzTU59SYbNq,NzTU59SYbNq,Final Decision,Accept (Oral),"This paper introduces a novel game-theoretic view on PCA which yields an algorithm (EigenGame; Algorithm 2) that allows evaluation of singular vectors in a decentralized manner. The proposed algorithm is significant in its scalability, as demonstrated in the experiment on a large-scale dataset (ResNet-200 activations). This paper is generally clearly written, and in particular Section 2 provides an easy-to-follow reasoning leading to the proposed game-theoretic reformulation of PCA. I felt that the later sections are a bit condensed, including the figures. In the authors response major concerns raised by the reviewers have been appropriately addressed. I would thus recommend acceptance of this paper. + +What I found particularly interesting in their game-theoretic reformulation is that in the utility functions shown in (6) the orthogonality constraints $\hat{u}_j^\top\hat{u}_i=0$ have been removed and replaced with the soft constraints represented as the regularizer terms encouraging the orthogonality. Although several alternative forms for the regularizers would be possible, it is this particular form that allows an efficient gradient-ascent algorithm which does not require explicit orthonormalization or matrix inversion is straightforwardly parallelizable. + +Pros: +- Provides a novel game-theoretic reformulation of PCA. +- Proposes a sequential algorithm and a decentralized algorithm for PCA on the basis of the game-theoretic reformulation. +- Provides theoretical guarantee for the global convergence of the sequential algorithm. +- Demonstrates that the proposed decentralized algorithm is scalable to large-scale problems. + +Cons: +- The latter statement of Theorem 4.1 requires conditions on the initialization, which are hard to satisfy in high-dimensional settings. +- Significance of the proposed game-theoretic formulation in the context of game theory does not seem to be well explored. +",ICLR2021, +YWcJxsJx6i,1576800000000.0,1576800000000.0,1,SJx_QJHYDB,SJx_QJHYDB,Paper Decision,Reject,"The paper studies finding winning tickets with limited supervision. The authors consider a variety of different settings. An interesting contribution is to show that findings on small datasets may be misleading. That said, all three reviewers agree that novelty is limited, and some found inconsistencies and passages that were hard to read: Based on this, it seems the paper doesn't quite meet the ICLR bar in its current form. ",ICLR2020, +ohM6wmrrbg3,1610040000000.0,1610470000000.0,1,OthEq8I5v1,OthEq8I5v1,Final Decision,Accept (Spotlight),"The paper introduces MUSIC, a method for unsupervised learning of control policies, which partitions state variables into exogenous and endogenous collections and maximizes mutual information between them. Reviewers were uniformly positive, agreeing that the approach was interesting and well-motivated, and the experiments convincing. Some concerns were raised as to clarity, which were addressed through several revisions of the manuscript. I am happy to recommend acceptance.",ICLR2021, +65qJOnY3zm,1576800000000.0,1576800000000.0,1,SyeRIgBYDB,SyeRIgBYDB,Paper Decision,Reject,"The reviewers equivocally reject the paper, which is mostly experimental and the results of which are limited. The authors do not react to the reviewers' comments.",ICLR2020, +B1eIzPs-eV,1544820000000.0,1545350000000.0,1,rklz9iAcKQ,rklz9iAcKQ,borderline paper due to concerns remain about the thoroughness of the experiments,Accept (Poster),"Because of strong support from two of the reviewers I am recommending accepting this paper. However, I believe reviewer 1's concerns should be taken seriously. Although I disagree with the reviewer that a general ""framework"" method is a bad thing, I agree with them that additional experiments would be valuable.",ICLR2019,3: The area chair is somewhat confident +TOgZq_8Nn7,1642700000000.0,1642700000000.0,1,kHNKTO2sYH,kHNKTO2sYH,Paper Decision,Reject,"This is a borderline paper with 2 marginally above and a marginally below acceptance recommendations. While the authors provided valid responses to some of the criticism, I still find some of the motivation and assumptions not sufficiently clear, theoretical and practical issues are mixed, and the validation on only synthetic data raises practical questions.",ICLR2022, +6FEB9sWzHh,1610040000000.0,1610470000000.0,1,34KAZ9HbJco,34KAZ9HbJco,Final Decision,Reject,"As one of the reviewers' comment, the paper presents ""a mixed of tricks"" for the multilingual speech recognition, which includes 1) the use of a pretrained mBERT, 2) dual-adapter and 3) prior adjusting. +First, the relative gains of the pretrained mBERT is marginal (Section 3.3.1). Secondly, using 1) on top of 2) is unnecessary. +These confuses the reader about what the conclusion of the paper is. +It would be better if choosing one aspect of the problem and investigate it deeper. + +The decision is mainly because of the lack of novelty and clarity. ",ICLR2021, +xs9tP994nRj,1610040000000.0,1610470000000.0,1,lvXLfNeCQdK,lvXLfNeCQdK,Final Decision,Reject,"The authors argue that tighter relaxations for certified robustness suffer from a worse loss landscape and thus are outperformed by the much simpler and less tight IBP relaxation and come up with a new relaxation to overcome this problem. + +After the rebuttal there still remain doubts about the reasoning regarding the loss landscape (even though I acknowledge that the authors have invested significant amount of work to support their hypothesis). Moreover, the differences to existing certified training methods is small or the proposed method performs worse while being significantly more expensive (in particular if one takes into account the results which are reported on the IBP-Crown github page where the reported numbers are significantly lower than reported in the present paper) so that the benefit is unclear. + +Thus the majority of the reviewers still suggests rejection and I agree with that even though I think that the paper has its merits and I encourage the authors to continue this line of work. For a next version, the authors should evaluate all the methods ideally with an exact verification method resp. use the best relaxation for all methods. Otherwise the differences can come just from the weaker relaxation but not from a difference in real robustness. + +",ICLR2021, +JTPUglowtWS,1642700000000.0,1642700000000.0,1,7twQI5VnC8,7twQI5VnC8,Paper Decision,Accept (Spotlight),"This paper proposes a method that uses conditional moment restriction methods to estimate causal parameters in non-parametric instrumental variable settings. This is done by converting to an unconditional moment restriction setting common in the econometrics causal inference literature. + +The paper was reviewed quite favorably by reviewers, and the authors updated the manuscript to address specific issues raised by reviewers.",ICLR2022, +B1IDXkpSM,1517250000000.0,1517260000000.0,157,Bys4ob-Rb,Bys4ob-Rb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The paper presents a differentiable upper bound on the performance of classifier on an adversarially perturbed example (with small perturbation in the L-infinity sense). The paper presents novel ideas, is well-written, and appears technically sound. It will likely be of interest to the ICLR community. + +The only downside of the paper is its limited empirical evaluation: there is evidence suggesting that defenses against adversarial examples that work well on MNIST/CIFAR do not necessarily transfer well to much higher-dimensional datasets, for instance, ImageNet. The paper could, therefore, would benefit from empirical evaluations of the defenses on a dataset like ImageNet.",ICLR2018, +_ousLf9zsAR,1610040000000.0,1610470000000.0,1,V6WHleb2nV,V6WHleb2nV,Final Decision,Reject,"While the authors thought that the paper had some strong experimental comparisons, there were serious concerns with novelty and paper claims. For a stronger ML paper the authors would need to either: (a) design a new training methodology beyond pre-training that is better suited for leveraging multiple datasets for Retrosynthesis, (b) design a new model for Retrosynthesis that is better able to leverage mutliple datasets, (c) design new evaluation metrics to describe how well current methods perform in Retrosynthesis and/or metrics that describe how well methods can use data from different sources. That said, if the authors were interested to submit to non-ML venues then I agree with R2 that chemistry venues may be better suited to the paper in its current form. ",ICLR2021, +BJxb7yaBf,1517250000000.0,1517260000000.0,70,ry80wMW0W,ry80wMW0W,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Overall this paper seems to make an interesting contribution to the problem of subtask discovery, but unfortunately this only works in a tabular setting, which is quite limiting.",ICLR2018, +SJsLQkaSz,1517250000000.0,1517260000000.0,147,ryazCMbR-,ryazCMbR-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper studies trainable deep encoders/decoders in the context of coding theory, based on recurrent neural networks. It presents highly promising results showing that one may be able to use learnt encoders and decoders on channels where no predefined codes are known. + +Besides these encouraging aspects, there are important concerns that the authors are encouraged to address; in particular, reviewers noted that the main contribution of this paper is mostly on the learnt encoding/decoding scheme rather than in the replacement of Viterbi/BCJR. Also, complexity should be taken into account when comparing different decoding schemes. + +Overall, the AC leans towards acceptance, since this paper may trigger further research in this direction. ",ICLR2018, +kWmxSk0YTw,1576800000000.0,1576800000000.0,1,Byl3HxBFwH,Byl3HxBFwH,Paper Decision,Reject,"VAE-based sample selection for training NNs. A well-written experimental paper that is demonstrated through a number of experiments, all of which are minimal and from which generalization is not per se expected. The absence of an underlying theory, and the absence of rigorous experimentation makes me request to extend either or, better, both. ",ICLR2020, +vkITsny4Z8,1642700000000.0,1642700000000.0,1,3XD_rnM97s,3XD_rnM97s,Paper Decision,Reject,"Strengths: +* Theoretical foundation provided to knowledge integration problem +* Findings from the empirical studies are interesting +* Authors dedicated significant time and energy to coordinating with reviewers in the rebuttal period + +Weaknesses: +* It is not clear whether the GCS is a suitable approximation for measuring KI. For example, relation types are not supported in the GCS architecture making it unclear whether GCS adequately approximates knowledge integration. As reviewer 4qCM mentions, (X, born_in, Zurich) is very different knowledge from (X, died_in, Zurich). The current formulation only learns co-occurrence between entities rather than relational knowledge. +* Empirical study is limited to two knowledge integration methods (ERNIE & K-Adapter) and only evaluated on entity typing datasets, which are likely to be well-suited for their method which ignores relation information. +* The presentation and takeaways of the results could be clearer. Authors should explain in-depth why experiments that drop knowledge randomly are not suitable baselines. + +This paper is promising and the topic explored by the authors is interesting. I think it would benefit from integrating the comments from the reviewers and will make for a strong submission at a future venue.",ICLR2022, +r1l7Ia_XxN,1544940000000.0,1545350000000.0,1,B1xOYoA5tQ,B1xOYoA5tQ,Interesting idea but the claims need to be still justified better,Reject,"This paper proposes a method for improving robustness to black-box adversarial attacks by replacing the cross-entropy layer with an output vector encoding scheme. The paper is well-written, and the approach appears to be novel. However, Reviewer 4 raises very relevant concerns regarding the experimental evaluation of the method, including (a) lack of robustness without AT in the whitebox case (which is very relevant as we still lack good understanding of blackbox vs whitebox robustness) (b) comparison with Kannan et al and (c) lack of some common strong attacks. Reviewer 1 echoes many of these concerns.",ICLR2019,5: The area chair is absolutely certain +l2jFTePF3g,1576800000000.0,1576800000000.0,1,B1gzLaNYvr,B1gzLaNYvr,Paper Decision,Reject,"Main content: + +Blind review #2 summarizes it well: + +The aim of this work is to improve interpretability in time series prediction. To do so, they propose to use a relatively post-hoc procedure which learns a sparse representation informed by gradients of the prediction objective under a trained model. In particular, given a trained next-step classifier, they propose to train a sparse autoencoder with a combined objective of reconstruction and classification performance (while keeping the classifier fixed), so as to expose which features are useful for time series prediction. Sparsity, and sparse auto-encoders, have been widely used for the end of interpretability. In this sense, the crux of the approach is very well motivated by the literature. + +-- + +Discussion: + +All reviews had difficulties understanding the significance and novelty, which appears to have in large part arisen from the original submission not having sufficiently contextualized the motivation and strengths of the approach (especially for readers not already specialized in this exact subarea). + +-- + +Recommendation and justification: + +The reviews are uniformly low, probably due to the above factors, and while the authors' revisions during the rebuttal period have improved the objections, there are so many strong submissions that it would be difficult to justify override the very low reviewer scores.",ICLR2020, +aDpW1npAHW,1576800000000.0,1576800000000.0,1,BklBp6EYvB,BklBp6EYvB,Paper Decision,Reject,"The paper is interested in multi-task learning. It introduces a new architecture which condition the model in a particular manner: images features and task ID features are fed to a top-down network which generates task-specific weights, which are then used in a bottom-up network to produce final labels. The paper is experimental, and the contribution rather incremental, considering existing work in the area. Experimental section is currently not convincing enough, given marginal improvements over existing approaches - multiple runs as well as confidence intervals would help in that respect. +",ICLR2020, +Jk7gqtUm-tE7,1642700000000.0,1642700000000.0,1,CAjxVodl_v,CAjxVodl_v,Paper Decision,Accept (Spotlight),"The paper describes a framework that unifies several previous lines under hindsight information matching. Within that framework, the paper also describes variants of the decision transformer (DT) called categorical DT and unsupervised DT. The rebuttal was quite effective and the reviewers confirmed that their concerns are addressed. The revised version of the paper is significantly improved and consists of an important contribution that should interested many researchers. Well done!",ICLR2022, +nhLtfGOGgt,1576800000000.0,1576800000000.0,1,S1gvg0NYvH,S1gvg0NYvH,Paper Decision,Reject,"This paper studies the evolution of the mean field dynamics of a two layer-fully connected and Resnet model. The focus is in a realizable or student/teacher setting where the labels are created according to a planted network. The authors study the stationary distribution of the mean-field method and use this to explain various observations. I think this is an interesting problem to study. However, the reviewers and I concur that the paper falls short in terms of clearly putting the results in the context of existing literature and demonstrating clear novel ideas. With the current writing of the paper is very difficult to surmise what is novel or new. I do agree with the authors' response that clearly they are looking at some novel aspects not studied by the previous work but this was not revised during the discussion period. Therefore, I do not think this paper is ready for publication. I suggest a substantial revision by the authors and recommend submission to future ML venues. ",ICLR2020, +obsI6Hebia,1610040000000.0,1610470000000.0,1,jP1vTH3inC,jP1vTH3inC,Final Decision,Accept (Poster),"This paper deals with a particular model structure selection problem: inferring the order of a given sequence of latent variables. This problem is closely related to the matching problem that involves discrete optimization. The authors propose to cast the problem into a one-step Markov Decision problem and optimize it using the policy gradient. The proposal here is using Variational Order Inference (VOI) using and using a Gumbel-Sinkhorn distribution to construct a proposal over approximate permutations. The approach is mathematically sound and novel. + +Empirical results on image caption and code generation show promising results: method outperforms the previous Transformer-InDIGO and other baselines (Random, L2R, Common, Rare). This paper further analyzes the learned orders globally and locally, and conducts ablations. + +The reviewers were overall very enthusiastic. +",ICLR2021, +M1H8ysZA5,1576800000000.0,1576800000000.0,1,rkenmREFDr,rkenmREFDr,Paper Decision,Accept (Poster),"This paper proposes a new framework for improved nearest neighbor search by learning a space partition of the data, allowing for better scalability in distributed settings and overall better performance over existing benchmarks. + +The two reviewers who were most confident were both positive about the contributions and the revisions. The one reviewer who recommended reject was concerned about the metric used and whether comparison with baselines was fair. In my opinion, the authors seem to have been very receptive to reviewer comments and answered these issues to my satisfaction. After author and reviewer engagement, both R1 and myself are satisfied with the addition of the new baselines and think the authors have sufficiently addressed the major concerns. For the final version of the paper, I’d urge the authors to take seriously R4’s comment regarding clarity and add algorithmic details as per their suggestion. +",ICLR2020, +dswNkjMSvK,1610040000000.0,1610720000000.0,1,WoLQsYU8aZ,WoLQsYU8aZ,Final Decision,Reject,"This paper represents the PettingZoo library of multi-agent environments, providing a common API and benchmark for multi-agent learning. The library has high potential for impact and is likely of interest to a wide range of people in the ICLR community. However, in its current form the paper could be significantly improved by actioning the many pieces of constructive feedback provided by all reviewers. + +We have also been made aware of two highly related papers ""Multiplayer Support for the Arcade Learning Environment"" and ""SuperSuit: Simple Microwrappers for Reinforcement Learning Environments."" Together all three papers could be one comprehensive manuscript, but appear to have been unnecessarily split into three separate short papers.",ICLR2021, +bpzUylPxjq,1610040000000.0,1610470000000.0,1,#NAME?,#NAME?,Final Decision,Accept (Poster),"The authors present a study where they investigate whether meta-learning techniques leverage the underlying task distribution. To do so, the authors come up with two conditions, in the first they generate tasks using a grammar and in the second condition, which is the null condition essentially, the tasks have the same statistical properties as the compositional task but they are not derived from a simple grammar. The authors find that while humans are better in the compositional condition, models are better in the null condition. + +All reviewers have been positive with this work, but some concerns were raised regarding clarity around the use of some terms, such as compositionality. The rebuttal period has been very productive and the reviewers have acknowledged the improvements on the manuscript. + +All in all, I think this is a good study to appear on ICLR and I believe researchers would benefit from the design of the study that will perhaps open new opportunities around careful evaluation of meta-learning agents.",ICLR2021, +B1ldLzqelV,1544750000000.0,1545350000000.0,1,H1x-x309tm,H1x-x309tm,Meta-Review,Accept (Poster),"This paper analysis the convergence properties of a family of 'Adam-Type' optimization algorithms, such as Adam, Amsgrad and AdaGrad, in the non-convex setting. The paper provides of the first comprehensive analyses of such algorithms in the non-convex setting. In addition, the results can help practitioners with monitoring convergence in experiments. Since Adam is a widely used method, the results have a potentially large impact. + +The reviewers agree that the paper is well-written, provides interesting new insights, and that is results are of sufficient interest to the ICLR community to be worthy of publication.",ICLR2019,4: The area chair is confident but not absolutely certain +43LPTO5Vy7,1610040000000.0,1610470000000.0,1,jk1094_ZiN,jk1094_ZiN,Final Decision,Reject,"The reviewers overall appreciated the efforts of the authors in making NAS more computationally efficient. The paper could greatly benefit from further editing/restructuring with the goal of improving clarity, as it’s currently hard to navigate and understand in places. Future submissions of this work would benefit from more extensive empirical validation that motif networks mimic the original network. The reviewers also agreed that for the method to be appealing/useful, a general way to generate motif networks is needed. Overall, the outcome was that this is a very interesting idea but needs further development along the directions outlined above.",ICLR2021, +Bke0OEWWg4,1544780000000.0,1545350000000.0,1,Bkl-43C9FQ,Bkl-43C9FQ,decision,Accept (Poster),"The paper presents a simple and effective convolution kernel for CNNs on spherical data (convolution by a linear combination of differential operators). The proposed method is efficient in the number of parameters and achieves strong classification and segmentation performance in several benchmarks. The paper is generally well written but the authors should clarify the details and address reviewer comments (for example, clarity/notations of equations) in the revision. +",ICLR2019,4: The area chair is confident but not absolutely certain +_OGXtLP75E0L,1642700000000.0,1642700000000.0,1,9-Rfew334N,9-Rfew334N,Paper Decision,Accept (Poster),"The paper introduces a method to learn rotations of a quantized embedding end-to-end. The proposed technique seems novel, although the technical/algorithm novelty seems to be somewhat marginal. +The empirical results are promising, although do not quite match some of the claims by the authors. +Hopefully the reviewer feedback would help in producing an even more influential paper.",ICLR2022, +HJlTO5ZIlV,1545110000000.0,1545350000000.0,1,Skl3M20qYQ,Skl3M20qYQ,"Interesting approach, but more work needed on theory and experiments",Reject,"The paper introduces a form of variational auto encoder for learning disentangled representations. The idea is to penalise synergistic mutual information. The introduction of concepts from synergy to the community is appreciated. + +Although the approach appears interesting and forward looking in understanding complex models, at this point the paper does not convince on the theoretical nor on the experimental side. The main concepts used in the paper are developed elsewhere, the potential value of synergy is not properly examined. + +The reviewers agree on a not so positive view on this paper, with ratings either ok, but not good enough, or clear rejection. There is a consensus that the paper needs more work. + +",ICLR2019,4: The area chair is confident but not absolutely certain +HYRA5-a9WD,1576800000000.0,1576800000000.0,1,r1erNxBtwr,r1erNxBtwr,Paper Decision,Reject,"The paper investigates graph convolutional filters, and proposes an adaptation of the Fisher score to assess the quality of a convolutional filter. Formally, the defined Graph Filter Discriminant Score assesses how the filter improves the Fisher score attached to a pair of classes (considering the nodes in each class, and their embedding through the filter and the graph structure, as propositional samples), taking into account the class imbalance. + +An analysis is conducted on synthetic graphs to assess how the hyper-parameters (order, normalization strategy) of the filter rule the GFD score depending on the graph and class features. As could have been expected there no single killer filter. + +A finite set of filters, called base filters, being defined by varying the above hyper-parameters, the search space is that of a linear combination of the base filters in each layer. Three losses are considered: with and without graph filter discriminant score, and alternatively optimizing the cross-entropy loss and the GFD; this last option is the best one in the experiments. + +As noted by the reviewers and other public comments, the idea of incorporating LDA ideas into GNN is nice and elegant. The reservations of the reviewers are mostly related to the experimental validation: of course getting the best score on each dataset is not expected; but the set of considered problems is too limited and their diversity is limited too (as demonstrated by the very nice Fig. 5). + +The area chair thus encourages the authors to pursue this very promising line of research and hopes to see a revised version backed up with more experimental evidence. ",ICLR2020, +r1eJeuU2k4,1544480000000.0,1545350000000.0,1,rylbWhC5Ym,rylbWhC5Ym,Issues with motivation and experiments,Reject,"All three reviewers raised the issues that (a) the problem tackled in the paper was insufficiently motivated, (b) the solution strategy was also not sufficiently motivated and (c) the experiments had serious methodological issues.",ICLR2019,5: The area chair is absolutely certain +SyxUeT2bxV,1544830000000.0,1545350000000.0,1,HkxKH2AcFm,HkxKH2AcFm,Tackles an important problem with arguable success,Accept (Poster),"The paper argues for a GAN evaluation metric that needs sufficiently large number of generated samples to evaluate. Authors propose a metric based on existing set of divergences computed with neural net representations. R2 and R3 appreciate the motivation behind the proposed method and the discussion in the paper to that end. The proposed NND based metric has some limitations as pointed out by R2/R3 and also acknowledged by the authors -- being biased towards GANs learned with the same NND metric; challenge in choosing the capacity of the metric neural network; being computationally expensive, etc. However, these points are discussed well in the paper, and R2 and R3 are in favor of accepting the paper (with R3 bumping their score up after the author response). +R1's main concern is the lack of rigorous theoretical analysis of the proposed metric, which the AC agrees with, but is willing to overlook, given that it is nontrivial and most existing evaluation metrics in the literature also lack this. +Overall, this is a borderline paper but falling on the accept side according to the AC. ",ICLR2019,4: The area chair is confident but not absolutely certain +YRkUmkR57L,1576800000000.0,1576800000000.0,1,rklv-a4tDB,rklv-a4tDB,Paper Decision,Reject,"This paper proposes modifying the training loss for neural net-based PDE solvers, by adding an L_infty (max) term to the standard L_2 loss. The motivation for this loss is sensible in that it matches the definition of a strong solution, but this is only a heuristic motivation, and is missing a theoretical analysis. + +This paper's lack of novelty and polish, as well as the lack of clarity in the implementation details, makes this a narrow reject.",ICLR2020, +r1ge3N8Bx4,1545070000000.0,1545350000000.0,1,SJfHg2A5tQ,SJfHg2A5tQ, incremental contribution,Reject,"The paper makes two fairly incremental contributions regarding training binarized neural networks: (1) the swish-based STE, and (2) a regularization that pushes weights to take on values in {-1, +1}. Reviewer1 and reviewer2 both pointed out concerns about the incremental contribution, the thoroughness of the evaluation, the poor clarity and consistency of the writing. Reviewer3 was muted during the discussion. Given the valid concerns from reviewer1/2, this paper is recommended for rejection. ",ICLR2019,4: The area chair is confident but not absolutely certain +94Uj1wNDFQm,1642700000000.0,1642700000000.0,1,G0CuTynjgQa,G0CuTynjgQa,Paper Decision,Reject,"This paper proposes to analyze the generalization error of deep learning models and GANs using the Lipschitz coefficient of the model. + +There was significant discrepancies in the evaluation of the paper among reviewers. While all reviewers acknowledged the interesting theoretical approach to understand generalization and the relevance to ICLR of the problem, they disagreed about the readiness level of the paper. Some concerns were expressed in terms of clarity (and the AC agrees with these), but most importantly, reviewer wKt9 pointed an important flaw in the current analysis that was not properly responded to by the reviewers (see below). In discussion, other reviewers were also concerned by this flaw, and so the AC decided to recommend a major revision of the paper taking the reviewers comments in consideration. + +## Important flaw in the paper analysis (from wKt9) + +Basically, Theorem 1 assumes that a loss $f(h,x)$ is $L$-Lipschitz w.r.t. input $x$ in some compact set of diameter $B$ for any $h$. The author shows that the: +$\sup_{h \in H} |E_{P} f(h,X) - E_{\hat{P}} f(h,X)|$ is upper-bounded by $L B + C \sqrt{\text{stuff}/m}$. + +The concern of wKt9 is that the LHS is upper-bounded *trivially* and deterministically by the tighter $L B$ [see proof sketch next] for any distribution $P$ and $\hat{P}$ just because of the compactness of the input set and that $f$ is $L$-Lipschitz; one does not even need to include the number of samples $m$ in the analysis (thanks to the very strong assumption on $f$). The reviewer also was concerned that later (Theorem 3), the authors study ways that we can make $L$ exponentially small (which is interesting), but this has both the issues that: +1) it tells you nothing about the absolute performance of your network, as this only bounds the variation between any two distributions (indeed including the empirical and true distribution; but the fact that it also contains all distributions should indicate how loose this bound is!), and so perhaps the best empirical error one can obtained is still big +2) the current version of Theorem 3 uses a loose bound with a dependence on $m$ which was not even needed (as per the result above). + +While it's true that empirically one can observe small empirical error, and thus combining this with a small Lipschitz constant would indicate good absolute performance; but the current presentation of the theory is rendered quite problematic by the above refinement, and should be corrected in a revision. + +### Proof sketch: +For simplicity, I'll prove it for $P$ being a discrete distribution and $\hat{P}$ being the empirical; but I'm pretty sure you can extend it to continuous distributions as well. + +Note that we have $|f(h,x) - f(h,x')| \leq L B$ for all $x, x'$ in the compact set of diameter $B$ and for all $h$. + +Now $$E_{P} f(h,X) - E_{\hat{P}} f(h,X) = \sum_j \pi_j f(h, x_j') - \frac{1}{m} \sum_i f(h,x_i)$$ + +For each $x_i$, associate several $x_j$'s so that the total sum of their probabilities is $1/m$ (split some $\pi_j$ in multiple pieces if necessary) -- we can augment the index set for these new pieces, to obtain new probabilities $\pi_j'$ and call $I_i$ the set of associated indices to $x_i$. We have $\sum_{j \in I_i} \pi'_j = 1/m$ + +We thus have: +$$E_{P} f(h,X) - E_{\hat{P}} f(h,X) = \sum_i \sum_{j \in I_i} \pi'_j \left[ f(h, x'_j) - f(h,x_i) \right]$$ + +Thus: +$$|E_{P} f(h,X) - E_{\hat{P}} f(h,X)| \leq \sum_i \sum_{j \in I_i} \pi'_j \left| f(h, x'_j) - f(h,x_i) \right| \leq L B$$ + +This is true for any $h$, so this is also true for the $\sup$, *deterministically*! QED",ICLR2022, +lwdmA1AazGk,1610040000000.0,1610470000000.0,1,imnG4Ap9dAd,imnG4Ap9dAd,Final Decision,Reject,"In this paper, the authors propose a model for integrating news representations for stock predictions. While the research direction has good value in real applications, it seems that this particular paper has not done a sufficiently good job in pushing the frontier of this direction. The reviewers have raised quite a few concerns, for example: +1) Paper writing needs significant improvement (e.g., confusion regarding future news). +2) Limited technical novelty as compared to previous works +3) Benchmark datasets are out of date, baselines are a little weak and not well explained, more evaluation measures (Such as Sharpe value) are needed +4) Marginal improvements over the baselines + +The authors have not submitted their rebuttals. Therefore the concerns are still there and we do not think the paper is ready for publication at ICLR. +",ICLR2021, +7cUkGsB1OVr,1642700000000.0,1642700000000.0,1,V70cjLuGACn,V70cjLuGACn,Paper Decision,Reject,"This manuscript studies the problem of continual learning and introduces a reinforcement learning agent to select hyperparameters for replay/training. Ordinarily, replay based mechanisms for continual learning use settings and hyperparameters that are chosen and fixed through training. If it was possible to adjust replay dynamics online (in this case by looking at performance on a held-aside test set), performance might be improved. This is the approach taken by this manuscript. +Reviewers were generally happy with the writing of the paper and presentation of the material. At the same time, more than one reviewer worried about the novelty of the approach. In essence, the proposal amounts to using a black-box optimizer (in this case RL) to adjust online the hyperparameters (e.g. the replay ratio) for continual learning (of-the-shelf ER and SCR). Viewed through this lens, and given that the optimizer in this case was a straightforward application of DQN, this concern is potentially well founded. The primary novelty then is the construction of the reward function to be optimized: in this case defined as the decrease of the CL loss measured on a held aside test set that is constructed online. Nevertheless, novelty is only part of the equation and strong empirical results can easily be a deciding factor in readiness for publication. On this front, reviewer GhFg points out that the empirical results and comparisons with baseline methods are not as clear as they need to be. Several issues are raise in discussion: the primary one is around the question of how the authors have allowed task-specific information for the Q functions used by RL, and what the implications of this might be. The baselines compared against do not use any task-specific information, which muddies the waters when trying to understand the comparisons. I agree with the reviewer that the manuscript needs to do a better job of making the empirical setting and comparisons as transparent and fair as possible. Given this, and the fact (raised by several reviewers) that some empirical evidence presented in the manuscript actually points to RL selecting near-static parameters over time, I recommend that the manuscript be rejected. At the same time, I want to encourage the authors to focus on a streamlined version of the manuscript that addresses the issues raised by GhFg, as I believe that if the concerns can be addressed the work is close to making a compelling contribution for the field.",ICLR2022, +o-eF6ru2Lww,1610040000000.0,1610470000000.0,1,ZVBtN6B_6i7,ZVBtN6B_6i7,Final Decision,Reject,"The paper studies the problem of identifying what information to forget in attention mechanisms, with the goal of enabling attention mechanisms to deal with longer contexts. This is a simple yet intuitive extension: self-attention is augmented with an expiration value prediction. Experiments were carried out on NLP and RL tasks. +Overall, the paper has novelty in the proposed idea, however, there are concerns about the strength of the experiments; that the experiments fall short.",ICLR2021, +SyFf3z8Og,1486400000000.0,1486400000000.0,1,Bkbc-Vqeg,Bkbc-Vqeg,ICLR committee final decision,Reject,"This paper received borderline reviews. All reviewers as well as AC agree that the authors pursue a very interesting and less explored direction. The paper essentially addresses the problem of double grounding; visual information helping to group acoustic signal into words, and words helping to localize object-like regions in images. While somewhat hidden under the rug, this is what makes the paper different from the authors' previous work. The reviewers mentioned this to be a minor contribution. The AC agrees with the authors that this is an interesting and novel problem worth studying. However, the AC also agrees with the reviewers that this major novelty over the previous work is missing technical depth. The AC strongly encourages the authors to improve on this aspect of the paper. A simple intuition would be to look at https://arxiv.org/abs/1609.01704 in order to discover words, and use something along the lines of attention (eg https://arxiv.org/abs/1502.03044) to link with image regions.",ICLR2017, +HyQi8kaHf,1517250000000.0,1517260000000.0,854,By-IifZRW,By-IifZRW,ICLR 2018 Conference Acceptance Decision,Reject,"The authors propose the use of Gaussian processes as the prior over activation functions in deep neural networks. This is a purely mathematical paper in which the authors derive an efficient and scalable approach to their problem. The idea of having flexible distributions over activation functions is interesting and possibly impactful. One reviewer recommended acceptance with low confidence. The other two found the idea interesting and compelling but confidently recommended rejection. These reviewers are concerned that the paper is unnecessarily complex in terms of the mathematical exposition and that it repeats existing derivations without citation. It is very important that the authors acknowledge existing literature for mathematical derivations. Furthermore, the reviewers question the correctness of some of the statements (e.g. is the variational bound preserved?). These reviewers agreed that the paper is incomplete without any empirical validation. + +Pros: +- A compelling and promising idea +- The approach seems to be scalable and highly plausible + +Cons: +- No experiments +- Significant issues with citing of related work +- Significant questions about the novelty of the mathematical work",ICLR2018, +nZ-cM6aGZ3,1576800000000.0,1576800000000.0,1,Bkga90VKDB,Bkga90VKDB,Paper Decision,Reject,"This paper proposes to further distill token embeddings via what is effectively a simple autoencoder with a ReLU activation. All reviewers expressed concerns with the degree of technical contribution of this paper. As Reviewer 3 identifies, there are simple variants (e.g. end-to-end training with the factorized model) and there is no clear intuition for why the proposed method should outperform its variants as well as the other baselines (as noted by Reviewer 1). Reviewer 2 further expresses concerns about the merits of the propose approach over existing approaches, given the apparently small effect size of the improvement (let alone the possibility that the improvement may not in fact be statistically significant). +",ICLR2020, +CxdI0PXXyx,1610040000000.0,1610470000000.0,1,ESVGfJM9a7,ESVGfJM9a7,Final Decision,Reject,"There was a consensus among all the reviewers that the methodological contribution is not significant enough for publication at ICLR. In short, the main contribution of the paper is to include spatial modeling into a deep temporal point process model. However, to do that, they just use a well-known method (KDE) on top of a methodology that is very similar to Du et al., KDD 2016. + +In addition, in the original submission, the specific functional form for KDE was independent on time, as highlighted by the reviewers, which basically separates spatial and temporal modeling. Unfortunately, further experiments performed by the authors failed to show performance benefits on making it dependent on time. + +The authors also claim that an additional contribution is the sampling method, however, this seems to thin of a contribution for a full paper.",ICLR2021, +SyehNuRHlN,1545100000000.0,1545350000000.0,1,HyexAiA5Fm,HyexAiA5Fm,An interesting algorithm for unbalanced optimal transport,Accept (Poster),"After revision, all reviewers agree that this paper makes an interesting contribution to ICLR by proposing a new methodology for unbalanced optimal transport using GANs and should be accepted.",ICLR2019,4: The area chair is confident but not absolutely certain +HyxExBm-l4,1544790000000.0,1545350000000.0,1,r1zOg309tX,r1zOg309tX,Intersting theoretical analyis of a compact dual form of the Wasserstein distance which however is not widely used in literature. ,Reject,"The paper investigates problems that can arise for a certain version of the dual form of the Wasserstein distance, which is proved in Appendix I. While the theoretical analysis seems correct, the significance of the distribution is limited by the fact, that the specific dual form analysed is not commonly used in other works. Furthermore, the assumption that the optimal function is differentiable is often not fulfilled neither. The paper would herefore be significantly strengthen by making more clear to which methods used in practice the insights carry over. +",ICLR2019,4: The area chair is confident but not absolutely certain +LhHpdoqy_4R,1610040000000.0,1610470000000.0,1,qkLMTphG5-h,qkLMTphG5-h,Final Decision,Accept (Poster),"This paper considers a new and practical setting of meta-learning for out-of-domain task adaptation where a pretrained model exists but the original meta-training data is not available. The authors incorporate several ideas including deep ensembles, adversarial training and uncertainty-based step sizes, and achieve competitive performance under this particular setting. + +The combination of various methods appears complicated, but the authors provide detailed ablation study to show the effectiveness of each component empirically. During rebuttal and discussion, they addressed many of the concerns from the reviewers. As pointed out by a reviewer, their proposed method would have a value in the domain adaptation area beyond meta-learning. + +The remaining concern is on the somewhat ad-hoc combination of multiple methods and lack of a clear single solution for addressing the OOD few-shot learning problem. Nonetheless, the proposed methods show a convincing empirical improvement on the vanilla MAML baseline in the experiments.",ICLR2021, +#NAME?,1576800000000.0,1576800000000.0,1,Hyl9ahVFwH,Hyl9ahVFwH,Paper Decision,Reject,"The authors present a Siamese neural net architecture for learning similarities among field data generated by numerical simulations of partial differential equations. The goal would be to find which two field data are more similar to each. One use case mentioned is the debugging of new numerical simulators, by comparing them with existing ones. + +The reviewers had mixed opinions on the paper. I agree with a negative comment of all three reviewers that the paper lacks a bit on the originality of the technique and the justification of the new loss proposed, as well as the fact that no strong explicit real world use case was given. I find this problematic especially given that similarities of solutions to PDEs is not a mainstream topic of the conference. Hence a good real world example use of the method would be more convincing.",ICLR2020, +XVeKRBcKp,1576800000000.0,1576800000000.0,1,Bkx5XyrtPS,Bkx5XyrtPS,Paper Decision,Reject,Paper shows that the question of linear deep networks having spurious local minima under benign conditions on the loss function can be reduced to the two layer case. This paper is motivated by and builds upon works that are proven for specific cases. Reviewers found the techniques used to prove the result not very novel in light of existing techniques. Novelty of technique is of particular importance to this area because these results have little practical value in linear networks on their own; the goal is to extend these techniques to the more interesting non-linear case. ,ICLR2020, +jBA2OeXjCQ4,1610040000000.0,1610470000000.0,1,UEtNMTl6yN,UEtNMTl6yN,Final Decision,Reject,"All four knowledgeable referees have indicated reject due to many concerns. In particular, reviewers pointed out that the novelty of this paper is not clear because the difference from related work is very limited (i.e., the difference from Z. Wang and S. Ji is not clear, other than using one additional layer),  and they were concerned that the results of the experiment are not convincing (For example, the results reported in this paper are significantly inferior to those reported in other papers, the GNN architecture used is limited, and the performance difference especially in the additional experiments in the revision, is very marginal). No reviewers were convinced by the authors' claims even through the author's rebuttal and revision. + +One important note: Reviewers have stated that they did not explicitly check the identity of the author and did not pose a problem on this, but if we follow the link specified in the original submission, we can see the identity of the author, which may be considered as a violation of the double-blind policy. This is a small and regrettable mistake, but it can be a serious problem in the review process. In this review process, reviewers unanimously suggested rejection even ignoring this issue, but it seems that you need to pay attention in your future submissions. +",ICLR2021, +ryTbaGLdl,1486400000000.0,1486400000000.0,1,ryQbbFile,ryQbbFile,ICLR committee final decision,Reject,"The program committee appreciates the authors' response to clarifying questions. Unfortunately, all reviewers are leaning against accepting the paper. Authors are encouraged to incorporate reviewer feedback in future iterations of this work.",ICLR2017, +BGfRIDY5hhp,1642700000000.0,1642700000000.0,1,VAmkgdMztWs,VAmkgdMztWs,Paper Decision,Reject,"The paper studies and compares different notions of robustness. However, reviewers found there are many unjustified claims in the analysis, and the paper does not provide novel findings nor useful approaches.",ICLR2022, +ryld-L2bgN,1544830000000.0,1545350000000.0,1,HygjqjR9Km,HygjqjR9Km,metareview,Accept (Poster),"The submission proposes two new things: a repulsive loss for MMD loss optimization and a bounded RBF kernel that stabilizes training of MMD-GAN. The submission has a number of unsupervised image modeling experiments on standard benchmarks and shows reasonable performance. All in all, this is an interesting piece of work that has a number of interesting ideas (e.g. the PICO method, which is useful to know). I agree with R2 that the RBF kernel seems somewhat hacky in its introduction, despite working well in practice. + +That being said, the repulsive loss seems like something the research community would benefit from finding out more about, and I think the experiments and discussion are sufficiently extensive to warrant publication.",ICLR2019,4: The area chair is confident but not absolutely certain +HJ80sMLOl,1486400000000.0,1486400000000.0,1,rJo9n9Feg,rJo9n9Feg,ICLR committee final decision,Reject,"The program committee appreciates the authors' response to concerns raised in the reviews. Unfortunately, all reviewers are leaning against accepting the paper. Authors are encouraged to incorporate reviewer feedback in future iterations of this work.",ICLR2017, +BJl9NfdQgE,1544940000000.0,1545350000000.0,1,Syx72jC9tm,Syx72jC9tm,Description of linear permutation invariant and equivariant layers,Accept (Poster),"The paper provides a comprehensive study and generalisations of previous results on linear permutation invariant and equivariant operators / layers for the case of hypergraph data on multiple node sets. Reviewers indicate that the paper makes a particularly interesting and important contribution, with applications to graphs and hyper-graphs, as demonstrated in experiments. + +A concern was raised that the paper could be overstating its scope. A point is that the model might not actually give a complete characterization, since the analysis considers permutation action only. The authors have rephrased the claim. Following comments of the reviewer, the authors have also revised the paper to include a discussion of how the model is capable of approximating message passing networks. + +Two referees give the paper a strong support. One referee considers the paper ok, but not good enough. The authors have made convincing efforts to improve issues and address the concerns. + + + + + ",ICLR2019,4: The area chair is confident but not absolutely certain +6CyjKH5iZYb,1610040000000.0,1610470000000.0,1,19drPzGV691,19drPzGV691,Final Decision,Reject,"The reviewers found found the paper well motivated and well written, they found both the theoretical contributions limited in novelty and the experiments too rudimentary to be insightful.",ICLR2021, +r1cnsfIOe,1486400000000.0,1486400000000.0,1,r1fYuytex,r1fYuytex,ICLR committee final decision,Accept (Poster),"After discussion, the reviewers unanimously recommend accepting the paper.",ICLR2017, +Pjcq4auELci,1610040000000.0,1610470000000.0,1,rABUmU3ulQh,rABUmU3ulQh,Final Decision,Accept (Poster),"This work proposes a method, inspired by Cellular Automata, to generate 3D objects in voxel space. By *only* using local update rule for each location, the method can probabilistic generate high resolution models of everyday objects in the dataset. Due to the ability to incrementally generate details, the quality of the samples are seemingly higher than tradition approaches using Voxel-based GANs. + +Most reviewers and myself agree this is a strong and interesting paper that will spark good discussion in the ICLR community. It is also well written and ideas are clearly explained. During the review process, the authors improved the work by conducting additional experiments to analyze the sensitivity of hyper parameters and took in and incorporated various suggestions from the reviewers. After the revision, I believe the work to be in good shape to be accepted at ICLR2021, and I will recommend that this paper be accepted (Poster).",ICLR2021, +rJm2nG8Ox,1486400000000.0,1486400000000.0,1,ryT4pvqll,ryT4pvqll,ICLR committee final decision,Accept (Poster),"This paper proposes a nice algorithm for improved exploration in policy search RL settings. The method essentially optimizes a weighted combination of expected reward (essentially the REINFORCE objective without an entropy regularizer) plus a term from the reward augmented maximum likelihood objective (from a recent NIPS paper), and show that the resulting update can be made with a fairly small modification to the REINFORCE algorithm. The authors show improved performance on several sequential ""program-like"" domains like copying a string, adding, etc. + + I'm recommending this paper for acceptance, as I think the contribution here is a good one, and the basic approach very nicely offers a better exploration policy than the typical Boltzmann policy, using a fairly trivial modification to REINFORCE. But after re-reading I'm less enthusiastic, simply because the delta over previous work (namely the RAML paper) doesn't seem incredibly substantial. None of the domains in the experiments seem substantially challenges, and the fact that it can improve over REINFORCE isn't necessarily amazing. + + Pros: + + Well-motivated (and simple) modification to REINFORCE to get better exploration + + Demonstrably better performance with seemingly less hyperparameter tunies + + Cons: + - Delta over RAML work isn't that clear, essentially is just a weighted combination between REINFORCE and RAML (though in RL context) + - Experiments are good, but not outstanding relative to simple baselines",ICLR2017, +6NeaFqs9ZO8,1642700000000.0,1642700000000.0,1,KVYq2Ea90PC,KVYq2Ea90PC,Paper Decision,Reject,"This paper received 3 quality reviews, with 2 rated 5 and 1 rated 6. While the reviewers recognize the various contributions and insights made by this work, it was also pointed out that this work lacks technical novelty. The authors agreed with this concerns and argued that this work provides a service to the community, citing imageNet and COCO papers. The AC agrees with the contribution and major concerns. Furthermore, the AC would like to point out that in term of the level of efforts, this work might not be on par with the imageNet and COCO. All things considered, the AC believes that this work is not ready for publication at its current form, and hence recommend rejection.",ICLR2022, +hlMH9OLx7,1576800000000.0,1576800000000.0,1,B1gZV1HYvS,B1gZV1HYvS,Paper Decision,Accept (Poster),"The paper proposes an extension to the popular Generative Adversarial Imitation Learning framework that considers multi-agent settings with ""correlated policies"", i.e., where agents' actions influence each other. The proposed approach learns opponent models to consider possible opponent actions during learning. Several questions were raised during the review phase, including clarifying questions about key components of the proposed approach and theoretical contributions, as well as concerns about related work. These were addressed by the authors and the reviewers are satisfied that the resulting paper provides a valuable contribution. I encourage the authors to continue to use the reviewers' feedback to improve the clarity of their manuscript in time for the camera ready submission.",ICLR2020, +rJxy1kgTh7,1541370000000.0,1545350000000.0,1,Hkf2_sC5FX,Hkf2_sC5FX,Useful improvement over GEM and a good evaluation methodology,Accept (Poster)," +Pros: +- Great work on getting rid of the need for QP and the corresponding proof of the update rule +- Mostly clear writing +- Good experimental results on relevant datasets +- Introduction of a more reasonable evaluation methodology for continual learning + +Cons: +- The model is arguably a little incremental over GEM. In the end I think all the reviewers agree though that the practical value of a considerably more efficient and easy to implement approach largely outweighs this concern. + +I think this is a good contribution in this area and I recommend acceptance.",ICLR2019,4: The area chair is confident but not absolutely certain +BkE0Uk6Hf,1517250000000.0,1517260000000.0,895,SJgf6Z-0W,SJgf6Z-0W,ICLR 2018 Conference Acceptance Decision,Reject,"All of the reviewers agree that the paper presents strong experimental results on continuous control benchmarks. The reviewers raised concerns regarding the analysis of the behavior of the algorithm, the possible impact of the technique, and requested more references and comparison with related work. The paper has significantly improved since the initial submission, but still not able fully satisfactory to the reviewers, partly due to the large extent of the changes needed. +",ICLR2018, +Ol2NJGJHy0V,1642700000000.0,1642700000000.0,1,YRDlrT00BP,YRDlrT00BP,Paper Decision,Reject,"This paper propose a way to make minibatch Optimal transport (m-OT) more efficient by computing an optimal assignment (in the OT sens) and us this assignment to compute instead a hierarchical OT loss (bomb-OT ) that can be used instead of the m-OT loss. The authors discuss how the equivalent OT plan with bomb-OT is much more sparse, and how the proposed approach is actually not biased when the number of mini-batches $k\rightarrow \infty$ . Numerical experiments show that the proposed method allows a gain in performances in applications such as generative modeling, domain adaptation, color transfer and approximate Bayesian computation. + +The paper originally got borderline-negative scores from the reviewers. While the reviewers acknowledged that the idea is interesting, they had some concerns about the theoretical results strength, some missing baselines and discussions in the numerical experiments. The authors did a detailed reply that clarified some problems. the new numerical experiments with m-UOT were also greatly appreciated by the reviewers but they also raised some questions about the paper. Some concerns detailed below about the comparison with m-OT appeared during the reviewers discussion. Despite the new information, the reviewers reached an agreement that this paper is interesting but needs more work and another round of reviews before acceptance. For theses reasons the AC recommends a rejection for this paper. + +More details and suggestions below: + +- While it is clearly not the objective of the paper a discussion about the proximity of the average plan to the exact OT plan is interested. Also a short numerical experiments showing that the bomb-OT average plan is closer to the exact plan than m-OT would be a good illustration of the better performance of bomb-OT. This seems more important for the paper than the color transfer experiments that is kind of a toy problem. + +- After checking the definition in the paper and discussion between reviewers it appeared that the comparison with m-OT is a bit unfair due to the reformulation of the problem in (1). indeed in the usual formulation, k pairs of independent minibatches are used and the OT is done on those pairs (a sum of k OT) not on all the possible pairwise permutation as in definition of m-OT in equation (1). In other words in m-OT the batches are supposed to be independent which is not the case in the proposed formulation (it is equivalent in the population case though). It means that in practical application, for the same computational complexity (k^2 OT computed), m-OT actually uses $k^2m$ independent samples on each distribution whereas the bomb-OD (and the m-OT defined in equation (1) ) use $km$ samples . By implementing m-OT as in (1) they actually prevent m-OT to explore the dataset as its original formulation does. This means that all the experiments should be done either with the original m-OT implementation of both the original and (1) in addition to bomb-OT. The proposed method will proably work better but the current experiment do not allow this fair comparison. + +- The theoretical result need more discussion and justification. For instance m-OT converges to its population value in + $O(m^{1/2}n^{-1/2}+k^{-1/2})$ that is independent from the dimensionality $d$, but the authors prove the concentration of bomb-OT in of $O(m^{1/2}n^{-1/d})$ which is clearly a problem for large $d$. Also the dependence on $k$ of the convergence would be important since bomb-OT is well defined is true only in the population case where $k$ is large. Note that the claim that it is well defined and hence better is also a bit dubious because it is well defined for $k=\infty$, which is also the case for m-OT when $m=\infty$. Both $m$ and $k$ large will lead to not practical optimization problems so they are comparable except that m-OT converged to the true OT plan when $m\rightarrow \infty$ which is not the case for bomb-OT. + +- While the contribution of the paper in indeed a methodological method and does not require to be state of the art on all applications the numerical experiments should be improved. First as discussed above the comparison with m-OT is actually unfair an do not correspond to what in done in practice (where all mini batches are independent). m-OT should be implemented with $k^2$ truly independent minibatches. + +- Second , the authors use approximate W2 on two of the GAN dataset and FID on the third. This is problem because approximate W2 is not defined in the paper. FID is the standard performance measure and should be used for all dataset. + +- Third the novel experiments comparing also raises a lot of questions. m-UOT is far better than BoMb-OT suggesting that Unbalanced OT can compensate for the limits of m-OT far better than bomb-OT itself. Yes there is a slight increase in performance for ebomb-UOT over m-UOT but is is so small (0.08 %) that it is hard to find them significant, especially since we have no variance. This result that is provided only for DA application actually suggest that the competitor of bomb-OT is m-UOT and not m-OT so it should also be part of the comparison in the other experiments. The authors talk in their replay about the limits of m-UOT but stating that the experiences are not done in the paper is not an excuse for evaluating this clear competitor on other problems and showing numerically these limits. + +- Finally in the current version of the paper puts a lot of things in the annex that make the paper clearly not self content. Some experiments could go in annex/supp for instance the color transfer to make place for more details in the main paper. + +Note that it is not one of those comments above that lead the the reject decision but the sum of them that clearly show that the paper needs more work.",ICLR2022, +MStTs7oGkd,1642700000000.0,1642700000000.0,1,KeBPcg5E3X,KeBPcg5E3X,Paper Decision,Reject,"The paper presents modifying latent optimization for representation disentanglement using contrastive learning, resulting in improved performance on disentanglement benchmarks. Despite the empirical success, the proposed algorithm has many moving parts and loss functions. Most reviewers agree that given the incremental and complex nature of the proposed technique, the empirical results are not sufficient for acceptance at ICLR, especially since the results do not present additional insights into the inner workings of the method. I encourage the authors to try to simplify the technique, or provide a convincing evidence that such complexity is necessary. + +PS: +I didn't find much discussion of how the hyper-parameters are chosen (temperature, lambda terms, etc.). +A discussion of recent self-supervised disentanglement methods (e.g., https://arxiv.org/abs/2102.08850 and https://arxiv.org/abs/2007.00810) can be helpful.",ICLR2022, +DAB0YImO8BA,1610040000000.0,1610470000000.0,1,CNA6ZrpNDar,CNA6ZrpNDar,Final Decision,Reject,"This paper studies the decision boundaries of shallow ReLU network using the formalism of tropical geometry. Its main takeaway is to provide a new interpretation of the lottery ticket hypothesis in terms of network pruning strategies that preserve certain geometric structure. + +Reviewers were appreciative of the clarity of the exposition, and the novel perspective on interesting and elusive phenomena such as the lottery ticket hypothesis. On the other hand, they also expressed some doubts about the significance of some aspects of the theory (such as proposition 1 and corollary 1), as well as the computational considerations required to elevate the analysis to large-scale architectures from applications. + +Ultimately, and after taking into consideration all the reviewing discussions, the AC believes that this submission is not yet ready for publications, but it is in a trajectory to become an important piece of work. In particular, the AC encourages delving deeper into the tropical network pruning. Additionally, the authors might want to discuss [Breaking the Curse of Dimensionality with Convex Neural Networks, Bach'17] in the related work, since this is the first instance the AC is aware of where the connection between zonotopes and shallow ReLU networks is established. ",ICLR2021, +ZstkImK21,1576800000000.0,1576800000000.0,1,Skerzp4KPS,Skerzp4KPS,Paper Decision,Reject,"This paper proposes a data augmentation method based on Generative Adversarial Networks by training several GANs on subsets of the data which are then used to synthesise new training examples in proportion to their estimated quality as measured by the Inception Score. The reviewers have raised several critical issues with the work, including motivation (it can be harder to train a generative model than a discriminative one), novelty, complexity of the proposed method, and lack of comparison to existing methods. Perhaps the most important one is the inadequate empirical evaluation. The authors didn’t address any of the raised concerns in the rebuttal. I will hence recommend the rejection of this paper.",ICLR2020, +YmxAWekLCEL,1642700000000.0,1642700000000.0,1,27aftiBeius,27aftiBeius,Paper Decision,Reject,"All reviewers agreed on rejection. Unfortunately, there was no author response so there was nothing to drive further discussion on the paper. The reviewers gave very detailed advice on improving the work.",ICLR2022, +kFpzueQ9OSn,1610040000000.0,1610470000000.0,1,Qun8fv4qSby,Qun8fv4qSby,Final Decision,Accept (Poster),"There is a substantial contribution in identifying novel questions/issues, as this paper certainly does. Neither I nor the reviewers have seen this issue of transient non-stationary before, and the authors make a compelling case for it, especially in the supervised setting with the CIFAR experiments. It is less compelling through the RL experiments. As such, this paper is likely to inspire new work within the field. To me, Figure 1 is the most interesting aspect of the whole paper. + +The initial approach by the authors is questionable in its effectiveness, and is likely to be improved by others in the future. Some of the results in Figure 3 are questionable, especially when you look at the individual curves in Figure 8. So overall, this means that the authors have identified a truly novel issue, and proposed an initial method that is just okay. They've done a nice job investigating this in a supervised setting, and need to push further in the RL setting. + +The question is whether the novel contribution of the problem outweighs that the algorithm and its evaluation could use improvement. The reviewers debated this in the discussion, with points on both sides, but the novelty of the question/issue (even if the investigation could use work) is likely to inspire further work in this direction. + +Other notes: +The authors could have evaluated the (impractical) version of their algorithm proposed in the first paragraph of Section 4.2. This would inform 1) whether their parallel training approximation is close to the optimal algorithm, and 2) whether the optimal (impractical) algorithm is capable of improving generalization significantly. If the latter is true, it would leave open a huge avenue of investigation to find better approximate solutions.",ICLR2021, +t-_SpxVzRLz,1642700000000.0,1642700000000.0,1,lrocYB-0ST2,lrocYB-0ST2,Paper Decision,Accept (Poster),"The paper addresses hierarchical kernels and provides an analysis of their RKHS along with generalization bounds and cases where improved generalization can be obtained. The reviewers appreciated the analysis and its implications. There were multiple concerns regarding presentation clarity, which the authors should address in the camera ready version.",ICLR2022, +rJ4iGyTrz,1517250000000.0,1517260000000.0,1,ryQu7f-RZ,ryQu7f-RZ,ICLR 2018 Conference Acceptance Decision,Accept (Oral),"This paper analyzes a problem with the convergence of Adam, and presents a solution. It identifies an error in the convergence proof of Adam (which also applies to related methods such as RMSProp) and gives a simple example where it fails to converge. The paper then repairs the algorithm in a way that guarantees convergence without introducing much computational or memory overhead. There ought to be a lot of interest in this paper: Adam is a widely used algorithm, but sometimes underperforms SGD on certain problems, and this could be part of the explanation. The fix is both principled and practical. Overall, this is a strong paper, and I recommend acceptance. +",ICLR2018, +S1l8yJLoyV,1544410000000.0,1545350000000.0,1,HylzTiC5Km,HylzTiC5Km,metareview: significant progress on autoregressive models for image generation,Accept (Oral),"All reviewers recommend acceptance, with two reviewers in agreement that the results represent a significant advance for autoregressive generative models. The AC concurs. +",ICLR2019,4: The area chair is confident but not absolutely certain +grLjfiJ-ep_6,1642700000000.0,1642700000000.0,1,AVShGWiL9z,AVShGWiL9z,Paper Decision,Reject,"Inspired by dendritic nonlinearity, this paper extends previous work on PLRNN/PWL dynamical system modeling by Durstewitz's group. The extension replaces the ReLU nonlinearity with a linear combination of ReLUs. This preserves the theoretical properties of PLRNN, however, the dimensionality of the latent dynamics remains the same, increasing the expressive power of prior PLRNNs. I (area chair) actually read this paper since not all reviewers provided high-quality reviews and one key reviewer is having a personal emergency. Though I appreciate the premise, detailed numerical evaluations, and the inference approach, the novelty is marginal and I do not buy the theoretical advantage of this class of models as presented (see below). Therefore I cannot recommend this paper to appear at ICLR at this time. + +Some additional weaknesses that reviewers did not point out: +1. Dendritic nonlinearity is summarized as a point nonlinearity; It lacks the interesting phenomena of dendrites such as nonlinear summation and calcium spikes with its own internal dynamics. +2. The many analytical properties of PLRNN may sound nice on paper, but very impractical. To search for the fixed points and cycles, the amount of required computation exponentially increases as the number of neurons and cycle length increases. In addition the boundary effects cannot always be ignored. In general detailed analysis can become quite non-trivial quickly, e.g., https://arxiv.org/abs/2109.03198 +3. High-dimensional PLRNN that approximates a low-dimensional dynamical system due to model mismatch won't have the same topological stability structures. Theoretical analysis of higher-dimensional DS may be very misleading.",ICLR2022, +tstH_r8GK,1576800000000.0,1576800000000.0,1,BylWYC4KwH,BylWYC4KwH,Paper Decision,Reject,"This paper introduces an unsupervised concept learning and explanation algorithm, as well as a concept of ""completeness"" for evaluating representations in an unsupervised way. + +There are several valuable contributions here, and the paper improved substantially after the rebuttal. It would not be unreasonable to accept this paper. But after extensive post-review discussion, we decided that the completeness idea was the most valuable contribution, but that it was insufficiently investigated. + +To quote R3, who I agree with: "" I think the paper could be strengthened considerably with a rewrite that focuses first on a shortcoming of existing methods in finding complete solutions. I also think their explanations for why PCA is not complete are somewhat speculative and I expect that studying the completeness of activation spaces in invertible networks would lead to some relevant insights"" + +",ICLR2020, +r17RQyaSz,1517250000000.0,1517260000000.0,247,Syg-YfWCW,Syg-YfWCW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Good contribution. There was a (heated) debate over this paper but the authors stayed calm and patiently addressed all comments and supplied additional evaluations, etc. +",ICLR2018, +TyBkcfDRc4,1576800000000.0,1576800000000.0,1,HJxRMlrtPH,HJxRMlrtPH,Paper Decision,Reject,"The goal of verification of properties of generative models is very interesting and the contributions of this work seem to make some progress in this context. However, the current state of the paper (particularly, its presentation) makes it difficult to recommend its acceptance.",ICLR2020, +LLMq2MEXTtEq,1642700000000.0,1642700000000.0,1,EFSctTwY4xn,EFSctTwY4xn,Paper Decision,Reject,"This paper approaches personalized federated learning from the perspective of meta-learning and use the mutual information framework developed in a recent work to regularize local model training. All the reviewers consider the writing very poor and hard to understand, and the contributions not sufficient for acceptance.",ICLR2022, +7AvjxyuYIFo,1642700000000.0,1642700000000.0,1,Q42f0dfjECO,Q42f0dfjECO,Paper Decision,Accept (Poster),"Discussions and additional baseline experiments added during the author response period were enough to motivate multiple reviewers to change their recommendation to an accept during the author response. Multiple reviewers felt that the technical novelty of the work was limited, but the rebuttal cleared up their concerns enough to motivate them to switch their assessments to accept. + +The claim of this work is that it provides a simpler, sparser, and faster algorithms for differentially private fine tuning of LLMs, yielding SOTA privacy results vs. utility on a number of standard NLP tasks. The work proposes a meta-framework. + +In the end, all reviewers rated this paper as an accept and the AC also recommends acceptance.",ICLR2022, +10I3AhtDV,1576800000000.0,1576800000000.0,1,BylTy1HFDS,BylTy1HFDS,Paper Decision,Reject,"This paper proposes Restricted AutoEncoders (REAs) for unsupervised feature selection, and applies and evaluates it in applications in biology. The paper was reviewed by three experts. R1 recommends Weak Reject, identifying some specific technical concerns as well as questions about missing and unclear experimental details. R2 recommends Reject, with concerns about limited novelty and unconvincing experimental results. R3 recommends Weak Accept saying that the overall idea is good, but also feels the contribution is ""severely undermined"" by a recently-published paper that proposes a very similar approach. Given that that paper (at ECMLPKDD 2019) was presented just one week before the deadline for ICLR, we would not have expected the authors to cite the paper. Nevertheless, given the concerns expressed by the other reviewers and the lack of an author response to help clarify the novelty, technical concerns, and missing details, we are not able to recommend acceptance. We believe the paper does have significant merit and hope that the reviewer comments will help authors in preparing a revision for another venue.",ICLR2020, +TYsKbfFMt3L,1642700000000.0,1642700000000.0,1,5fmBRf5rrC,5fmBRf5rrC,Paper Decision,Reject,"This paper proposes to address the problem of domain adaption using Knothe-Rosenblatt transport withe the method denoted as KRDA . The main idea is to perform density estimation of the different distributions with mixture of Gaussians and then estimate a an explicit mapping between the distribution using Knothe-Rosenblatt. Experiments show that the proposed method works well on toy and real life datasets. + + The paper had low score during the reviews (3,3,3,3). While the reviewers appreciated the idea, they felt that the originality of the method is not well justified compared to a number of existing UDA approaches using OT. Also the reviewers noted several important references missing and that should also be compared during the numerical experiments. A discussion about the limits of the method in high dimension would also be very interesting. + +The authors did not provide a reply to the reviewers' comments so their opinion stayed t same during the discussion. The paper is then rejected and the AC strongly suggests that the authors take into account the numerous comments from the reviewers before re-submitting ton a new venue.",ICLR2022, +rycsof8Og,1486400000000.0,1486400000000.0,1,ByOK0rwlx,ByOK0rwlx,ICLR committee final decision,Reject,"The paper presents a method for quantizing neural network weights and activations. The method is not compared to related state-of-the-art quantization techniques, so in the current form the paper is not ready for acceptance.",ICLR2017, +Vm1b71wUSnR,1642700000000.0,1642700000000.0,1,HG7vlodGGm,HG7vlodGGm,Paper Decision,Reject,"This paper was close and also very polarizing with the reviewers. On the positive side, some reviewers found: +1. the results impressive +2. the proposed method to be novel, interesting, and produce good performance across several settings +3. the paper was well written + +On the other hand, others found: +1. the motivation suspect +2. missing experiments to characterize the sensitivity to numerous hyper-parameters +3. the baselines compared with weak and not representative +4. significant performance drop comparing the results in the original submission and the new ones added during discussion period +5. low number of seeds initially + +In the end, multiple reviewers raised serious issues regarding the motivation for the approach and the quality and ultimately credibility of the results presented. One of the high-scoring reviewers agreed the paper was a bit misleading (limitations relegated to the appendix). Unfortunately, none of the high-scoring reviewers provided counters to this points.",ICLR2022, +BklhDylAJN,1544580000000.0,1545350000000.0,1,S1gBgnR9Y7,S1gBgnR9Y7,"Interesting study applying CNNs to prediction of assays, but work is perhaps more suited for a biomedical imaging journal. ",Reject,"This work studies the performance of several end-to-end CNN architectures for the prediction of biomedical assays in microscopy images. One of the architectures, GAPnet, is a minor modification of existing global average pooling (GAP) networks, involving skip connections and concatenations. The technical novelties are low, as outlined by several reviewers and confirmed by the authors, as most of the value of the work lies in the empirical evaluation of existing methods, or minor variants thereof. + +Given the low technical novelty and reviewer consensus, recommend reject, however area chair recognizes that the discovered utility may be of value for the biomedical community. Authors are encouraged to use reviewer feedback to improve the work, and submit to a biomedical imaging venue for dissemination to the appropriate communities. +",ICLR2019,5: The area chair is absolutely certain +HJmdSkpSG,1517250000000.0,1517260000000.0,595,BJDH5M-AW,BJDH5M-AW,ICLR 2018 Conference Acceptance Decision,Reject,"This paper studies the problem of synthesizing adversarial examples that will succeed at fooling a classification system under unknown viewpoint, lighting, etc conditions. For that purpose, the authors propose a data-augmentation technique (called ""EOT"") that makes adversarial examples robust against a predetermined family of transformations. + +Reviewers were mixed in their assessment of this work, on the one hand highlighting the potential practical applications, but on the other hand warning about weak comparisons with existing literature, as well as lack of discussion about how to improve the robustness of the deep neural net against that form of attacks. +The AC thus believes this paper will greatly benefit from a further round of iteration/review, and therefore recommends rejection at this time. ",ICLR2018, +2x8z5kaePIn,1642700000000.0,1642700000000.0,1,zou-Ry64vqx,zou-Ry64vqx,Paper Decision,Reject,"The paper proposes FedMorph to address the communication and computation heterogeneity problem in federated learning. The proposed FedMorph extracts sub-models from the global model and dispatch these to the clients to perform local training. Then, the morphed sub-networks get aggregated into the global model via distillation. + +The paper reports two to three orders of magnitudes savings in communication bandwidth using the proposed method. However, as agreed by all reviewers, the paper has some critical problems as listed below that prevent it being accepted at this point. + +1. The idea of training smaller networks to workaround heterogeneity is not novel, though the authors proposed a formulation that optimizes the subnetwork together with a distillation loss when updating server model parameters. Authors should include in the Related Work and compare against other distillation-related FL work in terms of: (1) communication costs savings, (2) easing the overfit problem, (3) reducing the compute and memory footprints of performing local training. + +2. Optimizing the distillation loss relies on using a validation dataset on the server, and the quality of distillation relies heavily on whether the distribution the validation dataset is close to that of the decentralized training set. This seems to be a rather strong requirement in federated learning where the data is hard to obtain and the distribution may evolve over time. Therefore it makes me question whether the distillation is a realistic proposal in practice. + +3. The test dataset is used as the distillation dataset which is a major experimentation flaw that the pixels from the test set are leaked into the training algorithm. + +4. It may be unrealistic to assume that there exists a representative validation dataset for the global model in FL. The proposed method's error (in Theorem 2) depends on the distance between the distillation and the local training datasets, which can be arbitrarily large in practice.",ICLR2022, +plYOp0wfL,1576800000000.0,1576800000000.0,1,rklFh34Kwr,rklFh34Kwr,Paper Decision,Reject,"This paper proposes a variant of Hamiltonian Monte Carlo for Bayesian inference in deep learning. + +Although the reviewers acknowledge the ambition, scope and novelty of the paper they still have a number of reservations regarding experimental results and claims (regarding need for hyperparameter tuning). The overall score consequently falls below acceptance. + +Rejection is recommended. These reservations made by the referees should definitely be addressable before next conference deadline so looking forward to see the paper published asap.",ICLR2020, +G0fxf7YkPlH,1610040000000.0,1610470000000.0,1,Cnon5ezMHtu,Cnon5ezMHtu,Final Decision,Accept (Poster),"The authors propose training-free neural architecture search using two theoretically inspired heuristics: the condition number of the Neural Tangent Kernel (to measure ""trainability"" of the architecture), and the number of linear regions in the input space (to measure ""expressivity""). These two heuristics are negatively and positively correlated with test accuracy, respectively, allowing for fast, training-free Neural Architecture Search. It is certainly not the first training-free NAS proposal, but achieves competitive results with much more expensive NAS methods. + +A few reviewers mentioned limited novelty of the method, a claim with which I agree. The contribution of the paper, however, is something different than how it was presented. The core message seems to be that the two proposed heuristics can greatly speed up NAS, and should be a baseline method against which more expensive methods should test. + +I feel like this is a borderline paper, but may be of interest to researchers in the field.",ICLR2021, +S1lgEzGgeN,1544720000000.0,1545350000000.0,1,B1G5ViAqFm,B1G5ViAqFm,novel approach to interesting and challenging problem with promising results,Accept (Poster),"1. Describe the strengths of the paper. As pointed out by the reviewers and based on your expert opinion. + +- The paper tackles an interesting and challenging problem with a novel approach. +- The method gives improves improved performance for the surface reconstruction task. + +2. Describe the weaknesses of the paper. As pointed out by the reviewers and based on your expert opinion. Be sure to indicate which weaknesses are seen as salient for the decision (i.e., potential critical flaws), as opposed to weaknesses that the authors can likely fix in a revision. + +The paper +- lacks clarity in some areas +- doesn't sufficiently explain the trade-offs between performing all computations in the spectral domain vs the spatial domain. + +3. Discuss any major points of contention. As raised by the authors or reviewers in the discussion, and how these might have influenced the decision. If the authors provide a rebuttal to a potential reviewer concern, it’s a good idea to acknowledge this and note whether it influenced the final decision or not. This makes sure that author responses are addressed adequately. + +Reviewers had a divergent set of concerns. After the rebuttal, the remaining concerns were: +- the significance of the performance improvements. The AC believes that the quantitative and qualitative results in Table 3 and Figures 5 and 6 show significant improvements with respect to two recent methods. +- a feeling that the proposed method could have been more efficient if more computations were done in the spectral domain. This is a fair point but should be considered as suggestions for improvement and future work rather than grounds for rejection in the AC's view. + +4. If consensus was reached, say so. Otherwise, explain what the source of reviewer disagreement was and why the decision on the paper aligns with one set of reviewers or another. + +The reviewers did not reach a consensus. The final decision is aligned with the more positive reviewer, AR1, because AR1 was more confident in his/her review and because of the additional reasons stated in the previous section. +",ICLR2019,3: The area chair is somewhat confident +yUm6RCynYrJ,1642700000000.0,1642700000000.0,1,NqDLrS73nG,NqDLrS73nG,Paper Decision,Reject,"The authors show that it is possible to overcome the script barrier in MLLMs by using transliteration. In effect, they show that transliterating all text to a single script improves the performance for low-resource languages. They also provide additional analysis in the form of statistical tests and crosslingual representation analysis to substantiate their claims. + +The main concerns raised by the reviewers are: +(i) lack of novelty: the idea of using transliteration has been extensively studied in the context of NMT, Speech. It has also been studied in the context of MLLMs by some recent work (which can be considered to be contemporary). IMO, this is a concern. +(ii) focus on Indic languages: there are some concerns raised about the broader applicability of the techniques presented in the paper (personally, I disagree with this concern as Indic languages are important - for example, there are numerous papers which only report results on En-De, En-Ru translation) +(iii) limited evaluation: the technique is evaluated only using the ALBERT model and other configurations (such as ROBERTA, XLM, etc) are not considered. IMO, it would have helped if the authors presented results on these models also (at least we would know if transliteration only helps in the case of small/compact models or even in the case of large models) +(iv) missing references: there is a large body of related work on NMT, speech, etch which the authors had missed in their initial draft. This has been rectified in the updated version. + +The reviewers did participate in the discussion with the meta-reviewer (not with the authors though) and even after looking at the revised draft mentioned that the novelty is limited. + +To summarise my views, I think the initial draft of the paper did need improvements and the final draft is a significantly improved version of the initial draft. However, I still feel the novelty is missing. Even the empirical novelty claimed by the authors is ;lacking due to the use of a single model (ALBERT).",ICLR2022, +HJge6z8_x,1486400000000.0,1486400000000.0,1,HJjiFK5gx,HJjiFK5gx,ICLR committee final decision,Accept (Poster),"This paper demonstrates a novel (although somewhat obvious) extension to NPI, namely moving away from training exclusively on full traces (in order to model the nested calling of subprograms) to training, in part on low-level program traces. By exploiting a continuous stack in the vein of Das et al. 1992, Joulin and Mikolov 2015, or Grefenstette et al. 2015 (any of which could plausibly have been used here, by my reading, contrary to claims made by the authors in discussion), they can model nested calls with a stack-like structure which needs only partial supervision of a few full traces. The claim is that the resulting model is more sample efficient than the NPI of Reed and de Freitas. + + The reviews are mixed. R1 is grumpy about citations, which I see as grounds for rejection, but raises the point that the claims to data efficiency may be overblown since it is only shown that few full traces are needed to train the model, not that it can infer structure without full traces, which would be more impressive. Confusingly, they refer to continuous stacks as probabilistic, which is misleading as while there is (in all variants mentioned above) a possible probabilistic interpretation of non-{0, 1} push/pop operations, the updates to the stack state are deterministic, as are all other aspects of the network except test-time sampling of actions from the multivariate distribution induced by the softmax on output. + + R2 has not understood the paper or background material if they mistakenly believe that the stack-like structure and resulting model are probabilistic, as there is no sampling of actions on the stack, but only deterministic state updates. The superficiality of the review combined with the fairly crucial misunderstanding sadly means I cannot rely on it to make a recommendation. + + R3 is broadly sympathetic to the paper, while recognising that it still requires information about the program structure through partial FULL supervision in order to learn to manipulate the continuous stack (and thus nested function calls). It is disappointing that the authors did not reply to this point in this review. I acknowledge that they address this claim in part in a response to R1, but the reply is mostly that they will rewrite the formulation of their claims. + + Overall, the novelty of this paper lies in the integration of an existing flavour of differentiable data structure, variants of which have recently been presented at NIPS in 2015, into NPI in order to learn program structure. This would have made for an excellent paper if this augmentation had shown that, without partial supervision on the stack operations (thus training solely from low level traces) the model had been able to infer program structures. The need to provide some full traces is disappointing in this respect, although the claim about providing a more data-efficient training regime thanks to the data structure is plausible. + +Stil, overall, the PCs encourage the authors to address the remainig issues in the camera reasy version of their paper and to present this as a poster at the main conference.",ICLR2017, +FFwV3wZenW,1576800000000.0,1576800000000.0,1,SygD31HFvB,SygD31HFvB,Paper Decision,Reject,"The paper considers a lower bound complexity for the convex problems. The reviewers worry about whether the scope of this paper fit in ICLR, the initialization issues, and the novelty and some other problems.",ICLR2020, +r1lbyB8feV,1544870000000.0,1545350000000.0,1,ryewE3R5YX,ryewE3R5YX,meta-review,Reject,"The authors have delivered an extensive examination of deep RL attacks, placing them within a taxonomy, proposing new attacks, and giving empirical evidence to compare the effectiveness of the attacks. The reviewers and AC appreciate the broad effort, comprising 14 different attacks, and the well-written taxonomic discussion. However, the reviewers were concerned that the paper had significant problems with clarity of technical presentation and that the attacks were not well grounded in any sort of real world scenario. Although the authors addressed many concerns with their revision and rebuttal, the reviewers were not convinced. The AC believes that R1 ought to have increased their score given their comments and the resulting rebuttal, but the paper remains a borderline reject even with a corrected R1 score.",ICLR2019,5: The area chair is absolutely certain +3l_6DzUI4u,1610040000000.0,1610470000000.0,1,rgFNuJHHXv,rgFNuJHHXv,Final Decision,Accept (Poster),"This paper use Group convolutional neural networks in both generators and discriminator of GANs, and demonstrates advantages of this approach when training with a relatively small sample size. While the novelty is limited in the work as it simply applies G-CNN for GANs , I believe this application is interesting and the authors have applied it to many GAN image synthesis applications (conditional generation , pix2pix) on various benchmarks, which gives evidence of the potential of GCNNs in generative modeling. Accept",ICLR2021, +a09IwQo5a,1576800000000.0,1576800000000.0,1,B1xGGTEtDH,B1xGGTEtDH,Paper Decision,Reject,"This article studies universal approximation with deep narrow networks, targeting the minimum width. The central contribution is described as providing results for general activation functions. The technique is described as straightforward, but robust enough to handle a variety of activation functions. The reviewers found the method elegant. The most positive position was that the article develops non trivial techniques that extend existing universal approximation results for deep narrow networks to essentially all activation functions. However, the reviewers also expressed reservations mentioning that the results could be on the incremental side, with derivations similar to previous works, and possibly of limited interest. In all, the article makes a reasonable theoretical contribution to the analysis of deep narrow neural networks. Although this is a reasonably good article, it is not good enough, given the very high acceptance bar for this year's ICLR. +",ICLR2020, +ek0swoIkqV,1642700000000.0,1642700000000.0,1,fExcSKdDo_,fExcSKdDo_,Paper Decision,Accept (Poster),"The paper proposes a variational dequantization method for categorical data, based on flows with learned truncated support. The problem has been studied before, but the paper makes it clear how the proposed method differs from existing ones. The method is empirically evaluated on a large variety of diverse tasks. + +The reviews were initially borderline. In general, the reviewers did not identify major quality of technical issues with the paper, and appreciated the clarity of writing. On the other hand, the reviewers were not fully convinced by the motivation or the empirical performance of the proposed method. After discussion with the authors, some concerns were allayed (especially regarding motivation) and all three reviewers decided to recommend weak acceptance. + +Seeing as there are no major technical or quality issues with the paper, and the paper is clearly written and well executed, I'm leaning towards recommending acceptance, although some doubts remain about the significance of the contribution.",ICLR2022, +TeJq-LO231,1642700000000.0,1642700000000.0,1,aYAA-XHKyk,aYAA-XHKyk,Paper Decision,Accept (Poster),"This paper received a majority vote for acceptance from reviewers and me. I have read all the materials of this paper including manuscript, appendix, comments and response. Based on collected information from all reviewers and my personal judgement, I can make the recommendation on this paper, *acceptance*. Here are the comments that I summarized, which include my opinion and evidence. + +**Research Motivation and Problem** + +This paper is well motivated by the agnostic of CPE assumption that the support of the positive data distribution cannot be contained in the support of the negative data distribution. To tackle this problem, the authors built an auxiliary probability distribution such that the support of the positive data distribution is never contained in the support of the negative data distribution. + +**Technical Contribution** + +The technical part is simple and clear. The regrouping idea is also easy to implement. The theoretical justification is a good complement of the proposed ReCPE algorithm. + +**ReCPE does not affect its base if the assumption already holds** + +The authors employed the synthetic datasets to verify this point. This is a plus. + +**Experimental Results** + +The authors demonstrated their ReCPE algorithm can be used as a booster on seven base PU classifiers. + +**Presentation** + +The presentation has been much improved with the guidance of one reviewer. But I found two extra minor ones. (1) ""PU Learning"" -> ""PU learning"" at the beginning of the second paragraph on Page 1. (2) Two typos in ""9 real word datasets."" on Page 7 -> ""9 real-world dataset."", where the footnote should be placed after the period. + +**Layout** + +(1) Too many lines in Table 1. It is suggested to remove the horizontal lines among the same dataset, (2) Appendix should go after the main manuscript, rather than a separate file. + +No objection from reviewers was raised to again this recommendation.",ICLR2022, +BJfQrJ6Sf,1517250000000.0,1517260000000.0,529,HJjePwx0-,HJjePwx0-,ICLR 2018 Conference Acceptance Decision,Reject,"There are two parts to this paper (1) an efficient procedure for solving trust-region subproblems in second-order optimization of neural nets, and (2) evidence that the proposed trust region method leads to better generalization performance than SGD in the large-batch setting. In both cases, there are some promising leads here. But it feels like two separate papers here, and I'm not sure either individual contribution is well enough supported to merit publication in ICLR. + +For (1), the contribution is novel and potentially useful, to the best of my knowledge. But as there's been a lot of work on trust region solvers and second-order optimization of neural nets more generally, claims about computational efficiency would require comparisons against existing methods. The focus on efficiency also doesn't seem to fit with the experiments section, where the proposed method optimizes less efficiently than SGD and is instead meant to provide a regularization benefit. + +For (2), it's an interesting empirical finding that the method improves generalization, but the explanation for this is very hand-wavy. If second-order optimization in general turned out to help with sharp minima, this would be an interesting finding indeed, but it doesn't seem to be supported by other work in the area. The training curves in Table 1 are interesting, but don't really distinguish the claims of Section 4.5 from other possible hypotheses. +",ICLR2018, +fpHaXPDNA,1576800000000.0,1576800000000.0,1,S1e-0kBYPB,S1e-0kBYPB,Paper Decision,Reject,"The paper proposes a framework for generating evaluation tests for feature-based explainers. The framework provides guarantees on the behaviors of each trained model in that non-selected tokens are irrelevant for each prediction, and for each instance in the pruned dataset, one subset of clearly relevant tokens is selected. + +After reading the paper, I think there are a few issues with the current version of the paper: + +(1) the writing can be significantly improved: the motivation is unclear, which makes it difficult for readers to fully appreciate the work. It seems that each part of the paper is written by different persons, so the transition between different parts seems abrupt and the consistency of the texts is poor. For example, the framework is targeted at NLP applications, but in the introduction the texts are more focused on general purpose explainers. The transition from the RCNN approach to the proposed framework is not well thought-out, which makes the readers confused about what exactly is the proposed framework and what is the novelty. + +(2) the claimed properties of the proposed framework are rather straightforward derivations. The technical novelty is not as high as claimed in the paper. + +(3) The experiment results are not fully convincing. + +All the reviewers have read the authors' feedback and responded. It is agreed that the current version of the paper is not ready for publication. +",ICLR2020, +Bkj5oz8_g,1486400000000.0,1486400000000.0,1,BJO-BuT1g,BJO-BuT1g,ICLR committee final decision,Accept (Poster),The reviewers (two of whom stated maximum confidence) are in consensus that this is a high-quality paper. It also attracted some public feedback which was also positive. The authors have already incorporated much of the feedback into their revised paper. This seems to be a clear accept in my opinion.,ICLR2017, +KZUTMHOLy3,1576800000000.0,1576800000000.0,1,SJxyCRVKvB,SJxyCRVKvB,Paper Decision,Reject,"This paper proposes a solution to learn Granger temporal-causal network for multivariate time series by adding attention named prototypical Granger causal attention in LSTM. + +The work aims to address an important problem. The proposed solution seems effective empirically. However, two major issues have not been fully addressed in the current version: (1) the connection between Granger causality and the attention mechanism is not fully justified; (2) the complex design overkills the whole concept of Granger causality (since its popularity is due to the simplicity). + +The paper would be a strong publication in the future if the two issues can be addressed in a satisfactory way. ",ICLR2020, +p_e-_b76UO,1642700000000.0,1642700000000.0,1,keQjAwuC7j-,keQjAwuC7j-,Paper Decision,Reject,"This paper develops a technique to provide both privacy and robustness +at the same time using differential privacy. + +Unfortunately the paper in its current form does not have meaningfully +interpretable security or privacy claims. The reviewers point at a number +of these flaws that the authors do not address to the satisfaction of +the reviewers, but there are a few others as well. +- What is actually private, at the end of this whole procedure? If the + actual ""pretrained classifier"" is not made private, then what's the + purpose of the entire privacy setup in this paper? Why does the denoiser + need to be private if the classifier isn't? +- The proof of Lemma 1 appears incorrect. The proof in Appendix E says that + Equation 10 is true, but this sweeps all of the remaining Taylor series + terms under the rug and doesn't deal with them. How are they handled? +- In Figure 4(a), what does it even mean to have a ""FGSM privacy budget + epsilon""? Or a ""MIM privacy budget epsilon""? A privacy budget is almost + always something defined with respect to the *training data privacy*, + how does this relate to the attack in this paper? +- How does this paper compare prior *canonical* defenses, both on the + robustness and privacy side? In particular, comparisons to adversarial + training on the robustness side, and some recent DPSGD result on the + privacy side?",ICLR2022, +asPKLCz-gMR,1610040000000.0,1610470000000.0,1,KpfasTaLUpq,KpfasTaLUpq,Final Decision,Accept (Poster),"This work demonstrates that autoregressive (AR) models for machine translation can can be competitive with their non-autoregressive (NAR) counterparts in terms of practicality. This is a timely observation, given the flurry of recent work on NAR models, whose primary benefit is often cited to be fast inference. + +It was argued that the results are not surprising -- if this is the case, I still think this work merits acceptance because its thesis runs counter to the direction the field as a whole seems to be moving in, and the results are convincing. That said, I agree with the authors that the observation that some encoder and decoder layers are interchangeable, is not self-evident (i.e. it _is_ surprising). This is of course subjective to some degree, so I am making a judgement call here. The work also has value in that it draws attention to some practices regarding evaluation in NAR machine translation literature that could be improved and made more fair (specifically regarding comparison with AR models). + +There were some concerns about whether these models should be evaluated in the small-batch or large-batch setting. The authors have updated their manuscript in response, and it now explicitly discusses both settings. The authors have also run more experiments and added several additional results requested by reviewers to the manuscript. + +All things considered, I am inclined to follow the majority and recommend acceptance.",ICLR2021, +kyhEAum8pVs,1610040000000.0,1610470000000.0,1,heqv8eIweMY,heqv8eIweMY,Final Decision,Reject,"This paper is right at the borderline: the reviewers agree it is well written, proposing a simple but interesting idea. However, there was a feeling among the reviewers (especially reviewer 1) that the paper could be strengthened considerably with a better discussion/some theory on the sufficiency of the calibration vectors, as well as experiments on larger datasets. Doing one of these would have substantially strengthened the paper. Due to the remaining shortcomings, the recommendation is not to accept the paper in its present state.",ICLR2021, +uqrYzOqA5GVw,1642700000000.0,1642700000000.0,1,qI4542Y2s1D,qI4542Y2s1D,Paper Decision,Accept (Poster),"This paper develops a modular system named FILM, for egocentric instruction execution task in the ALFRED environment, which uses structured representations that build a semantic map of the scene, perform exploration with a semantic search policy, to achieve the natural language goal. They achieve strong performance while avoiding both expert trajectories and low-level instructions. The reviewers all reasonably liked the paper (all reviewers gave 'marginally above the acceptance threshold' score) and appreciated the planner ideas + strong results; but many of them also had concerns about the use of templated mappings from 7 high-level goal types to low-level instruction sequences, and whether this will make the system specific to ALFRED. The authors did provide some new results in the response period to show that results drop without the templates but not by a large margin. Some reviewers also had concerns about the novelty of the work and said that the semantic map building module and sub-goal deterministic policy are motivated by previous work, but their incorporation into the FILM system is novel. Lastly, there was some concerns/debates on whether the system assumes/uses too much domain knowledge / task type taxonomy which might reduce the ability to generalize to other domains / data types, versus on the other hand the results may also serve to highlight the need for improvements in high-level planning/control in these types of visual language navigation tasks.",ICLR2022, +o3BiJcC10ja,1610040000000.0,1610470000000.0,1,a3wKPZpGtCF,a3wKPZpGtCF,Final Decision,Accept (Poster),"This paper looks at chaos in learning in games, extending a line of work in two players zero-sum games (that I found quite restrictive in the past). It somehow reduces the class of more general games to zero-sum and cooperative games (this decomposition is already known) so that the techniques can be transposed here. + +The paper is interesting, yet sometimes difficult to follow, and I am not certain that it gives many new insights. + +Nonetheless, we believe its quality justify acceptance.",ICLR2021, +37ql7akKchu,1642700000000.0,1642700000000.0,1,fYor2QIp_3,fYor2QIp_3,Paper Decision,Reject,"The paper proposes a method to predict protein functions from Gene Ontology (GO) and protein sequences. The protein sequences are embedded with a pretrained protein language model (SeqVec) and the GO network is modelled with a graph convolutional neural network. + +Reviewers found the paper well-written and structured. At the same time, they found the novelty of the paper limited. Two reviewers pointed out that the paper is very similar to DeepGOA, which the authors cite but don't compare against. Overall, there is consensus among the reviewers that the paper is not suitable for ICLR. + +The authors didn't submit a rebuttal. + +We encourage the authors to take into account reviewer comments to improve the paper. Since it is more on the application side, perhaps a computational biology conference / workshop would be more appropriate for this paper.",ICLR2022, +wnGlpvDOxZ,1610040000000.0,1610470000000.0,1,opHLcXxYTC_,opHLcXxYTC_,Final Decision,Accept (Spotlight),"This paper advances the idea that recent “influence estimation” methods for supervised learning cannot be trivially applied to GANs. Based on Hara et al.’s method, the authors propose a novel influence estimation for GANs, and an evaluation scheme based on popular GAN evaluation methods, exploiting the fact that they are differentiable with respect to their input data. The paper demonstrates empirically that the proposed influence estimation method correlates to true influence. It also shows that removing “harmful” instances using the average log-likelihood, Inception Score, and Frechet Inception Distance versions of the proposed metric improves the quality of generated examples. + +All reviewers were positive about the paper. R2 pointed out that it was well-written and appreciated the detailed analysis. They thought it thoroughly explained the similarities between it and the most closely-related recent work (Hara et al. and Koh & Liang). Concerns expressed by the reviewer were: the amount of samples needed to be removed to obtain a statistically significant result, lack of qualitative results, and an outdated baseline for anomaly detection. The reviewer also stated that they had some concerns with practical applicability and would like to see more GAN metrics, like Precision & Recall. The authors added qualitative results to the paper which partially satisfied the reviewer. + +R1 also thought that the paper was well-written and contributed to the interpretability of GAN training. Like R2, they pointed out the lack of visual examples (addressed in rebuttal), and asked for more insight into what kind of characteristics make a data point influential. They also requested that the authors add a metric that trades fidelity and diversity like P&R. The reviewer originally felt that the paper was below the bar, because it was “like a story without a satisfying conclusion”. However, the authors responded with additional analysis which satisfied the reviewer, and they upgraded their score by two points. + +R3 also found the paper well-written and interesting, like the other reviewers. The reviewer raised some similar concerns as the other reviewers (e.g. qualitative results), as well as the scalability of the method to relevant architectures, which I thought was surprising that the other reviewers didn’t mention. The authors responded that they believe their method succeeded in improving diversity of the generated samples but not their visual quality. This is an important point. +The additions in Appendix D have addressed the main concerns of R1 and R2, as well as R3’s concern about lack of visual analysis. R1 seems quite convinced now, and R2, though not changing their score, was already in favour of acceptance. It is an interesting finding that “harmful” instances seem to come from regions of distributional mismatch. + +I would like to see a fidelity-diversity tradeoff like P&R added to a paper, and a discussion of this work in relation to DeVries et. al “Instance Selection” that appears to be similarly motivated though executed differently. I think one major thing holding back this paper is the scale of the experimental analysis (Gaussians & MNIST); I hope the authors can scale the method in future work.",ICLR2021, +SkeVLyTBz,1517250000000.0,1517260000000.0,755,HytSvlWRZ,HytSvlWRZ,ICLR 2018 Conference Acceptance Decision,Reject,"Authors present a method for modeling neurodegenerative diseases using a multitask learning framework that considers ""censored regression"" problems (to model where the outputs have discrete values and ranges). Given the pros/cons, the committee feels this paper is not ready for acceptance in its current state. + + +Pro: +- This approach to modeling discrete regression problems is interesting and may hold potential, but the evaluation is not in a state where strong meaningful conclusions can be made. + +Con: +- Reviewers raise multiple concerns regarding evaluation and comparison standards for tasks. While authors have added some model comparisons in response, in other areas comparisons don't appear complete. For example, when using MRI data, networks compared all use features derived from images, rather than systems that may learn from images themselves. Authors claim dataset is too small to learn directly from pixels in this data (in comments), but transfer learning and data augmentation have been successfully applied to learn from datasets of this size. In addition, new multitask techniques in the imaging domain have also been presented that dynamically learn the network structure, rather than relying on a hand-crafted neural network design. How this approach would compare is not addressed. + + +",ICLR2018, +SJIchMIue,1486400000000.0,1486400000000.0,1,ByC7ww9le,ByC7ww9le,ICLR committee final decision,Reject,"Three knowledgable reviewers recommend rejection. While they agree that the paper has interesting aspects, they suggest a more convincing evaluation. The authors did not address some of the reviewer's concerns. The AC strongly encourages the authors to improve their paper and resubmit it to a future conference.",ICLR2017, +xFjm1GjXSi,1576800000000.0,1576800000000.0,1,rkgCJ64tDB,rkgCJ64tDB,Paper Decision,Reject,"This paper presents a CNN architecture equivariant to scaling and translation which is realized by the proposed joint convolution across the space and scaling groups. All reviewers find the theorical side of the paper is sound and interesting. Through the discussion based on authors’ rebuttal, one reviewer decided to update the score to Weak Accept, putting this paper on the borderline. However, some concerns still remain. Some reviewers are still not convinced regarding the novelty of the paper, particularly in terms of the difference from (Chen+,2019). Also, they agree that experiments are still very weak and not convincing enough. Overall, as there was no opinion to champion this paper, I’d like to recommend rejection this time. +I encourage authors to polish the experimentations taking in the reviewers’ suggestions. +",ICLR2020, +J6wCr1MBL9Q,1642700000000.0,1642700000000.0,1,pIjvdJ_QUYv,pIjvdJ_QUYv,Paper Decision,Reject,"A heterogeneous federated learning framework is proposed which does +not require auiliary public data sets, and does not reveal the private +data to the server or answering parties if they operate as +honest-but-curious entities. It builds a new protocol for private +inference, which can run on GPUs, and proposes a dataset expansion +method to not need an auxiliary data set. The paper presents extensive +empirical experiments on the method. + +The paper was extensively discussed with the authors. The concerns +included both technical issues and more general issues on missing DP +guarantees and realisticness of the threat model. Many of the issues +were resolved by the clarifications provided by the authors, and as a +result two reviewers increased their scores. However, all reviewers still +place the paper to the borderline. + +While the paper contains solid work, and improves efficiency compared to +previous models, this is a borderline paper where the final judgement +needs to be based on importance of the presented new contributions in +advancing the field. The paper may not yet quite reach the bar, but +I believe the reviewer comments have enabled the authors to improve the +paper for further work.",ICLR2022, +HJzTVkTBz,1517250000000.0,1517260000000.0,449,rJ7RBNe0-,rJ7RBNe0-,ICLR 2018 Conference Acceptance Decision,Reject,"The pros and cons of this paper cited by the reviewers (with a small amount of my personal opinion) can be summarized below: + +Pros: +* The method itself seems to be tackling an interesting problem, which is feature matching between encoders within a generative model + +Cons: +* The paper is sloppily written and symbols are not defined clearly +* The paper overclaims its contributions in the introduction, which are not supported by experimental results +* It mis-represents the task of decipherment and fails to cite relevant work +* The experimental setting is not well thought out in many places (see Reviewer 1's comments in particular) + +As a result, I do not think this is up to the standards of ICLR at this time, although it may have potential in the future.",ICLR2018, +EnItGAmxLd,1576800000000.0,1576800000000.0,1,r1l7E1HFPH,r1l7E1HFPH,Paper Decision,Reject,"This paper extends recent multi-step dynamic programming algorithms to reinforcement learning with function approximation. In particular, the paper extends h-step optimal Bellman operators (and associated k-PI and k-VI algorithms) to deep reinforcement learning. The paper describes new extensions to DQN and TRPO algorithms. This approach is claimed to reduce the instability of model-free algorithms, and the approach is tested on Atari and Mujoco domains. + +The reviewers noticed several limitations of the work. The reviewers found little theoretical contribution in this work and they were unsatisfied with the empirical contributions. The reviewers were unconvinced of the strength and clarity of the empirical results with the Atari and Mujoco domains along with the deep learning network architectures. The reviewers suggested that simpler domains with a simpler function approximation scheme could enable more through experiments and more conclusive results. The claim in the abstract of addressing the instabilities was also not adequately studied in the paper. + +This paper is not ready for publication. The primary contribution of this work is the empirical evaluation, and the evaluation is not sufficiently clear for the reviewers.",ICLR2020, +BkGT3fIdg,1486400000000.0,1486400000000.0,1,HJStZKqel,HJStZKqel,ICLR committee final decision,Invite to Workshop Track,"The reviewers were quite divided on this submission, which proposes a method for lifelong learning in the context of program generation. While a novel idea, the experiments and baselines are simply not clear enough or convincing enough, and the method itself is not clearly conveyed. The paper makes strong claims that are not substantiated with more compelling or challenging experiments. Since this is an interesting, novel idea in a relevant area, however, I think it should be invited as a workshop paper.",ICLR2017, +SJOs3fL_x,1486400000000.0,1486400000000.0,1,HksioDcxl,HksioDcxl,ICLR committee final decision,Reject,"The paper has some nice ideas, but requires a bit to push it over the acceptance threshold. I agree with the reviewers who ask for comparisons with other rating-review methods, and that other evaluation metrics more appropriate to the recommendation tasks should be reported. More analysis of the model, and the factors that contribute to its performance, would greatly improve the paper.",ICLR2017, +H1_ASJ6Bz,1517250000000.0,1517260000000.0,684,HJaDJZ-0W,HJaDJZ-0W,ICLR 2018 Conference Acceptance Decision,Reject,"Pros +-- Interesting approach to induce sparsity, trains faster than alternative approaches +Cons +-- Fairly complex set of heuristics for pruning weights +-- Han et al. works well, although the authors claim it takes more time to train, which may not not hold for all training sets and doesn’t seem like a strong enough reason to choose an alternative appraoch + +Given these comments, the AC recommends that the paper be rejected. +",ICLR2018, +z0G71rd_c_M,1642700000000.0,1642700000000.0,1,aBsCjcPu_tE,aBsCjcPu_tE,Paper Decision,Accept (Poster),"Thank you for your submission to ICLR. + +This paper presents a technique for image synthesis based on stochastic differential equations and a diffusion model. This looks to be a very nice idea with good results. After discussion, the reviewers converged and all agreed that the paper is ready for publication---the most negative reviewer raised their score after the author rebuttal, from a weak reject to weak accept. The rebuttal clearly and concisely addressed several concerns of the reviewers. + +I'm happy to recommend accepting the paper.",ICLR2022, +4kLBcHijCo,1576800000000.0,1576800000000.0,1,HylpqA4FwS,HylpqA4FwS,Paper Decision,Accept (Poster),"In this paper, the authors propose the incremental RNN, a novel recurrent neural network architecture that resolves the exploding/vanishing gradient problem. While the reviewers initially had various concerns, the paper has been substantially improved during the discussion period and all questions by the reviewers have been resolved. The main idea of the paper is elegant, the theoretical results interesting, and the empirical evaluation extensive. The reviewers and the AC recommend acceptance of this paper to ICLR-2020.",ICLR2020, +FPPF3qenpVU,1642700000000.0,1642700000000.0,1,d20jtFYzyxe,d20jtFYzyxe,Paper Decision,Reject,"Casting domain generalization as a rate-distortion problem and developing an information-theoretic approach to solving it looks like an interesting idea. While the proposed method is technically sound, the assumption made in the proposed method is too strong to hold in real-world applications. Though in the rebuttal the authors provided additional experiments on two benchmark datasets, reviewers' concerns about the strong assumption made in the proposed algorithm still remain. To address this issue, I think besides conducting more extensive experiments, the authors also need to analyze when the assumption does not hold in practice, why the proposed algorithm could still perform well compared with other domain generalization methods. + +In summary, this is a borderline paper below the acceptance bar of ICLR.",ICLR2022, +hBs16wkOeP,1576800000000.0,1576800000000.0,1,HyloPnEKPr,HyloPnEKPr,Paper Decision,Reject,"Main content: + +Blind review #2 summarizes it well: + +This paper extends the neural coreference resolution model in Lee et al. (2018) by 1) introducing an additional mention-level feature (grammatical numbers), and 2) letting the mention/pair scoring functions attend over multiple mention-level features. The proposed model achieves marginal improvement (0.2 avg. F1 points) over Lee et al., 2018, on the CoNLL 2012 English test set. + +-- + +Discussion: + +All reviewers rejected. + +-- + +Recommendation and justification: + +The paper must be rejected due to its violation of blind submission (the authors reveal themselves in the Acknowledgments). + +For information, blind review #2 also summarized well the following justifications for rejection: + +I recommend rejection for this paper due to the following reasons: +- The technical contribution is very incremental (introducing one more features, and adding an attention layer over the feature vectors). +- The experiment results aren't strong enough. And the experiments are done on only one dataset. +- I am not convinced that adding the grammatical numbers features and the attention mechanism makes the model more context-aware.",ICLR2020, +dzj1Jf4xmKT,1642700000000.0,1642700000000.0,1,AypVMhFfuc5,AypVMhFfuc5,Paper Decision,Reject,"The authors study a practical problem of selecting/combining existing multi-label classification APIs under a budget constraint for a specific problem instance on hand. The task can be viewed as an (online) integer programming problem when given an accuracy estimator for the combination performance. The authors relax the integer constraints and propose a framework to solve the task in the dual form. They also run experiments to validate that the proposed framework is advantageous (cost or accuracy-wise) over the best single API. + +Most of the reviewers are positive about the practical value and the potential impacts of the work in applications/products/services. There are several disputes between the authors and some reviewers that cannot be fully resolved during the rebuttal. In the end, no reviewers express willingness to strongly champion for the acceptance of the paper, making the paper a borderline case. The decision is based on a careful examination of the current manuscript and every side's opinions. + +* Novelty: Some reviewers question about the novelty of the work. There are two aspects about novelty: one is on whether the problem itself is novel (are the authors trying to propose a new multi-label method?) In this aspect, the authors' response, which states that they are not aiming at proposing a new method, but at solving an automation task for MLaaS users, appears believable. The other aspect is whether the solution technique, namely the relaxed integer programming and other techniques, are sufficiently novel. Some reviewers find the novelty aspect satisfactory, while others believe that the proposed optimization technique have been widely used in machine learning community. The authors did not clarify the similarity/difference of the proposed technique to existing ones during the rebuttal. In this sense, the technical novelty is not well justified. + +* Speed: Some reviewers are concerned about different aspects of the running time and other costs. The authors emphasized the rapid speed in inference phase, particularly in Figure 3. Less is discussed about the time needed for the training phase (although the authors claim to be much smaller than the inference time)---somehow even the most positive reviewers have some questions about this aspect. The authors could add more clarification about the different ""time"" costs to the discussion. One dispute between some reviewers and the authors is about the *complexity* analysis of time, which is indeed missing in the current manuscript and can be a nice-to-have for future todos. + +* Theoretical Guarantee: One major dispute between some reviewers and the authors is on the theoretical guarantee provided. The reviewers suggest a regret-style bound, which compares the solution to the worst-case sequence; the authors provide an optimization-style bound, which compares the solution to the absolute optimal solution. Different bounds have their different roles for supporting the framework. Given that the authors have provided some reasonable bounds, the lack of regret bound is not taken against the authors. + +* Specialty: One concern raised by some reviewers is that the technique does not seem particularly tailored for multi-label classification (except some minor parts). In this sense, it is nice to have for the authors to discuss more on the wider applicability of the technique, and/or include some more specialty of the multi-label classification problem into the technique design. + +After taking all the factors above into account, and calibrating the received scores to the distribution across the papers, it seems that the paper could use some more revision before being mature enough as an impactful work.",ICLR2022, +kdJ0scpdIoG,1610040000000.0,1610470000000.0,1,Y0MgRifqikY,Y0MgRifqikY,Final Decision,Reject,"The paper proposes a method to generate attention masks to interpret the performance of RL agents. Results are presented on a few ATARI games. Reviewers unanimously vote for rejecting the papers. R1, R3 give a score of 5, whereas R4, R5 give a score of 4. Their concerns are best explained in their own words: + +R1 says, ""The use of attention maps to analyze and explain deep neural networks is not new in itself, and learning attention maps to improve vision tasks is not new either."" + +R3 says, ""the analysis of the learned attention masks seems selective. Some automatic metrics or systematic studies of different game categories (shooting, maze-like, and ball-and-paddle) may shed light on the learned attention's general property."" + +R5 says, ""I am still not convinced by the quality of the provided visual explanations nor am I convinced that the attention is well correlated with the current frame (the additional experiments provided do help somewhat in this regard, but are not extensive and reasonably inconclusive"" + +In their rebuttal, to address R1's concern authors suggested that the use of attention on both value and policy networks is novel. This is not sufficient, because it does not show why such attention maps are more useful than ones proposed by prior work. As suggested by reviewers, a systematic study or a human study clearly showing that the proposed method adds more interpretability is critical. However, this is missing. In response to R3, the authors provided experiments on more games. But this is not the point -- because it's not about the number of environments in which experiments are provided, but rather the nature of the analysis that is performed. Finally, R5 comments that it's unclear whether attention actually provided interpretability or not. + +Due to the lack of convincing analysis that demonstrates the utility of the proposed method in advancing the understanding of decisions made by RL agents, I recommend that the paper be rejected. + +",ICLR2021, +jtIkO1-IN,1576800000000.0,1576800000000.0,1,SygLu0VtPH,SygLu0VtPH,Paper Decision,Reject,"This paper is a very borderline case. Mixed reviews. R2 score originally 4, moved to 5 (rounded up to WA 6), but still borderline. R1 was 6 (WA) and R3 was 3 (WR). R2 expert on this topic, R1 and R3 less so. AC has carefully read the reviews/rebuttal/comments and looked closely at the paper. AC feels that R2's review is spot on and that the contribution does not quite reach ICLR acceptance level, despite it being interesting work. So the AC feels the paper cannot be accepted at this time. But the work is definitely interesting -- the authors should improve their paper using R2's comments and resubmit. ",ICLR2020, +rkl23V07e4,1544970000000.0,1545350000000.0,1,r1exVhActQ,r1exVhActQ,"A study on sparse properties of L1-regularization in deep neural networks, yet experimental supports seem week.",Reject,"This paper studies the properties of L1 regularization for deep neural network. It contains some interesting results, e.g. the stationary point of an l1 regularized layer has bounded number of non-zero elements. On the other hand, the majority of reviewers has concerns on that experimental supports are weak and suggests rejection. Therefore, a final rejection is proposed.",ICLR2019,3: The area chair is somewhat confident +r1xqT4_AyN,1544620000000.0,1545350000000.0,1,Syx5V2CcFm,Syx5V2CcFm,ICLR 2019 decision,Accept (Poster),"This paper develops a stagewise optimization framework for solving non smooth and non convex problems. The idea is to use standard convex solvers to iteratively optimize a regularized objective with penalty centered at previous iterates - which is standard in many proximal methods. The paper combines this with the analysis for non-smooth functions giving a more general convergence results. Reviewers agree on the usefulness and novelty of the contribution. Initially there were concerns about lack of comparison with current results, but updated version have addressed this issue. The main weakness is that the results only holds for \mu weekly convex functions and the algorithm depends on the knowledge of \mu. Despite this limitations, reviewers believe that the paper has enough new material and I suggest for publication. I suggest authors to address these issues in the final version. ",ICLR2019,4: The area chair is confident but not absolutely certain +yAZrLt9M7d,1576800000000.0,1576800000000.0,1,BklfR3EYDH,BklfR3EYDH,Paper Decision,Reject,"The paper is interesting in video prediction, introducing a hierarchical approach: keyframes are first predicted, then intermediate frames are generated. While it is acknowledge the authors do a step in the right direction, several issues remain: (i) the presentation of the paper could be improved (ii) experiments are not convincing enough (baselines, images not realistic enough, marginal improvements) to validate the viability of the proposed approach over existing ones. +",ICLR2020, +R7GqKNBYNi,1576800000000.0,1576800000000.0,1,BkxFi2VYvS,BkxFi2VYvS,Paper Decision,Reject,"The paper presents a semi-supervised learning approach to handle semantic classification (pixel-level classification). The approach extends Hung et al. 18, using a confidence map generated by an auxiliary network, aimed to improve the identification of small objects. + +The reviews state that the paper novelty is limited compared to the state of the art; the reviewers made several suggestions to improve the processing pipeline (including all images, including the confidence weights). +The reviews also state that the paper needs be carefully polished. + +The area chair hopes that the suggestions about the contents and writing of the paper will help to prepare an improved version of the paper. +",ICLR2020, +S1l8GOC-k4,1543790000000.0,1545350000000.0,1,SkVhlh09tX,SkVhlh09tX,Accept,Accept (Oral),"Very solid work, recognized by all reviewers as worthy of acceptance. Additional readers also commented and there is interest in the open source implementation that the authors promise to provide.",ICLR2019,4: The area chair is confident but not absolutely certain +dzvlCiZ90Xt,1610040000000.0,1610470000000.0,1,V6BjBgku7Ro,V6BjBgku7Ro,Final Decision,Accept (Poster),"While many work in the literature (PlaNet (2018), Dreamer (2020), SimPLe (2019), etc.) learn world models to perform well on a particular task at hand, the motivation behind this work is that dynamics models benefit if they are task-agnostic, hence would be able to perform a wider range of tasks, as opposed to just doing one task really well. In order to do this, they propose to learn a latent representation that models inverse dynamics of the system / environment rather than capturing information about the task-specific rewards, and incorporate a planning for solving specific tasks in which they can measure performance. + +To show broad applicability of their method, the authors tested their approach on Atari and DM Control Suite (from pixels), and also simple grid worlds to illustrate the concepts, and demonstrated strong performance over SOTA model-free algorithms (even the ones that do not have open-source implementations). Reviewers and myself agree that the paper is well written, easy to follow, and the approach is well-motivated. + +After the review period, the authors have done work to improve the draft, particularly including ablation studies with and without planning, addition comparisons, and improved visualizations, after taking in the comments and feedback from the reviewers after the initial reviews, which satisfied some of the reviewers. One reviewer asked for a real robotic task, but I feel that while it will help the paper, many existing works focus purely on DM control from pixels, and this work has performed experiments on both DM Control and Atari, two reasonably different domains, and IMO makes up for the lack of real-world robotics experiment. That being said, a discussion on how the proposed method would work in a real-robotic task, as suggested by R4 would be good to have. + +I believe the work in its current state is ready for acceptance for ICLR 2021, and should be a fine contribution to the visual model-based RL works. I'm excited to see this work presented to the community, and I'm going to recommend acceptance (Poster).",ICLR2021, +Wy-im0uf0G7D,1642700000000.0,1642700000000.0,1,z2zmSDKONK,z2zmSDKONK,Paper Decision,Reject,"Description of paper content: + +A mixed theoretical and experimental paper that investigates the robustness of distributional RL to perturbations of state observations as compared to expectation-based value function learning. They provide sufficient conditions for TD’s convergence and prove the Lipschitz continuity of the loss of a histogram-based KL version of distributional RL with respect to the state features, whereas this is not true for expected RL. This continuity indicates a certain robustness of the loss with respect to perturbations of the state. The theory’s tie to experiment is weak in the sense that it is not predictive of the actual performance of any algorithm. The theoretical methods are based on a previously published paper SA-MDP. + +Summary of paper discussion: +The reviewers raised concerns about the statistical significance of the experimental results, the clarity and organization of the writing, the novelty of the theoretical setting, and its usefulness for describing a real problem setting. The majority of reviewers rejected the paper and did not lift the scores after the rebuttal. + +(I personally wonder if the community would not benefit from conducting some of these kinds of theoretical analyses and experiments on LQR systems rather than Atari (etc.) environments.)",ICLR2022, +Cg4MtuQDAG8,1642700000000.0,1642700000000.0,1,viWF5cyz6i,viWF5cyz6i,Paper Decision,Reject,"The reviewers recommended rejection. There was no reply from the authors. The main weaknesses are: +- No experiment on real-life dataset (only simulated) +- Unsubstantiated claims about the literature +- No discussion on the time complexity +- Incremental contribution",ICLR2022, +VrdXmrpI5n,1576800000000.0,1576800000000.0,1,ryx6WgStPB,ryx6WgStPB,Paper Decision,Accept (Poster),"This paper considers ensemble of deep learning models in order to quantify their epistemic uncertainty and use this for exploration in RL. The authors first show that limiting the ensemble to a small number of models, which is typically done for computational reasons, can severely limit the approximation of the posterior, which can translate into poor learning behaviours (e.g. over-exploitation). Instead, they propose a general approach based on hypermodels which can achieve the benefits of a large ensemble of models without the computational issues. They perform experiments in the bandit setting supporting their claim. They also provide a theoretical contribution, proving that an arbitrary distribution over functions can be represented by a linear hypermodel. + +The decision boundary for this paper is unclear given the confidence of reviewers and their scores. However, the tackled problem is important, and the proposed approach is sound and backed up by experiments. Most of reviewers concerns seemed to be addressed by the rebuttal, with the exception of few missing references which the authors should really consider adding. I would therefore recommend acceptance.",ICLR2020, +nijibd3tgr2,1642700000000.0,1642700000000.0,1,fR-EnKWL_Zb,fR-EnKWL_Zb,Paper Decision,Accept (Poster),"The paper proposes an efficient attention variant inspired by quadtrees, for use in vision transformers. When applied to several vision tasks, the approach leads to better results and/or less compute. + +The reviews are all positive about the paper, after taking into account the authors' feedback (one reviewer forgot to update their official rating, apparently). They point out that the idea is reasonable and the empirical evaluation is thorough and convincing, with good gains on several tasks and datasets. + +Overall, I recommend acceptance.",ICLR2022, +Syebneh1l4,1544700000000.0,1545350000000.0,1,HyleYiC9FX,HyleYiC9FX,Serious evaluation issues,Reject,"I have to agree with the reviewers here and unfortunately recommend a rejection. + +The methodology and task are not clear. Authors have reformulated QA in SQUAD as as ranking and never compared the results of the proposed model with other QA systems. If authors want to solve a pure ranking problem why they do not compare their methods with other ranking methods/datasets.",ICLR2019,4: The area chair is confident but not absolutely certain +0zV62G_Z4s,1610040000000.0,1611060000000.0,1,meG3o0ttiAD,meG3o0ttiAD,Final Decision,Reject,"This paper introduces two new quantum neural networks with specific structures: TT-QNNs and SC-QNNs. The main contribution of this work is to show a theoretical lower bound that the gradient of the two neural networks (at random initialization) with respect to certain training objectives is well lower bounded by 2^{-2 L}, where L is the number of layers in the network. Previously, the known work only manage to prove this lower bound with less-realistic QNNs with 2-design, or prove an 2^{-poly(n)} lower bounds for random QNNs, where the input of the neural network is an n-qubit. This paper makes a first step towards solving the vanishing gradient problem of QNNs at random initialization. + + + +The major concern of the paper is the usefulness of these QNNs with proposed architectures: The proposed QNNs might be theoretically easier to train, but what if they can only learn a significantly smaller class of functions? In classical world, such phenomenons are very common: Linear classifiers (or even linear functions over prescribed feature mappings) are much easier to train and have much better theoretical properties, but they fail short in terms of representation power comparing to real neural networks. + + + +In this paper, on the theory side, there is no argument about the representation power of these QNNs: It is unclear which set of functions they can represent efficiently, which limits their theoretical interests to machine learning committee. On the empirical side, the reviewers all agree that the empirical results are weak at this point: The proposed new QNNs did not show significant advantages over random QNNs (especially with early stopping), and other types of QNNs were not compared. Moreover, there seems to be some efficiency issue regarding implementing these QNNs -- More convincing empirical evidence or theoretical evidence about the power of these QNNs need to be addressed. ",ICLR2021, +E7s56FBrAS,1576800000000.0,1576800000000.0,1,BylRkAEKDH,BylRkAEKDH,Paper Decision,Reject,"This paper constitutes interesting progress on an important topic; the reviewers identify certain improvements and directions for future work, and I urge the authors to continue to develop refinements and extensions.",ICLR2020, +SOOwYIQXMF,1576800000000.0,1576800000000.0,1,SJecKyrKPH,SJecKyrKPH,Paper Decision,Reject,"This paper proposes a CNN that is invariant to input transformation, by making two modifications on top of the TI-pooling architecture: the input-dependent convolutional filters, and a decoder network to ensure fully transformation invariant. Reviewer #1 concerns the limited novelty, unconvincing experimental results. Reviewer #2 praises the paper being well written, but is not convinced by the significance of the contributions. The authors respond to Reviewer #2 but did not change the rating. Reviewer #3 especially concerns that the paper is not well positioned with respect to the related prior work. Given these concerns and overall negative rating (two weak reject and one reject), the AC recommends reject.",ICLR2020, +V81Ypcrrt60,1610040000000.0,1610470000000.0,1,kDnal_bbb-E,kDnal_bbb-E,Final Decision,Accept (Poster),"In the context of constructing negotiation dialogue strategies/policies, the authors explore the use of graph attention networks (GATs) for determining the sequence of negotiation dialogue acts -- specifically leading to a (1) hierarchical dialogue encoder via pooled BERT + GRU encoding -> (2) GAT over dialogue strategies/acts (many technical details around graph usage) -> (3) GRU decoder. While a relatively straightforward replacement relative to similar architectures with other 'structural' encoders, they provide a sound end-to-end training strategy that is shown to perform well on the buyer-seller negotiation task via CraigslistBargain dataset where they demonstrate SoTA performance. + +== Pros == ++ Studying the pragmatics component of negotiation dialogue strategies has received recent interest and this seems a good milepost that demonstrates mainstream methodological approaches for this task (i.e., this is a good baseline for future innovations) ++ The paper is well-written in that it is easy to understand intuitively while having sufficient detail to understand the details. ++ The empirical results appear promising and meet the standard within this sub-community -- showing improvements with automatic and human evaluation. + +== Cons == +- This builds on existing datasets, which are known to have undesirable properties (e.g., automatic evaluation, small number of dialogue datasets, use of explicit dialogues acts, etc.) While it still meets the standards of this sub-community, it still isn't a completely convincing task. +- While the use of GATs is novel in this setting and they get it to work within the overall architecture, this is something that many people are likely trying at this time -- so there isn't an exciting 'disruptive' step here. +- The empirical results, while satisfactory from a quantitative perspective, even in reading the Appendices, it isn't clear that these are significantly better from a planning perspective or if it is just 'pattern recognition' gains. + +Evaluating along the requested dimensions: +- Quality: The underlying method is fairly straightforward and the authors incorporate up-to-date GAT-related methods to get this to work in this setting. The empirical results are sound if predicated on the general quality in this sub-community where you have the standard machine translation evaluation problem for meaning vs. lexical closeness. To mitigate, they use BERTScore and human evaluation -- which is at the higher end of what can be reasonably expected. +- Clarity: The paper is written clearly overall, especially if considering the appendices where there is significant detail. Related to empirical evaluation, it isn't easy to intuitively interpret the results, but this is again par for the course. Additionally, I believe the authors did a good job responding to reviewer concerns. +- Originality: While all of the reviewers agreed that the approach was novel in this setting, one of the reviewers explicitly pointed out that using GATs in negotiation dialogues isn't that exciting -- and I mostly agree. I view this as something that somebody would have done and will serve as a good baseline; although I think this sub-field is going to need more datasets to continue progressing. +- Significance: As stated above, it is a good baseline that I think many are likely thinking of (as the TOD community has been doing this for a bit now). However, it is done well. + +Honestly, I agree with the reviewers that this is a somewhat borderline paper -- mostly due to it being a fairly 'obvious' idea and the nature of the subfield making it not entirely clear if the improvements are due to knowing the target performance while training or due to the methodological advance. Personally, I am convinced, but it isn't totally clear. That being said, it is a well-written paper and I think the reviewer issues were sufficiently addressed. Thus, I would prefer to see it accepted as I think it will be a strong methodological baseline for this problem (which hopefully will accumulate more convincing datasets and standard evaluation). ",ICLR2021, +neEBpcWuzuB,1610040000000.0,1610470000000.0,1,QM4_h99pjCE,QM4_h99pjCE,Final Decision,Reject,"The paper offers a direction for mult-agent RL that builds on results for actor-critic methods [Zhang, ICML 2018], extending that work to address deterministic policies. The authors establish convergence under a number of assumptions. Both on-policy setting and off-policy settings are treated. The reviewers point out several concerns and the consensus seems to be that, while the direction looks promising, the paper deserves further work. +",ICLR2021, +KhPwIpqPJ20,1610040000000.0,1610470000000.0,1,5fJ0qcwBNr0,5fJ0qcwBNr0,Final Decision,Reject,"This paper proposes a new criterion for neural architecture search that does not require the expensive step of training the model. The reviewers found the proposed approach of relying on gradient statistics promising. However, the reviewers found that the clarity of the paper needs to be improved and that the empirical evidence is too limited to support some of the claims.",ICLR2021, +rCDocmc_3r,1576800000000.0,1576800000000.0,1,BkgXHTNtvS,BkgXHTNtvS,Paper Decision,Accept (Poster),"This article investigates the optimization landscape of shallow ReLU networks, showing that for sufficiently narrow networks there are data sets for which there is no descent paths to the global minimiser. The topic and the nature of the results is very interesting. The reviewers found that this article makes important contributions in a relevant line of investigation and had generally positive ratings. The authors' responses addressed questions from the initial reviews, and the discussion helped identifying questions for future study departing from the present contribution. ",ICLR2020, +b0zz6TgSrY1,1610040000000.0,1610470000000.0,1,FP9kKyNWwwE,FP9kKyNWwwE,Final Decision,Reject,"The paper provides a transfer learning approach to HPO. It builds and improves upon existing methods of zero-shot HPO where the high level idea is to use the outcomes of hyper-parameters on an offline collection of datasets in order to speed up HPO on a new dataset. On the plus side, the methods provided seem to be novel, and the results seem to be promising. The main issue is the writing and clarity of the paper, making it hard to be certain of the good qualities of the paper. Aggregating the reviews, the details are too spread out between the appendix and main body, the techniques require more motivation behind them, and important details of the experiment are somewhat vague. The authors provided a modified version which is definitely a step in the right direction, however, it does not seem to be enough. I think this is a solid paper based on a promising idea. However, given the almost unanimous agreement about that crucial gap in clarity even after the modified version was uploaded, I recommend rejecting the paper. ",ICLR2021, +rdclXO67Yn,1576800000000.0,1576800000000.0,1,B1xtFpVtvB,B1xtFpVtvB,Paper Decision,Reject,All the reviewers recommend rejecting the submission. There is no basis for acceptance.,ICLR2020, +HJxbLZhZx4,1544830000000.0,1545350000000.0,1,H1gL-2A9Ym,H1gL-2A9Ym,borderline paper,Accept (Poster),"There were several ambivalent reviews for this submission and one favorable one. Although this is a difficult case, I am recommending accepting the paper. + +There were two main questions in my mind. +1. Did the authors justify that the limited neighborhood problem they try to fix with their method is a real problem and that they fixed it? If so, accept. + +Here I believe evidence has been presented, but the case remains undecided. + +2. If they have not, is the method/experiments sufficiently useful to be interesting anyway? + +This question I would lean towards answering in the affirmative. + +I believe the paper as a whole is sufficiently interesting and executed sufficiently well to be accepted, although I was not convinced of the first point (1) above. One review voting to reject did not find the conceptual contribution very valuable but still thought the paper was not severely flawed. I am partly down-weighting the conceptual criticism they made. I am more concerned with experimental issues. However, I did not see sufficiently severe issues raised by the reviewers to justify rejection. + +Ultimately, I could go either way on this case, but I think some members of the community will benefit from reading this work enough that it should be accepted.",ICLR2019,3: The area chair is somewhat confident +ynHfzoRQu1,1610040000000.0,1610470000000.0,1,_QnwcbR-GG,_QnwcbR-GG,Final Decision,Reject,"The paper proposes how weight-encoded neural implicit can be strong 3D shape representations. A neural network is trained such that it overfits over a single shape, and the weights of such network is a great representation for the 3D shape. Results are shown on signed distance field (SDF) generation from meshes. + +Strengths: +- an interesting idea for generating compact representations of 3D shapes +- Will further foster several conversations within the deep learning community + +Weaknesses: +- Very limited evaluation to support the authors claims, particularly against other traditional learnable 3D representations",ICLR2021, +Syxx81TBG,1517250000000.0,1517260000000.0,704,H1bhRHeA-,H1bhRHeA-,ICLR 2018 Conference Acceptance Decision,Reject,"The key concern from the reviewers that was not addressed is that none of the experimental results illustrate convergence vs. time instead of convergence vs. number of iterations. While the authors point out that their method is O(ND) instead of O(KND), the reviewers really wanted to see graphs demonstrating this, given that the implicit SGD method requires an iterative solver. The revised paper is otherwise much improved from the original submission, but falls a bit short of ICLR acceptance because of the lack of a measurement of convergence vs. time. + +Pros: ++ Promising unbiased algorithms for optimizing the log-likelihood of a model using a softmax without having to repeatedly compute the normalizing factor. + +Cons: +- The key concern from the reviewers that was not addressed is that none of the experimental results illustrate convergence vs. time instead of convergence vs. number of iterations. +",ICLR2018, +zTbz6C38EXa,1642700000000.0,1642700000000.0,1,m8bypnj7Yl5,m8bypnj7Yl5,Paper Decision,Accept (Poster),"The authors propose a novel hypersolver framework for solving numerical optimal control problems, learning a low order ODE and a neural network based residual dynamics. They compare their framework with traditional optimal control solvers on a number of control tasks and demonstrate superior performance. + +The reviewers are in consensus that the paper makes significant contributions that are validated by the experimental results. The only concern was that the experiments are largely on low dimensional systems, but the reviewers agreed that the results are still worthy of acceptance.",ICLR2022, +SP4XjGwqz_d,1642700000000.0,1642700000000.0,1,GdPZJxjk46V,GdPZJxjk46V,Paper Decision,Reject,"Unfortunately, reviewers unanimously agreed that this paper does not meet the ICLR acceptance standards, citing generally unpolished experiments. I would recommend substantially expanding the experimental results in the future.",ICLR2022, +5XNRtVCmqy,1576800000000.0,1576800000000.0,1,HJxR7R4FvS,HJxR7R4FvS,Paper Decision,Accept (Poster),"The reviewers generally agreed that the application and method are interesting and relevant, and the paper should be accepted. + +I would encourage the authors to carefully go through the reviewers' suggestions and address them in the final.",ICLR2020, +Sy92LJaBG,1517250000000.0,1517260000000.0,873,S1NHaMW0b,S1NHaMW0b,ICLR 2018 Conference Acceptance Decision,Reject,"The paper proposes a regularisation technique based on Shake-Shake which leads to the state of the art performance on the CIFAR-10 and CIFAR-100 dataset. Despite good results on CIFAR, the novelty of the method is low, justification for the method is not provided, and the impact of the method on tasks beyond CIFAR classification is unclear.",ICLR2018, +bzmnCeltOR96,1642700000000.0,1642700000000.0,1,dg79moSRqIo,dg79moSRqIo,Paper Decision,Accept (Poster),"This paper is about an unsupervised method to learn new skills in non-stationary environments by maximizing an intrinsic reward function. Experimental evaluations on OpenAI gym environments show that the proposed approach improves the diversity of the learned skills and is able to adapt to continuously changing environments. + +This paper is borderline. After reading each other's reviews and the authors' feedback, the reviewers discussed the pros and cons of this work. Even if the reviewers have pointed out that the paper has some limitations, they agree that the paper represents a valuable contribution and have appreciated the improvements implemented by the authors during the rebuttal, thus reaching a consensus towards acceptance. +The authors need to update their paper according to what they have proposed in their response and they have to take into serious considerations all the reviewers' suggestions while they will prepare the camera-ready version of their paper.",ICLR2022, +GM9PXdJnYf1,1610040000000.0,1610470000000.0,1,nXSDybDWV3,nXSDybDWV3,Final Decision,Reject,All reviewers have carefully reviewed and discussed this paper. They are in consensus that this manuscript merits a strong revision. I encourage the authors to take these experts' thoughts into consideration in revising their manuscript.,ICLR2021, +cGpT-tKTQRs,1642700000000.0,1642700000000.0,1,w-CPUXXrAj,w-CPUXXrAj,Paper Decision,Accept (Poster),"This paper provides an investigation into the quality of generations made by multimodal VAEs. All reviewers were in favor of accepting the paper, and there was quite a bit of detailed discussion and clarifications in the revised version of the paper which led two reviewers to raise their ratings. Overall this is an interesting contribution to the area and is an excellent fit for ICLR.",ICLR2022, +SJevy0zbeV,1544790000000.0,1545350000000.0,1,HkgYmhR9KX,HkgYmhR9KX,adversarial learning for active visual tracking with interesting components,Accept (Poster),"The paper presents an adversarial learning framework for active visual tracking, a tracking setup where the tracker has camera control in order to follow a target object. The paper builds upon Luo et al. 2018 and proposes jointly learning tracker and target policies (as opposed to tracker policy alone). This automatically creates a curriculum of target trajectory difficulty, as opposed to the engineer designing the target trajectories. The paper further proposes a method for preventing the target to fast outperform the tracker and thus cause his policy to plateau. Experiments presented justify the problem formulation and design choices, and outperform Luo et al. . The task considered is very important, active surveillance with drones is just one sue case. + +A downside of the paper is that certain sentences have English mistakes, such as this one: ""The authors learn a policy that maps raw-pixel observation to control signal straightly with a Conv-LSTM network. Not only can it save +the effort in tuning an extra camera controller, but also does it outperform the..."" However, overall the manuscript is well written, well structured, and easy to follow. The authors are encouraged to correct any remaining English mistakes in the manuscript. ",ICLR2019,5: The area chair is absolutely certain +#NAME?,1576800000000.0,1576800000000.0,1,ryloogSKDS,ryloogSKDS,Paper Decision,Accept (Poster),"This paper considers the problem of reasoning about uncertain poses of objects in images. The reviewers agree that this is an interesting direction, and that the paper has interesting technical merit. ",ICLR2020, +PIyIlCDwJr,1610040000000.0,1610470000000.0,1,px0-N3_KjA,px0-N3_KjA,Final Decision,Reject,"This paper proposes benchmark tasks for offline reinforcement learning. The paper has major strength and weakness, and it has resulted in very active discussion among reviewers, authors, and other participants. + +The major strength includes the following: +- The proposed benchmark is already heavily used in the community +- Offline reinforcement learning is very important to solve reinforcement learning tasks in the real world +- The paper covers a range of tasks and provides through evaluation of existing methods to be used as baselines + +The major weakness is that it is not sufficiently convincing that the methods that perform well in the proposed benchmark tasks will perform well in the offline reinforcement learning tasks in the real world. + +This is partly due to the nature of the benchmark tasks of offline reinforcement learning, which require simulators to evaluate the policies learned with offline reinforcement learning. This means that one cannot simply collect datasets from real world tasks and provide them as benchmark datasets. + +Although one cannot do much about simulators, benchmark tasks for offline reinforcement learning still have many design choices. In particular, how should the datasets in the benchmark be collected (i.e., behavior policies)? + +While the datasets in the proposed benchmark are collected with various behavior policies including humans, it is not necessarily convincing that the resulting benchmark tasks are good for the purpose of evaluating offline reinforcement learning to be used in the real world. + +In addition to the suggestions given by the reviewers, a possible direction to improve the paper is to focus on the choice of behavior policies used to generate the datasets in the proposed benchmark. One might then be able to provide some convincing arguments as to why performing well in the benchmark might imply good performance in the real world by relating it to the choice of behavior policies.",ICLR2021, +rkgKFN6ZeV,1544830000000.0,1545350000000.0,1,ByxAcjCqt7,ByxAcjCqt7,Paper decision,Reject,"Reviewers mostly recommended to reject after engaging with the authors, however since not all author answers have been acknowledged by reviewers, I am not sure if there are any remaining issues with the submission. I thus lean to recommend to reject and resubmit. Please take reviewers' comments into consideration to improve your submission should you decide to resubmit. +",ICLR2019,2: The area chair is not sure +0bRJ-H0LvG,1610040000000.0,1610470000000.0,1,pHXfe1cOmA,pHXfe1cOmA,Final Decision,Accept (Poster),"This paper proposes ""HyperDynamics"" a framework that takes into account the history of an agents recent interactions with the environment to predict physical parameters such as mass and friction. These parameters are fed into a forward dynamics model, represented as a neural network, that is used for control. + +Pros: +- addresses an important problem (adapting dynamics models to ""new"" environments) and provides strong baselines +- well written and authors have improved clarity even further based on reviewers comments + +Cons: +- I agree with the reviewer that it is currently unclear how well this will transfer to the real world +- The idea of predicting physical parameters from a history of environment interactions is not not novel in itself (although the proposed framework is, as far as I know). The authors should include related work along the lines of (1) (this is just one paper that comes to mind, others exist) + +(1) Preparing for the Unknown: Learning a Universal Policy with Online System Identification",ICLR2021, +0HpAtHFe03,1576800000000.0,1576800000000.0,1,HJx-akSKPS,HJx-akSKPS,Paper Decision,Reject,"This paper proposes a method called Dynamic Intermedium Attention Memory Network (DIAMNet) to learn the subgraph isomorphism counting for a given pattern graph P and target graph G. However, the reviewers think the experimental comparisons are insufficient. Furthermore, the evaluation is only for synthetic dataset for which generating process is designed by the authors. If possible, evaluation on benchmark graph datasets would be convincing though creating the ground truth might be difficult for larger graphs. +",ICLR2020, +ByWmpfLug,1486400000000.0,1486400000000.0,1,rkuDV6iex,rkuDV6iex,ICLR committee final decision,Reject,"The paper proposes an empirical investigation of the energy landscape of deep neural networks using several stochastic optimization algorithms. + + The extensive experiments conducted by the authors are interesting and inspiring. However, several reviewers expressed major concerns pointing out the limitations of a experimental investigation on real-world datasets. The paper would benefit from additional experiments on simulated datasets. This would allow to complement the experimental analysis. Furthermore, theoretical analysis and analytical derivations, consistent with the simulated datasets, could shed light on the experimental results and allow to make more precise claims. + + A revised version, following the reviewers' suggestions, will result in a stronger submission to a future venue.",ICLR2017, +H1lIONLrxN,1545070000000.0,1545350000000.0,1,S1lTEh09FQ,S1lTEh09FQ,"Good paper, accept.",Accept (Poster),The paper provides a novel attack method and contributes to evaluating the robustness of neural networks with recently proposed defenses. The evaluation is convincing overall and the authors have answered most questions from the reviewers. We recommend acceptance. ,ICLR2019,4: The area chair is confident but not absolutely certain +MB8AaJRk-_u,1610040000000.0,1610470000000.0,1,R5M7Mxl1xZ,R5M7Mxl1xZ,Final Decision,Reject,"This paper deals with unsupervised image-to-image translation and proposed a geometric constrains for better structural similarity between the source and the target. Experiments are done using multiple GAN frameworks and demonstrate reduction in distortions in the generated images. + +The reviewers appreciated the contributions, but were overall not very enthusiastic about the paper, with two rejection recommendations. In particular, the criticism regarded +- limited applicability; shape similarity does not always translate into a good visual result; scenes with multiple similar objects might be severely distorted +- some results show a strange mixture of styles +- missing implementation details +- only small quantitative improvement +- similarity to prior works on perceptual loss +- lack of clarity about the use of mutual information for geometry preservation, and implementation details +- unconvincing baselines + +The authors provided an extensive rebuttal addressing some of the above comments. However, many of the doubts remained because of which we believe the paper cannot be accepted. ",ICLR2021, +HJVI3zIug,1486400000000.0,1486400000000.0,1,rJbbOLcex,rJbbOLcex,ICLR committee final decision,Accept (Poster),"Though the have been attempts to incorporate both ""topic-like"" and ""sequence-like"" methods in the past (e.g, the work of Hanna Wallach, Amit Gruber and other), they were quite computationally expensive, especially when high-order ngrams are incorporated. This is a modern take on this challenge: using RNNs and the VAE / inference network framework. The results are quite convincing, and the paper is well written. + + Pros: + -- clean and simple model + -- sufficiently convincing experimentation + + Cons: + -- other ways to model interaction between RNN and topic representation could be considered (see comments of R2 and R1)",ICLR2017, +HJUyL16rz,1517250000000.0,1517260000000.0,696,H1-oTz-Cb,H1-oTz-Cb,ICLR 2018 Conference Acceptance Decision,Reject,The experiments are not sufficient to support the claim. The authors plan to improve it for future publication.,ICLR2018, +WcDEMY3vSu6,1610040000000.0,1610470000000.0,1,Tp7kI90Htd,Tp7kI90Htd,Final Decision,Accept (Spotlight),"This paper has received four positive reviews. The main intellectual contribution of the paper is the introduction of a novel readout mechanism that allows models to be shared fully across neurons which in turn helps transfer learning across neurons and even across animals. The reviewers commented on the technical strength of the paper. At the same time, the main contribution remains relatively incremental from a technical standpoint, and while the approach may be of value to future work, the impact of the current study on neuroscience (which is the target here) is quite limited. Nonetheless, there seems to be sufficient enthusiasm from the reviewers to recommend this paper be accepted.",ICLR2021, +0vWK7djX8k,1576800000000.0,1576800000000.0,1,HylvleBtPB,HylvleBtPB,Paper Decision,Reject,"The paper proposes a method to learn cross-lingual representations by aligning monolingual models with the help of a parallel corpus using a three-step process: transform, extract, and reorder. Experiments on XNLI show that the proposed method is able to perform zero-shot cross-lingual transfer, although its overall performance is still below state-of-the-art jointly trained method XLM. + +All three reviewers suggested that the proposed method needs to be evaluated more thoroughly (more datasets and languages). R2 and R4 raise some concerns around the complexity of the proposed method (possibly could be simplified further). R3 suggests a more thorough investigation on why the model saturates at 250,000 parallel sentences, among others. + +The authors acknowledged reviewers' concerns in their response and will incorporate them in future work. + +I recommend rejecting this paper for ICLR.",ICLR2020, +EwPfvVf0rno,1610040000000.0,1610470000000.0,1,0vO-u0sucRF,0vO-u0sucRF,Final Decision,Reject,"Information bottleneck is a well-known principle that is used for clustering, dimensionality reduction, and recently deep learning. It finds a compressed representation of input X while retaining most information on the response Y. This paper addresses an attempt to interpret the meta-learning using the information bottleneck. In addition, a GP-based meta-learning method is also proposed. +The topic itself is interesting without any doubt. However, most of reviewers have serious concerns about this work, which is summarized below. First of all, two components of this paper (IB and GP-based meta-learning) do not provide a coherent message. While the IB interpretation is emphasized in the beginning of this paper, the main point seems to that GP-based methods can be more data efficient than gradient-based meta learning. There does not much point to GP+MAML or IB interpretation of MAML. Experiments are not strong enough, although a few ones are added during the author responses. During the discussion with reviewers, no one support this work, so I do not have choice but to suggest rejection. +",ICLR2021, +qjHcmfbTZm,1576800000000.0,1576800000000.0,1,BJeKwTNFvB,BJeKwTNFvB,Paper Decision,Accept (Poster),"The submission presents an approach to estimating physical parameters from video. The approach is sensible and is presented fairly well. The main criticism is that the approach is only demonstrated in simplistic ""toy"" settings. Nevertheless, the reviewers recommend (weakly) accepting the paper and the AC concurs.",ICLR2020, +F2WRIjPXMzF,1642700000000.0,1642700000000.0,1,dDjSKKA5TP1,dDjSKKA5TP1,Paper Decision,Accept (Poster),"While the reviewers were somewhat split on this paper, they all found some strengths, and pointed out some weaknesses. Among these the main seems to be the somewhat incremental nature of the work, in particular with respect to PCL. As the authors point out, the differences w.r.t. PCL are meaningful and include the main thrust of the paper (removal of false negatives), and the results do indicate usefulness of the proposed approach. Given the wide interest in self-supervision I think the paper is above bar for acceptance.",ICLR2022, +H1xXh4_HxN,1545070000000.0,1545350000000.0,1,ByfbnsA9Km,ByfbnsA9Km,Insufficient novelty,Reject,"The paper challenges claims about cross-entropy loss attaining max margin when applied to linear classifier and linearly separable data. This is important in moving forward with the development of better loss functions. + +The main criticism of the paper is that the results are incremental and can be easily obtained from previous work. + +The authors expressed certain concerns about the reviewing process. In the interest of dissipating any doubts, we collected two additional referee reports. + +Although one referee is positive about the paper, four other referees agree that the paper is not strong enough. + + + +",ICLR2019,4: The area chair is confident but not absolutely certain +NvTA0bVzkm9,1642700000000.0,1642700000000.0,1,9q3g_5gQbbA,9q3g_5gQbbA,Paper Decision,Reject,"The reviews are of adequate quality. The responses by the authors are commendable, but ICLR is selective and reviewers continue to believe that more experiments and more rigorous analysis are needed.",ICLR2022, +HJexcGhRJV,1544630000000.0,1545350000000.0,1,H1ldNoC9tX,H1ldNoC9tX,Limited practical value,Reject,"The paper proposes an algorithm for semi-supervised learning, which incorporate biased negative data into the existing PU learning framework. + +The reviewers and AC commonly note the critical limitation of practical value of the paper and results are rather straightforward. + +AC decided the paper might not be ready to publish as other contributions are not enough to compensate the issue.",ICLR2019,4: The area chair is confident but not absolutely certain +9AvyA73Dd,1576800000000.0,1576800000000.0,1,B1e-kxSKDH,B1e-kxSKDH,Paper Decision,Accept (Poster),"The paper presents a method for modeling videos with object-centric structured representations. The paper is well written and clearly motivated. Using a Graph Neural Network for modeling latent physics is a sensible idea and can be beneficial for planning/control. Experimental results show improved performance over the baselines. After the rebuttal, many questions/concerns from the reviewers were addressed, and all reviewers recommend weak acceptance.",ICLR2020, +U2RHxKNBKTT,1642700000000.0,1642700000000.0,1,U1edbV4kNu_,U1edbV4kNu_,Paper Decision,Reject,"Overall, the reviewers thought this paper suggested an important problem. However, there were many concens. Particularly, the multiple reviewers felt it was unclear when the new approach is better than prior work. The reviewers had difficulty connecting the experiments to the paper's main claims.",ICLR2022, +S1xxqCHPlV,1545200000000.0,1545350000000.0,1,HyeVtoRqtQ,HyeVtoRqtQ,An interesting novel approach combining advantages of truncated RNNs and temporal convnets,Accept (Poster),"The paper proposes a novel network architecture for sequential learning, called trellis networks, which generalizes truncated RNNs and also links them to temporal convnets. The advantages of both types of nets are used to design trellis networks which appear to outperform state of art on several datasets. The paper is well-written and the results are convincing.",ICLR2019,4: The area chair is confident but not absolutely certain +S1xDKvwxgE,1544740000000.0,1545350000000.0,1,B1VWtsA5tQ,B1VWtsA5tQ,Improvement needed,Reject,"This paper proposes to improve the exploration in the PPO algorithm by applying CMA-ES. Major concerns of the paper include: paper editing can be improved; the choices of baselines used in the paper may be not reasonable; flaws in comparisons with SOTA. It is also not quite clear why CMA can improve exploration, further justification required. Overall, this paper cannot be published in its current form.",ICLR2019,4: The area chair is confident but not absolutely certain +3VCfKBMIQX,1610040000000.0,1610470000000.0,1,W3Wf_wKmqm9,W3Wf_wKmqm9,Final Decision,Accept (Poster),"This paper introduces C-learning, an approach to integrate temporal abstractions to value-based methods. Specifically, it uses accessibility functions that estimate horizon-aware value functions for goal-reaching RL problems. Such an approach allows trading-off reliability and speed. After careful consideration I’m recommending the acceptance of this paper. The main weaknesses raised by the reviewers were addressed during the rebuttal, including the improvement of presentation and the introduction of new experiments and baselines. There were not many actionable criticisms left after the discussion and the reviewers acknowledged that the paper has improved since its first version. + +For the final version of the manuscript, I recommend the authors to further take R2’s comments/suggestions into consideration. Further incorporating the discussion about TDMs in the main text will improve clarity, better position the paper, and increase its likelihood of having impact. +",ICLR2021, +H1eZpzUOe,1486400000000.0,1486400000000.0,1,ryTYxh5ll,ryTYxh5ll,ICLR committee final decision,Reject,"The idea of combining many modalities for product recommendation is a good one and well worth exploring. However, the approach presented in this paper is unsatisfying, as it involves combining several pre-trained models, in a somewhat ad hoc manner. Overall a nice problem, but the formulation and results are not presented clearly enough.",ICLR2017, +B19JVJpSz,1517250000000.0,1517260000000.0,268,ryBnUWb0b,ryBnUWb0b,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Reviewers agree that the paper is well done and addresses an interesting problem, but uses fairly standard ML techniques. +The authors have responded to rebuttals with careful revisions, and improved results. ",ICLR2018, +7iBXqS04jIa,1610040000000.0,1610470000000.0,1,k9GoaycDeio,k9GoaycDeio,Final Decision,Reject,"This paper presents a new method of employing some existing techniques to improve robustness, which was verified through experiments. According to the reviewers’ comments and the authors’ responses to these comments, the reviewers generally appreciate the authors’ effort in properly improving and clarifying the proposed method. However, their major concerns still rely on the novelty of this paper, which is identified as a combination of some existing techniques. In addition, the proposed method at its current status still contains some un-convincing points. Hence, the paper is recommended rejected.",ICLR2021, +K1gHRlmgml,1576800000000.0,1576800000000.0,1,S1erpeBFPB,S1erpeBFPB,Paper Decision,Accept (Poster),"This paper proposes using Flush+Reload to infer the deep network architecture of another program, when the two programs are running on the same machine (as in cloud computing or similar). + +There is some disagreement about this paper; the approach is thoughtful and well executed, but one reviewer had concerns about its applicability and realism. Upon reading the author's rebuttal I believe these to be largely addressed, or at least as realistically as one can in a single paper. Therefore I recommend acceptance.",ICLR2020, +gWfOmhOyiVG,1642700000000.0,1642700000000.0,1,oWZsQ8o5EA,oWZsQ8o5EA,Paper Decision,Accept (Poster),"This paper offers a refinement of the information-theoretic characterization of the generalization of models obtained via SGD. This is assessed on some basic neural architectures and inspires the use of new regularizers. Overall, even though the perspective of this paper is not novel, the presented results appear to be clearer and tighter than prior instances of the same ideas. This was appreciated by most reviewers. The few clarity and organization concerns that were raised by the reviewers were adequately addressed by the authors. Overall, the paper deserves to be shared with the community.",ICLR2022, +SJwTz16Sf,1517250000000.0,1517260000000.0,32,S1DWPP1A-,S1DWPP1A-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper aims to improve on the intrinsically motivated goal exploration framework by additionally incorporating representation learning for the space of goals. The paper is well motivated and follows a significant direction of research, as agreed by all reviewers. In particular, it provides a means for learning in complex environments, where manually designed goal spaces would not be available in practice. There had been significant concerns over the presentation of the paper, but the authors put great effort in improving the manuscript according to the reviewers’ suggestions, raising the average rating by 2 points after the rebuttal. ",ICLR2018, +hEnlPzmiLk,1576800000000.0,1576800000000.0,1,Syxp-1HtvB,Syxp-1HtvB,Paper Decision,Reject,"The paper proposes to study what information is encoded in different layers of StyleGAN. The authors do so by training classifiers for different layers of latent codes and investigating whether changing the latent code changes the generated output in the expected fashion. + +The paper received borderline reviews with two weak accepts and one weak reject. Initially, the reviewers were more negative (with one reject, one weak reject, and one weak accept). After the rebuttal, the authors addressed most of the reviewer questions/concerns. + +Overall, the reviewers thought the results were interesting and appreciated the care the authors took in their investigations. The main concern of the reviewers is that the analysis is limited to only StyleGAN. It would be more interesting and informative if the authors applied their methodology to different GANs. Then they can analyze whether the methodology and findings holds for other types of GANs as well. R1 notes that given the wide interest in StyleGAN-like models, the work maybe of interest to the community despite the limited investigation. The reviewers also point out the writing can be improved to be more precise. + +The AC agrees that the paper is mostly well written and well presented. However, there are limitations in what is achieved in the paper and it would be of limited interest to the community. The AC recommends that the authors consider improving their work, potentially broadening their investigation to other GAN architectures, and resubmit to an appropriate venue.",ICLR2020, +6QYfjRQeDv,1576800000000.0,1576800000000.0,1,H1e552VKPr,H1e552VKPr,Paper Decision,Reject,"Initially, two reviewers gave high scores to this paper while they both admitted that they know little about this field. The other review raised significant concerns on novelty while claiming high confidence. During discussions, one of the high-scoring reviewers lowered his/her score. Thus a reject is recommended.",ICLR2020, +Mv_srpROVd,1642700000000.0,1642700000000.0,1,QevkqHTK3DJ,QevkqHTK3DJ,Paper Decision,Reject,"The paper proposes to incorporate an autoencoder to transformer-based summarization models in order to compress the model while preserving the quality of summarization. The strengths of the paper, as identified by reviewers, are in extensive experiments presented in the paper and in a relatively clear write-up. However, the reviewers identify several weaknesses, including missing state-of-the-art summarization baselines and missing relevant compression/knowledge distillation baselines. Although the author response have addressed some of reviewers' concerns, all the reviewers agree that the draft is not yet ready for publication.",ICLR2022, +kM1GckIAm5,1576800000000.0,1576800000000.0,1,ByeqyxBKvS,ByeqyxBKvS,Paper Decision,Reject,"Three reviewers have assessed this paper and they have scored it 6/6/6 after rebuttal with one reviewer hesitating about the appropriateness of this submission to ML venues. The reviewers have raised a number of criticisms such as an incremental nature of the paper (HHL and LMR algorithms) and the main contributions lying more within the field of quantum computing than ML. The paper was discussed with reviewers, buddy AC and chairs. On balance, it was concluded that this paper is minimally below the acceptance threshold. We encourage authors to consider all criticism, improve the paper and resubmit to another venue as there is some merit to the proposed idea. +",ICLR2020, +ZYbZIv92Mkg,1610040000000.0,1610470000000.0,1,KcLlh3Qe7KU,KcLlh3Qe7KU,Final Decision,Reject,"I am recommending rejection for this paper for the following reasons: + +I agree that the main claim is an obvious consequence of the structure of the GAN generator and the prior. +I'm also not sure why the authors restricted their analysis to GANs, but that's not a super important point to me. +More important is that the experimental validation of the ensembling idea is way below the bar for this conference. +Moreover, I know the authors touched on this a bit, but all of these modern GAN variants that work on imagenet +are implicitly ensembling anyway through e.g. the conditioning input and the structure of the special batch-norm. +Finally, the authors didn't respond to the reviews. ",ICLR2021, +SyDOU16Sz,1517250000000.0,1517260000000.0,817,ByCPHrgCW,ByCPHrgCW,ICLR 2018 Conference Acceptance Decision,Reject,"While the reviewers all seem to think this is interesting and basically good work, the Reviewers are consistent and unanimous in rejecting the paper. +While the authors did provide a thorough rebuttal, the original paper did not meet the criteria and the reviewers have not changed their scores.",ICLR2018, +HkFpzJ6SM,1517250000000.0,1517260000000.0,34,SyYe6k-CW,SyYe6k-CW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper is not aimed at introducing new methodologies (and does not claim to do so), but instead it aims at presenting a well-executed empirical study. The presentation and outcomes of this study are quite instructive, and with the ever-growing list of academic papers, this kind of studies are a useful regularizer. ",ICLR2018, +enSC6Eok6n,1576800000000.0,1576800000000.0,1,rylXBkrYDS,rylXBkrYDS,Paper Decision,Accept (Poster),"This paper introduces a simple baseline for few-shot image classification in the transductive setting, which includes a standard cross-entropy loss on the labeled support samples and a conditional entropy loss on the unlabeled query samples. + +Both losses are known in the literature (the seminal work of entropy minimization by Bengio should be cited properly). However, reviewers are positive about this paper, acknowledging the significant contributions of a novel few-shot baseline that establishes a new state-of-the-art on well-known public few-shot datasets as well as on the introduced large-scale benchmark ImageNet21K. The comprehensive study of the methods and datasets in this domain will benefit the research practices in this area. + +Therefore, I make an acceptance recommendation.",ICLR2020, +r16rnf8ug,1486400000000.0,1486400000000.0,1,r1G4z8cge,r1G4z8cge,ICLR committee final decision,Accept (Poster),"The paper presents a nice idea for using a sequence of progressively more expressive neural networks to train a model. Experiments are shown on CIFAR10, parity, language modeling to show that the methods performs well on these tasks. + However, as noted by the reviewers, the experiments do not do a convincing enough job. For example, the point of the model is to show that optimization can be made easier by their concept, however, results are presented on depths that are considered shallow these days. The results on PTB are also very far from SOTA. However, because of the novelty of the idea, and because of the authors ratings, I'm giving the paper a pass. I strongly encourage the authors to revise the paper accordingly for the camera ready version. + + Pros: + - interesting new idea + - shows gains over simple baselines. + Cons: + - not a very easy read, I think the paper was unnecessarily dense exposition of a relatively simple idea. + - experiments are not very convincing for the specific type of problem being addressed.",ICLR2017, +r1xWbGC01N,1544640000000.0,1545350000000.0,1,Hyx6Bi0qYm,Hyx6Bi0qYm,nice,Accept (Poster),"BMIs need per-patient and per-session calibration, and this paper seeks to amend that. Using VAEs and RNNs, it relates sEEG to sEMG, in principle a ten-year old approach, but do so using a novel adversarial approach that seems to work. + +The reviewers agree the approach is nice, the statements in the paper are too strong, but publication is recommended. Clinical evaluation is an important next step.",ICLR2019,4: The area chair is confident but not absolutely certain +ZevoAHi5Aw,1576800000000.0,1576800000000.0,1,Hklcm0VYDS,Hklcm0VYDS,Paper Decision,Reject,"The study of the impact of the noise on the Hessian is interesting and I commend the authors for attacking this difficult problem. After the rebuttal and discussion, the reviewers had two concerns: +- The strength of the assumptions of the theorem +- Assuming the assumptions are reasonable, the conclusions to draw given the current weak link between Hessian and generalization. + +I'm confident the authors will be able to address these issues for a later submission.",ICLR2020, +OZat6ivaA5p,1642700000000.0,1642700000000.0,1,GU11Lbci5J,GU11Lbci5J,Paper Decision,Reject,"I agree that reviewer Vxer was confrontational and abusive, especially in the response to the author's rebuttal, and believe that some form of sanction or reprimand is appropriate. That said, I do think that ""performance"" should be evaluated on both convergence rate and generalization. Figure 3 does suggest some improvement on generalization for deep versions of resnet without batch normalization. + +The three less offensive reviewers all indicated weak acceptance. One reviewer pointed out the weakness of only getting positive results on deep versions of resnet with batch normalization removed. Results on transformers, where Adam is typically used, would be more compelling. This is my primary issue with the paper. It has not demonstrated improvement in the standard practice of resnet (with batch normalization) and has not presented experiments on transformers. The theoretical analysis is not aimed at explaining why the improvement is only observed on deep resnets with batch normalization removed or why L2 regularization seems to be of no value when batch normalization is present. I understand that these are very difficult questions. + +The paper has no champion and I am personally concerned about the significance of the contribution.",ICLR2022, +ckpMw6vOf5,1642700000000.0,1642700000000.0,1,zRb7IWkTZAU,zRb7IWkTZAU,Paper Decision,Reject,"This manuscript describes a method that turns sentences into reward functions by recognizing objects, parsing sentences into a simple formalism, and then grounding the parse in the recognized objects to form a reward for an agent. + +1. The title and much of the manuscript are written in a way that reviewers found confusing. It would seem from the title and most of the text that the method integrates language models, CLIP specifically, into RL in a novel way to provide zero-shot rewards. But this is not the case. CLIP is used purely as an object detector. Yes, the method requires a good object detector and CLIP provides that, but any good object detector that can handle arbitrary phrases would have done. + +2. The overall setup of the work: extract the state of the world and then parse sentences to formulate rewards by grounding parts of the parse into parts of the world state has been explored widely in robotics. Reviewers provided citations going back several years, but many others exist. + +I would encourage the authors to rewrite the manuscript around their central contributions and downgrade their use of CLIP and language models in general to a minor technical footnote. Similarly refocusing related work on the robotics literature and demonstrating how this approach differs and improves on the state of the art there could result in a strong contribution.",ICLR2022, +lo17nxStoMe,1642700000000.0,1642700000000.0,1,Qb07sqX7dVl,Qb07sqX7dVl,Paper Decision,Reject,"The paper presents an approach to weak supervision to address the possibly low-coverage of rule-based labeling functions, by assigning similar labels to similar instances (where the similarity is computed in feature space). + +The reviewers main concerns were the presentation, as well as the experimental protocol and results. Several directions for improvement have been identified by the reviewers and acknowledged by the authors, but in the current state of the submission the consensus is that the paper is not ready for publication.",ICLR2022, +XEXAHzFM1rEu,1642700000000.0,1642700000000.0,1,48RBsJwGkJf,48RBsJwGkJf,Paper Decision,Accept (Poster),"The paper presents a new problem: open-set single domain generalization, where only one source domain is available and unknown classes and unseen target domains increase the difficulty of the task. To tackle this challenging problem, this paper designs a CrossMatch approach to improve the performance of SDG methods on identifying unknown classes by leveraging a multi-binary classifier. CrossMatch generates auxiliary samples out of source label space by using an adversarial data augmentation strategy. Then, the paper proposes a cross-classifier consistency regularization that minimizes the multi-binary classifier's output and one-vs-all multi-class classifier's output. + +The proposed OS-SDG is an interesting and realistic problem. However, since it is way more challenging, the optimal solution to it remains elusive. Some reviewers think the method might be heuristic and lack theoretical guarantees. Nevertheless, the results are promising and the paper makes a first step toward the challenging OS-SDG problem. Another concern is that the CCR loss needs more ablation studies to further analyze its role. Though the authors have added more explanation of this part, I suggest the authors put more ablation studies in the final supplementary document. + +Overall, the paper is novel and interesting. I would recommend acceptance of this paper given its novelty and impressive performance, but I highly suggest the authors add more ablation studies in the final supplementary, as suggested by the reviewers.",ICLR2022, +pO71Q-kcg2J,1610040000000.0,1610470000000.0,1,6_FjMpi_ebO,6_FjMpi_ebO,Final Decision,Reject,"The reviewers are in consensus that this paper is not ready for publication: cited concerns include simple (interesting) ideas but need to be carefully analyzed empirically, contextualized (other similar studies exist), identifying convincing empirical evidences,. etc. + + +The AC recommends Reject.",ICLR2021, +lYJC5K7X2xK,1642700000000.0,1642700000000.0,1,HMJdXzbWKH,HMJdXzbWKH,Paper Decision,Accept (Poster),"The paper analyzes a variant of the Q-Learning algorithm with two modifications: Online Target Learning (OTL), and Reverse Experience Replay (REP). OTL is essentially the same as using the target network. REP is a new modification of ER, which instead of randomly selecting samples from the buffer, replays them in the reverse order. + +Most reviewers are positive about this paper, so I am going to recommend acceptance. There are, however, several concerns that have been raised by the reviewers. As the authors have not revised the paper during the discussion period, my acceptance recommendation is under the good faith expectation that the authors make a serious effort in improving their work based on the reviews. Some of the concerns are: + +- The intuition of why REP breaks the correlation is not clear enough. This has been brought up several times by the reviewers. +- What are the technical differences in the analysis compared to previous work such as Zou et al., 2019? +- The kappa appearing in Assumption 4, and showing up in the error bounds, can be dimension dependent. Please clarify this and its effect on the results. +- Much of the paper is in the appendix. It helps if the authors can include more about the proof technique in the main body of the paper. +- Describe the relation between the error in the value function vs. the performance of its greedy policy.",ICLR2022, +HkewfbFWe4,1544810000000.0,1545350000000.0,1,S1gd7nCcF7,S1gd7nCcF7,meta review,Reject,"This paper proposes a framework for generating auxiliary tasks as a means to regularize learning. The idea is interesting, and the method is simple. Two of the three reviewers found the paper to be well-written. The experiment include a promising result on the CIFAR dataset. The reviewer's brought up several concerns regarding the description of the method, the generality of the method (e.g. the requirement for class hierarchy), the validity and description of the comparisons, and the lack of experiments on domains with much more complex hierarchies. None of these concerns were not addressed in revisions to the paper. Hence, the paper in it's current state does not meet the bar for publication.",ICLR2019,4: The area chair is confident but not absolutely certain +wFoKcfPslx,1610040000000.0,1610470000000.0,1,yKYiyoHG4N3,yKYiyoHG4N3,Final Decision,Reject,"The paper proposes a new multimodal neuro-symbolic technique for synthesizing programs. The specification is given in natural language (soft constraints) and input-output examples (hard constraints). The multimodal program synthesis is formulated as a constrained maximization problem where the goal is to find a program maximizing the conditional probability w.r.t. the natural language specification while satisfying the input-output examples. The proposed technique is evaluated on a multimodal synthesis dataset of regular expressions, and significant performance gains are shown w.r.t. the state-of-the-art synthesis methods. Overall this is an important direction of research, and the paper presents significant results in the space of multimodel program synthesis. + +I want to thank the authors for actively engaging with the reviewers during the discussion phase. The reviewers generally appreciated the paper's ideas; however, there was quite a bit of spread in the reviewers' assessment of the paper (scores: 4, 5, 6, 7). In summary, this is a borderline paper, and unfortunately, the final decision is a rejection. The reviewers have provided detailed and constructive feedback for improving the paper. In particular, the authors should more clearly describe the paper's primary contributions, compare their technique with related work that combines neural generation approaches with deductive methods, and simplify the presentation of technical sections. This is exciting and potentially impactful work, and we encourage the authors to incorporate the reviewers' feedback when preparing future revisions of the paper.",ICLR2021, +r1FmQJpHG,1517250000000.0,1517260000000.0,106,HkAClQgA-,HkAClQgA-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This work extends upon recent ideas to build a complete summarization system using clever attention, copying, and RL training. Reviewers like the work but have some criticisms. Particularly in terms of its originality and potential significance noting ""It is a good incremental research, but the downside of this paper is lack of innovations since most of the methods proposed in this paper are not new to us."". Still reviewers note the experimental results are of high quality performing excellent on several datasets and building ""a strong summarization model."" Furthermore the model is extensively tested including in ""human readability and relevance assessments "". The work itself is well written and clear.",ICLR2018, +B1NinG8_l,1486400000000.0,1486400000000.0,1,Syoiqwcxx,Syoiqwcxx,ICLR committee final decision,Reject,"This paper analyzes under which circumstances bad local optima prevent effective training of deep neural networks. The contribution is real, but the gap between the proposed scenarios and real training scenarios diminishes its importance.",ICLR2017, +DCC3FVzwnE,1576800000000.0,1576800000000.0,1,SklD9yrFPS,SklD9yrFPS,Paper Decision,Accept (Spotlight),"This paper presents a software library for dealing with neural networks either in the (usual) finite limit or in the infinite limit. The latter is obtained by using the Neural Tangent Kernel theory. + +There is variance in the reviewers' scores, however there has also been quite a lot of discussion, which has been facilitated by the authors' elaborate rebuttal. The main points in favor and against are clear: on the positive side, the library is demonstrated well (especially after rebuttal) and is equipped with desirable properties such as usage of GPU/TPU, scalability etc. On the other hand, a lot of the key insights build heavily on prior work of Lee et al, 2019. However, judging novelty when it comes to a software paper is more tricky to do, especially given that not many such papers appear in ICLR and therefore calibration is difficult. This has been discussed among reviewers. + +It would help if some further theoretical insights were included in this paper; these insights could come by working backwards from the implementation (i.e. what more can we learn about infinite width networks now that we can experiment easily with them?). + +Overall, this paper should still be of interest to the ICLR community. +",ICLR2020, +S1MJXJaHf,1517250000000.0,1517260000000.0,43,r11Q2SlRW,r11Q2SlRW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper proposes a real-time method for synthesizing human motion of highly complex styles. The key concern raised by R2 was that the method did not depart greatly from a standard LSTM: parts of the generated sequences are conditioned on generated data as opposed to ground truth data. However, the reviewer thought the idea was sensible and the results were very good in practice. R1 also agreed that the results were very good and asked for a more detailed analysis of conditioning length and some clarification. R3 brought up similarities to Professor Forcing (Goyal et al. 2016) -- also noted by R2 -- and Learning Human Motion Models for Long-term Predictions (Ghosh et al. 2017) -- noting not peer-reviewed. R3 also raised the open issue of how to best evaluate sequence prediction models like these. They brought up an interesting point, which was that the synthesized motions were low quality compared to recent works by Holden et al., however, they acknowledged that by rendering the characters this exposed the motion flaws. The authors responded to all of the reviews, committing to a comparison to Scheduled Sampling, though a comparison to Professor Forcing was proving difficult in the review timeline. While this paper may not receive the highest novelty score, I agree with the reviewers that it has merit. It is well written, has clear and reasonably thorough experiments, and the results are indeed good.",ICLR2018, +Cz0iRDKaaz,1576800000000.0,1576800000000.0,1,HkxDheHFDr,HkxDheHFDr,Paper Decision,Reject,"This paper presents a VAE approach where the model learns representation while disentangling the location and appearance information. The reviewers found issues with the experimental evaluation of the paper, and have given many useful feedback. None of the reviewers were willing to change their score during the discussion period. with the current score, the paper does not make the cut for ICLR, and I recommend to reject this paper. ",ICLR2020, +9Dr71XU6BEg,1610040000000.0,1610470000000.0,1,xxWl2oEvP2h,xxWl2oEvP2h,Final Decision,Reject,"The authors propose an RL-based approach, “Rewriting-by Generating (RBG)”, to solve large-scale capacitated vehicle routing problems (CVRPs): such problems are NP-hard in general and are ubiquitous. The RL agent consists of a ""Generator"" and ""Rewriter"". In generation, the graph is sub-divided into several regions and in each region, an RL algorithm runs to get the best (or near-optimal) route. The rewriter then patches these near-optimal sub-solutions together using “hierarchical RL”. +The paper is generally well-written. + +One main concern is related to generalizability: the authors respond that their approach can work for other NP-hard combinatorial-optimization problems such as knapsack. The authors are encouraged to do a systematic study of several such (related) problems where their approach can work. It was also a concern that the overall approach of partitioning the input instance and rewriting the CVRP solution by merging regions and recomputing routes, is also employed by commercial OR solvers. The authors are encouraged to do a careful comparison (and perhaps melding) with such available solvers, to get a hybrid “OR + ML” improvement. It is also suggested that the authors include several different constraints from real-world VRP (e.g., heterogeneous vehicle costs, costs of missed shipment, route limits, upper-bounded number of vehicles etc.). +",ICLR2021, +rkl5-a26JN,1544570000000.0,1545350000000.0,1,HkxMG209K7,HkxMG209K7,"Promising direction of research for detecting poor quality segmentation, but further experiments and analysis must be completed. ",Reject,"The authors present a method using a VAE to model segmentation masks directly. Errors in reconstruction of masks by the VAE indicate that the mask may be outside the distribution of common mask shapes, and are used to predict poor quality segmentation scenarios that fall outside the distribution of common segmentations. + +Pros: ++ R2: Technical idea is interesting, and a number of baselines used to compare. ++ R1 & R4: Method is novel. + +Cons: +- R3 & R4: The method ignores the original input in its prediction, making the method wholly reliant on shape priors. In situations where the shape prior is weak, the method may be expected to fail. Authors have confirmed this, but not added any experiments to quantify its effect. +- R4: The baseline regressor method is missing key details, which makes it impossible to judge if the comparison is fair (i.e. at minimum, number of learned parameters for each model, number of convolutional layers, structure of network, etc.). Authors have not provided these details. Authors have not investigated datasets with weak shape prior to see how methods compare in this setting. +- R2: GANs can be used as a baseline. Authors confirmed, but did not supply results. + +Reviewers generally agree that the idea is novel, but the value of the approach cannot be determined due to missing baseline experiments, and missing details of baselines. Recommend reject in current form, but encourage authors to complete experiments. ",ICLR2019,5: The area chair is absolutely certain +vnUTNN6FMm3,1642700000000.0,1642700000000.0,1,vQmIksuciu2,vQmIksuciu2,Paper Decision,Reject,"### Summary + +This paper presents a technique to reduce the worst-case latency of inference. The key idea is to use a combination of early exit and filter selection to achieve its results. The filter selection predicts the top-k classes for the input and, using that indication, uses the filters that are the most relevant (using DeepLIFT) to refine the result. + +### Strengths (from Discussion) + +- The idea is interesting. Early exit, mixtures of experts (one potential interpretation of the filter selection here), as well as pruning are interesting mechanisms for neural network efficiency. +There may be new opportunities to find synergies in their combination. + +### Weaknesses (from Discussion) + +- The clarity of writing could be significantly improved, particularly in the description and illustration of the constituent techniques. Figures, such as those in https://arxiv.org/abs/2008.13006, that clearly present the constitution of various layers, in particular, would help. + +- There are relevant and applicable baselines that a comparison would contextualize the strength of the approach (as per Reviewer vKUc examples) + +- ImageNet experiments appear to be within reach of this experimental apparatus (i.e., without extreme cost). Hence, such experiments would validate the applicability of this approach to practice. + +- A small point arose that longest-path inference was not motivated. Work on optimizing tail latency (https://research.google/pubs/pub40801/) may be helpful contextualization here. + +### Recommendation + +My recommendation is Reject. The work here is a very promising start for a new idea. Though requests for additional experimentation and baselines can be ill-defined recommendations. Here, scaling of the results to ImageNet as well as comparing against baselines in the literature (as per Reviewer vKUc's examples) would provide much stronger scoping for this work.",ICLR2022, +Na9kkq6CGQi,1642700000000.0,1642700000000.0,1,cuvga_CiVND,cuvga_CiVND,Paper Decision,Accept (Poster),"The paper observes that the number of redundant parameters is a function of the training procedure and proposes a training strategy that encourages all parameters in the model to be trained sufficiently and become useful. The method adaptively adjusts the learning rate for each individual parameter according to its sensitivity (a proxy for the parameter's contribution to the model performance). The approach encourages the use of under fitted parameters while preventing overfitting in the well-fitted ones. Experimental results are presented covering a wide range of tasks and in combination with several optimisers, showing improvements in model generalization. + +The paper is very well written and easy to follow (as mentioned by Reviewers NSqH, 4pzE and sSHP). + +The authors provided a strong rebuttal including new experiments, like training using CNN based architectures (as requested by Reviewers sSHP and MzBV). Reviewer sSHP requested these results to be reported with STD, the AC encourages the authors to do so for the camera ready. + +Reviewer MzBV points out that the paper could be improved by giving a motivation of the update rule and proving convergence. However, still recommends accepting the paper due to the novelty in the idea of not taking redundant parameters as something inevitable and devising an effective strategy to improve it. This idea was also appreciated by the other reviewers. While the AC agrees that adding these points would improve the work, it takes as valid the point made by the authors. Namely, that the intuition behind the update rule is quite clear, and many other reasonable variants were ablated (in Appendix A.4.4). Furthermore, the empirical evidence shows that the method improves generalization. + +Reviewer NSqH points out that while SAGE improves the model’s generalization performance for lightly compressed models, its performance becomes more susceptible to pruning when the model is compressed heavier. While the authors responded with good points, the AC encourages them to follow the reviewer’s advice and incorporate further experiments studying this issue (e.g. other datasets). + +In sum, the paper proposes a simple and effective method that is able to improve generalization of large scale models. All four reviewers recommend accepting the paper. The AC agrees and encourages the authors to incorporate the requests mentioned above.",ICLR2022, +TPyOEsajYos,1642700000000.0,1642700000000.0,1,3FvF1db-bKT,3FvF1db-bKT,Paper Decision,Reject,"The paper presents a new algorithm for data augmentation in graph neural networks. The algorithm works by learning a conditional model of a node's neighbor features, and augment the neighborhood representation using the generative model. + + In response to the reviews, the authors provided long answers and clarified much of the text. Nonetheless, after the discussion, two main concerns remained. First, the presention still felt subpar, too notationally heavy for what was presented. Second, the gains with respect to the baselines were assessed as not sufficiently significant to justify the approach which is substantially more complex than a baseline such as GRAND.",ICLR2022, +Gcu279gYLMh,1610040000000.0,1610470000000.0,1,PBfaUXYZzU,PBfaUXYZzU,Final Decision,Reject,"This paper proposes a weighted balanced accuracy metric to evaluate the performance of imbalanced multiclass classification. The metric is based on a one-versus-all decomposition from multi-class to binary, and then aggregating the metrics on the binary classification sub-problems in a weighted manner. The authors hope to argue that the new metric is more flexible for evaluating classifiers in the imbalanced and importance-varying setting. + +The reviewers agree that the proposed framework is simple and applicable to an important problem. Somehow the novelty and significance of the work is pretty limited, as many related metrics (e.g. micro/macro-averaged metrics) exist in the literature. The authors are encouraged to think about stronger reasoning on how useful the ""new"" metric is. The experiments are also not convincing nor complete enough to verify the benefits of the proposed metric. +",ICLR2021, +By2qozIue,1486400000000.0,1486400000000.0,1,rkGabzZgl,rkGabzZgl,ICLR committee final decision,Accept (Poster),"This paper presents a theoretical underpinning of dropout, and uses this derivation to both characterize its properties and to extend the method. A solid contribution. I am surprised that none of the reviewers mentioned that this work is closely related to the uncited 2015 paper ""Variational Dropout and the Local Reparameterization Trick"" by Diederik P. Kingma, Tim Salimans, Max Welling.",ICLR2017, +r1vinMIOg,1486400000000.0,1486400000000.0,1,Sk8csP5ex,Sk8csP5ex,ICLR committee final decision,Reject,"The paper presents an analysis of residual networks and argues that the residual networks behave as ensembles of shallow networks, whose depths are dynamic. The authors argue that their model provides a concrete explanation to the effectiveness of resnets. + + However, I have to agree with reviewer 1 that the assumption of path independence is deeply flawed. In my opinion, it was also flawed in the original paper. Using that as a justification to continue this line of research is not the right approach. We cannot construct a single practical scenario where path independence may be expected to hold. So we should not be encouraging papers to continue this line of flawed reasoning. + + I thus cannot recommend acceptance of this paper.",ICLR2017, +GzWSdW3v-S,1642700000000.0,1642700000000.0,1,n1BMcctC12,n1BMcctC12,Paper Decision,Reject,"Dear Authors, + +The paper was received nicely and discussed during the rebuttal period. However, the current consensus suggests the paper requires another round of revisions before it gets accepted. + +In particular: + +- it is not clear if the new method with randomization improves over the deterministic methods, either in theory and practice. +- it is not clear how the assumptions made in this work compare to the existing ones and what the implications are, in terms of applications. + +Reviewers were not satisfied by the replies received during the rebuttal period. +One reviewer stated that the argument ""first coordinate method for this setting"" is valid, but not sufficient to justify publication at this stage. + +Best +AC",ICLR2022, +HJgwYC2-xN,1544830000000.0,1545350000000.0,1,B1zMDjAqKQ,B1zMDjAqKQ,metareview,Reject,"The submission introduces a model that does learning of multisensory representations (by predicting one from the other), with an autoencoder structure. Generally, the reviewers liked the overall idea of the work, but found the clarity lacking, the evaluation insufficient (and not particularly state of the art), the requirement for paired training data quite limiting and the choices (VAE sometimes, autoencoder other times) somewhat ad hoc.",ICLR2019,4: The area chair is confident but not absolutely certain +hdTI6noAe,1576800000000.0,1576800000000.0,1,Bkx4AJSFvB,Bkx4AJSFvB,Paper Decision,Reject,"This article is concerned with sensitivity to adversarial perturbations. It studies the computation of the distance to the decision boundary from a given sample in order to obtain robustness certificates, and presents an iterative procedure to this end. This is a very relevant line of investigation. The reviewers found that the approach is different from previous ones (even if related quadratic constraints had been formulated in previous works). However, they expressed concerns with the presentation, missing details or intuition for the upper bounds, and the small size of the networks that are tested. The reviewers also mentioned that the paper could be clearer about the strengths and weaknesses of the proposed algorithm. The responses clarified a number of points from the initial reviews. However, some reviewers found that important aspects were still not addressed satisfactorily, specifically in relation to the justification of the approach to obtain upper bounds (although they acknowledge that the strategy seems at least empirically validated), and reiterated concerns about the scalability of the approach. Overall, this article ranks good, but not good enough. +",ICLR2020, +5vceY_g_DD31,1642700000000.0,1642700000000.0,1,ZKy2X3dgPA,ZKy2X3dgPA,Paper Decision,Accept (Poster),"This paper adapts the mixup data augmentation strategy to the case of metric learning. The main challenge addressed is the fact that in metric learning, the loss function does not treat each example as an IID sample. The paper takes the view of metric learning as learning over positive and negative pairs (those belonging to the same/different classes) and uses this to develop a fairly general metric-mixup formulation. To measure the effectiveness of the approach for metric learning, the paper introduces a new measure called utilization that looks at the distance of a query point to its nearest training point in embedding space. + +The reviewers (5 of them) all favour acceptance on the grounds of novelty, and the performance of the method. During the discussion, some issues were raised around whether utilization is a useful measure, improvements to the paper clarity, whether the clean loss in eq. 10 is necessary, and potential limitations on the generality of the approach. However, additional experiments and clarification during the discussion period has resolved these issues to their satisfaction.",ICLR2022, +SCQPqWO9XA3,1610040000000.0,1610470000000.0,1,UV9kN3S4uTZ,UV9kN3S4uTZ,Final Decision,Reject,"This paper presents a method for relational inference in multi-agent/multi-object trajectory prediction tasks. Different from the neural relational inference (NRI) model [1], the presented method is able to model time-varying relations. Experimental results on physics simulations and sports games (basketball) show benefits over variants of the NRI model. + +The reviewers agree that the presented method is mostly solid, that the experiments are insightful, and that this is generally a well-written paper. The authors, however, have apparently overlooked recent related work [2] (dNRI) that proposes a very similar model. In the light of dNRI, it is difficult to argue for the novelty of the presented approach, and the paper needs to undergo a revision in order to more clearly differentiate it from the dNRI model, and to resolve the other concerns raised by the reviewers. + +[1] Kipf et al., Neural Relational Inference for Interacting Systems (ICML 2018) +[2] Graber et al., Dynamic Neural Relational Inference (CVPR 2020)",ICLR2021, +431GWQk5GMD,1610040000000.0,1610470000000.0,1,1eKz1kjHO1,1eKz1kjHO1,Final Decision,Reject,"The reviewers were split (with all scores hovering around borderline) and I found it difficult to reach a conclusion. I like the paper, and agree with the authors that it may offer an interesting ""middle ground"" between bottom-up and top-down approaches. On the other hand, I was concerned with some of the execution flaws that were brought up in the reviews, in particular, insufficient comparisons to other embedding methods, lack of results on COCO, and to a significant degree, lack of focus in presentation. I think this could be a much stronger paper, and it will benefit from additional time to line up those missing components. +To clarify the concern re: experiments on COCO, since the authors bring up computational constraints: I agree that running these experiments during the rebuttal period is not a reasonable expectation. But the conclusion is that these results should have been in the original submission! Semantic/panoptic segmentation is now a very mature area, and COCO (along with CityScapes) is one of the standard benchmarks. (BTW, ""different methods often perform similarly"" on COCO and CityScapes, but not always -- partially due to the significant differences in the statistics of the two datasets). I don't think it's reasonable to have a submission in this area which does not include results on it, since it makes it very hard to assess how much empirical progress is being made.",ICLR2021, +xltOPg-8OTH,1610040000000.0,1610470000000.0,1,m1CD7tPubNy,m1CD7tPubNy,Final Decision,Accept (Spotlight),"This submission explores how certain common padding choices can induce spatial biases in convolutional networks. It looks into alternative padding schemes which mitigate these issues and demonstrates significant performance improvements in widely used convnets. Reviewers generally agreed that this is an important point that should be more widely understood in the community, and that the proposed changes are relatively simple to adopt, so this work is likely to be impactful. Most reviewers thought the paper was well-written, describing the problem well, and the analysis well-executed. Most reviewers acknowledged that most of the weaknesses described in their initial reviews were well-addressed by the authors' responses and manuscript updates. Given the strength of the analysis and the impact for many practitioners, I recommend the submission be accepted with a spotlight presentation.",ICLR2021, +rkU5r1TSz,1517250000000.0,1517260000000.0,626,HJ8W1Q-0Z,HJ8W1Q-0Z,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers agree that while the presented result looks interesting, it is but one result. Further, one of the reviewer finds this to be a weak comparison as well. +The novelty of the approach over the paper by Ba et. al. also is in question -- good results on multiple tasks might have made it worth exploring, but the authors did not establish this to be the case convincingly.",ICLR2018, +p4bJY7nmsy,1642700000000.0,1642700000000.0,1,wkMG8cdvh7-,wkMG8cdvh7-,Paper Decision,Accept (Poster),"The reviewers agree that this paper studies an important problem, provides theoretically analysis to understand graph injection attack. +The authors propose a new regularizer to improve the attack success. Extensive experimental results also show the effectiveness of the proposed method.",ICLR2022, +CdVLFxjlSj,1576800000000.0,1576800000000.0,1,S1eYchEtwH,S1eYchEtwH,Paper Decision,Reject,"The paper proposes hierarchical Bayesian optimization (HiBO) for learning control policies from a small number of environment interaction and applies it to the postural control of a humanoid. Both reviewers raised issues with the clarity of presentation, as well as contribution and overall fit to this venue. The authors’ response helped to clarify these issues only marginally. Therefore, primarily due to lack of clarity, I recommend rejecting this paper, but encourage the authors to improve the presentation as per the reviewers’ suggestions and resubmitting.",ICLR2020, +28kwLPRzOM,1576800000000.0,1576800000000.0,1,Byg9bxrtwS,Byg9bxrtwS,Paper Decision,Reject,"The paper studies how the size of the initialization of neural network weights affects whether the resulting training puts the network in a ""kernel regime"" or a ""rich regime"". Using a two-layer model they show, theoretically and practically, the transition between kernel and rich regimes. Further experiments are provided for more complex settings. + +The scores of the reviewers were widely spread, with a high score (8) from a low confidence reviewer with a very short review. While the authors responded to the reviewer comments, two of the reviewers (importantly including the one recommending reject) did not further engage. + +Overall, the paper studies an important problem, and provides insight into how weight initialization size can affect the final network. Unfortunately, there are many strong submissions to ICLR this year, and the submission in its current state is not yet suitable for publication.",ICLR2020, +SkeLgP9egN,1544750000000.0,1545350000000.0,1,rygjN3C9F7,rygjN3C9F7,"Valid theory, but quite close to existing work",Reject,"Strengths: The paper presents an alternative regularized training objective for supervised learning that has a reasonable theoretical justification. It also has a simple computational formula. + +Weaknesses: +The experiments are minimal proofs of concept on MNIST and fashion MNIST, and the authors didn't find an example where this formulation makes a large difference. The resulting formula is very close to existing methods. Finally the paper is a bit dense and the intuitions we should gain from this theory aren't made clear. + +Points of contention: +One reviewer pointed out the close connection of the new objective to IWAE, and the authors added a discussion of the relation and showed that they're not mathematically equivalent. However, as far as I can tell they're almost identical in purpose: As k -> \infty in IWAE, the encoder ceases to matter. And as M -> \infty in VDB, we take the max over all encoders. Could the method proposed in this paper lead to an alternative to IWAE in the VAE setting? + +Consensus: +Consensus wasn't reached, but the ""7"" reviewer did not appear to have put much though into their review.",ICLR2019,3: The area chair is somewhat confident +PqqcEiCKne,1576800000000.0,1576800000000.0,1,H1emfT4twB,H1emfT4twB,Paper Decision,Accept (Poster),"This paper proposes a meta-learning approach for few-shot text classification. The main idea is to use an attention mechanism over the distributional signatures of the inputs to weight word importance. Experiments on text classification datasets show that the proposed method improves over baselines in 1-shot and 5-shot settings. + +The paper addresses an important problem of learning from a few labeled examples. The proposed approach makes sense and the results clearly show the strength of the proposed approach. + +R1 had some questions regarding the proposed method and experimental details. I believe this have been addressed by the authors in their rebuttal. + +R2 suggested that the authors clarified their experimental setup with respect to prior work and improved the clarity of their paper. The authors have made some adjustments based on this feedback, including adding new sections in the appendix. + +R3 had concerns regarding the contribution of the approach and whether it trades variance for bias. The authors have addressed most of these concerns and R3 has updated their review accordingly. + +I think all the reviewers gave valuable feedbacks that have been incorporated by the authors to improve their paper. While the overall scores remain low, I believe that they would have been increased had R1 and R2 reassessed the revised submission. I recommend to accept this paper. +",ICLR2020, +qfn9Msr6s0,1576800000000.0,1576800000000.0,1,S1exA2NtDB,S1exA2NtDB,Paper Decision,Accept (Poster),"This paper introduces an evolution strategy for solving the MAML problem. Following up on some other evolutionary methods as alternatives for RL algorithms, this ES-MAML algorithm appears to be quite stable and efficient. The idea makes sense, and the experiments appear strong. + +The scores of the reviews showed a lot of variance: 1,6,8. Therefore, I asked a 4th reviewer for a tie-breaking review, and he/she gave another 8. The rejecting reviewer mostly took objection to the fact that learning rates / step sizes were not tuned consistently, which can easily change the relative ranking of different ES algorithms. Here, I agree with the authors' rebuttal: the fact that even a simple ES algorithm performs well is very promising, and further tuning would only strengthen that result. Nevertheless, it would be useful to assess the algorithm's sensitivity w.r.t. its learning rate / step size. + +In summary, I agree with the tie breaking review and recommend acceptance as a poster.",ICLR2020, +fpmxv3N5I0,1576800000000.0,1576800000000.0,1,S1esMkHYPr,S1esMkHYPr,Paper Decision,Accept (Poster),"All reviewers agreed that this paper is essentially a combination of existing ideas, making it a bit incremental, but is well-executed and a good contribution. Specifically, to quote R1: + +""This paper proposes a generative model architecture for molecular graph generation based on autoregressive flows. The main contribution of this paper is to combine existing techniques (auto-regressive BFS-ordered generation of graphs, normalizing flows, dequantization by Gaussian noise, fine-tuning based on reinforcement learning for molecular property optimization, and validity constrained sampling). Most of these techniques are well-established either for data generation with normalizing flows or for molecular graph generation and the novelty lies in the combination of these building blocks into a framework. ... Overall, the paper is very well written, nicely structured and addresses an important problem. The framework in its entirety is novel, but the building blocks of the proposed framework are established in prior work and the idea of using normalizing flows for graph generation has been proposed in earlier work. Nonetheless, I find the paper relevant for an ICLR audience and the quality of execution and presentation of the paper is good.""",ICLR2020, +dUxCoAxeOS,1576800000000.0,1576800000000.0,1,B1gi0TEFDB,B1gi0TEFDB,Paper Decision,Reject,"This paper investigates gradient sparsification using top-k for distributed training. Starting with empirical studies, the authors propose a distribution for the gradient values, which is used to derive bounds on the top-k sparsification. The top-k approach is further improved using a procedure that is easier to parallelize. + +The reviewers and AC agree that the problem studied is timely and interesting. However, this manuscript also received quite divergent reviews, resulting from differences in opinion about the rigor and novelty of the results, and perhaps issues with unstated assumptions. In reviews and discussion, the reviewers also noted issues with clarity of the presentation, some of which were corrected after rebuttal. In the opinion of the AC, the manuscript is not appropriate for publication in its current state. ",ICLR2020, +pPaN6Fgsu,1576800000000.0,1576800000000.0,1,BJgNJgSFPS,BJgNJgSFPS,Paper Decision,Accept (Talk),"This paper combine recent ideas from capsule networks and group-equivariant neural networks to form equivariant capsules, which is a great idea. The exposition is clear and the experiments provide a very interesting analysis and results. I believe this work will be very well received by the ICLR community.",ICLR2020, +cXLDedg8GM,1610040000000.0,1610470000000.0,1,ascdLuNQY4J,ascdLuNQY4J,Final Decision,Reject,"Motivated by the possibility of Neural Architecture Search on domains beyond computer vision, this paper introduces a new search space and search method to improve neural operators. It applies the technique to problems in vision and text. + +Reviewer 1 found the paper interesting and liked the motivation of considering different tasks in NAS. However, they found some aspects of the paper confusing and, like other reviewers, thought that the baselines were weak. The authors clarified some points, and R1 said that some, but not all, concerns were resolved. The reviewer improved their score by a point but still was not in favour of acceptance. + +Reviewer 2 thought the paper was interesting but questioned its main claim: that it was proposing a search space over novel operators. They argued that what was discovered was similar to convolution and therefore not much had been gained. They questioned the significance of the ablation studies: there were a lot of them, but they focused on relatively simple tasks like MNIST and CIFAR-10. They also asked some clarifying questions which were answered by the authors. Pushing back on the point about the smaller scale of the experiments (in a general reply to all reviews), the authors said that the goal of their work was not advancing computer vision, but to push NAS beyond computer vision and simple search spaces to new application domains. + +Reviewer 3 liked that the paper gave a good overview of the NAS problem and thought that it was ambitious. They also thought the approach was novel and promising. Like R2’s comment, R3 seemed disappointed that the search was over “reparameterized convolutions”. In fact, they thought that the paper was overselling its contribution. They pointed out that performance was still far from state-of-the art on the various benchmarks. The authors argued against this view of “reparameterizing convolutions” and claimed that the search space was, in fact, much larger than that of DARTS. R3 read and responded to the rebuttal, appreciating the response but ultimately thought that the search space wasn’t clear and comparisons fell short. +Reviewer 4 shared similar pros & cons as have been pointed out by the other reviewers. They thought that operator search was limiting and that the paper should also consider topology. The authors responded to this, saying that they intentionally fixed the topology. Searching beyond operators was out of scope. R4 responded to the rebuttal though still had some remaining concerns both in terms of motivation and execution. + +Multiple reviewers said that they would have considered the paper more favourably had an updated paper been submitted, addressing some of the original concerns. As it stands, all of the reviewers think that the paper has some merits but none believe after considering the author response, that the paper is ready for acceptance. I see no reason to overrule the consensus.",ICLR2021, +H1goFVggx4,1544710000000.0,1545350000000.0,1,HJe62s09tX,HJe62s09tX,"Simple and effective method, accuracy worse but speed better than contemporaneous work.",Accept (Poster),"This paper provides a simple and intuitive method for learning multilingual word embeddings that makes it possible to softly encourage the model to align the spaces of non-English language pairs. The results are better than learning just pairwise embeddings with English. + +The main remaining concern (in my mind) after the author response is that the method is less accurate empirically than Chen and Cardie (2018). I think however that given that these two works are largely contemporaneous, the methods are appreciably different, and the proposed method also has advantages with respect to speed, that the paper here is still a reasonably candidate for acceptance at ICLR. + +However, I would like to request that in the final version the authors feature Chen and Cardie (2018) more prominently in the introduction and discuss the theoretical and empirical differences between the two methods. This will make sure that readers get the full picture of the two works and understand their relative differences and advantages/disadvantages.",ICLR2019,2: The area chair is not sure +S1ejr8gWx4,1544780000000.0,1545350000000.0,1,BJe1hsCcYQ,BJe1hsCcYQ,A well-written paper that is a bit lacking in novelty,Reject,"Dear authors, + +The reviewers all appreciated the treatment of the topic and the quality of the writing. It is rare for all reviewers to agree on this, congratulations. + +However, all reviewers also felt that the paper could have gone further in its analysis. In particular, they noticed that quite a few points were either mentioned in recent papers or lacked an experimental validation. + +Given the reviews, I strongly encourage the authors to expand on their findings and submit the improved work to a future conference.",ICLR2019,3: The area chair is somewhat confident +1FM3Nvoufmq,1642700000000.0,1642700000000.0,1,ajIC9wlTd52,ajIC9wlTd52,Paper Decision,Reject,"The authors attempt to tackle the problem of compositional generalization, i.e., the problem of generalizing to +novel combinations of familiar words or structures. The authors propose a transfer learning strategy based on +pretraining language models. The idea is to introduce a pre-finetuning task where a model is first trained on compositional train-test splits from other datasets, before transferring to fine-tuning on the training data from the target dataset. Although the technique +brings some improvements, and the authors do their best the address the reviewers' questions, it is still unclear: + +a) Why the method should work in principle, whether there is a theoretical backing and how it formally relates to meta-learning +b) How the approach compares to data augmentation methods since pre-finetuning requires more data, albeit from a different +dataset. See for example: https://openreview.net/forum?id=PS3IMnScugk +c) The whole approach would be more convincing if the authors could articulate *how* their method renders a model +more robust to distribution shifts (e.g., based on GOGS results it does not help structural generalization, do the gains +come from lexical generalization?) +d) it would also be interesting whether this method works on larger scale or more realistic datsets like CFQ, ATIS or machine translation +https://arxiv.org/pdf/1912.09713.pdf +https://arxiv.org/abs/2010.11818",ICLR2022, +55mpXrHq9D,1642700000000.0,1642700000000.0,1,R-piejobttn,R-piejobttn,Paper Decision,Reject,"This paper proposes cpl-mixVAE, a method for fitting discrete-continuous latent variable models based on mixture representations and a novel consensus clustering constraint. After extensive discussion, no one was willing to argue in favor of acceptance, and a majority of the reviewers felt another round of revision is needed. Ultimately, I concur that while the ideas are novel and potentially interesting, more effort is needed to convincingly demonstrate the efficacy of the method. Valid concerns were also raised regarding the claimed ""unsupervised"" nature of the proposed method, a claim which at the very least requires some additional context. At this point, these outstanding issues require an additional round of revision.",ICLR2022, +rJxxPJ6BM,1517250000000.0,1517260000000.0,919,S1xDcSR6W,S1xDcSR6W,ICLR 2018 Conference Acceptance Decision,Reject,"This paper does not meet the acceptance bar this year, and thus I must recommend it for rejection.",ICLR2018, +CGt2tNRqMz3,1642700000000.0,1642700000000.0,1,NgmcJ66xQz_,NgmcJ66xQz_,Paper Decision,Reject,"The paper proposes a strategy for multiple learning agents to explore a large RL problem's state space, via the divide and conqeuer principle. It prescribes a design for each agent's reward function, which when optimized enables the agents to 'carve out' and cover different parts of the state space yielding efficient exploratory behavior. The argument for efficacy of the proposed method is purely experimental, with numerical benchmarking on complex simulated environments. + +The reviewers have raised several concerns that persist even after receiving detailed responses from the author(s). These include the lack of discussion about comparisons with seemingly closely related and applicable work, the perception that the comparisons of this method with others are not fair (""not apples to apples""), and the assessment that the ablation studies and investigation of the sensitivity to hyperparameters may not be comprehensive to make a compelling argument. Thus, keeping in mind the unanimous impression of the reviewers, I am of the view that while the paper contributes an interesting principle, more work is needed to argue for its acceptance in a clear way.",ICLR2022, +GOXrTep2ho,1576800000000.0,1576800000000.0,1,BklIxyHKDr,BklIxyHKDr,Paper Decision,Reject,"The paper proposed and analyze a k-NN method for identifying corrupted labels for training deep neural networks. + +Although a reviewer pointed out that the noisy k-NN contribution is interesting, I think the paper can be much improved further due to the followings: + +(a) Lack of state-of-the-art baselines to compare. +(b) Lack of important recent related work, i.e., ""Robust Inference via Generative Classifiers for Handling Noisy Labels"" from ICML 2019 (see https://arxiv.org/abs/1901.11300). The paper also runs a clustering-like algorithm for handling noisy labels, and the authors should compare and discuss why the proposed method is superior. +(c) Poor write-up, e.g., address what is missing in existing methods from many different perspectives as this is a quite well-studied popular problem. + +Hence, I recommend rejection.",ICLR2020, +sjJmO5Bl1Up,1610040000000.0,1610470000000.0,1,X6YPReSv5CX,X6YPReSv5CX,Final Decision,Reject,"This paper extends Bootstrap DQN with multi-step TD target. The initial submission had missing details, communication problems, and results lacking rigor. The authors made a clear effort to address the reviewers concerns. + +This paper's contribution is supported primarily by the empirical results which need major work. The lack of statistical significance in the key results is a major problem. The new 5 run results (originally only 3 runs) shows no clear evidence of improving over the baseline. Additionally one must either justify the use of such few runs by investigating the distributions and using the proper statistical tools (Colas et al [2]) or simply do more runs. Regardless statistical significance in the precise sense is a requirement. + +In addition other adjustments to the paper would strengthen it significantly: (1) The qualitative results like state visitations can be interpreted either in favour of the method or not, this could be improved with discussion or omitted---see [1]; (2) the discussion of heterogeneity was informal; (3) discussion of the impact and sensitivity of hyper-parameters should be included---this includes addressing the concern that the performance of the baseline was as strong as it could be; (4) the current results do clearly separate if the improvement in performance (if it can be shown to be significant) is due to improvements in the rep via auxiliary task effect or the multi-step return---the reviewer has made a nice suggestion for an experiment here. + +In summary, the reviewers did not find the text and examples in the paper convincing as to why the proposed method should be better than bootstrap q, and the results are not significant and need more work. + +references that may be helpful: +[1] https://openreview.net/forum?id=rkl3m1BFDB&utm_campaign=RL%20Weekly&utm_medium=email&utm_source=Revue%20newsletter +[2] https://arxiv.org/abs/1806.08295",ICLR2021, +5MQBEkutpAr,1642700000000.0,1642700000000.0,1,5Qkd7-bZfI,5Qkd7-bZfI,Paper Decision,Accept (Poster),"The authors investigate the claim that agents in emergent communication games will converge to a symmetric homogenous state. In particular, the authors show/argue for diversity in the population to close the gap between observed trends in neural agents and those expected when studying natural languages (e.g. around structure). Reviewers were generally positive, though requested a number of rhetorical changes needed and additional literature. These have been addressed.",ICLR2022, +75BbZDVTonU,1642700000000.0,1642700000000.0,1,_LNdXw0BSx,_LNdXw0BSx,Paper Decision,Reject,"This work analyzes the ability of pre-trained language models to maintain entity coherence and consistency in long narrative generation. Along with new automatic metrics for analyzing narrative generation, it proposes a memory-augmented model that allows tracking entities to improve narrative generation. Although all the reviewers appreciated the importance of the problem, the novelty of the proposed approach, as well as empirical improvements in a subset of experiments, they also acknowledge several major weaknesses including the lack of rigor in defining the method, the lack of clarity in writing (especially in the experiments section), insufficiently strong baselines, and an issue of reproducibility since the code cannot be released. These concerns were in part addressed during rebuttal, but not enough to accept the paper.",ICLR2022, +VBG346qyt7t,1610040000000.0,1610470000000.0,1,Jr8XGtK04Pw,Jr8XGtK04Pw,Final Decision,Reject,"This paper analyses a recurrent neural network model trained to perform a simple maze task, and reports that the network exhibits multiple hallmarks of neural selectivity reported in neurophysiological recordings from the hippocampus— in particular, they find place cells which also are tuned to task-relevant locations, cells which anticipate possible future paths, and a high proportion of neurons tuned to task variables. + +The reviewers appreciated the interesting empirical analysis, and the demonstration that multiple such features could arise in the same neural network— to the best of my knowledge, this had not been demonstrated explicitly before. However, there were also multiple concerns, which lead to this paper beeing discussed extensively and controversially. In particular, it is not clear which features arise from which learning objective, for example, for place cells to arise, do we just need sensory prediction, or do we need q-learning? In addition, there were some points in which the tightness of the analogy between model and biology is questionable— in particular, this refers to the comprising between hippocampal recordings and the evaluation of the network. Finally, it is also clear that some of these observations reported in the paper are, indeed, empirical observations rather than explanations. Because of these shortcomings, there was no consensus and strong support from the reviewers for acceptance of the paper. + +After extensive discussion between both the reviewers, the AC and the program chair, the final decision was to not accept the paper. We do hope that the reviews will help you in improving the study and its presentation. It clearly has potential to be a valuable contribution to the literature. +",ICLR2021, +1B84faemIIq,1642700000000.0,1642700000000.0,1,i7O3VGpb7qZ,i7O3VGpb7qZ,Paper Decision,Reject,"This paper proposes learning to make stylistic code edits (semantics remains similar) based on information from a few exemplars instead of one. The proposed method first parses the code into abstract syntax trees and then use the multi-extent similarity ensemble. This was compared to a Graph2Edit baseline on C# fixer and pyfixer, which are datasets generated by rule-based transcompilers. The proposed method got around 10% accuracy improvement due to a combination of the method and using more than one examplar. + +The reviewers find that any improvement due to more examplars to be expected and suggested that 1) one carefully chosen examplar is enough, and 2) that the need for multiple examplars means more practical difficulties in providing them in an application 3) the targets are all generated by rule-based methods and the benefits may not extend to a realistic case where the edits are not so clear and the reviewers wondered about the application value and the potential need for human evaluations. The authors argued that it is unexpectedly difficult to expand the base method to multiple examplars and users should be able to provide examplars in an application. The authors further provided additional results that addressed some of the reviewer's concerns but the reviewers did not change their evaluation. + +Rejection is recommended based on reviewer consensus.",ICLR2022, +WybA5-my0c,1576800000000.0,1576800000000.0,1,HyxjNyrtPr,HyxjNyrtPr,Paper Decision,Accept (Poster),"The paper has initially received mixed reviews, with two reviewers being weakly positive and one being negative. Following the author's revision, however, the negative reviewer was satisfied with the changes, and one of the positive reviewers increased the score as well. + +In general, the reviewers agree that the paper contains a simple and well-executed idea for recovering geometry in unsupervised way with generative modeling from a collection of 2D images, even though the results are a bit underwhelming. The authors are encouraged to expand the related work section in the revision and to follow our suggestion of the reviewers.",ICLR2020, +VUY3JzOcsCA,1610040000000.0,1610470000000.0,1,LnVNgfvrQjC,LnVNgfvrQjC,Final Decision,Reject,"The paper introduces the new task of few-shot semantic edge detection by adapting existing datasets. It proposes a new method which is compared to a baseline. + +Pros: +- Clear writing. +- Extensive ablation experiments. +- Good architectural choices. + +Mixed: +- The value of the new task raises a mix of opinions. For example R1 sees it as a ""relevant problem, and is well suited for few shot tasks"", but R2 finds is very similar to few-shot segmentation. I think a more interesting version of the problem (that would also create more separation to few-shot segmentation) would be to also consider internal edges, not just ""semantic boundaries"". For example the original BSDS dataset has pure edge annotations. +- Besides the task, another novelty of the paper is the proposed multi-split matching technique, but while it is well demonstrated empirically (as backed by additional results given by authors in rebuttal), R3 would like to have seen ""theoretical or analytical reasoning"" and R1 says it is an ""ad-hoc technique"". + +Cons: +- the PANet+Sobel baseline. All 4 reviewers are unhappy with this baseline: 3 of them find it unfair because of the non-standard edge thickening employed and 2 think there would be more recent and better baselines. The authors provided a rebuttal arguing that their GT edges are ""not too thick to be unfair"" but two of the reviewers mentioned they remained unconvinced -- R1 hopes ""the authors will work on cleaner evaluation of the baseline"" and R4 find the baseline ""still unconvincing in the revised version"". + +Overall the paper would benefit from one more iteration focusing on the evaluation procedure to be convincing and impactful. + +",ICLR2021, +5C4cMWL6ve,1576800000000.0,1576800000000.0,1,rkxdexBYPB,rkxdexBYPB,Paper Decision,Reject,"This paper proposes using a lightweight alternative to Transformer self-attention called Group-Transformer. This is proposed in order to overcome difficulties in modelling long-distance dependencies in character level language modelling. They take inspiration from work on group convolutions. They experiment on two large-scale char-level LM datasets which show positive results, but experiments on word level tasks fail to show benefits. I think that this work, though promising, is still somewhat incremental and has not shown to be widely applicable, and therefore I recommend that it is not accepted. ",ICLR2020, +Sygs_p5NeN,1545020000000.0,1545350000000.0,1,HyM8V2A9Km,HyM8V2A9Km,meta-review,Reject,"This paper was reviewed by three experts (I assure the authors R3 is indeed familiar with RL and this area). Initially, the reviews were mixed with several concerns raised. After the author response, R2 and R3 recommend rejecting the paper, and R1 is unwilling to defend/champion/support it (not visible to the authors). The AC agrees with the concerns raised (in particular by R2) and finds no basis for overruling this recommendation. We encourage the authors to incorporate reviewer feedback and submit a stronger manuscript at a future venue. ",ICLR2019,4: The area chair is confident but not absolutely certain +w4IJAHxNo9-,1642700000000.0,1642700000000.0,1,tFQyjbOz34,tFQyjbOz34,Paper Decision,Reject,"Paper studies if DNNs are modular and proposes statistical methods to quantify modularity. +cluster the neurons of the network using spectral clustering applied to a graph that is weighted by similarity between the neurons. + +While the reviewers find the question of modularity relevant, they raise the issue that the results are inconclusive regarding the main stated contribution of the paper (i.e., if modularity is appropriately measured). After discussion, some concerns are answered. However, the main problem of inconclusive results stands. Therefore, this borderline paper is rejected.",ICLR2022, +weHzFx0JGh,1576800000000.0,1576800000000.0,1,B1lPaCNtPB,B1lPaCNtPB,Paper Decision,Accept (Spotlight),"The paper proposes a novel GAN formulation where the discriminator outputs discrete distributions instead of a scalar. The objective uses two ""anchor"" distributions that correspond to real and fake data. There were some concerns about the choice of these distributions but authors have addressed it in their response. The empirical results are impressive and the method will be of interest to the wide generative models community. ",ICLR2020, +BkeeQlUelV,1544740000000.0,1545350000000.0,1,r1lyTjAqYX,r1lyTjAqYX,Valuable insights on training reinforcement learning with recurrent neural networks at scale,Accept (Poster),"The paper proposes a new distributed DQN algorithm that combines recurrent neural networks with distributed prioritized replay memory. The authors systematically compare three types of initialization strategies for training the recurrent models. The thorough investigation is cited as a valuable contribution by all reviewers, with reviewer 1 noting that the study would be of interest to ""anyone using recurrent networks on RL tasks"". Empirical results on Atari and DMLab are impressive. + +The reviewers noted several weaknesses in their original reviews. These included issues of clarity, a need for more detailed ablation studies, and need to more carefully document the empirical setup. A further question was raised on whether the empirical results could be complemented with theoretical or conceptual insights. + +The authors carefully addressed all concerns raised during the reviewing and rebuttal period. They took exceptional care to clarify their writing, document experiment details, and ran a large set of additional experiments as suggested by the reviewers. The AC feels that the review period for the paper was particularly productive and would like to thank the reviewers and authors. + +The reviewers and AC agree that the paper makes a significant contribution to the field and should be accepted.",ICLR2019,4: The area chair is confident but not absolutely certain +r1xw2nollN,1544760000000.0,1545350000000.0,1,rkemqsC9Fm,rkemqsC9Fm,Skirts close to previous work but ultimately novel,Accept (Poster),"Strengths: This paper gives a detailed treatment of the connections between rate distortion theory and variational lower bounds, culminating in a practical diagnostic tool. The paper is well-written. + +Weaknesses: Many of the theoretical results existed in older work. + +Points of contention: Most of the discussion was about the novelty of the lower bound. + +Consensus: R3 and R2 both appear to recommend acceptance (R2 in a comment), and have both clearly given the paper detailed thought.",ICLR2019,3: The area chair is somewhat confident +Zv8FD8ENYvg,1642700000000.0,1642700000000.0,1,TXsjU8BaibT,TXsjU8BaibT,Paper Decision,Accept (Poster),"This paper proposes a diversity loss and a topological prior to not only increase the chances of finding the appropriate triggers but also improve the quality of the found triggers. These loss terms significantly improve the efficiency in finding trojaned triggers. The experiments results show that the proposed method performs substantially better than the baselines on the Trojaned-MNIST/CIFAR10 and TrojAI datasets, respectively. This paper shows detailed ablation study with great empirical performance. Some reviewers have doubts about the experimental comparisons and some of the assumptions made in the algorithm. Overall, the thorough experimental investigation fo the proposed method makes this paper worthy of publication and widely being shared.",ICLR2022, +BBNminBT2d0,1642700000000.0,1642700000000.0,1,Z7Lk2cQEG8a,Z7Lk2cQEG8a,Paper Decision,Accept (Oral),A conceptually and technically highly innovative paper which reinforces an existing powerful connection between the critical set of two-layer ReLU networks and suitable convex programs with cone constraints. The reviewers are in strong consensus that the paper is sound and has merits for publication.,ICLR2022, +KdLV66rsud-,1610040000000.0,1610470000000.0,1,gIHd-5X324,gIHd-5X324,Final Decision,Accept (Poster),"The paper investigates the effect of soft labels in knowledge distillation from the perspective of sample-wise bias-variance tradeoff. They observe that during training the bias-variance tradeoff +varies sample-wisely. and under the same distillation temperature setting, we + distillation performance is negatively associated with the number of regularization samples. But removing them altogether hurts the performance (the authors show empirical evidence of this). Based on some observations about regularization samples, the authors propose the weighted soft labels to handle the tradeoff. Experiments on standard datasets show that the proposed method can improve the standard knowledge distillation. + +pros. +-the paper is written clearly. +-through the review period the authors added additional experiments suggested by the reviewers and enhances experimental results. The experiment results are convincing and the authors have now added explanations on hyperparameter choices. +-the mathematical setting is now clear after incorporating reviewer's comments. +-the missing related work as suggested by reviewers is added + +cons. +-comparison with results of Zitong Yang et al 2020[1] is missing. + +I thank the authors for incorporating the changes requested by reviewers. Please add comparison with result of [1] in the final version. + +[1] Rethinking Bias-Variance Trade-off for Generalization of Neural Networks +Zitong Yang, Yaodong Yu, Chong You, Jacob Steinhardt, Yi Ma +",ICLR2021, +d2KRXpvndz,1576800000000.0,1576800000000.0,1,BklTQCEtwH,BklTQCEtwH,Paper Decision,Reject,"The paper proposes a curriculum learning approach to training generative models like GANs. The reviewers had a number of questions and concerns related to specific details in the paper and experimental results. While the authors were able to address some of these concerns, the reviewers believe that further refinement is necessary before the paper is ready for publication.",ICLR2020, +ryxGa3UkTm,1541530000000.0,1545350000000.0,1,r1fO8oC9Y7,r1fO8oC9Y7,Two stage approach for semantic parsing leveraging cross domain schemas,Reject,"Interesting approach aiming to leverage cross domain schemas and generic semantic parsing (based on meaning representation language, MRL) for language understanding. Experiments have been performed on the recently released SNIPS corpus and comparisons have been made with multiple recent multi-task learning approaches. Unfortunately, the proposed approach falls short in comparison to the slot gated attention work by Goo et al. + +The motivation and description of the cross domain schemas can be improved in the paper, and for replication of experiments it would be useful to include how the annotations are extended for this purpose. + +Experimental results could be extended to the other available corpora mentioned in the paper (ATIS and GEO). +",ICLR2019,4: The area chair is confident but not absolutely certain +BJelFq6bgV,1544830000000.0,1545350000000.0,1,rJgz8sA5F7,rJgz8sA5F7,metareview,Reject,"This work is effectively an extension of progressive nets, where the task ID is not given at test time. There were several concerns about novelty of this work and the evaluation being insufficient. There was a reasonable back and forth between the reviewers and authors, and the reviewers are all aligned with the idea that this work would need a substantial rewrite in order to be accepted at ICLR.",ICLR2019,5: The area chair is absolutely certain +Sk1fH16SG,1517250000000.0,1517260000000.0,514,SyW4Gjg0W,SyW4Gjg0W,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers were unanimous in their assessment that the paper was not ready for publication in ICLR. Their concerns included: + - lack of novelty over Niepert, Ahmed, Kutzkov, ICML 2016 + - The approach learns combinations of graph kernels and its expressive capacity is thus limited + - The results are close to the state of the art and it is not clear whether any improvement is statistically significant. + +The authors have not provided a response to these concerns.",ICLR2018, +ueJfykS85b,1576800000000.0,1576800000000.0,1,SyxM51BYPB,SyxM51BYPB,Paper Decision,Reject,"In this paper, the authors draw upon online convex optimization in order to derive a different interpretation of Adam-Type algorithms, allowing them to identify the functionality of each part of Adam. Based on these observations, the authors derive a new Adam-Type algorithm, AdamAL and test it in 2 computer vision datasets using 3 CNN architectures. The main concern shared by all reviewers is the lack of novelty but also rigor both on the experimental and theoretical justification provided by the authors. After having read carefully the reviews and main points of the paper, I will side with the reviewers, thus not recommending acceptance of this paper. ",ICLR2020, +By2SSy6SG,1517250000000.0,1517260000000.0,564,BJcAWaeCW,BJcAWaeCW,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers present strong concerns regarding presentation of the paper. The approach appears overly complex, some design choices are not clear and the experiments are not conducted properly. I recommend the authors to carefully go through the reviews.",ICLR2018, +HkbyHyaBM,1517250000000.0,1517260000000.0,474,SJQO7UJCW,SJQO7UJCW,ICLR 2018 Conference Acceptance Decision,Reject,"The paper presents a reasonable idea, probably an improved version of method (combination of GAN and SSL for semantic segmentation) over the existing works. Novelty is not ground-breaking (e.g., discriminator network taking only pixel-labeling predictions, application of self-training for semantic segmentation---each of this component is not highly novel by itself). It looks like a well-engineered model that manages to get a small improvement with a semi-supervised learning setting. However, given that the focus of the paper is on semi-supervised learning, the improvement from the proposed loss (L_semi) is fairly small (0.4-0.8%).",ICLR2018, +g0T-7bVtWH,1576800000000.0,1576800000000.0,1,Syl5o2EFPB,Syl5o2EFPB,Paper Decision,Reject,"The paper proposed a refined AIRL method to deal with the reward ambiguity problem in image captioning, wherein the main idea is to refine the loss function in word level instead in sentence level, and introduce a conditional term in the loss function to mitigate mode collapse problem. The results show the proposed method improves the performance and achieves state-of-the-art performance. However there are concerns from the reviewers that the motivation of the work was not well explained and some inprecise parts exist in the paper. The concept of ""reward ambiguity problem"" is not properly addressed according the opinion of reviewer2. I would like to see these concerns be well addressed before the paper can be accepted. ",ICLR2020, +7ZV_sq3mdBX,1642700000000.0,1642700000000.0,1,pQ02Y-onvZA,pQ02Y-onvZA,Paper Decision,Reject,"Although the reviewers found the idea of the work interesting, they all think it is not ready for publication. The experiments do not properly support the claims. Discussion on the connection to some related work is missing. And also the proposed method is not well motivated. I suggest the authors to take the reviewers' comments into account, revise their work and prepare it for future venues.",ICLR2022, +3Ax5mID5vhy,1642700000000.0,1642700000000.0,1,gdegUuC_fxR,gdegUuC_fxR,Paper Decision,Reject,"The paper considers the high resolution continuous limit of Nesterov's Accelerated Gradient (NAG) algorithm and its connections to sampling (MCMC methods). The paper develops a Hessian-Free High Resolution (HFHR) ODE and injects noise into it to obtain an accelerated sampling algorithm. Further, the paper provides a discrete-time variant of the algorithm by appropriately discretizing HFHR using simple discretization schemes. For strongly log-concave potential functions (log-densities), the paper proves convergence of the order $\tilde{O}(\sqrt{d}/\epsilon)$ in Wasserstein-2 distance. In the asymptotic sense, the result matches the convergence of the underdamped Langevin algorithm; however, the paper argues that the constants in the proposed algorithm are smaller and empirically shows that the proposed algorithm is faster in practice. The main contributions of the paper are theoretical; however, the theoretical results are supplemented by numerical experiments. + +Overall, the reviewers found the contributions interesting and the theoretical contributions of the paper technically sound. The main concerns that were not completely addressed were related to the presentation of the results and reproducibility of some of the numerical experiments. While both seem minor and possible to address, ultimately there was not enough support to recommend acceptance. However, the paper is solid and merits acceptance after suitable revisions. Thus, the authors are encouraged to revise the paper and resubmit it to one of the conferences in the equivalence class of ICLR.",ICLR2022, +HyZe8yTHG,1517250000000.0,1517260000000.0,705,HyKZyYlRZ,HyKZyYlRZ,ICLR 2018 Conference Acceptance Decision,Reject," +Pros: ++ Interesting and promising approach to multi-domain, multi-task learning. ++ Paper is clearly written. + +Cons: +- Reads more like a technical report than a research paper: more space should be devoted to explaining the design decisions behind the model and the challenges involved, as this will help others tackle similar problems. + +This paper had extensive discussion between the reviewers and authors, and between the reviewers. In the end, the reviewers want more insight into the architectural choices made, either via ablation studies or via a series of experiments in which tasks or components are added one at a time. The consensus is that this would give readers a lot more insight into the challenges involved in tackling multiple domains and multiple tasks in a single model and a lot more guidance on how to do it. +",ICLR2018, +zBA5rLkAZnfH,1642700000000.0,1642700000000.0,1,S0NsaRIxvQ,S0NsaRIxvQ,Paper Decision,Reject,"I thank the authors for their submission and active participation in the discussions. This papers is borderline with reviewers WXXr and eK4b leaning towards acceptance and reviewers f6jT and FV5x leaning towards rejection. On the positive side, reviewers remarked that the paper is interesting [FV5x] and novel [FV5x,f6jT,eK4b,WXXr]. However, there all reviewers found some flaws with respect to the execution and empirical validation [FV5x], specifically around lacking baselines [FV5x,WXXr] and some ablations [f6jT,WXXr]. I side with the comment made by reviewers FV5x as well as WXXr that a comparison to stronger baselines (UCB-DrAC) is warranted. Therefore, I recommend that this paper is not ready for publication at this point and that it will benefit greatly from another iteration with stronger empirical results. I want to very strongly encourage the authors to further improve their paper based on the reviewer feedback.",ICLR2022, +2ZzD4djzx,1576800000000.0,1576800000000.0,1,rylvYaNYDH,rylvYaNYDH,Paper Decision,Accept (Poster),"This paper proposes a tool to visualizing the behaviour of deep RL agents, for example to observe the behaviour of an agent in critical scenarios. The idea is to learn a generative model of the environment and use it to artificially generate novel states in order to induce specific agent actions. States can then be generated such as to optimize a given target function, for example states where the agent takes a specific actions or states which are high/low reward. They evaluate the proposed visualization on Atari games and on a driving simulation environment, where the authors use their approach, to investigate the behaviour of different deep RL agents such as DQN. + +The paper is very controversial. On the one hand, as far as we know, this is the first approach that explicitly generates states that are meant to induce specific agent behaviour, although one could relate this to adversarial samples generation. Interpretability in deep RL is a known problem and this work could bring an interesting tool to the community. However, the proposed approach lacks theoretical foundations, thus feels quite ad-hoc, and results are limited to a qualitative, visual, evaluation. At the same time, one could say that the approach is not more ad hoc than other gradient saliency visualization approaches, and one could argue that the lack of theoretical soundness is due to the difficulty of defining good measures of interpretability and that apply well to image-based environments. + +Nonetheless, this paper is a step in the good direction in a field that could really benefit from it. ",ICLR2020, +HyevvTAW14,1543790000000.0,1545350000000.0,1,Bkg93jC5YX,Bkg93jC5YX,Reject,Reject,"This paper is very close to the decision boundary and the reviewers were split about whether it should be accepted or not. The authors updated the paper with additional experiments as request by the reviewers. +The area chair acknowledges that there is some novelty that leads to (moderate) empirical gains but does not see these as sufficient to push the paper over the very competitive acceptance threshold. ",ICLR2019,4: The area chair is confident but not absolutely certain +ouQe0KdgzXr,1610040000000.0,1610470000000.0,1,FsLTUzZlsgT,FsLTUzZlsgT,Final Decision,Reject,"This paper studies the relationship between test error as a function of training set size and various design choices of neural network training. Overall all of the reviewers are excited about the prospect of relating error curves to neural network design choices, but different reviewers complain about the rigor of empirical evaluation and the accuracy of conclusions given limited data points. I agree with reviewers on both points, i.e., the paper studies different design choices, but does not do a thorough job studying those design choices. Moreover, it is not clear what aspects of the study are directly related to error curves vs. a standard correlation study done in prior work, e.g. in ""Do better ImageNet models transfer better?"" for usefulness of ImageNet pre-training. So, overall, I believe not only the empirical evaluation needs improvement, but also the story needs refinement. I am looking forward to seeing this paper published in other ML venues.",ICLR2021, +6rvxhyUCjhl,1642700000000.0,1642700000000.0,1,pETy-HVvGtt,pETy-HVvGtt,Paper Decision,Accept (Poster),"The topic of the paper is the use of partial information decomposition (PID) for the analysis of interactions in latent representations. + +All reviewers ended up appreciating the paper after a good extensive discussion with the authors. The numerical investigation is somewhat on the short side. One reviewer asks for more ablation studies and one reviewer asks for more investigation on real datasets to show the advantage of the method. + +The paper is borderline. The theoretical development is fine. But one could argue that the paper could benefit from some more work on the experiments. However, the main points of the method is in place and further validation of the method can be left for future contributions.",ICLR2022, +5ACBXd9y-SZ,1610040000000.0,1610470000000.0,1,Eql5b1_hTE4,Eql5b1_hTE4,Final Decision,Accept (Poster),"The paper presents a novel method for learning with noisy labels based on an interesting insight into the learning dynamics of deep neural networks. + +Reviewers unanimously vote for acceptance. I agree with their assessment, and it is my pleasure to recommend the paper for acceptance. + +If I can draw attention to one comment, I strongly agree with R1 that the criterion in Eq. (3) is somewhat poorly motivated. I believe the paper would benefit from a clearer exposition of this part. + +Please make sure to address all reviewers' remarks in the camera-ready version. Thank you for submitting your work to ICLR.",ICLR2021, +18_TpzDjus,1610040000000.0,1610470000000.0,1,ZKyd0bkFmom,ZKyd0bkFmom,Final Decision,Reject,"Summary: The authors built on existing work of GP vine copula +models. Some modifications are made, to conditional marginals and +mixing. Applications to mutual information estimation are discussed +and evaluated, and the approach is applied to joint +neural/behavioral data. + + +Discussion: +Strengths mentioned in the reviews are that the +application is (from a neuroscience perspective) interesting, that +estimating mutual information is an important problem, and that the +paper is very well-written. Weaknesses are the limited novelty (from a +machine learning perspective), and weak empirical validation. + +The authors have responded in detail, and were able to clarify a +number of unclear points. Clearly, however, the main criticisms noted +above are hard to address in discussion. + +Despite the paper being overall clearly written, I agree with +reviewers that it is hard to tell from +abstract and introduction where the paper is going (even after +modifications made by the authors in the course of the discussion); of +the fairly long abstract, just about half a sentence relates to where +the proposed model differs from previous work. + + +Recommendation: +I recommend rejection. Despite some clearly positive aspects, the two main criticisms voiced +by reviewers are serious: Weak validation and minimal +novelty from a machine learning perspective. I agree that the +neuroscience application may be interesting, but requires more validation. + +If the authors want to pursue this work further, I would suggest to +perhaps consider first where to position the paper's focus. +Estimation of mutual information is +a problem that is both hard and important. Any progress here would be +welcome, and simple usefulness could offset any lack of model novelty, +but it would have to be carefully and comprehensively +evaluated. On the other hand, a focus on neuroscience applications would +require more emphasis on, and presumably more space in the paper for, +relevant experiments. +",ICLR2021, +ezqNAcSPtBi,1642700000000.0,1642700000000.0,1,JBAZe2yN6Ub,JBAZe2yN6Ub,Paper Decision,Accept (Poster),"This paper introduces a first-occupancy representation for reinforcement learning problems, with potential benefits on problems with non-stationary rewards. The representation is defined analogously to the successor representations, but captures the expected discounted time to first arrive at a state instead of measuring discounted visitations. The paper develops the idea and illustrates some uses for exploration, unsupervised RL, and non-stationary reward functions (for example when food rewards are consumed). + +The reviews brought forward a number of related older ideas in the literature, where several aspects of the method have been previously developed. These include dynamic goal learning, option conditional predictions, general value functions, dynamical distance learning, and temporal difference models. However, from the author response and ensuing discussion, the exact form of the proposed representation has not been studied for the purposes presented in this paper. The reviewers appreciated the utility of this representation for problems with non-Markovian rewards, in particular that “the use of the first-occupancy values as an exploration bonus results in much more efficient exploration”. Multiple reviewers commented on the desire for a stronger empirical evaluation, but they were satisfied with the contribution of the paper. + +The reviewers arrived at a consensus that the paper contributes a new representation for RL problems with non-stationary rewards, with two reviewers strongly convinced and none opposed. The paper is therefore accepted.",ICLR2022, +bDmE-zpu-1F,1610040000000.0,1610470000000.0,1,MxaY4FzOTa,MxaY4FzOTa,Final Decision,Accept (Poster),"## Summary +The paper advances the state of the art in training binary neural networks coming out to first place on ImageNet with a controlled computation budget. While any paper making a new record on ImageNet would be a serious candidate for acceptance, it is positive that this one achieves the goal by putting at work the mechanism of conditional computation, innovative for binary networks, and studying in a systematic and clear way how the network width and configuration can be varied while maintaining the computation budged. + +## Review Process and Decision + +The paper was thoroughly discussed by reviewers from different aspects. Several weaker spots have been identified (see below and final reviews), but no critical issues that would indicate a necessity of a major revision. In the end, reviewers agreed on acceptance although in some cases they have decided to keep their original <=5 ranking to reflect the scientific value to them from a more global perspective. +I think it is an example of well done modeling and experimental work: the work is very clear, uses sound methods, the experimental results are systematic and give interpretable evidence, which is in my experience is rather exceptional for the overall very empirical binary NNs field. I estimate high interest because of the concept of conditional computation put to work here and because of making a new record on ImageNet. + +## Details + +* Computation Cost + +If such networks are to be deployed in low-power devices, the computation cost might need to be estimated more accurately. An example of such estimation is the work by +Ding et al. (2019) Regularizing Activation Distribution for Training Binarized Deep Networks, +where the energy and area are estimated using information from a semiconductor process design kit. + +There is indeed a number of floating point operations around the binary convolutions: first and last layers, experts, skip connections with scale factors and non-linearities. The latency and cost of these operations may not be negligible on target devices. In particular Ding et al. (2019) argue that XNOR-Net architecture is 3 times more costly because of floating point scale factors. +However the paper does a fair job in comparing in operation counts, which is a good proxy for many devices. The floating point computations needed in various places can be indeed further reduced to lower bit width, the research on quantization techniques shows this is possible and orthogonal to the contribution. + +* Novelty of grouped convolutions design and search + +The work of Phan et al. (CVPR 2020): Binarizing MobileNet via Evolution-based Searching +also proposed to search for best grouped convolution under computation budget constraints (evolutionary search method). +Strict budget constraint and merging results from different groups are somewhat novel and the prior work can be objectively contemporaneous. + +* Clarity + +Clarity of the paper has been improved by the revision. One remaining mysticism is still about the gradient estimator for the experts. The paper states: ""we wish to back-propagate gradients for the non-selected experts"", ""allows meaningful gradients to flow to all experts during training"". The problem is that since $\varphi(z)$ is binary one-hot on the forward pass, the gradient of the scalar product with $\varphi$ in (2) results in that in the backward pass only the selected expert receives the training signal and by no means all of them. This is regardless of how the gradient is propagated through $\varphi$. Maybe something is missing? I hope the authors can clarify in the final version. I do not consider it as a serious flow since this training scheme is not claimed as a contribution in any case. + +One more point on the clarity: The paper claims that using experts increases the network representation power / capacity. While this seems logical, and follows the preceding work in real-valued NNs, the paper could provide additional evidence in terms of training performance of these models. Since the teacher with 76% accuracy is used in the distillation, I assume the training never reaches 100% training accuracy in any of the settings. Does the training accuracy improves with experts? This would be a helpful evidence for further work. + +* Search method + +The paper was further criticized for that the manual search of the architecture is a step back from automated search methods (NAS, BATS). However these methods are themselves a relaxation of discrete choices (experts, if you like), that need to keep all possible configurations at the same time, which may be less stable and too costly for real architectures and datasets. The principles of gradient-based architecture search are not entirely clear and the resulting models coming out of these methods typically give no insights regarding good (intelligent) design choices. At present, the systematic exploration with analysis of tradeoffs conducted is seen to have advantages. + + +",ICLR2021, +A4ETzzXzTHH,1610040000000.0,1610470000000.0,1,AVKFuhH1Fo4,AVKFuhH1Fo4,Final Decision,Reject,"Reviewers have different views on the paper and after going through the reviews, replies and the papers, we believe that +there is room for improvement here. + +While the part related to indefinite symmetric kernels, and general similarity functions seems to be well covered, as +well as the part on Transformers, the relation with learning in RKBS and Transformer is far from being clear and Reviewer 4 makes a strong point on this. For instance, + +* what is the goal of the section 5 and Definition 1 . Indeed it is not clear here if the point of the authors is to learn the kernel parameters in equation 9 or to learn to predict the output of a transformer. If it is the latter, the connection with the first part is unclear. + +* In Equation 11, I can understand that x and y are the sequences t and s but what is z_ij and how it is obtained? So again, the learning problem drops in without justification and it is not explained how it can be solved. The theoretical results involving the representer theorem is nice though. + +* The experiment does not seem very related to the learning problem in Equation 11 introduced by the authors.it seems to me that they are just trying different kernels on top of the dot product. +",ICLR2021, +H1xAY5F1xE,1544690000000.0,1545350000000.0,1,HyxzRsR9Y7,HyxzRsR9Y7,meta review,Accept (Poster),"This paper proposes a reinforcement learning approach that better handles sparse reward environments, by using previously-experienced roll-outs that achieve high reward. The approach is intuitive, and the results in the paper are convincing. The authors addressed nearly all of the reviewer's concerns. The reviewers all agree that the paper should be accepted.",ICLR2019,4: The area chair is confident but not absolutely certain +0bx5FsGtRJ_,1610040000000.0,1610470000000.0,1,foNTMJHXHXC,foNTMJHXHXC,Final Decision,Reject,"The paper is proposing Risk Extrapolation (REX) as a domain generalization algorithm. Authors extends the distributionally robust learning to affine mixture of distributions from convex mixture. Authors later uses variances instead of this extension and demonstrate various empirical and theoretical properties. The paper is reviewed by four expert reviewers and the reviewers did not reach to a consensus. Hence, I also read the paper in detailed and reviewed it. In summary, reviewers argue the following: + +- R#2: Main argument is the lack of justification of the claim ""Rex could deal with both covariate and concept shift together"". Authors try to address this in their response. Moreover, reviewer also argues in the private discussion that manuscript is not updated and authors did not address any of the issues during the discussion period. +- R#3: Argues that (similar to R#2), dealing with covariate shift is not explained properly. Reviewer is not persuaded that REX results in invariant prediction. +- R#1 and R#4: Largely positive about the paper. In the mean time, argue that organization of the paper is lacking and some of the material in the supplement is relevant and should be moved to the main text. R#1 decreases their score due to the lack of re-organization during the discussion. + +The value of the paper is clear to me, the joint treatment of minimax perspective, domain generalization and invariances is definitely interesting and valuable. Hence, the paper has merit to be published. However, the presentation is lacking significantly. The main contribution of the paper lies in Table 1 but the invariant prediction property is not justified at all in the main text. Hence, Table 1 is not justified properly. Authors discuss Thm 1&2 in their response but they both are in the supplement. From reading only the main text, confusion of the reviewers are well justified. ICLR guidelines clearly states that ""...Note that reviewers are encouraged, but not required to review supplementary material during the review process..."" It is authors' responsibility to make the main paper self contained. Even more worrisome is the fact that authors dismiss this concern in their response to R#1 which eventually leads to R#1 decreasing their score. Hence, I decided to reject the paper since the presentation is subpar and authors did not persuaded reviewers that they can fix this presentation issue by the camera-ready deadline. On the other hand, I think the paper can be really influential if it was written clearly. I suggest authors to revise the claims more precisely, extended the discussion on the claims and move the theorems to the main paper.",ICLR2021, +S1beVyarG,1517250000000.0,1517260000000.0,274,rkrC3GbRW,rkrC3GbRW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Viewing the problem of determining the validity of high-dimensional discrete sequences as a sequential decision problem, the authors propose learning a Q function that indicates whether the current sequence prefix can lead to a valid sequence. The paper is fairly well written and contains several interesting ideas. The experimental results appear promising but would be considerably more informative if more baselines were included. In particular, it would be good to compare the proposed approach (both conceptually and empirically) to learning a generative model of sequences. Also, given that your method is based on learning a Q function, you need to explain its exact relationship to classic Q-learning, which would also make for a good baseline.",ICLR2018, +SkxkMjiyeN,1544690000000.0,1545350000000.0,1,r1eiqi09K7,r1eiqi09K7,An interesting analysis and algorithm,Accept (Poster),"Dear authors, + +All reviewers agreed that your work sheds new light on a popular class of algorithms and should thus be presented at ICLR. + +Please make sure to implement all their comments in the final version.",ICLR2019,4: The area chair is confident but not absolutely certain +VDlasw_u1F,1610040000000.0,1610470000000.0,1,tnSo6VRLmT,tnSo6VRLmT,Final Decision,Accept (Poster),"This paper presents an approach for conformal prediction where, in its standard paradigm, a set of prediction candidates is identified as opposed to a single one. The authors advance the CP framework by presenting a rigorous methods that allows for a smaller set of admissable predictions with a covergae quarantee. Their further contribution is a methodolgy based on cascading that filters out non promising candidates. + +After the discussion period, _all_ the reviewers are in favour of accepting the manuscript with the average being marginally above acceptance. My recommendation is therefore to accept the paper. + +Strong points: +The advance of a smaller set of admissible predictions in the CP framewrok is quite useful especially in scenaria where the set can grow (expensively) large. +Thorough experimental analysis with good presentation of performance gain and usefulness in real world data. + +Weak points: +Lack of novelty in the techniques make the work a weaker candidate compared to the rest of submissions. ",ICLR2021, +GvdWlnXlUr,1642700000000.0,1642700000000.0,1,Mo9R9oqzPo,Mo9R9oqzPo,Paper Decision,Reject,"This submission tackles the problem of model explainability from the perspective of masking-based saliency methods. +Several metrics are proposed for evaluating saliency methods including a new « soundness » concept. +Experiments using a consistency score to simultaneously evaluate completeness and soundness are provided. + +Most of the reviewers were not convinced by the approach and have raised several issues. + +After rebuttal and despite the interest in the introduction of the concept of « soundness » to better explain model decision, the current proposition needs to be improved. In particular, the interest of this soundness concept does not bring out, many details of the method are not clear enough and the effectiveness of the proposed measure is still questionable. It would be interesting that the authors consider the R’s comments as the ones for additional experiments to demonstrate the relevancy of their contribution.",ICLR2022, +U614f8IR6sC,1642700000000.0,1642700000000.0,1,ZDaSIkWT-AP,ZDaSIkWT-AP,Paper Decision,Accept (Poster),This paper describes how to apply a combination of case-based reasoning and RL methods to improve the performance of agents in text-adventure games. The reviewers unanimously recommend acceptance. This work is both insightful and practical. This is a valuable contribution. Well done!,ICLR2022, +oydovObXGUu,1610040000000.0,1610470000000.0,1,w6Vm1Vob0-X,w6Vm1Vob0-X,Final Decision,Reject,"This paper proposes a GNN that uses global attention based on graph wavelet transform for more flexible and data-dependent GNN feature aggregation without the assumption of local homophily. + +Three reviewers gave conflicting opinions on this paper. The reviewer claiming rejection questioned the novelty of the paper and the complexity of the global attention mentioned in the paper. Even through the authors' responses and subsequent private discussions, concerns about complexity and novelty were not completely resolved. + + +Considering the authors' claim that the core contribution of this paper is to design fully learnable spectral filters without compromising computational efficiency, it is necessary to consider why it is meaningful to perform global attention based on graph wavelet transform in the first place. In terms of complexity, although the wavelet coefficient can be efficiently calculated using the Chebyshev polynomials mentioned by the authors, in the attention sparsification part, n log n is required **for each node** in sorting, resulting in complexity of n^2 or more. There may still be an advantage of complexity over using global attention in a message-passing architecture, but it will be necessary to clarify and verify that, given that the proposed method uses an approximation that limits global attention within K hops. + +Also, this paper modifies the graph wavelet transform in graph theory, which requires a deeper discussion. For example, as the authors mentioned, the original wavelet coefficient psi_uv can be interpreted as the amount of energy that node v has received from node u in its local neighborhood. The psi_uv defined by the learnable filter as shown in Equation 3 has a different meaning from the original wavelet coefficient. There is insufficient insight as to whether it is justifiable to use this value as an attention coefficient. + +Overall, the paper proposes potentially interesting ideas, but it seems to require further development for publication.",ICLR2021, +UK2zLS1Z3C,1642700000000.0,1642700000000.0,1,SVcEx6SC_NL,SVcEx6SC_NL,Paper Decision,Reject,"The paper looks at the favorable properties of feature representations of an adversarially robust model, which are interesting but not surprising, especially in the context of much existing literature has talked about this. All reviewers gave negative scores. The main issues are: 1) The paper only provides experimental demonstration of this phenomenon without going into a more detailed explanation of the phenomenon. This is not enough when the observations, in question, are not very novel and have already been explored in various forms in past published literature. 2) limited novelty since the current submission does not introduce a new approach or algorithm or theoretical results. The paper also lacks comparison/discussion of recent works. Thus, I cannot recommend accepting the paper to ICLR.",ICLR2022, +BygXAwjzgE,1544890000000.0,1545350000000.0,1,BkG8sjR5Km,BkG8sjR5Km,An interesting new task to study learning cooperation between agents,Accept (Poster),"The paper studies population-based training for MARL with co-play, in MuJoCo (continuous control) soccer. It shows that (long term) cooperative behaviors can emerge from simple rewards, shaped but not towards cooperation. + +The paper is overall well written and includes a thorough study/ablation. The weaknesses are the lack of strong comparisons (or at least easy to grasp baselines) on a new task, and the lack of some of the experimental details (about reward shaping, about hyperparameters). + +The reviewers reached an agreement. This paper is welcomed to be published at ICLR.",ICLR2019,3: The area chair is somewhat confident +5WBMTct6Ho,1576800000000.0,1576800000000.0,1,r1eyceSYPr,r1eyceSYPr,Paper Decision,Accept (Spotlight),"Main content: + +Blind review #1 summarizes it well: + +The paper proposes an algorithmic improvement that significantly simplifies training of energy-based models, such as the Restricted Boltzmann Machine. The key issue in training such models is computing the gradient of the log partition function, which can be framed as computing the expected value of f(x) = dE(x; theta) / d theta over the model distribution p(x). The canonical algorithm for this problem is Contrastive Divergence which approximates x ~ p(x) with k steps of Gibbs sampling, resulting in biased gradients. In this paper, the authors apply the recently introduced unbiased MCMC framework of Jacob et al. to completely remove the bias. The key idea is to (1) rewrite the expectation as a limit of a telescopic sum: E f(x_0) + \sum_t E f(x_t) - E f(x_{t-1}); (2) run two coupled MCMC chains, one for the “positive” part of the telescopic sum and one for the “negative” part until they converge. After convergence, all remaining terms of the sum are zero and we can stop iterating. However, the number of time steps until convergence is now random. + +Other contributions of the paper are: +1. Proof that Bernoulli RBMs and other models satisfying certain conditions have finite expected number of steps and finite variance of the unbiased gradient estimator. +2. A shared random variables method for the coupled Gibbs chains that should result in faster convergence of the chains. +3. Verification of the proposed method on two synthetic datasets and a subset of MNIST, demonstrating more stable training compared to contrastive divergence and persistent contrastive divergence. + +-- + +Discussion: + +The main objection in reviews was to have meaningful empirical validation of the strong theoretical aspect of the paper, which the authors did during the rebuttal period to the satisfaction of reviewers. + +-- + +Recommendation and justification: + +As review #1 said, ""I am very excited about this paper and strongly support its acceptance, since the proposed method should revitalize research in energy-based models.""",ICLR2020, +FPYfXekenb,1576800000000.0,1576800000000.0,1,HkeeITEYDr,HkeeITEYDr,Paper Decision,Reject,"This paper studies the robust reinforcement learning problem in which the constraint on model uncertainty is captured by the Wasserstein distance. The reviewers expressed concerns regarding novelty with respect to prior work, the presentation or the results, and unconvincing experiments. In its current form the paper is not ready for acceptance to ICLR-2020.",ICLR2020, +8OxLtVvNiXi,1610040000000.0,1610470000000.0,1,TTUVg6vkNjK,TTUVg6vkNjK,Final Decision,Accept (Poster),"The paper proposes a two-level hierarchical algorithm for efficient and scalable multi-agent learning where the high-level policy decides a reduced space for low-level to explore in. All the reviewers liked the premise and the experimental evaluation. Reviewers had some clarification questions which were answered in the authors' rebuttal. After discussing the rebuttal, AC as well as reviewers believe that the paper provides insights that will be useful for the multi-agent learning community and recommend acceptance.",ICLR2021, +qtIQopvp-31,1642700000000.0,1642700000000.0,1,wTTjnvGphYj,wTTjnvGphYj,Paper Decision,Accept (Poster),"This work adds the positional encoding (akin to those in transformers, but adapted) to GNNs. +In their reviews, reviewers raised a number of concerns about this work, in particular, lack of novelty, lack of ablations to demonstrate the claims of the paper, lack of comparison to previous work (e.g., position-aware GNNS, Graphormer and GraphiT which would appear very related to this work), lack of motivation (e.g., the introduced positional loss do not actually improve performance), and whether the experimental results were really significant. +During the rebuttal, the authors replied to the reviews, to address. the concerns that they could. Of the reviewers, unfortunately only one reviewer elected to respond to the authors. It is disappointing that the four other reviewers did not respond and overall the reviewers did not discuss this paper further. + +The authors chose to highlight privately to the AC that two reviewers who scored the paper unfavourably did not respond. The authors then argued this should be taken into account in the score (presumably to make acceptance more likely)--however, two favourable reviewers also did not respond (not highlighted by the authors). I understand this kind of private request to the AC to dismiss unfavourable reviews (especially if they do not respond) is becoming common--I find it unhelpful--I can see who and who has not responded. + +Nonetheless, looking at the responses to the original concerns of the reviewers highlighted above, I believe the authors have adequately addressed the concerns of the reviewers. Therefore i recommend acceptance but only as a poster.",ICLR2022, +cQL0CNpLeP,1576800000000.0,1576800000000.0,1,SJxpsxrYPS,SJxpsxrYPS,Paper Decision,Accept (Spotlight),"This paper proposes a novel way to learn hierarchical disentangled latent representations by building on the previously published Variational Ladder AutoEncoder (VLAE) work. The proposed extension involves learning disentangled representations in a progressive manner, from the most abstract to the more detailed. While at first the reviewers expressed some concerns about the paper, in terms of its main focus (whether it was the disentanglement or the hierarchical aspect of the learnt representation), connections to past work, and experimental results, these concerns were fully alleviated during the discussion period. All of the reviewers now agree that this is a valuable contribution to the field and should be accepted to ICLR. Hence, I am happy to recommend this paper for acceptance as an oral.",ICLR2020, +HkpDnfI_g,1486400000000.0,1486400000000.0,1,rkpdnIqlx,rkpdnIqlx,ICLR committee final decision,Reject,"This paper proposes an algorithm for training undirected probabilistic graphical models. However, there are technical concerns of correctness that haven't been responded to. It also wasn't felt the method was evaluated appropriately.",ICLR2017, +ByqSIkpHM,1517250000000.0,1517260000000.0,778,HJw8fAgA-,HJw8fAgA-,ICLR 2018 Conference Acceptance Decision,Reject,"There was quite a bit of discussion about this paper but in the end the majority felt that, though the paper is interesting, the results are too limited and more needs to be done for publication. + +PROS: +1. Good comparison of state space model variations +2. Good writing (perhaps a bit dense in places) +3. Promising results, especially concerning speedup + +CONS: +1. The evaluation is quite limited + +",ICLR2018, +TA7hSp5sEb_,1610040000000.0,1610470000000.0,1,WPO0vDYLXem,WPO0vDYLXem,Final Decision,Reject,"The paper has been actively discussed, both during and after the rebuttal phase. I enjoyed, and I am thankful for, the active communication that took place between the authors and the reviewers. + +On the one hand, the reviewers agreed on several pros of the paper, e.g., +* Clear, well presented manuscript +* The presentation of practically-relevant setting +* A work that fosters reproducible research (both BO data and algorithms are made available) +* Careful experiments + +On the other hand, several important weaknesses were also outlined, e.g., +* _Novelty_: While the authors claim they “introduce a practically relevant and fundamentally novel research problem”, existing commercial HPO solutions already mention, and propose solutions for, the very same problem, e.g., [AWS](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-warm-start.html) (section “Types of Warm Start Tuning Jobs”) and [Google cloud](https://cloud.google.com/blog/products/gcp/hyperparameter-tuning-on-google-cloud-platform-is-now-faster-and-smarter) (section “Learning from previous trials”). The reviewers all agreed on the fact that this down-weights the novelty aspect (claimed many times in the rebuttal and the manuscript): The paper formalizes an already existing framework rather than introducing it. +* In the light of the weakened ""novelty"" contribution (see above), the reviewers regretted the absence of a novel transfer method _tailored to HT-AA_, which would have certainly strengthened the submission. +* _“Dynamic range” of the benchmark_: It is difficult to evaluate the capacity of the benchmark to discriminate between different approaches (e.g., see new Fig. 3 showing the violin plot with all three methods for transfer, as suggested by Reviewer 1: the improvements over ""best first"" seem marginal at best). To better understand the benchmark, it would be nice to illustrate its “dynamic range” by exhibiting a more powerful method that would substantially improve over “best first”. + +As illustrated by its scores, the paper is extremely borderline. Given the mixed perspectives of pros and cons, we decided with the reviewers to recommend the rejection of the paper. ",ICLR2021, +EbSHJ5LuEnZ,1610040000000.0,1610470000000.0,1,ATp1nW2FuZL,ATp1nW2FuZL,Final Decision,Accept (Poster),"The reviewers are enthusiastic about this work, and the few comments that they had were appropriately addressed by the reviewers.",ICLR2021, +HU6BjGlvDl,1576800000000.0,1576800000000.0,1,rkglZyHtvH,rkglZyHtvH,Paper Decision,Reject,"In this paper a method for refining the variational approximation is proposed. + +The reviewers liked the contribution but a number reservations such as missing reference made the paper drop below the acceptance threshold. The authors are encouraged to modify paper and send to next conference. + +Reject. ",ICLR2020, +SJwAVyTrz,1517250000000.0,1517260000000.0,466,B1NOXfWR-,B1NOXfWR-,ICLR 2018 Conference Acceptance Decision,Reject,"Paper presents and interesting new direction, but the evaluation leaves many questions open, and situation with respect to state of the art is lacking",ICLR2018, +AZ_pu0JgN_,1576800000000.0,1576800000000.0,1,S1g2skStPB,S1g2skStPB,Paper Decision,Accept (Talk),"This paper proposes an RL-based structure search method for causal discovery. The reviewers and AC think that the idea of applying reinforcement learning to causal structure discovery is novel and intriguing. While there were initially some concerns regarding presentation of the results, these have been taken care of during the discussion period. The reviewers agree that this is a very good submission, which merits acceptance to ICLR-2020.",ICLR2020, +r1lb5fiByE,1544040000000.0,1545350000000.0,1,B1lnjo05Km,B1lnjo05Km,Structural regularizations imposed on layers. ,Reject,"The work presents a method of imposing harmonic structural regularizations to layers of a neural network. While the idea is interesting, the reviewers point out multiple issues. + +Pros: ++ Interesting method ++ Hidden layer coherence tends to improve + +Cons: +- Deficient comparisons to baselines or context with other works. +- Insufficient assessment of impact to model performance. +- Lack of strategy to select regularizers +- Lack of evaluation on more realistic datasets",ICLR2019,4: The area chair is confident but not absolutely certain +4xT9h4qpyGN,1610040000000.0,1610470000000.0,1,eYgI3cTPTq9,eYgI3cTPTq9,Final Decision,Reject,"The paper describes very interesting work that advances the state of the art in Zork by going beyond an important state bottleneck. While there is an important engineering contribution, the reviewers raised important concerns regarding the novelty of the question-answering approach to construct knowledge graphs, the clarity of the backtracking heuristic and the extent to which the proposed approach outperforms previous work. I also read the paper and I agree with the concerns of the reviewers. In particular, I encourage the authors to provide more details about the backtracking procedure, a formal description of the algorithm and its assumptions to help readers apply the approach in other domains, as well as a formal analysis to better understand when it will or will not pass a bottleneck state.",ICLR2021, +riZnDqu_jI,1576800000000.0,1576800000000.0,1,HJlA0C4tPS,HJlA0C4tPS,Paper Decision,Accept (Spotlight),"This paper proposes an unsupervised text style transfer model which combines a language model prior with an encoder-decoder transducer. They use a deep generative model which hypothesises a latent sequence which generates the observed sequences. It is trained on non-parallel data and they report good results on unsupervised sentiment transfer, formality transfer, word decipherment, author imitation, and machine translation. The authors responded in depth to reviewer comments, and the reviewers took this into consideration. This is a well written paper, with an elegant model and I would like to see it accepted at ICLR. ",ICLR2020, +AcwZd9VHlQ,1610040000000.0,1610470000000.0,1,h8q8iZi-ks,h8q8iZi-ks,Final Decision,Reject,"The paper proposes to address the out-of-distribution generalization problem by means of conditional computation in form of a feature modulating module. +While the approach is interesting and brings a new take on how to perform feature modulation (although initially felt too similar to Conditional Batch Normalization) some major concerns about the experiments and validation of the approach are raised by all reviewers. Some of the hypothesis made are also challenged due to lack of proper validation. +Although the discussion clarified some points I am afraid many open questions are left unanswered and would require a more work to be fully addressed before acceptance.",ICLR2021, +ft2jRGTdVV,1642700000000.0,1642700000000.0,1,WnOLO1f50MH,WnOLO1f50MH,Paper Decision,Reject,"The authors study separable convolutions in the group-convolutional setting, and describe experiments showing them to be more computationally efficient without loss of performance in the setting of some group-augmented MNISTs, and show some promising results on un-augmented CIFAR10, CIFAR100, and Galaxy10. The reviewers are mixed; some of the reviewers have concerns about the completeness of the experiments and the novelty of the work, and in particular to what extent the experiments support the specific novelties claimed. The authors have made some updates to address this in the revision, but my opinion is that the authors should resubmit to the next venue after further experiments and exposition to clarify.",ICLR2022, +nZ8L3Xg9NG-,1642700000000.0,1642700000000.0,1,uPv9Y3gmAI5,uPv9Y3gmAI5,Paper Decision,Accept (Poster),"The paper studies the problem of task-specific model compression obtained from fine-tuning large pre-trained language models. The work follows the line of research in which model size is reduced by decomposing the matrices in the model into smaller factors. Two-step approaches apply SVD and then fine-tuned the model on task specific data. The present work makes the observation that after the first step (the SVD compression) the model can dramatically lose its performance, due to the mismatched optimization objectives between the low-rank approximation and the target task. The work provides evidence backing this claim. The paper proposes to address this problem by weighting the importance of parameters for the factorization according to the Fisher information. Experimental evaluation shows that the proposed method can achieve better results than variants that use truncated SVD of the weight matrices. + +The paper is well written and easy to read. The method is simple and effective and can be applied to in a wide range of settings. The authors provided a thorough response which clarified several points. This led Reviewer Kuwu to increase the score to 6. + +All three reviewers agree that the main observation in the work is interesting and informative for researchers and practitioners working on the problem. + +Reviewer jnTC points out that the paper would have been stronger if it included theoretical exploration of the reasons behind the ""importance of low SVs"" phenomenon. + +Reviewer Kuwu and jnTC consider the results marginally novel. Reviewer Kuwu considered the significance of the reported results to be limited, and put the work marginally above the acceptance threshold. Reviewer jnTC disagrees with this view, considers and appreciates the generality of the method and the fact that it can work well even for compressed models, while improving in accuracy by a few percent over competing approaches which result in similar parameter counts. The AC agrees with Reviewer jnTC. + +Overall all reviewers consider the paper borderline but recommend accepting the paper. The AC overall the topic important (reducing the footprint of language models), the method simple and well motivated. The empirical evaluation is very thorough and shows clear gains across a large number of settings.",ICLR2022, +rJxnmyTHM,1517250000000.0,1517260000000.0,216,S1uxsye0Z,S1uxsye0Z,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The reviewers agreed that the work addresses an important problem. There was disagreement as to the correctness of the arguments in the paper: one of these reviewers was eventually convinced. The other pointed out another two issue in their final post, but it seems that 1. the first is easily adopted and does not affect the correctness of the experiments and 2. the second was fixed in the second revision. Ideally these would be rechecked by the third reviewer, but ultimately the correctness of the work is the authors' responsibility. + +Some related work (by McAllister) was pointed out late in the process. I encourage the authors to take this related work seriously in any revisions. It deserves more than two sentences.",ICLR2018, +xJVMzdbEhf,1642700000000.0,1642700000000.0,1,iulEMLYh1uR,iulEMLYh1uR,Paper Decision,Accept (Poster),"Overall the reviewers like the ideas in this paper. It calls out some of the issues with the current line of thinking in the ML/AI community. There were some concerns, but overall this paper offers a new way to think about, present, and question efficiency results. This could be quite infulential. I think this is interesting enough to warrent publicaiton.",ICLR2022, +HJsG6MIOg,1486400000000.0,1486400000000.0,1,S1TER2oll,S1TER2oll,ICLR committee final decision,Accept (Poster),"Using covariance analysis for designing convolution connection structure is a nice and novel idea. All three reviewers recommended acceptance. Since the reviewers were not confident about the theoretical derivations, the AC asked for an opinion from an additional reviewer. This reviewer also found the paper interesting and novel, and recommended acceptance.",ICLR2017, +HJ4981TBG,1517250000000.0,1517260000000.0,841,rkeZRGbRW,rkeZRGbRW,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers found a number of short-comings in this work that would prevent it from being accepted at ICLR in its current form, both in terms of writing (not specifying the loss function), experiments that are too limited, and inconclusive comparisons with existing regularization techniques. I recommend the authors take into account the feedback from reviewers in any follow-up submissions.",ICLR2018, +VVjyDGexwfp,1610040000000.0,1610470000000.0,1,MCe-j2-mVnA,MCe-j2-mVnA,Final Decision,Reject,"The paper aims to address several challenges in learning neural network-based optimization algorithms by increasing the #unrolled steps, increasing the #training tasks, and exploring new parameterizations for the learning optimizer. The authors demonstrated the effectiveness of applying persisted Evolution Stratergies and backdrop through over 10,000 inner-loop steps can improve the performance of the learned optimizer. Empirical experiments showcased incorporating LSTM to the previous state-of-the-art improve their training performance. + +There are a lot of interesting ideas in the paper. However, packaging them together and only glance over each idea briefly unfortunately dilutes the contribution and the novelty of the work. There are still some major concerns echoed among the reviewer: + +1) The proposed hierarchical optimizer seems interesting. It is one of the major contributions of the paper. But, its architecture was only briefly mentioned in Sec 3.3. Its motivation, implementation and the corresponding engineering choices remain unclear by just reading the main text. Some of the details were discussed in the appendix but it would be of great interest if authors could give some intuition on which subset of the tasks the proposed architecture gives the most improvement / failure among the 6000 tasks. + +2) Training the optimizer on a diverse set of tasks is crucial for the learned optimizer to generalize. One of the paper's contributions is to further expand the task dataset from the prior work Metz et al., (2020). The authors have conducted very thorough experiments on this new dataset, which is amazing. I would argue there are even enough results for another standalone paper. However, there is surprisingly little detail on how the newly proposed dataset differs from the prior TaskSet dataset. What are the new optimization problems? How are they different from the family of tasks in TaskSet? A TSNE plot of the tasks similar to Figure 1 from Metz et al. (2020) could provide more intuition for the reader and highlight the contribution. + +Overall, if the authors could provide more insight into their experiments and the proposed methods, it would help the readers greatly to see the novelty and the contribution of the paper. The current version of the paper will need additional development and non-trivial modifications to be broadly appreciated by the community. + +",ICLR2021, +fXYn1oD1qZ_,1610040000000.0,1610470000000.0,1,EZ8aZaCt9k,EZ8aZaCt9k,Final Decision,Reject,"This paper studies an interesting problem: the landscape of neural networks. I agree with the authors' comment that this work improves our understanding of one aspect of neural networks, and I do find the result of this paper is of interest to some extent. Reviewer 5 pointed out the technique used in the paper is interesting, and Reviewer 3 has shown interest in the techniques (and indicated the possibility of increasing the score). Nevertheless, a few reviewers questioned the requirement of the large width; I do not think having a large width itself is necessarily an issue (even in the presence of convergence results on NTK), but it is necessary to clearly explain the context and the relation/differences with closely related works in the literature. In the current form, the paper probably has not reached the bar of acceptance, thus I recommend reject. ",ICLR2021, +H1eKXHHlgN,1544730000000.0,1545350000000.0,1,rygnfn0qF7,rygnfn0qF7,"Reasonable results, limited novelty",Reject,"This paper proposes to pre-train hierarchical document representations for use in downstream tasks. All reviewers agreed that the results were reasonable. + +However, the methodological novelty is limited. While I believe there is a place for solid empirical results, even if not incredibly novel, there is also little qualitative or quantitative analysis to shed additional insights. + +Given the high quality bar for ICLR, I can't recommend the paper for acceptance at this time.",ICLR2019,4: The area chair is confident but not absolutely certain +Syg-Y_e4eE,1544980000000.0,1545350000000.0,1,HyEl3o05Fm,HyEl3o05Fm,Significance of applying a GAN - VAE combination to video is too limited,Reject,"This paper shows that combining GAN and VAE for video prediction allows to trade off diversity and realism. The paper is well-written and the experimentation is careful, as noted by reviewers. However, reviewers agree that this combination is of limited novelty (having been used for images before). Reviewers also note that the empirical performance is not very much stronger than baselines. Overall, the novelty is too slight and the empirical results are not strong enough compared to baselines to justify acceptance based solely on empirical results.",ICLR2019,4: The area chair is confident but not absolutely certain +NligvqpEYK,1576800000000.0,1576800000000.0,1,SyxD7lrFPH,SyxD7lrFPH,Paper Decision,Reject,"This submission has been assessed by three reviewers and scored 3/6/1. The reviewers also have not increased their scores after the rebuttal. Two reviewers pointed to poor experimental results that do not fully support what is claimed in contributions and conclusions. Theoretical support for the reconstruction criterion was considered weak. Finally, the paer is pointed to be a special case of (Zhang 2019). While the paper has some merits, all reviewers had a large number of unresolved criticism. Thus, this paper cannot be accepted by ICLR2020. +",ICLR2020, +dVh8sdY8Xlv,1642700000000.0,1642700000000.0,1,xnYACQquaGV,xnYACQquaGV,Paper Decision,Accept (Poster),"This paper tackles the neural contextual bandit problem, for which existing approaches consists rely on bandit algorithms based on deep neural networks to learn reward functions. In these existing strategies, exploration takes place over the entire network parameter space, which can be inefficient for the large-size networks typically used in NTK-based approaches. In this work, the authors address this by building on an existing technique of shallow exploration, which consists in exploring over the final layer of the network only, allowing to decouple the deep neural network feature representation learning from most of the exploration of the network parameters. More specifically, they propose a simple and effective UCB-based strategy using this shallow exploration scheme, for which they provide a theoretical analysis. The proposed approach builds on several ideas for previous works, including borrowing proof techniques and theoretical arguments. Although this limits the novelty of the work, connecting these ideas together is not obvious and constitutes a significant contribution. Moreover, the proposed approach fixes an important known issue due to the matrix inversion in LinUCB, which could have a strong impact on the bandit community.",ICLR2022, +#NAME?,1610040000000.0,1610470000000.0,1,xCy9thPPTb_,xCy9thPPTb_,Final Decision,Reject,"The paper presents an approach that supports better performance when out of distribution cases occur, by letting neurons be of only compact support and thus if the input is out of distribution (OOD). + +Pros: +- The proposed strategy is interesting and may be useful. + +Cons: +- The choice of the parameter alpha, whose value is crucial to the success in experiments, is left murky. The approach suggested by the authors was not validated experimentally. +- There is insufficient comparison to recent works.",ICLR2021, +-85RM5mGddT,1610040000000.0,1610470000000.0,1,hsFN92eQEla,hsFN92eQEla,Final Decision,Accept (Poster),"The paper questions the use of cross-entropy loss for classification tasks and shows that using squared error loss can work just as well for deep neural networks. The authors conduct extensive experiments across ASR, NLP, and CV tasks. Comparing cross-entropy to squared error loss is certainly not novel, but the conclusions of the paper, backed by a lot of experimental evidence, are certainly thought-provoking. + +I would have liked to see a bit more analysis into the results of the paper, and perhaps a bit more theoretical justification. That said, the paper will be of interest to the community, given the ubiquity of classification tasks. +",ICLR2021, +ZnsDK63P4h,1576800000000.0,1576800000000.0,1,H1eKT1SFvH,H1eKT1SFvH,Paper Decision,Reject,"This works presents a method for inferring the optimal bit allocation for quantization of weights and activations in CNNs. The formulation is sound and the experiments are complete. However, the main concern is that the paper is very similar to a recent work by the authors, which is not cited.",ICLR2020, +Pnak9yMhNtb,1642700000000.0,1642700000000.0,1,cOtBRgsf2fO,cOtBRgsf2fO,Paper Decision,Accept (Poster),"All reviewers concur on the fact that the paper contains solid ideas. The discussion helped clarify the case of class-imbalance and no major concerns remained after discussion phase. I thank the authors for the additional details on execution time / complexity. + +On a separate note and perhaps to dig further in the paper's ideas, + +1- the validity of the Gaussian assumption carried in the paper was raised (e.g. ercK), but I would like to point out that Theorem 2 can also be derived for general exponential families given the objective in (2), with perhaps a reformulation of the trace constraint (still, this would imply the knowledge of the exponential family for the KL divergence to simplify). + +2- when it comes to protecting labels, the authors might want to have a look at the rich literature on learning from label proportions, which shows that the knowledge of the class is not necessary to learn a supervised model (see for example Patrini et al, NeurIPS / NIPS 2014). Thus, protecting the class could in fact be more achievable than by just considering that learning “needs observed classes”.",ICLR2022, +mJuBOabWNeY,1610040000000.0,1610470000000.0,1,MP0LhG4YiiC,MP0LhG4YiiC,Final Decision,Reject,"The authors propose a new dataset and compositional task based on the EPIC Kitchens dataset. The goal is to test novel compositions and to build a transformer based network specifically for this inference (by analogy). Specifically, the analogy here references the use of nearest neighbors in the dataset. There are a lot of concerns raised by reviewers which require a large number of changes to the presentation of the manuscript and they are not at present convinced by the current setup or experiments. Explicitly motivating which pretraining methods do or do not violate which aspects of composition and what role other factors like synonymy play in generalization is necessary. Several aspects of the claims made in the paper and in the discussion are big claims that require substantial discussion and analysis (e.g. the surprising weakness of pretrained models) which the reviewers do not feel can be so easily explained away (e.g. by domain shift).",ICLR2021, +S1LszJpSz,1517250000000.0,1517260000000.0,3,Hk2aImxAb,Hk2aImxAb,ICLR 2018 Conference Acceptance Decision,Accept (Oral),"As stated by reviewer 3 ""This paper introduces a new model to perform image classification with limited computational resources at test time. The model is based on a multi-scale convolutional neural network similar to the neural fabric (Saxena and Verbeek 2016), but with dense connections (Huang et al., 2017) and with a classifier at each layer."" +As stated by reviewer 2 ""My only major concern is the degree of technical novelty with respect to the original DenseNet paper of Huang et al. (2017). "". The authors assert novelty in the sense that they provide a solution to improve computational efficiency and focus on this aspect of the problem. Overall, the technical innovation is not huge, but I think this could be a very useful idea in practice. +",ICLR2018, +lpJWcFOdJr,1576800000000.0,1576800000000.0,1,SylkYeHtwr,SylkYeHtwr,Paper Decision,Accept (Spotlight),"The paper proposes a new way to train latent variable models. The standard way of training using the ELBO produces biased estimates for many quantities of interest. The authors introduce an unbiased estimate for the log marginal probability and its derivative to address this. The new estimator is based on the importance weighted autoencoder, correcting the remaining bias using russian roulette sampling. The model is empirically shown to give better test set likelihood, and can be used in tasks where unbiased estimates are needed. + +All reviewers are positive about the paper. Support for the main claims is provided through empirical and theoretical results. The reviewers had some minor comments, especially about the theory, which the authors have addressed with additional clarification, which was appreciated by the reviewers. + +The paper was deemed to be well organized. There were some unclarities about variance issues and bias from gradient clipping, which have been addressed by the authors in additional explanation as well as an additional plot. + +The approach is novel and addresses a very relevant problem for the ICLR community: optimizing latent variable models, especially in situations where unbiased estimates are required. The method results in marginally better optimization compared to IWAE with much smaller average number of samples. The method was deemed by the reviewers to open up new possibilities such as entropy minimization. ",ICLR2020, +zmGDKVCQHnM,1642700000000.0,1642700000000.0,1,_QLmakITKg,_QLmakITKg,Paper Decision,Accept (Poster),"This paper proposes a federated learning (FL) scheme that is suitable for clients/devices with heterogeneous resources. The scheme Split-Mix trains multiple models of different sizes and adversarial-robustness levels, which are tailored to the budgets of the individual device. Empirical results show encouraging results. + +It is clear that FL will have to work with clients with diverse resources, a point that is appreciated. Indeed, it is anticipated that widely-dispersed inference will have to deal with a highly-heterogeneous mix of clients. The study is quite thorough. One aspect that is not convincing in the experiments is the budgets being exponentially distributed: having a strong concentration around a mean (with something like a Gaussian tail), or a power-law distribution, would be more suitable.",ICLR2022, +eUJ8ab-Ckf9,1610040000000.0,1610470000000.0,1,IqtonxWI0V3,IqtonxWI0V3,Final Decision,Accept (Poster),"This paper analyses linear regions in ReLU networks using a new algorithm for extracting linear terms based on the data. The reviewers found the paper to be well written with sound results. While the paper itself provides only modest evidence of the algorithm’s utility (mainly in terms of highlighting some distinctions between fully-connected and convolutional networks), the algorithm and the corresponding new paradigm of exploring linear terms rather than counting regions may prove useful in future analyses. Altogether, I think this paper will interest theorists focusing on ReLU networks and I recommend acceptance.",ICLR2021, +PN7vqUphQ6,1576800000000.0,1576800000000.0,1,SklE_CNFPr,SklE_CNFPr,Paper Decision,Reject,"The paper proposes an adaptive sampling mechanism for zeroth order optimization that samples perturbed points from a mixture distribution with asymptotic convergence guarantees. The reviewers raised issues regarding the clarity of presentation, potential problems with the proofs, and simplicity of the experimental setup. The authors did not provide a response. Overall, the reviewers agree that the quality of the paper is not sufficient for publishing, and therefore I recommend rejection.",ICLR2020, +HyVbIkTrG,1517250000000.0,1517260000000.0,721,r1kNDlbCb,r1kNDlbCb,ICLR 2018 Conference Acceptance Decision,Reject,"As expressed by most reviewers, the idea of the paper is interesting: using summarization as an intermediate representation for an auto encoder. In addition, a GAN is used on the generator output to encourage the output to look like summaries. They just need unpaired summaries. Even if the idea is interesting, from the committee's perspective, important baselines are missing in the experimental section: why would one choose to use this method if it is not competitive with other baselines that have proposed work in this vein? One reviewer brings up the point that the method is significantly worse than a supervised baseline. Moreover, the authors mention the work of Miao and Blunsom, but could have used one of their experimental setups to show that at least in the semi-supervised scenario, this work empirically performs as well or better than that baseline.",ICLR2018, +aR2YYoPRmVyb,1642700000000.0,1642700000000.0,1,SF9o3-yP1WR,SF9o3-yP1WR,Paper Decision,Reject,"The paper talks about a novel setting in Federated Learning and argues that personalization methods may cause the personalized models to overfit on spurious features, thereby increasing the accuracy disparity compared to the global model. To this end the authors propose a debiasing strategy using a global model and adversarial tranferability. + + There were some positive opinion about the problem being interesting .However reviewers had several concerns about the validity of assumption and hand wavy arguments used in the solutions for existence adversarial tranferability. Overall, the settings and the need for removing personalization bias needs to be validated more convincingly and rigorously, with concrete real scenarios and experiments.",ICLR2022, +0TlkzlVy27,1576800000000.0,1576800000000.0,1,Bylp62EKDH,Bylp62EKDH,Paper Decision,Reject,"The authors propose a novel distance metric learning approach. Reviews were mixed, and while the discussion was interesting to follow, some issues, including novelty, comparison with existing approaches, and impact, remain unresolved, and overall, the paper does not seem quite ready for publication. ",ICLR2020, +x0PV5ximCyL,1610040000000.0,1610470000000.0,1,_L6b4Qzn5bp,_L6b4Qzn5bp,Final Decision,Reject,"The paper presents a GCN-based solution with a distance aware pooling method for diagnosis and prognosis of COVID-19 based on CT-scan. It aims to address an important and timely problem. The proposed solution is reasonable. + +The paper receives mixed ratings, and therefore we had extensive discussions. It is agreed by all of us that + +(1) the novel contribution of the proposed method is relatively low compared with standard ICLR papers; + +(2) the evaluation is interesting, but could be improved with state-of-art baselines on CT-scan (not limited to GCN-based method); + +(3) the authors have improved the writing of the paper significantly, which convinces two reviewers to elevate their scores. + +The paper addresses a timely topic, but there is still room for improvement in methodology and evaluation. We hope that the reviews can help the authors prepare a strong publication in the future. +",ICLR2021, +wsKwOmEI25r,1642700000000.0,1642700000000.0,1,ZUXZKjfptc9,ZUXZKjfptc9,Paper Decision,Reject,"The paper proposes BitRand, a bit-aware randomized response algorithm, to preserve local differential privacy in federated learning. The main idea is to take into account the bit indices and prioritise higher order bits focussed towards achieving a utility which is higher than other algorithms which are oblivious to the floating point bit representation. Additionally, the analysis allows the bit randomization probabilities not to be affected too much by the dimension of the data. + +Overall, the paper lay right at the borderline of acceptance. The paper's core idea, their development and experiments were all liked by the reviewers. The main issue the reviewers brought up was the writing and structuring of the paper and the presentation of the overall results. Most reviewers agreed that the paper presentation was not ready to match the bar for ICLR. There are multiple suggestions that reviewers have made - through their questions and direct comments and addressing those and rewriting the paper to highlight these aspects better will significantly improve the paper.",ICLR2022, +bBrYTIKIN3d1,1642700000000.0,1642700000000.0,1,2_vhkAMARk,2_vhkAMARk,Paper Decision,Accept (Spotlight),"The paper considers the saddle point problem of finding non-convex/non-concave minimax solutions. Building onEG+ of Diakonikolas et al., 2021 that works under weak MVI conditions, the work presents a new algorithm CurvatureEG+ that works for a larger range of weak MVI condition compared to previous work and also works for the constrained and composite cases. The authors show cases where this algorithm converges while the previous algorithms can be shown to reach limit cycles. Overall, this theoretical work seems strong. Most reviewers seem to agree that the contribution is good enough for publication. Compared to EG+ the additional contribution is to expand the range of weak MVI condition. While this seems like a slight improvement, looking beyond just the final convergence rate, the paper has some nice insights that provides a unifying view that captures past algorithms (like EG+ as special case). I recommend acceptance.",ICLR2022, +NMtXjqBZzgo,1610040000000.0,1610470000000.0,1,QYjO70ACDK,QYjO70ACDK,Final Decision,Accept (Spotlight),"In this paper, the authors propose a new max-sliced Wasserstein distance. Specifically, the proposed method is a multiple sliced variants of the existing max-sliced Wasserstein distance. Compared to the subspace Robust Wasserstein distance, the proposed method can be efficiently computed. + +Overall, the proposed method is a good extension of the max-sliced Wasserstein and can be used in various applications. All authors agree to accept the paper, so, I also vote for acceptance.",ICLR2021, +LtGgnFsDTH,1576800000000.0,1576800000000.0,1,S1xqRTNtDr,S1xqRTNtDr,Paper Decision,Reject,"This paper proposes a way to lean context-dependent policies from demonstrations, where the context represents behavior labels obtained by annotating demonstrations with differences in behavior across dimensions and the reduced in 2 dimensions. Results are conducted in the domain of StarCraft. The main concerns from the reviewers related to the paper’s novelty (as pointed by R2) and experiments (particularly the lack of comparison with other methods and the evaluation of only 4 out of the 62 behaviour clusters, as pointed by R3). As such, I cannot recommend acceptance, as current results do not provide strong empirical evidence about the superiority of the method against other alternatives.",ICLR2020, +dzHgYLX-BphA,1642700000000.0,1642700000000.0,1,xWRX16GCugt,xWRX16GCugt,Paper Decision,Reject,"The manuscript introduces a taxonomy for organizing continual learning research settings and a software framework that realizes this taxonomy. Each continual learning setting is represented by as a set of shared assumptions (e.g., are task IDs observed or not) represented in a hierarchy, and the software is introduced with the hopes of unifying continual learning research. + +The manuscript identifies a clear issue in the field: settings and methods for continual learning have proliferated so that there is little coherence in benchmarks, making progress difficult to judge. Reviewers generally agreed that the motivation of building software to help unify continual learning research was a positive. + +However, reviewers also pointed to many concerns with the manuscript and software package (Sequoia) that comprises its main contribution. In particular, there is concern that the software is at an early stage of development and makes heavy use of existing libraries to function (e.g. Avalanche and Continuum). This makes it unclear what Sequioa offers over using its dependencies directly. As well, there is concern that multiple standard benchmark tasks and common methods are missing from the implementation — particularly for large scale experiments with, e.g. ImageNet-1k. In theory, the library allows extension and these might be implemented by others in the community. However, this would require that the original manuscript+software are strong enough to draw buy in from other researchers. +In sum, the manuscript+software does not yet offer a convincing starting point for researchers looking for a starting point to begin their continual learning research.",ICLR2022, +AJ0KGaEK1NC,1610040000000.0,1610470000000.0,1,RepN5K31PT3,RepN5K31PT3,Final Decision,Reject,"All the reviewers questioned the significance of the result, in the sense that the qualitatively it is not clear how much of an improvement it is to replace ""min(S_T,C_T) with Lipschitz assumption"" by ""min(S_T,C_T,G_T)"". The authors' response on this point did not convince the reviewers. If the authors were to resubmit this work to a future conference, we encourage them to significantly expand on this point.",ICLR2021, +4mcTeRkuY0,1610040000000.0,1610470000000.0,1,L-88RyVtXGr,L-88RyVtXGr,Final Decision,Reject,"I agree with the concerns raised by the reviewers. In particular, the issues of novelty and experimental evaluation (mentioned in the revision summary) remain the major weak points of the paper. My impression is that the changes made in the revision represent a significant experimental addition to the paper, one which might merit a full pass through peer review, and one which in any event did not alter the reviewers' scores. While I think this paper has something to contribute (and the empirical results suggest the method may outperform competitors), I think it would be improved by a rewrite (and possibly a restructure) that makes the part that is your contribution much more clear. For example, in the abstract, it's only in the sentence ""We show both theoretically +and empirically that potential vanishing/exploding gradients problems can be mitigated by enforcing orthogonality to the shared filter bases"" that we actually get to the part that is really novel about this contribution (the ""enforcing orthogonality""): that would ideally be much earlier in the abstract.",ICLR2021, +HydZnG8Og,1486400000000.0,1486400000000.0,1,HyM25Mqel,HyM25Mqel,ICLR committee final decision,Accept (Poster),"pros: + - set of contributions leading to SOTA for sample complexity wrt Atari (discrete) and continuous domain problems + - significant experimental analysis + - long all-in-one paper + + cons: + - builds on existing ideas, although ablation analysis shows each to be essential + - long paper + +The PCs believe this paper will be a good contribution to the conference track.",ICLR2017, +fOo9MWSmX4,1576800000000.0,1576800000000.0,1,S1xXiREKDB,S1xXiREKDB,Paper Decision,Reject,"This paper proposes to use the GAN (i.e., minimax) framework for adversarial training, where another neural network was introduced to generate the most effective adversarial perturbation by finding the weakness of the classifier. The rebuttal was not fully convincing on why the proposed method should be superior to existing attacks.",ICLR2020, +StDfOFB9_7W,1610040000000.0,1610470000000.0,1,g0a-XYjpQ7r,g0a-XYjpQ7r,Final Decision,Reject,"The paper studies personalized federated learning, mixing a global model with locally trained models. Reviewers agreed on the relevance of the problem and that the work contains valuable contributions, such as the generalization bounds. +After discussion, unfortunately consensus remained that the paper remains narrowly below the bar in the current form. +Concerns remained on novelty over the Mapper optimization algorithm which also has adaptivity to the local/global combination of models, the dependence of the generalization bound on the mixing parameter as it converges to the global model, +as well as on the strength of the experimental findings compared to well-known FedAvg and related method in a realistic benchmark environment (such as e.g. Leaf), since the dataset choice (and even more its partition among clients) is a crucial aspect for measuring personalization in a fair way. We hope the feedback helps to strengthen the paper for a future occasion.",ICLR2021, +vC_M5UCkccZ,1610040000000.0,1610470000000.0,1,xYJpCgSZff,xYJpCgSZff,Final Decision,Reject,"This paper introduces an architecture based on structured causal model for long-tailed IE tasks. It incorporates the dependency tree structure of the sentence using a GCN for learning the representations. The key idea is to use counterfactual reasoning to help with the inference in attempt to reduce the impact of spurious relations. There are some concerns about the presentation of this paper. While the high level idea is reasonably clear and well motivated, the paper is quite messy with the notations and technical details. +How to use the causal effect estimation for the final prediction is not explicitly explained except for in Figure 1. +For the experiments on ED and NER, it is unclear if they assume the trigger or span is given. The method seems to need the span information to make the prediction. If span is given, this is a different set up that is much simpler compared to traditional ED and NER where the span or trigger needs to be detected as well. +There are also some question regarding the difference between this work and the prior work on using causal reasoning for improving prediction (the TDE work). One difference is the additional term in equation 8(updated version), which appears to be useful empirically, but the motivation is rather hand wavy and needs more clarification. +Overall, there are some useful ideas, but the overall novelty does not particularly stands out, and the presentation of the paper made somewhat straight forward ideas more convoluted than necessary. +",ICLR2021, +MYQvPh7Uno,1576800000000.0,1576800000000.0,1,H1e_cC4twS,H1e_cC4twS,Paper Decision,Accept (Poster),"(Please note that I am basing the meta-review on two reviews plus my own thorough read of the paper) +This paper proposes an interesting adaptation of the non-autoregressive neural encoder-decoder models previously proposed for machine translation to dialog state tracking. Experimental results demonstrate state-of-the-art for the MultiWOZ, multi-domain dialog corpus. The reviewers suggest that while the NA approach is not novel, author's adaptation of the approach to dialog state tracking and detailed experimental analysis are interesting and convincing. Hence I suggest accepting the paper as a poster presentation.",ICLR2020, +ryxjRj-Zx4,1544780000000.0,1545350000000.0,1,ryzfcoR5YQ,ryzfcoR5YQ,lack of clarity and precision in writing,Reject,"The paper proposes an interesting neural architecture for traffic flow forecasting, which is tested on a number of datasets. Unfortunately, the lack of clarity as well as precision in writing appears to be a big issue for this paper, which prevents it from being accepted for publication in its current form. However, the reviewers did provide valuable feedback regarding writing, explanation, presentation and structure, that the paper would benefit from. +",ICLR2019,5: The area chair is absolutely certain +l9ip7FxyGV0,1642700000000.0,1642700000000.0,1,14kbUbOaZUc,14kbUbOaZUc,Paper Decision,Reject,"The paper proposes a new method for representation learning of time-varying graphs which uses a streaming-snapshot model to describe graphs on different time scales and meta-learning for adaption to unseen graphs. Reviewers highlighted as strengths that the paper proposes an interesting approach for modeling temporal dynamics in graphs which of interest to the ICLR community. However, reviewers raised also concerns regarding the novelty of contributions, the empirical evaluation (also with regard to related work), as well as the clarity of presentation. In addition there was no author response. All reviewers and the AC agree therefore that the paper is not yet ready for publication at ICLR at this point.",ICLR2022, +SJlBJw6bxN,1544830000000.0,1545350000000.0,1,SJMBM2RqKQ,SJMBM2RqKQ,Paper decision,Reject,"Reviewers are in a consensus and recommended to reject. However, the reviewers did not engage at all with the authors, and did not acknowledge whether their concerns have been answered. I therefore lean to reject, and would recommend the authors to resubmit. Please take reviewers' comments into consideration to improve your submission should you decide to resubmit. + +",ICLR2019,2: The area chair is not sure +0WaOgwe3N_,1576800000000.0,1576800000000.0,1,BJe4V1HFPr,BJe4V1HFPr,Paper Decision,Reject,"This paper proposes a two-stage adversarial training approach for learning a disentangled representation of style and content of anime images. Unlike the previous style transfer work, here style is defined as the identity of a particular anime artist, rather than a set of uninterpretable style features. This allows the trained network to generate new anime images which have a particular content and are drawn in the style of a particular artist. While the approach works well, the reviewers voiced concerns about the method (overly complicated and somewhat incremental) and the quality of the experimental section (lack of good baselines and quantitative comparisons at least in terms of the disentanglement quality). It was also mentioned that releasing the code and the dataset would strengthen the appeal of the paper. While the authors have addressed some of the reviewers’ concerns, unfortunately it was not enough to persuade the reviewers to change their marks. Hence, I have to recommend a rejection.",ICLR2020, +0SX78josca,1610040000000.0,1610470000000.0,1,D9pSaTGUemb,D9pSaTGUemb,Final Decision,Reject,"This paper studies the implicit acceleration of gradient flow in over-parameterized two-layer linear models. The authors show that the amount of acceleration depends on the spectrum of the data without assuming small, balanced, or spectral initialization for the weights, and establish interesting connections between matrix factorization and Riccati differential equations. While this paper provides some interesting results regarding implicit acceleration in training linear neural networks, the reviewers raised quite a few questions and concerns about some claims made in the paper, as well as an inadequate comparison with previous work. Even after the author's response and reviewer discussion, the reviewers' doubts are still not completely cleared away. I feel the current form of the paper is slightly below the bar of acceptance, and encourage the authors to carefully address reviewers' comments in the revision.",ICLR2021, +nEseTYD8ZJj,1610040000000.0,1610470000000.0,1,DGttsPh502x,DGttsPh502x,Final Decision,Reject,"This paper proposes a simple method to discover latent manipulations in trained text VAEs. Compared to random and coordinate directions, the authors found that by performing PCA on the latent code to find directions that maximize variance, more interpretable text manipulations can be achieved. + +This paper receives 4 reject recommendations with an average score of 3.75. The reviewers have raised many concerns regarding the paper. (i) The idea is straightforward with limited novelty. (ii) There are only mostly qualitative results presented. More in-depth analysis and more solid evaluations are needed. (iii) Human evaluation is too small to draw any reliable conclusion. (iv) The proposed method is only tested on one text VAE, how well it can be generalized to other models remains unclear. + +The rebuttal unfortunately did not address the reviewers' main concerns. Therefore, the AC regrets that the paper cannot be recommended for acceptance at this time. The authors are encouraged to consider the reviewers' comments when revising the paper for submission elsewhere. ",ICLR2021, +KDtuQwA8N,1576800000000.0,1576800000000.0,1,HJx4PAEYDH,HJx4PAEYDH,Paper Decision,Reject,"The submission proposes a variant of a Transformer architecture that does not use positional embeddings to model local structural patterns but instead adds a recurrent layer before each attention layer to maintain local context. The approach is empirically verified on a number of domains. + +The reviewers had concerns with the paper, most notably that the architectural modification is not sufficiently novel or significant to warrant publication, that appropriate ablations and baselines were not done to convincingly show the benefit of the approach, that the speed tradeoff was not adequately discussed, and that the results were not compared to actual SOTA results. + +For these reasons, the recommendation is to reject the paper.",ICLR2020, +1cok-WiQq8Jn,1642700000000.0,1642700000000.0,1,dSw0QtRMJkO,dSw0QtRMJkO,Paper Decision,Accept (Poster),"The paper provides a high probability analysis for Adagrad for smooth non-convex optimization and shows its rate of convergence to critical points. Both rates for deterministic optimization and for stochastic optimization are provided. The main contribution of the paper is that unlike for SGD they don’t require knowledge of smoothness parameter in advance and second, they prove high probability results. + +The reviewers lean positively towards the paper. One of the reviewers comments about the comparison with SGD which has some merit. The main comparison of this paper is w.r.t. ward et al 2019 and Zhou et al 2018 both of which prove high probability results. However, both these works require prior knowledge of smoothness parameter. The other axis of comparison is w.r.t algorithms like spider by Fang et al 2018 which uses variance reduction type techniques to obtain the optimal rate for critical point (here it is 1/sqrt{T} for norm square which is T^{-1/4} for norm and spider is T^{-1/3} for norm). Of course, an argument can be made for the fact that the algorithm here is closer to what is used in practice and more importantly, the assumptions there are somewhat stronger. + +In any case, the paper still has interesting results and I am leaning towards an accept.",ICLR2022, +_8RzzyrrT8,1576800000000.0,1576800000000.0,1,BJepcaEtwB,BJepcaEtwB,Paper Decision,Reject,"This paper presents a new link prediction framework in the case of small amount labels using meta learning methods. The reviewers think the problem is important, and the proposed approach is a modification of meta learning to this case. However, the method is not compared to other knowledge graph completion methods such as TransE, RotaE, Neural Tensor Factorization in benchmark dataset such as Fb15k and freebase. Adding these comparisons can make the paper more convincing. ",ICLR2020, +0gTMKuMawnM,1642700000000.0,1642700000000.0,1,xVlPHwnNKv,xVlPHwnNKv,Paper Decision,Reject,"This paper aims to make Stackelberg Deep Deterministic Policy Gradients practical and efficient. The main contributions are an analysis which suggests terms involving the Hessian can be dropped and a block-diagonal approximation to an expensive matrix inversion. + +Several reviewers who voted for rejection expressed concerns about the soundness of the theoretical arguments. The response provided by the authors did help alleviate some of the reviewers’ concerns but still left significant doubts. While some of the remaining concerns could be due to a misunderstanding of the deterministic setting it is up to the authors to convince the reviewers that their arguments are sound. Given the current scores and the low confidence of the reviewer voting for acceptance, I recommend rejecting the paper in its current form.",ICLR2022, +ZkeS2qYF5C,1642700000000.0,1642700000000.0,1,rRg0ghtqRw2,rRg0ghtqRw2,Paper Decision,Reject,"This paper tackles the problem of Unsupervised Environment Design to train more robust agents. The proposed method trains RL agents by generating a curriculum of training tasks to enable agents to generalize to many tasks. The key contribution is an algorithm to generate this curriculum by incremental edits of the grid world environments. The reviewers all agreed that the paper is well-written and the method is intuitive. However, the weakness of this work is also obvious: the proposed method is only evaluated in grid worlds, and it's unclear how the editing approach can be easily generalized to more complex environments. This submission would benefit from more comprehensive evaluation in non-grid world environments, especially given that the compared baselines have results in other environments too.",ICLR2022, +t-q-4uv0Bki,1610040000000.0,1610470000000.0,1,TJzkxFw-mGm,TJzkxFw-mGm,Final Decision,Reject,"The paper shows (nearly) matching upper and lower bounds on dynamic regret for non-stationary finite-horizon reinforcement learning problems. The paper studies an important problem and the results are interesting. Some reviewers are concerned that there is not enough algorithmic and theoretical innovations in light of prior results. Authors need to improve the presentation and add a more detailed discussion on related works, the novelty and the originality of the paper, and the new algorithmic and theoretical contributions. Finally, authors can improve the submission by implementing the proposed method and adding experiments.",ICLR2021, +9PXvytFAJ,1576800000000.0,1576800000000.0,1,HJghoa4YDB,HJghoa4YDB,Paper Decision,Reject,"This paper provides convergence results for Non-linear TD under lazy training. + +This paper tackles the important and challenging task of improving our theoretical understanding of deep RL. We have lots of empirical evidence Q-learning and TD can work with NNs, and even empirical work that attempts to characterize when we should expect it to fail. Such empirical work is always limited and we need theory to supplement our empirical knowledge. This paper attempts to extend recent theoretical work on the convergence of supervised training of NN to the policy evaluation setting with TD. + +The main issue revolves around the presentation of the work. The reviewers found the paper difficult to read (ok for theory work). But, the paper did not clearly discuss and characterize the significance of the work: how limited is the lazy training regime, when would it be useful? Now that we have this result, do we have any more insights for algorithm design (improving nonlinear TD), or comments about when we expect NN policy evaluation to work? + +This all reads like: the paper needs a better intro and discussion of the implications and limitations of the results, and indeed this is what the reviewers were looking for. Unfortunately the author response and paper submitted were lacking in this respect. Even the strongest advocates of the work found it severely lacking explanation and discussion. They felt that the paper could be accepted, but only after extensive revision. + +The direction of the work is important. The work is novel, and not a small undertaking. However, to be published the authors should spend more time explaining the framework, the results, and the limitations to the reader. + +",ICLR2020, +iFpAc-yVKO,1576800000000.0,1576800000000.0,1,rkeO-lrYwr,rkeO-lrYwr,Paper Decision,Reject,"This paper investigates theories related to networks sparsification, related to mode connectivity and the so-called lottery ticket hypothesis. The paper is interesting and has merit, but on balance I find the contributions not sufficiently clear to warrant acceptance. The authors made substantial changes to the paper which are admirable and which bring it to borderline status. +",ICLR2020, +O1xfhwsOt1e,1642700000000.0,1642700000000.0,1,0Mo_5PkLpwc,0Mo_5PkLpwc,Paper Decision,Reject,"This paper aims to address the problem of cross-modal semi-supervised few-shot learning with noisy data, and proposed a robust cross-modal semi-supervised few-shot learning (RCFSL) based on Bayesian deep learning. The approach combines several existing techniques for tackling a new problem in a non-trivial approach. Empirical results demonstrate the effectiveness of the proposed method to some extent. + +While the proposed integrated complex approach seems to be novel in the proposed unique setting, there are some major concerns from the reviewers. One concern is about the lack of clear justification on technical contributions for the proposed methodology in the complex settings. In particular, it lacks of comprehensive ablation studies for analyzing and understanding the source of gains by the proposed complex method, and the baselines in the experiments also do not look strong enough. In addition, many aspects of the paper writing and presentation are not satisfied (e.g., the math formulation in Section 2 is densely presented making it difficult to follow). + +Overall, this is a borderline case, where the paper did contribute a new method for the interesting cross-modal semi-supervised few-shot learning task, but some major concerns on the weaknesses remain at its current form. Therefore, it cannot be recommended for acceptance. Nonetheless, I hope authors can improve the paper by fully addressing these issues and hope to see it accepted in the near future.",ICLR2022, +S1x3mHh-gE,1544830000000.0,1545350000000.0,1,Byxpfh0cFm,Byxpfh0cFm,Useful contributions to practice,Accept (Poster),"The paper proposes several subsampling policies to achieve a clear reduction in the size of augmented data while maintaining the accuracy of using a standard data augmentation method. The paper in general is clearly written and easy to follow, and provides sufficiently convincing experimental results to support the claim. After reading the authors' response and revision, the reviewers have reached a general consensus that the paper is above the acceptance bar. ",ICLR2019,4: The area chair is confident but not absolutely certain +Ii7NTl3DvZLk,1642700000000.0,1642700000000.0,1,lnEaqbTJIRz,lnEaqbTJIRz,Paper Decision,Accept (Spotlight),"This paper presents a novel framing of what's at stake when selecting/segmenting text for use in language model pretraining. Four reviewers with experience working with these models agreed that the conceptual and theoretical work here is insightful and worth sharing. The empirical work is fairly small-scale and does not yet support broad conclusions, but reviewers did not see such conclusions as necessary for the paper to be valuable.",ICLR2022, +ntH_yB3ApOm,1642700000000.0,1642700000000.0,1,AEa_UepnMDX,AEa_UepnMDX,Paper Decision,Reject,"The authors have addressed several of the issues raised by the reviewers, and they are strongly encouraged in include the additional experiments, and sections, that they propose, in a revised submission. The reviewers also recognized the novelty and extend of applications the proposed methodology has. Nevertheless, the paper would significantly benefit from a rigorous and thorough comparison to related work, placing it well within the context of the literature brought up by reviewers. Experimental comparisons to competitors, even if the latter address more restrictive settings, would strengthen the paper. Most importantly, the authors should consider including a comprehensive related work section, that convincingly discusses and compares to related/adjacent methods.",ICLR2022, +BJeZ-M3We4,1544830000000.0,1545350000000.0,1,S1xjdoC9Fm,S1xjdoC9Fm,Paper decision,Reject,"Reviewers are in a consensus and recommended to reject after engaging with the authors. Please take reviewers' comments into consideration to improve your submission should you decide to resubmit. +",ICLR2019,4: The area chair is confident but not absolutely certain +3ea6YtnwKl1,1642700000000.0,1642700000000.0,1,WTXMNULQ3Uu,WTXMNULQ3Uu,Paper Decision,Reject,"This paper proposes a VAE-based hierarchical generative model (Latent Object Model) to model scenes with multiple objects. +The paper would benefit from a substantial revision to improve text quality and clarity. +The experiments lack proper quantitative baselines and imputations; and the overall results are quite underwhelming relative to existing models.",ICLR2022, +AwnkKwrqTHD,1642700000000.0,1642700000000.0,1,dIVrWHP9_1i,dIVrWHP9_1i,Paper Decision,Reject,"This work tries to extend mixup to graph structured data, where graphs can differ in the number of nodes, and the space is not Euclidean. This is achieved by G-Mixup, which interpolates the generator (graphon) of different classes of graphs through the latent Euclidean space. Experimental results show some promise. + +Several concerns have been raised by the reviewers, and although the rebuttal helped, some concerns remain. For example, how to confirm that the graphon can be accurate estimated. Several weakness in experiment is also raised, and a revision is needed before the paper can be published.",ICLR2022, +qYwMaqcyMx4,1642700000000.0,1642700000000.0,1,v3aeIsY_vVX,v3aeIsY_vVX,Paper Decision,Accept (Poster),"This work proposes a hybrid autoregressive and adversarial model for sound synthesis (including but not limited to speech), conditioned on various types of control signals. Although recent adversarial approaches have gained favor over previously popular autoregressive approaches in this domain, because of their ability to produce audio signals much more quickly, the authors argue that these models tend to introduce certain types of artifacts which stem from an inability to learn accurate pitch and periodicity. They propose to address this by reintroducing some degree of autoregression, without compromising too much on inference speed. + +Reviewers praised the presentation of this work, the thoroughness of the experimental evaluation, and the audio examples provided. A few concerns were also raised regarding related work and the clarity of some parts of the paper, which the authors have taken the time to address. After the discussion phase, all reviewers chose to recommend acceptance, and I will follow their recommendation.",ICLR2022, +ryOnMk6rM,1517250000000.0,1517260000000.0,19,Hk99zCeAb,Hk99zCeAb,ICLR 2018 Conference Acceptance Decision,Accept (Oral),"The main contribution of the paper is a technique for training GANs which consists in progressively increasing the resolution of generated images by gradually enabling layers in the generator and the discriminator. The method is novel, and outperforms the state of the art in adversarial image generation both quantitatively and qualitatively. The evaluation is carried out on several datasets; it also contains an ablation study showing the effect of contributions (I recommend that the authors follow the suggestions of AnonReviewer2 and further improve it). Finally, the source code is released which should facilitate the reproducibility of the results and further progress in the field. + +AnonReviewer1 has noted that the authors have revealed their names through GitHub, thus violating the double-blind submission requirement of ICLR; if not for this issue, the reviewer’s rating would have been 8. While these concerns should be taken very seriously, I believe that in this particular case the paper should still be accepted for the following reasons: +1) the double blind rule is new for ICLR this year, and posting the paper on arxiv is allowed; +2) the author list has been revealed through the supplementary material (Github page) rather than the paper itself; +3) all reviewers agree on the high impact of the paper, so having it presented and discussed at the conference would be very useful for the community.",ICLR2018, +9Tpd15lBXLP,1610040000000.0,1610470000000.0,1,zleOqnAUZzl,zleOqnAUZzl,Final Decision,Reject,"**Problem Significance** This paper introduces an interesting taxonomy of OODs and proposed an integrated approach to detect different types of OODs. The AC agrees on the importance of a fine-grained characterization of outliers given the large OOD uncertainty space. + +**Technical contribution** The key idea of the paper is to combine the predictions from multiple existing OOD detection methods. While the AC recognizes the effort made by the authors to address the review comments, reviewers have several major standing concerns regarding limited contributions, insufficient analysis, and lack of clarity. The AC agrees with reviewers that the paper is not ready yet for ICLR publication, and can be further strengthened by: + +- (R1) reporting the computational cost for the integrated approach. The inference time for approaches such as Mahalanobis is typically a few times more expensive than the MSP baseline. The cumulative time for calculating all four scores may be non-negligible. Authors are encouraged to analyze the performance tradeoff in a future revision. +- (R2 & R3) discussing the effect of hyper-parameters tuning, which be overly complicated in practice as it involves combinations of multiple methods that each have multiple parameters to tune. +- (R3) comparing with more recent development on OOD detection and move the new results to the main paper. The AC also thinks it's worth discussing the connection and comparison to methods on quantifying uncertainty via Bayesian probabilistic approaches. +- (R2 & R4) more rigorous analysis of the benefits of the proposed integrated approach, both empirically and theoretically. Based on Table 7, the performance of using Mahalanobis alone is almost competitive as the proposed approach (except for the CIFAR10-CIFAR100 pair). This may deem further careful examination to understand what value other components are adding, and in what circumstance. +- (R2, R3 & R4) More discussion on the implication of the taxonomy and algorithm in the high-dimensional space would be valuable. The 2D toy dataset might be too simple to reflect the decision boundary as well as uncertainty space learned by NNs. Moreover, it's important to justify further how aleatoric and epistemic uncertainty manifests in the current experiments using NNs. For example, epistemic uncertainty can arise due to the use of limited samples or due to the model uncertainty associated with the model regularization. + +Recent work by Hsu et al. [2] also attempt to define a taxonomy of OOD inputs (based on semantic shift and domain shift), which can be relevant for the authors. + +**Recommendation** Three knowledgeable reviewers have indicated rejection. The AC discounted R4's rating due to the less familiarity in this area and lack of participation in the post-rebuttal discussion. + +[1] Richard Harang, Ethan M. Rudd. Towards Principled Uncertainty Estimation for Deep Neural Networks +[2] Hsu et al. Generalized ODIN: Detecting Out-of-distribution Image without Learning from Out-of-distribution Data +",ICLR2021, +HJlMRkclxE,1544750000000.0,1545350000000.0,1,rkglvsC9Ym,rkglvsC9Ym,"Well-written and promising method, but some remaining issues around theoretical justification and experimental metrics.",Reject,"The reviewers agree that the paper is well-written, and they all seem to like the general idea. One of the earlier criticisms was that you did not compare against other robust loss functions, but you have partially rectified that by comparing to L1 in the appendix. As per the request of reviewer 2 I would also compare to the Huber loss. + +One remaining concern is the lack of theoretical justification, which could help address the comment of reviewer 3 regarding blurry images from location uncertainty. The other concern is that you should compare your method using FID scores from a standard implementation so that your numbers are comparable to other papers. Some of the reviewers were impressed, but confused by your relatively low scores.",ICLR2019,5: The area chair is absolutely certain +Tex4z872T,1576800000000.0,1576800000000.0,1,BkglSTNFDB,BkglSTNFDB,Paper Decision,Accept (Poster),"In this paper, the authors extended Q-learning with UCB exploration bonus by Jin et al. to infinite-horizon MDP with discounted rewards without accessing a generative model, and proved nearly optimal regret bound for finite-horizon episodic MDP. The authors also proved PAC-type sample complexity of exploration, which matches the lower bound up to logarithmic factors. Overall this is a solid theoretical reinforcement learning work. After author response, we reached a unanimous agreement to accept this paper.",ICLR2020, +UXo1N-nb5p7,1642700000000.0,1642700000000.0,1,hC474P6AqN-,hC474P6AqN-,Paper Decision,Reject,"This work proposes a continuous disentanglement variational autoencoder. The approach is a direct extension of Sha & Lukasiewcz (2021). The proposed method appears effective in learning disentangled factors on synthetic data. However, the approach is a minor change to Sha & Lukasiewcz (2021) that samples a weighted sum over all style values. This limits the novelty of the paper. Additionally, evaluation is only on small synthetic datasets that was created for this paper. The lack of evaluation on standard datasets such as an emotion dataset as motivated in the paper, means the results may be due to data selection rather than a superior method. This raises doubts as to whether the approach would generalize to other datasets. In the rebuttal the authors state they wanted to focus on a synthetic dataset since various metrics are easily method but additional real-world/standard dataset results can be added while keeping the synthetic results.",ICLR2022, +3EFgwCIwIxo,1642700000000.0,1642700000000.0,1,sTNHCrIKDQc,sTNHCrIKDQc,Paper Decision,Accept (Poster),"This paper presents a new method for clustering multiple graphs, without vertex correspondence, by combing existing approaches on graphon estimation and spectral clustering. All reviewers agree that this is a neat paper with new theoretical and empirical results. The main concerns were also properly addressed during rebutal. Overall, it is a good paper.",ICLR2022, +ByxJjnFelE,1544750000000.0,1545350000000.0,1,SklckhR5Ym,SklckhR5Ym,meta-review,Reject,"The paper proposes an additional module to train language models, adding +a new loss that tries to predict the previous token given the next one, thus +enforcing the model to remember the past. Two out of 3 reviewers recommend to +accept the paper; the third one said it was misleading to claim SOTA since +authors didn't try the mixture-of-softmax model that is actually currently SOTA. +The authors acknowledged and modified the paper accordingly, and added a few +more experiments. The reviewer still thinks the improvements are +not important enough to claim significant novelty. Overall, I think the idea is simple and +adds some structure to language modeling, but I also concur with the reviewer about +limited improvements, which makes it a borderline paper. When +calibrating with other area chairs, I decided to recommend to reject the paper.",ICLR2019,3: The area chair is somewhat confident +S1eRZV3xx4,1544760000000.0,1545350000000.0,1,H1gNHs05FX,H1gNHs05FX,Meta-Review for Clinical Risk paper,Reject,"There was discussion of this paper, and the accept reviewer was not willing to argue for acceptance of this paper, while the reject reviewers, specifically pointing to the clarity of the work, argued for rejection. There appear to be many good ideas related to wavelets, and hopefully the authors can work on polishing the paper and resubmitting.",ICLR2019,3: The area chair is somewhat confident +5UyWnRz28x,1576800000000.0,1576800000000.0,1,Hyg5TRNtDH,Hyg5TRNtDH,Paper Decision,Reject,"The paper proposes a method called unsupervised temperature scaling (UTS) for improving calibration under domain shift. + +The reviewers agree that this is an interesting research question, but raised concerns about clarity of the text, depth of the empirical evaluation, and validity of some of the assumptions. While the author rebuttal addressed some of these concerns, the reviewers felt that the current version of the paper is not ready for publication. + +I encourage the authors to revise and resubmit to a different venue.",ICLR2020, +hNuSZbLrOyHy,1642700000000.0,1642700000000.0,1,9kBDWEmA6i,9kBDWEmA6i,Paper Decision,Reject,"This submission has been withdrawn. + +The reviews are of good quality. The authors should consider writing two separate papers: one about the problem and solution from an ML perspective, and the other about the application to radiology. Papers that provide a new method in the context of a single application domain run the risk of making a contribution to neither, and of being evaluated by reviewers who are not experts in both.",ICLR2022, +FNpR7ygHS,1576800000000.0,1576800000000.0,1,r1ecqn4YwB,r1ecqn4YwB,Paper Decision,Accept (Poster),The paper received positive recommendation from all reviewers. Accept.,ICLR2020, +rkmqVJpHM,1517250000000.0,1517260000000.0,409,H18WqugAb,H18WqugAb,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"Reviewers were somewhat lukewarm about this paper, which seeks to present an analysis of the limitations of sequence models when it comes to understanding compositionality. Somewhat synthetic experiments show that such models generalise poorly on patterns not attested during training, even if the information required to interpret such patterns is present in the training data when combined with knowledge of the compositional structure of the language. This conclusion seems as unsurprising to me as it does to some of the reviewers, so I would be inclined to agree with the moderate enthusiasm two out of three reviewers have for the paper, and suggest that it be redirected to the workshop track. + +Other criticisms found in the review have to do with the lack of any discussion on the topic of how to address these limitations, or what message to take home from these empirical observations. It would be good for the authors to consider how to evaluate their claims against ""real"" data, to avoid the accusation that the conclusion is trivial from the task set up. + +Therefore, while well written, it is not clear that the paper is ready for the main conference. It could potentially generate interesting discussion, so I am happy for it to be invited to the workshop track, or failing that, to suggest that further work on this topic be done before the paper is accepted somewhere.",ICLR2018, +DiqylQ9dzpa,1610040000000.0,1610470000000.0,1,hb1sDDSLbV,hb1sDDSLbV,Final Decision,Accept (Poster),"The paper shows that hat if the goal is to find invariant mechanisms in the data, these can be identified by finding explanations (e.g. model parameters) that are hard to vary across examples. To find those ""explanations"" it then proposes to combine gradients across examples in a ""logical AND"" fashion, i.e., pooling gradients sing a geometric mean with a logical AND masking. All reviewers agree that the direction is very interesting. While indeed mentioning sum and products of experts might be good, the overall idea is still very much interesting, also to the ICRL community, since it paves the way to apply this to larger set of machine learning methods, as actually shown in the experimental evaluation. Still, the authors should make the link to causality more obvious from the very beginning. This should also involve clarifying that ""explanations"" here do not refer to ""explanations"" as used in Explainable AI. Overall, this is an interesting and simple (in a positive sense) contribution to the question of getting at least ""more"" causal models. ",ICLR2021, +PGis0keWptB,1610040000000.0,1610470000000.0,1,v5gjXpmR8J,v5gjXpmR8J,Final Decision,Accept (Poster),"There is some positive consensus on this paper, which improved somewhat after +the very detailed rebuttal comments by the authors. The use of limited amounts of OOD data is interesting and novel. There were some experimental design problems, but these were well-addressed in rebuttal. + +A reviewer points out that +anomaly/outlier detection does not explicitly assume that there is only one +class within the normal class (or in-distribution data). The one-class +assumption is mainly made in some popular anomaly detection methods, such as +one-class classification-based approaches for anomaly detection. The authors +should take this into careful consideration when preparing a final version of +this work. +",ICLR2021, +8l9X5fc2So,1610040000000.0,1610470000000.0,1,bQNosljkHj,bQNosljkHj,Final Decision,Reject,"All reviewers express concerns, such as about the presentation, the situation of the paper w.r.t. prior work, the experimental evaluation etc., and recommend rejection.",ICLR2021, +2W5kBraSVLH,1642700000000.0,1642700000000.0,1,KntaNRo6R48,KntaNRo6R48,Paper Decision,Accept (Poster),"Canonical correlation analysis is a method for studying associations between two sets of variables. However these methods lose their effectiveness when the number of variables is larger than the number of samples. This paper proposes a method, based on stochastic gating, for solving a $\ell_0$-CCA problem where the goal is to learn correlated representations based on sparse subsets of variables. Essentially, this paper combines ideas from Yamada et al. and Suo et al. who introduced Gaussian-based relaxations of Bernoulli random variables, and sparse CCA respectively. They also extend their methods to work with nonlinear functions by integrating deep neural networks into the $\ell_0$-CCA model. They gave experimental results on various synthetic and real examples, including to feature selection on biological data. The author response addressed a number of the reviewers' concerns, including by providing additional experiments and analyzing the genes selected by their model on the METABRIC dataset. Overall this is a solid contribution both from a theoretical and experimental standpoint.",ICLR2022, +HyefcXpegE,1544770000000.0,1545350000000.0,1,H1lFZnR5YX,H1lFZnR5YX,Needs a stronger motivation and updated baselines,Reject,"While the idea of revisiting regression-via-classification is interesting, the reviewers all agree that the paper lacks a proper motivating story for why this perspective is important. Furthermore, the baselines are weak, and there is additional relevant work that should be considered and discussed.",ICLR2019,5: The area chair is absolutely certain +srWRKFK85zU,1610040000000.0,1610470000000.0,1,24-DxeAe2af,24-DxeAe2af,Final Decision,Reject,"Four knowledgeable referees have indicated reject. I agree with the most critical reviewer R4 that the model design lacks a clear and transparent motivation and that the experimental setup is not convincing, and so must reject.",ICLR2021, +HZFvq5hS1Hd,1642700000000.0,1642700000000.0,1,JedTK_aOaRa,JedTK_aOaRa,Paper Decision,Reject,"This is an interesting paper discussing differential privacy for multi-label classification. The initial reviews rated the paper with rather extreme scores, therefore I have invited an additional reviewer. This review did not clarify the issues raised by the most critical reviewer, but pointed out that the goal of showing how DP can be enforced in MLC is not fully obtained as there is a lack of the discussion concerning the MLC performance. This is also a problem raised in my comments. Taking this into account, I need to state that the paper is not ready for publication.",ICLR2022, +HJAojGLdl,1486400000000.0,1486400000000.0,1,r1y1aawlg,r1y1aawlg,ICLR committee final decision,Reject,"The paper is an interesting first step, but the reviewers found flaws in the overall argument. Further, the method is not contextualized well enough in relation to prior work.",ICLR2017, +S1IT2zUue,1486400000000.0,1486400000000.0,1,Hy0L4t5el,Hy0L4t5el,ICLR committee final decision,Reject,"This paper is clearly written, but the method isn't demonstrated to solve any problem much better than simpler approaches. To quote one reviewer, ""the work may well be significant in the future, but is currently somewhat preliminary, lacks motivation, chooses a tree structured encoder without particular motivation, and is lacking in wider comparisons.""",ICLR2017, +ryqim1aSz,1517250000000.0,1517260000000.0,211,rJ33wwxRb,rJ33wwxRb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This is a high quality paper, clearly written, highly original, and clearly significant. The paper gives a complete analysis of SGD in a two layer network where the second layer does not undergo training and the data are linearly separable. Experimental results confirm the theoretical suggestion that the second layer can be trained provided the weights don't change sign and remain bounded. The authors address the major concerns of the reviewers (namely, whether these results are indicative given the assumptions). This line of work seems very promising.",ICLR2018, +6u-0JHnvhX,1610040000000.0,1610470000000.0,1,nsZGadY22N4,nsZGadY22N4,Final Decision,Reject,"The paper is acknowledged by all the reviewers as making a novel contribution -- the proposal to reweight state-action pairs depending on the variation in their Q-value estimates during learning. However, despite its extensive reporting of numerical experiments, its arguments in favor of the proposed approach are found to be wanting on both empirical and theoretical fronts. Reviewer 3 points out (correctly, in my opinion) that 5 (or even 10 in the updated version) independent trials are not sufficient to establish the validity of the approach up to statistical significance, and that even a well-reasoned heuristic explanation of why reweighting is expected to work in terms of reducing Q-value estimation error is missing. I agree with this point, which also struck me when reading the submission myself, that at the very least, the submission ought to contain a basic (and not necessarily rigorous) argument as to why the variance reduction ostensibly achieved due to reweighting should lead the estimation algorithm to the right Q-function in a general function approximation setting. For instance, even in the simplest multi-armed bandit setting, it is of interest to ask why this procedure should perform consistently without introducing unwanted bias in an unforeseen sense, and a clear explanation offered for this would be interesting. Another important concern that most reviewers are left with is about the lack of sufficient insight into the action of the UCB mechanism against the backdrop of the reweighting procedure (reviewers 1, 2, 3). I hope that the author(s) assimilate the feedback to strengthen the paper's main pitch further and make a strong case in the near future. ",ICLR2021, +A3ZRB1ImTbw,1642700000000.0,1642700000000.0,1,1kqWZlj4QYJ,1kqWZlj4QYJ,Paper Decision,Reject,"The authors did not respond to the concerns raised by all the reviewers. As the recommendation were on the edge, this lack of engagement seems odd, and it left the reviewers with little material to discuss and revise their recommendation. We recommend the authors carefully consider the reviews if they plan to resubmit.",ICLR2022, +Byx2ZfF2y4,1544490000000.0,1545350000000.0,1,SygJSiA5YQ,SygJSiA5YQ,ICLR 2019 decision,Reject,This paper proposes an optimization algorithm based on 'weak contraction mapping'. The paper is written poorly without clear definitions and mathematical rigor. Reviewers doubt both the correctness and the usefulness of the proposed method. I strongly suggest authors to rewrite the paper addressing all the reviews before submitting to a different venue. ,ICLR2019,5: The area chair is absolutely certain +YFirzxynQ_s,1642700000000.0,1642700000000.0,1,XVPqLyNxSyh,XVPqLyNxSyh,Paper Decision,Accept (Poster),"The paper tackles the important problem of spurious feature detection in deep neural networks. Specifically, it proposes a framework to identify core and spurious features by investigating the activation maps with human supervision. Then, it produces an annotated version of the ImageNet dataset with core and spurious features, called Salient ImageNet, which is then used to empirically assess the robustness of the method against spurious training signals in comparison with current SOTA models. + +As pointed out by the reviewers, this work is not about causality and the definitions of causal and spurious features were originally vague and inaccurate. During the revision and discussion, the authors changed the terms ""causal"" features/accuracy to ""core"" features/accuracy. They also called the provided dataset ""Salient Imagenet"", instead of ""Causal Imagenet"", and changed the title to ""Salient ImageNet: How to Discover Spurious Features in Deep Learning?"". Following the prior discussion, we strongly recommend the authors discard any discussion about causality in the camera-ready version of the paper to avoid confusion. Further, we encourage the authors to consider the reviewers’ thoughts and comments in preparing the camera-ready version of their manuscript.",ICLR2022, +O_X3TrTzQu,1576800000000.0,1576800000000.0,1,BJx3_0VKPB,BJx3_0VKPB,Paper Decision,Reject,"The reviewers had a hard time fully identifying the intended contribution behind this paper, and raised concerns that suggest that the experimental results are not sufficient to justify any substantial contribution with the level of certainty that would warrant publication at a top venue. The authors have not responded, and the concerns are serious, so I have no choice but to reject this paper despite its potentially valuable topic.",ICLR2020, +HJg1wdCIxN,1545160000000.0,1545350000000.0,1,HJg6e2CcK7,HJg6e2CcK7,"interesting idea, good execution, but just below threshold",Reject,"The present work proposes to improve backdoor poisoning attacks by only using ""clean-label"" images (images whose label would be judged correct by a human), with the motivation that this would make them harder to detect. It considers two approaches to this, one based on GANs and one based on adversarial examples, and shows that the latter works better (and is in general quite effective). It also identifies an interesting phenomenon---that simply using existing back-door attacks with clean labels is substantially less effective than with incorrect labels, because the network does not need to modify itself to accommodate these additional correctly-labeled examples. + +The strengths of this paper are that it has a detailed empirical evaluation with multiple interesting insights (described above). It also considers efficacy against some basic defense measures based on random pre-processing. + +A weakness of the paper is that the justification for clean-label attacks is somewhat heuristic, based on the claim that dirty-label attacks can be recognized by hand. There is additional justification that dirty labels tend to be correlated with low confidence, but this correlation (as shown in Figure 2) is actually quite weak. On the other hand, natural defense strategies against the adversarial examples based attack (such as detecting and removing points with large loss at intermediate stages of training) are not considered. This might be fine, as we often assume that the attacker can react to the defender, but it is unclear why we should reject dirty-label attacks on the basis that they can be recognized by one detection mechanism but not give the defender the benefit of other simple detection mechanisms for clean-label attacks. + +A separate concern was brought up that the attack is too similar to that of Guo et al., and that the method was not run on large-scale datasets. The Guo et al. paper does somewhat diminish the novelty of the present work, but not in a way that I consider problematic; there are definitely new results in this paper, especially the interesting empirical finding that the Guo et al. attack crucially relies on dirty labels. I do not agree with the criticism about large-scale datasets; in general, not all authors have the resources to test on ImageNet, and it is not clear why this should be required unless there is a specific hypothesis that running on ImageNet would test. It is true that the GAN-based method might work more poorly on ImageNet than on CIFAR, but the adversarial attack method (which is in any case the stronger method) seems unlikely to run into scaling issues. + +Overall, this paper is right on the borderline of acceptance. There are interesting results, and none of the weaknesses are critical. It was unfortunately the case that there wasn't room in the program this year, so the paper was ultimately rejected. However, I think this could be a strong piece of work (and a clear accept) with some additional development. Here are some ideas that might help: + +(1) Further investigate the phenomenon that adding data points that are too easy to fit do not succeed in data poisoning. This is a fairly interesting point but is not emphasized in the paper. +(2) Investigate natural defense mechanisms in the clean-label setting (such as filtering by loss or other such strategies). I do not think it is crucial that the clean-label attack bypasses every simple defense, but considering such defenses can provide more insight into how the attack works--e.g., does it in fact lead to substantially higher loss during training? And if so, at what stage does this occur? If not, how does it succeed in altering the model without inducing high loss?",ICLR2019,2: The area chair is not sure +pm33gxoegu4,1642700000000.0,1642700000000.0,1,BrPdX1bDZkQ,BrPdX1bDZkQ,Paper Decision,Accept (Poster),"The paper presents a method for learning sequential decision making policies from a mix of demonstrations of varying quality. The reviewers agree, and I concur, that the method is relevant to the ICLR community. It is non-trivial, the empirical evaluations and theoretical analysis are rigorous, resulting in a novel method that produces near optimal policies from more readily available demonstrations. The authors revised the manuscript to reflect the reviewers' comments.",ICLR2022, +BKDt9UP9Kqm,1610040000000.0,1610470000000.0,1,ghjxvfgv9ht,ghjxvfgv9ht,Final Decision,Reject,"All reviewers agreed on the major shortcomings of this submission, the most important of which is that the contributions are insufficiently evaluated. There was no author response. ",ICLR2021, +bDQsC3MOQPH,1610040000000.0,1610470000000.0,1,g21u6nlbPzn,g21u6nlbPzn,Final Decision,Accept (Poster),This paper presents work on efficient video analysis. The reviewers appreciated the clear formulation and effective methodology. Concerns were raised over empirical validation. The authors' responses added additional material that assisted in clarifying these points. After the discussion the reviewers converged on a unanimous accept rating. The paper contains solid advances in efficient inference for video analysis and is ready for publication.,ICLR2021, +ryxw5m4GgV,1544860000000.0,1545350000000.0,1,HkMlGnC9KQ,HkMlGnC9KQ,interesting perspective but insufficient contribution,Reject,"Reviewers generally found the RKHS perspective interesting, but did not feel that the results in the work (many of which were already known or follow easily from known theory) are sufficient to form a complete paper. Authors are encouraged to read the detailed reviewer comments which contain a number of critiques and suggestions for improvement.",ICLR2019,5: The area chair is absolutely certain +u4eaOzSAxnA,1610040000000.0,1610470000000.0,1,P3WG6p6Jnb,P3WG6p6Jnb,Final Decision,Reject,The reviewers are unanimous that the submission does not clear the bar for ICLR.,ICLR2021, +pL6JZOKsA1A,1610040000000.0,1610470000000.0,1,zElset1Klrp,zElset1Klrp,Final Decision,Accept (Poster),"This paper proposes a new sparsity-inducing activation function, and demonstrates its benefits on continual learning and reinforcement learning tasks. + +After the discussion period, all reviewers agree that this is a solid paper, and so do I. I am thus recommending it for acceptance as a poster. Hopefully, such visibility (combined with the open source release of the code) will encourage other researchers to try this new technique, and we will see more evidence confirming its usefulness in more varied settings and versus stronger baselines (that remain somewhat limited in the current work: this is the main weakness of the paper).",ICLR2021, +Vtf6emHHVv,1576800000000.0,1576800000000.0,1,BygSXCNFDB,BygSXCNFDB,Paper Decision,Reject,"The paper applies the Go-Explore algorithm to text-based games and shows that it is able to solve text-based game with better sample efficiency and generalization than some alternatives. The Go-Explore algorithm is used to extract high reward trajectories that can be used to train a policy using a seq2seq model that maps observations to actions. + +Paper received 1 weak accept and 2 weak rejects. Initially the paper received three weak rejects, with the author response and revision convincing one reviewer to increase their score to a weak accept. + +Overall, the authors liked the paper and thought that it was well-written with good experiments. +However, there is concern that the paper lacks technical novelty and would not be of interest to the broader ICLR community (beyond those that are interested in text-based games). Another concern reviewers expressed was that the proposed method was only compared against baselines with simple exploration strategies and that baselines with more advanced exploration strategies should be included. + +The AC agrees with above concerns and encourage the authors to improve their paper based on the reviewer feedback, and to consider resubmitting to a venue that is more focused on text-based games (perhaps an NLP conference).",ICLR2020, +ZkfLnoKAp,1576800000000.0,1576800000000.0,1,H1MOqeHYvB,H1MOqeHYvB,Paper Decision,Reject,The paper shows an automatic piano fingering algorithm. The idea is good. But the reviewers find that the novelty is limited and it is an incremental work. All the reivewers agree to reject.,ICLR2020, +oTwFIDP58NF,1610040000000.0,1610470000000.0,1,W75l6XMzLq,W75l6XMzLq,Final Decision,Reject,"This paper extends the idea of hindsight experience replay (HER) to learn Q functions with relative goals by constructing a distribution over relative goals sampled from a replay buffer using a clustering algorithm. This approach is evaluated on three multi-goal RL environments and is shown to learn faster than baselines. + +${\bf Pros}$: +1. Faster convergence as compared to baselines +2. Interesting use of clustering in the context of HER but this choice is made without strong justifications or formal arguments + +${\bf Cons}$: +1. Some of the key choices made in this paper are not justified or explained property, e.g. - the goal sampling strategy, choices made in the clustering algorithm and associated heuristics, implicit assumptions (e.g. R1 raised the question of using L2 distance in measuring metrics between two states) +2. There are several choices made without sufficient formal arguments, verification or guarantees. + +The paper studies an interesting problem but could be made stronger by incorporating feedback received during the discussion period. ",ICLR2021, +0jdLRprDHGf,1610040000000.0,1610470000000.0,1,xW9zZm9qK0_,xW9zZm9qK0_,Final Decision,Reject,"The paper's stated contributions are: + +(1) a new perspective on learning with label noise, which reduces the problem to a similarity learning (Ie, pairwise classification) task + +(2) a technique leveraging the above to learn from noisy similarity labels, and a theoretical analysis of the same + +(3) empirical demonstration that the proposed technique surpasses baselines on real-world benchmarks + +Reviewers agreed that (1) is an interesting new perspective that is worthy of study. In the initial set of reviews, there were concerns about (2) and (3); for example, there were questions on whether the theoretical analysis studies the ""right"" quantity (pointwise vs pairwise loss), and a number of questions on the experimental setup and results (Eg, the computational complexity of the technique). Following a lengthy discussion, the authors clarified some of these points, and updated the paper accordingly. + +At the conclusion of the discussion, three reviewers continued to express concerns on the following points: + +- *Theoretical justification*. Following Theorem 3, the authors assert that their results ""theoretically justifies why the proposed method works well"". The analysis indeed provides some interesting properties of the reduction, such as the fact that it preserves learnability (Appendix F), and that the ""total noise"" is reduced (Theorem 2). However, a complete theoretical justification would involve guaranteeing that the quantity of interest (Ie, the clean pointwise classification risk) is guaranteed to be small under the proposed technique. Such a guarantee is lacking. + - This is not to suggest that such a guarantee is easy -- as the authors note, this might involve a bound that relates pointwise and pairwise classification in multi-class settings, and such bounds have only recently been shown for binary problems -- or necessary for their method being practical useful (per discussion following Theorem 3). Nonetheless, without such a bound, there are limits to what the current theory justifies about the technique's performance in terms of the final metric of interest. + +- *Comparison to SOTA*. Reviewers noted that the gains of the proposed technique are often modest, with the exception of CIFAR-100 with high noise. Further, the best performing results are significantly worse than those reported in two recent works, namely, Iterative-CV and DivideMix. The authors responded to the former in the discussion, and suggested that they might be able to combine results with the latter. While plausible, given that the latter sees significant gains (Eg, >40% on CIFAR-100), concrete demonstration of this point is advisable: it is not immediately apparent to what extent the gains of the proposed technique seen on ""simple"" methods (Eg, Forward) would translate more ""complex"" ones (Eg, DivideMix). + - In the response, the authors also mentioned that (at least the initial batch of) the experiments are intended to be a proof-of-concept. This would be perfectly acceptable for a work with a strong theoretical justification. However, per above, this point is not definitive. + +- *Creation of Clothing1M*. The authors construct a variant of Clothing1M which merges the classes 3 and 5. Given that prior work compares methods on the original data, and that this potentially reflects noise one may encounter in some settings, it is advisable to at least report results on the original, unmodified version. + +- *Issues with clarity*. There are some grammatical issues (Eg, ""is exact the""), typos (Eg, ""over 3 trails""), notational inconsistencies (Eg, use of C for # of classes in Sec 2, but then c in Sec 3.1), and imprecision in explanation (Eg, Sec 3.2 could be clearer what precise relationships are used from [Hsu et al. 2019]). + - These are minor but ought to be fixed with a careful proof-read. + +Cumulatively, these points suggest that the work would be served by further revision and review. The authors are encouraged to incorporate the reviewers' detailed comments.",ICLR2021, +BkxqorRZy4,1543790000000.0,1545350000000.0,1,rJg_NjCqtX,rJg_NjCqtX,Reject,Reject,"The area chair agrees with reviewer 1 and 2 that this paper does not have sufficient machine learning novelty for ICLR. This is competent work and the problem is interesting, but ICLR is not the right venue since the main contributions are on defining the task. All the models that are then applied are standard.",ICLR2019,4: The area chair is confident but not absolutely certain +yYanK0bHId,1642700000000.0,1642700000000.0,1,81e1aeOt-sd,81e1aeOt-sd,Paper Decision,Accept (Poster),"This paper presents a study of on-policy data in the context of model-based reinforcement learning and proposes a way to ameliorate the resulting model errors. + +This is a timely and interesting contribution, and all reviewers agree on the quality of the manuscript. +Please incorporate all the remaining feedback from the reviewers. + +Minor comment: There might be interesting points of contact between this work and the concept of objective mismatch (https://arxiv.org/abs/2002.04523)",ICLR2022, +r7aRk9U3rQj,1642700000000.0,1642700000000.0,1,rxF4IN3R2ml,rxF4IN3R2ml,Paper Decision,Reject,"This paper proposes a number of improvements to the previously-published transformer-based MQ-forecaster model for multi-horizon forecasts on time series data. They show strong empirical improvements in terms of accuracy and excess forecast variability on a large proprietary dataset, as well as four public datasets. Concerns were raised about the relatively incremental changes to the MQ-forecaster model this work is based on, lack of ablations on public data and, relatedly, inability to reproduce results on the proprietary data.",ICLR2022, +rF3us7ESVAx,1610040000000.0,1610470000000.0,1,yHeg4PbFHh,yHeg4PbFHh,Final Decision,Accept (Spotlight),The reviewers have supported the acceptance of this paper (R3 and R5 were particularly excited) so I recommend to accept this paper.,ICLR2021, +96LOw_pB2qQ,1610040000000.0,1610470000000.0,1,GEpTemgn7cq,GEpTemgn7cq,Final Decision,Reject,"In this paper, the authors study how to incorporate experimental data with interventions into existing pipelines for DAG learning. Mixing observational and experimental data is a well-studied problem, and it is well-known how to incorporate interventions into e.g. the likelihood function, along with theoretical guarantees and identifiability. Ultimately there was a general consensus amongst the reviewers that without additional theoretical results to advance the state of the art, the contribution of this work is limited.",ICLR2021, +SkxZ4XiVx4,1545020000000.0,1545350000000.0,1,ByxF-nAqYX,ByxF-nAqYX,Unconvincing novelty and empirical results,Reject,"This paper presents an LLE-based unsupervised feature selection approach. While one of the reviewers has acknowledged that the paper is well-written with clear mathematical explanations of the key ideas, it also lacks a sufficiently strong theoretical foundation as the authors have acknowledged in their responses; as well as novelty in its tight connection to LLE. When theoretical backbone is weak, the role of empirical results is paramount, but the paper is not convincing in that regard.",ICLR2019,5: The area chair is absolutely certain +U3sPdqwc7,1576800000000.0,1576800000000.0,1,SkeNlJSKvS,SkeNlJSKvS,Paper Decision,Reject,"This paper provides an interesting insight into the fitting of variational autoencoders. While much of the recent literature focuses on training ever more expressive models, the authors demonstrate that learning a flexible prior can provide an equally strong model. Unfortunately one review is somewhat terse. Among the other reviews, one reviewer found the paper very interesting and compelling but did not feel comfortable raising their score to ""accept"" in the discussion phase citing a lack of compelling empirical results in compared to baselines. Both reviewers were concerned about novelty in light of Huang et al., in which a RealNVP prior is also learned in a VAE. AnonReviewer3 also felt that the experiments were not thorough enough to back up the claims in the paper. Unfortunately, for these reasons the recommendation is to reject. More compelling empirical results with carefully chosen baselines to back up the claims of the paper and comparison to existing literature (Huang et al) would make this paper much stronger. + +",ICLR2020, +rylaJFB4xV,1545000000000.0,1545350000000.0,1,HJlYzhR9tm,HJlYzhR9tm,interesting direction but not ready for publication,Reject,"Though the overall direction is interesting, the reviewers are in consensus that the work is not ready for publication (better / larger scale evaluation is needed, comparison with other non-autoregressive architectures should be provided, esp Transformer as there is a close relation between the methods). ",ICLR2019,5: The area chair is absolutely certain +Bk9u3fUue,1486400000000.0,1486400000000.0,1,rJxDkvqee,rJxDkvqee,ICLR committee final decision,Accept (Poster),"The paper explores a model that performs joint embedding of acoustic sequences and character sequences. The reviewers agree the paper is well-written, the proposed loss function is interesting, and the experimental evaluation is sufficient. Having said that, there are also concerns that the proxy tasks used in the experiments are somewhat artificial.",ICLR2017, +Y7j1xjiMRqE,1642700000000.0,1642700000000.0,1,JZrETJlgyq,JZrETJlgyq,Paper Decision,Reject,"This paper received the initial scores with large variance. During the intensive discussion (Number of Forum replies is up to 60), the opinions reached the consensus. I have read all the materials of this paper including manuscript, appendix, comments and response. Based on collected information from all reviewers and my personal judgement, I can make the recommendation on this paper, *rejection*. Here are the comments that I summarized, which include my opinion and evidence. + +**Research Problem and Motivation** + +(1) It seems that the authors aimed to address the question that “are negative examples necessary for deep clustering?” This research problem has been proposed and addressed in BYOL and SimSiam (If I remembered correctly, some reviewer pointed this out). What the authors actually did is to add two more components, positive sampling strategy and prototypical contrastive loss, on the top of BYOL. In my eyes, it is like putting two patches on BYOL, where one of them does not work (I will explain later). + +(2) Moreover, the authors failed to clearly illustrate the drawback of BYOL. In the last sentence of the third paragraph in the Introduction part, the authors mentioned that “BYOL only optimize the alignment term, leading to unstable training and suffering from the representation collapse.” This sentence is too general, which lacks strong motivation. + +Therefore, the research problem addressed here is an incremental problem over BYOL, rather than brings new insights into the contrastive learning community. + +**Philosophy** + +Without a clear motivation, it is difficult to catch the philosophy of this paper, i.e., how their proposed components tackle BYOL’s drawbacks. Moreover, the relationship between two components is also unclear. + +**Novelty** + +I believe Reviewer Na7g has a thorough analysis of the novelty of this paper. I will not go into details here. The difference does not mean novelty. + +**Technique** + +Positive sampling strategy does not work. If we take a closer look at Table 4, the rows of BYOL and NCC with PS, there is no significant performance gain. The p-values of t-test on ACC results on CIFAR-10 and CIFAR-20 are 0.92 and 0.32, respectively. Actually, the prototypical contrastive loss is the key element to boost performance over BYOL. + +**Misleading Title** + +Based on the above point, the title is misleading. Although no negative data sample pairs are used in the training, the contractiveness on the cluster-level should also belong to the scope of contrastive learning. + +**Experiments** + +(1) In the Introduction part, the authors mentioned that SimSiam is in the same non-contrastive category with this paper. However, this paper is not included in the comparison. + +(2) The competitive methods in Table 2 and 3 are not consistent. The authors even did not report the performance of BYOL on ImageNet-1K. + +(3) Positive sampling strategy does not work. See the above Technique point. + +(4) The authors only reported the running time on CIFAR-10 and CIFAR-20. + +Therefore, the experimental results are not very convincing and solid to me. + +**Presentation** + +I believe the presentation also needs many efforts to smooth the logic. For example, “Even though Grill et al. (2020); Richemond et al. (2020) have proposed to use some tricks such as SyncBN (Ioffe & Szegedy, 2015) and weight normalization (Qiao et al., 2019) to alleviate this issue, the additional computation cost is significant.” Actually, Grill et al. (2020) is BYOL, where the authors added their components on. The computational cost of the proposed method should be heavier than BYOL. + +Based on the above points, this paper suffers from several severe issues, which makes it not self-standing.",ICLR2022, +lpKtMT65CNS,1610040000000.0,1610470000000.0,1,ZAfeFYKUek5,ZAfeFYKUek5,Final Decision,Reject,"Originality: The paper can be developed into a very nice contribution, if the value of the newly introduced optimization variance is evaluated more thoroughly (e.g., through simple theory, or through more rigorous experiments). + +Main pros: +- One of the early works studying epoch-wise double descent +- Optimization variance is an interesting concept and might very well be useful for finding good early stopping points. + +Main cons: +- Findings about epoch-wise double descent remain inconclusive +- optimization variance is not sufficiently evaluated to judge its usefulness: theoretical justification for why is this an important quantity and when does it arise naturally is something missing at this point. + +Overall: there was a consensus that the paper focuses and provides an interesting story, with new ideas; however, paper's conclusions are not strongly supported by experiments; more experiments are needed to make arguments conclusive. ",ICLR2021, +HkestptgxE,1544750000000.0,1545350000000.0,1,H1g4k309F7,H1g4k309F7,meta-review,Accept (Poster),"The paper proposes a novel way to ensemble multi-class or multi-label models +based on a Wasserstein barycenter approach. The approach is theoretically +justified and obtains good results. Reviewers were concerned with time +complexity, and authors provided a clear breakdown of the complexity. +Overall, all reviewers were positives in their scores, and I recommend accepting the paper.",ICLR2019,4: The area chair is confident but not absolutely certain +8nH6Cxet0gC,1610040000000.0,1610470000000.0,1,SHvF5xaueVn,SHvF5xaueVn,Final Decision,Accept (Poster),"Summary of discussion: Three reviewers rated the paper Good (7) while Reviewer2 disagreed. R2's criticism was focussed on how this work is placed within existing/related literature, and no technical problem was identified. The authors have addressed some of R2's comments/concerns, R2 has not participated in the discussion. + +Novelty and contributions: Overall the reviews seem consistent with an incremental paper which is technically valid, improves the state of the art on a reasonably difficult task. However, it does not appear from the reviews that the paper substantially advances our understanding of machine learning more broadly beyond this specific application. + +Experiments: There is some disagreement among reviewers on the adequacy of the experiments, with at least two reviewers calling for experiments involving 'natural photos'. I believe the author's responses adequately address these concerns: they pointed out that the key selling point of their paper is the ability to model structured noise which is less relevant in natural photos. + +On the balance of things, I think this paper should be accepted, but I wouldn't argue if it did not make the cut due to its narrow scope. For this reason, I recommended poster presentation.",ICLR2021, +M3UDqlbzv3TA,1642700000000.0,1642700000000.0,1,fPhKeld3Okz,fPhKeld3Okz,Paper Decision,Accept (Poster),"The paper proposes a plug-and-play method for solving imaging problems. Plug-and-play methods use a denoiser to solve linear inverse problems. The paper proposes a plug-and-play method and uses convex optimization tools from analyzing proximal gradient methods to provide convergence guarantees. The algorithm is applied to a variety of inverse problems showing that the method works well. + +After the discussion period, all four reviewers recommend acceptance. +- Reviewer QQES provided a detailed review and raised a few concerns including a clear motivation for and description of the denoiser, and unsupported claims, in particular related to a proof in the paper. The authors revised the paper and responded in length to the claims, in particular they detailed steps and assumptions related to the theorem in their response. As a response, the reviewer changed their score to accept. +- Reviewer xYLt strongly supports acceptance based on the strong theoretical results and a very good exposition. +- Reviewer GZzY likes the overall idea of the paper and raised a few minor concerns and questions, which were addressed by the authors. +- Finally, reviewer E8QG also appreciates the method, convergence analysis, and extensive validation. The reviewer also raised a few minor concerns and asked for clarification, and the response of the authors resolved those concerns. + +Based on my own reading and based on the reviews, I recommend acceptance. The paper provides a variant of a plug-and-play method, proves interesting convergence results for the method, and has a strong experimental evaluation of the method. I encourage the authors to take the feedback of the reviewers into account, which they have done for the most part already, and it would also be interesting to see the performance of the method for compressive sensing problems.",ICLR2022, +Ba1h0ZlPAZ0,1642700000000.0,1642700000000.0,1,kezNJydWvE,kezNJydWvE,Paper Decision,Accept (Poster),"The paper introduces an idea that was found interesting by all reviewers (including Gxxe who recommends a marginal reject). A majority of the reviewers also point out a few weaknesses of the paper, notably in terms of clarity of several statements that were found to be hand-wavy (see the reviews of Gxxe and oSPE for more precise details). The area chair agrees with those statements, but overall, the originality of the idea introduced in this paper outweighs these weaknesses, and the experimental study is conducted in a reasonably convincing manner. + +Even though there is room for improvements, the area chair is happy to recommend an accept, but encourages the authors to follow the constructive feedback provided by the reviewers for the camera-ready version.",ICLR2022, +rySUnGLdx,1486400000000.0,1486400000000.0,1,BydrOIcle,BydrOIcle,ICLR committee final decision,Accept (Poster),"The main idea of this paper is clearly presented, and its empirical claims well-demonstrated. It improves our understanding of how to train GANs.",ICLR2017, +haDMmb_GW,1576800000000.0,1576800000000.0,1,H1g79ySYvB,H1g79ySYvB,Paper Decision,Reject,"This paper proposes an extension of Gradient Episodic Memory (GEM) namely support examples, soft gradient constraints, and positive backward transfer. The authors argue that experiments on MNIST and CIFAR show that the proposed method consistently improves over the original GEM. + +All three reviewers are not convinced with experiments in the paper. R1 and R3 mentioned that the improvements over GEM appear to be small. R2 and R3 also have some concerns without results with multiple runs. R3 has questions about hyperparameter tuning. The authors also appears to be missing recent developments in this area (e.g., A-GEM). The authors did not provide a rebuttal to these concerns. + +I agree with the reviewers and recommend rejecting this paper.",ICLR2020, +sYZWm-ZVxDg,1642700000000.0,1642700000000.0,1,CO0ZuH5vaMu,CO0ZuH5vaMu,Paper Decision,Reject,"Motivated by addressing the problem of lacking parallel training data for supervised code translation, this paper proposed to construct noisy parallel source code datasets using a document similarity based approach, and empirically evaluated its effectiveness for code translation tasks. + +The paper is in general well-written, easy to follow, and the method is simple and empirical results look positive. Some major concern by reviewers is that while the proposed method is simple and may be easy to use, the overall technical novelty/contribution is limited, e.g., there generally lacks of more thorough discussions on how to deal with the critical noise issue in a more robust or sophisticated way. In addition, there were also other concerns about the experimental issues, such as datasets, metrics, ablation analysis, usability, etc. + +Overall, the paper presents some preliminary positive results for an interesting research problem, but the overall technical novelty and contributions are incremental and the paper is not strong enough for the acceptance by this conference. Nonetheless, this work could be potentially valuable for the niche area of code translation research, and authors are encouraged to continue to improve this research with more thorough investigation for a future venue.",ICLR2022, +QW6pSTI5ByH,1642700000000.0,1642700000000.0,1,Jjcv9MTqhcq,Jjcv9MTqhcq,Paper Decision,Accept (Poster),"The paper introduces a simple yet effective technique for supervised pre-training based on kNN lookup from a MoCo memory queue . Initially, the reviewers raised concerns about limited novelty with respect to neighborhood component analysis, baseline results lower than the original papers, and several other questions such as how many positive samples fall in and out of kNN. The author response was strong, adequately addressing the reviewer’s comments with additional experiments and clarifications. After the discussion period, three reviewers recommended borderline acceptance. One reviewer maintained score 5, suggesting a more exhaustive search for hyper-parameters, but indicated he/she was on the fence and would be ok if the paper is accepted. The AC considers the response of the authors regarding hyper-parameter search (and the small gap from other reported results) is reasonable, and agrees with the majority that the paper passes the acceptance bar of ICLR.",ICLR2022, +pJHb_jg7spu,1642700000000.0,1642700000000.0,1,V7eSbSAz-O8,V7eSbSAz-O8,Paper Decision,Reject,"This paper presents a framework to test the accuracy and robustness of different machine learning algorithms in classifying the COVID-19 spike sequences. After reading the paper and taking into consideration the reviewing process, here are my comments: +- The work is aligned to the efforts on understanding the COVID-19 pandemic. +- Many concepts are not novel. +- Sequences errors are not modeled in a realistic way. +- The benchmark is very limited and nonlinear machine learning approaches are presented. +- Many typos are presented. +From the above, the paper is not suitable for aacceptance in ICLR.",ICLR2022, +HyEoiMLdl,1486400000000.0,1486400000000.0,1,SJqaCVLxx,SJqaCVLxx,ICLR committee final decision,Reject,"This paper is unfortunately quite unclear and unreadable and nowhere near ready for any conference. + I would advise the authors to 1) restructure their paper to present first some type of context and identify a problem that they are trying to solve, 2) explain what novel method they propose to solve the identified problem and why this method is promising and how it relates to existing methods, 3) explain what their experiments are trying to do and what the results of the experiments are, 4) enlist someone fluent in English to help with writing and re-reading. + A way to do this is to find a set of well-cited papers in the same domain with similar ideas and see how they are structured, then try to follow similar outlines.",ICLR2017, +A4nh2hdEfp,1610040000000.0,1610470000000.0,1,5FRJWsiLRmA,5FRJWsiLRmA,Final Decision,Reject,"The reviewers were split between accept (7) and borderline reject (two 5's). All three reviewers acknowledged that the proposed approach is simple and intuitive (but this paper follows, for the most part, the concept of reservoir operation and apply it to transformers). The main criticisms were insufficient experiments (R5) and the lack of a clear conclusion (R2). I found these concerns to be valid and did not find strong reasons to overturn their recommendations. More comprehensive experiments (especially on WMT) and clear conclusions (accuracy or efficiency) would make this paper much stronger.",ICLR2021, +KDC2vRV-DV9,1610040000000.0,1610470000000.0,1,apiI1ySCSSR,apiI1ySCSSR,Final Decision,Reject,"The paper compares transfer learning with fine-tuning and joint training and then proposes a new approach (Merlin). Reviewers have pointed to the fact that Merlin works in a setting that is different from normal transfer learning settings (it assumes some target domain data is available during training). The authors acknowledge this and think it can still be a reasonable setting, but of course it makes comparisons more difficult. Overall, while there are interesting analysis and results, the paper remains borderline and more work should be done to make it a good contribution, including significantly improving the presentation to make clear the distinction in settings. I therefore recommend to reject the paper.",ICLR2021, +y7gMoMUg72R,1642700000000.0,1642700000000.0,1,D6nH3719vZy,D6nH3719vZy,Paper Decision,Accept (Spotlight),"In this paper, the authors enhance the adversarial transferability of vision transformers by introducing two novel strategies specific to the architecture of ViT models: Self-Ensemble and Token Refinement method. Comprehensive experiments on various models (including CNN's and ViT's variants) and tasks (classification, detection, and segmentation) successfully verify the effectiveness of the proposed method. + +In general, the problem studied is relevant and important. The paper is well-written and well-motivated with empirical findings. The proposed two strategies are novel, simple to implement, and effective in practice. Following the author's response and discussion, the average score increases from 6 to 7.5, with most concerns well addressed. AC believes that the paper should be highlighted at the ICLR conference.",ICLR2022, +rJ2H3zI_x,1486400000000.0,1486400000000.0,1,S1QefL5ge,S1QefL5ge,ICLR committee final decision,Invite to Workshop Track,"The authors present a framework for online structure learning of sum-product network. They overcome challenges such as being able to learn a valid sum product network and to have an online learning mechanism. Based on the extensive discussions presented by the reviewers, our recommendation is to accept this paper for a workshop.",ICLR2017, +SyID3G8Og,1486400000000.0,1486400000000.0,1,ryPx38qge,ryPx38qge,ICLR committee final decision,Reject,"The program committee appreciates the authors' response to concerns raised in the reviews. Reviewers are generally excited about the combination of predefined representations with CNN architectures, allowing the model to generalize better in the low data regime. This was an extremely borderline paper, and the PCs have determined that the paper would have needed to be further revised and should be rejected.",ICLR2017, +rklsL14-x4,1544790000000.0,1545350000000.0,1,S1fQSiCcYm,S1fQSiCcYm,Strong paper,Accept (Poster),The reviewers have reached a consensus that this paper is very interesting and add insights into interpolation in autoencoders.,ICLR2019,4: The area chair is confident but not absolutely certain +GRqFJUWdovz,1610040000000.0,1610470000000.0,1,6UdQLhqJyFD,6UdQLhqJyFD,Final Decision,Accept (Poster),"Three out of four reviewers are positive about the paper after the author response and during the discussion. + +Strengths include +* The proposed method for parameter reduction in transformers allows end-2-end learning cross-modal representations especially on long videos, which has not been possible before +* Good performance on audio and video understanding +* Extensive set of ablations + +Concerns include a somewhat incremental nature of the paper and the still large computational resources to run the experiments. +I think, both, the ideas and results are interesting to the community and recommend accept.",ICLR2021, +B1lXfSzegV,1544720000000.0,1545350000000.0,1,B1fA3oActQ,B1fA3oActQ,"Interesting method, if somewhat incremental. Experiments are reasonable but variables potentially not controlled.",Reject,"This paper proposes a new method for graph representation in sequence-to-sequence models and validates its results on several tasks. The overall results are relatively strong. + +Overall, the reviewers thought this was a reasonable contribution if somewhat incremental. In addition, while the experimental comparison has greatly improved from the original version, there are still a couple of less satisfying points: notably the size of the training data is somewhat small. In addition, as far as I can tell all comparisons with other graph-based baselines actually aren't implemented in the same toolkit with the same hyperparameters, so it's a bit difficult to tell whether the gains are coming from the proposed method itself or from other auxiliary differences. + +I think this paper is very reasonable, and definitely on the borderline for acceptance, but given the limited number of slots available at ICLR this year I am leaning in favor of the other very good papers in my area.",ICLR2019,2: The area chair is not sure +azIzKSq5FKv,1642700000000.0,1642700000000.0,1,q1QmAqT_4Zh,q1QmAqT_4Zh,Paper Decision,Reject,"This paper proposes to improve offline RL by a data augmentation technique that exploits the symmetry of the dynamics using Koopman operator. The idea is interesting but the draft at its current form has several weaknesses as pointed out by the reviewers. The scores are borderlines at this point. I read the paper and find myself agree with reviewer ohJ3 in both the lack of clarity and the gap in theory and empirical results. The math presentation still a careful check and improvement. Eq(1)-(4) are already fairly confusing (should $Q_i$ and $\pi_i$ be replaced by $Q$ and $\pi$ in Eq(1)-(4), and $\hat Q$ by $\hat Q_i$ in Eq (2)-(3)?). I would like to suggest the authors to add a self-contained algorithm box for the practical algorithm procedure. Do the readers really need to understand the full Koopman theory (section 3.1) before understanding the algorithm? The authors could think about if it is better to present the practical algorithm first with minimum math, and then analyze the property of the algorithm using the math tools (and in this case, make it clear what theoretical guarantees we get exactly). I think making the paper more accessible can help the paper gain more popularity in ML readers.",ICLR2022, +Tca4gfTlujq,1610040000000.0,1610470000000.0,1,bIwkmDnSeu,bIwkmDnSeu,Final Decision,Reject,"The paper was evaluated by 3 knowledgeable reviewers. All reviewers raised concerns about the motivation of the contribution of the paper. It is unclear why the use of an additional discriminator should reduce the variance of the log density ratio estimate. Also, the derivations were found to be not convincing or intuitive. These concerns have also not been alleviated after a rather extensive discussion of the reviewers with the authors. Moreover, the transfer setting to a new environment was unclear as it does not show how the reward function transfers to new dynamics, so the transfer experiments rather evaluate how well the algorithm can imitate a policy on a different dynamics, but it does not tell that the extracted reward function is valid. While the experimental results seem promising, the authors are encouraged to improve the motivation of contribution, check which of the ""incremental"" contributions are very necessary and improve their evaluation on the transfer scenario.",ICLR2021, +rJ8qE1arz,1517250000000.0,1517260000000.0,412,HkGJUXb0-,HkGJUXb0-,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"This paper proposes a new way of learning tensors representation with ring-structured decompositions rather than through Tensor Train methods. The paper investigates the mathematical properties of this decomposition and provides synthetic experiments. There was some debate, with the reviewers, about the novelty and impact of this method, where overall the feeling was this work was too preliminary to be accepted. The idea, from my understanding, is interesting and would benefit from discussion at the workshop track, but the authors are investigated to make a stronger case for the novelty of this method in any further work and, in particular, to consider showing empirical improvement on ""real"" data where TT methods are currently applied.",ICLR2018, +SkxBHdNSx4,1545060000000.0,1545350000000.0,1,SyxYEoA5FX,SyxYEoA5FX,meta-review,Reject,The main strength of the paper is to provide a clear mathematical characterization of invertible neural networks. The reviewers and the AC also note potential weakness including 1) the exposition of the paper can be much improved; 2) it's unclear how these analyses can help improve the training algorithm or architecture design since these characterizations are likely not computable; 3) the novelty compared to previous work Carlsson et al. 2017 may not be enough for ICLR acceptance. These weakness are considered critical issues by the AC in the decision. ,ICLR2019,5: The area chair is absolutely certain +Oq1YFGg3FBQ,1642700000000.0,1642700000000.0,1,jgAl403zfau,jgAl403zfau,Paper Decision,Reject,"This paper proposes a hardware-aware pruning method which structurally prunes the given deep neural networks to retain their accuracy while satisfying the latency constraints. Specifically, the authors formulate the latency-constrained pruning problem as a combinatorial optimization problem to find the optimal combination of neurons to maximize the sum of the importance scores, and propose an augmented knapsack solver to solve it, as well as a neuron grouping technique to speed up the training. The proposed method is validated for its classification tasks on two devices, namely Titan V and Jetson TX2, and for object detection performance on Titan V, and is shown to achieve superior accuracy/latency tradeoff compared to existing pruning methods, including latency-aware ones. + +The paper received split reviews initially, and the following is the summary of the pros and cons mentioned by the reviewers. + +Pros +- The proposed formulation of the latency-constrained pruning problem as a constrained knapsack problem is novel. +- The method achieves competitive performance against existing latency-constrained pruning methods. +- The paper is written well, with clear motivation and descriptions of the proposed method. + +Cons +- The idea is not very exciting since posing pruning as a combinatorial optimization problem, or a knapsack problem is not new, and the proposed method only adds in additional latency constraints. +- The title “hardware-aware” is vague and misleading since what the authors do are latency-constrained pruning. +- The experimental validation is only done on two devices, which makes the method less convincing as a “hardware-aware” method and how it generalizes to other devices (e.g. CPU, FPGA) +- Use of lookup tables to obtain the latency constraints is not novel, has a limited scalability, and is inefficient. +- Missing discussion of design choices. + +During the discussion period, the authors cleared away some of the concerns, which resulted in two of the reviewers increasing their scores. However, one reviewer maintained the negative rating of 5, and the positive reviewers were still concerned with limited novelty. + +I believe that this is a good paper that proposes a neat solution for latency pruning, which may have some practical impact. However, the novelty of the idea is limited, as pointed out by the reviewers. The use of lookup tables also does not seem to be an efficient solution for adapting to edge devices for which the collection of latency measurements could be slow. The experimental validation on only two devices of the same type (GPU) also seems insufficient, as how the method generalizes to diverse devices is uncertain. It would be worthwhile to consider using a latency predictor (e.g. BRP-NAS [Dudziak et al. 20]), and perform experimental validation on diverse hardware platforms (e.g. CPU and FPGA). Comparing against recently proposed hardware-aware NAS methods could be also interesting, as there has been a rapid progress on the topic recently. + +Thus, despite the overall practicality and the quality of the paper, the paper may benefit from another round of revision, since both the method and the experimental validation part could be improved. + +[Dudziak et al. 20] BRP-NAS: Prediction-based NAS using GCNs, NeurIPS 2020",ICLR2022, +N95ksqYW5Ux,1642700000000.0,1642700000000.0,1,NoE4RfaOOa,NoE4RfaOOa,Paper Decision,Reject,"There was consensus that though the paper introduces an interesting question, but not enough exploration has been made. The reviews point out several mathematical in-accuracies, and points out several issues including that the delta criterion needs to be examined.",ICLR2022, +iOf4TftVuCQ,1642700000000.0,1642700000000.0,1,Ehhk6jyas6v,Ehhk6jyas6v,Paper Decision,Reject,"This paper considers the question of whether recent concept-based learning algorithms, as well disentangled representation learning algorithms, result in high-quality representations. In particular, the authors consider what high-quality should mean in terms of the relationship with ground truth concepts and the ability to make accurate predictions for a downstream task. To this end, they propose two main metrics for representations that are explicitly or implicitly encouraged to encode concepts. While the premise of this paper has been appreciated by the reviewers, some concerns about the details of the metrics proposed and experimental results which have been raised by the reviewers remain post rebuttal. Given this, we are unable to recommend the acceptance of the paper at this time. We hope the authors find the reviewer feedback useful.",ICLR2022, +Syx7d_H1Tm,1541520000000.0,1545350000000.0,1,B1xY-hRctX,B1xY-hRctX,Interesting forward chaining approach to neural deduction,Accept (Poster)," +pros: +- The paper presents an interesting forward chaining model which makes use of meta-level expansions and reductions on predicate arguments in a neat way to reduce complexity. As Reviewer 3 points out, there are a number of other papers from the neuro-symbolic community that learn relations (logic tensor networks is one good reference there). However using these meta-rules you can mix predicates of different arities in a principled way in the construction of the rules, which is something I haven't seen. +- The paper is reasonably well written (see cons for specific issues) +- There is quite a broad evaluation across a number of different tasks. I appreciated that you integrated this into an RL setting for tasks like blocks world. +- The results are good on small datasets and generalize well + +cons: +- (scalability) As both Reviewers 1 and 3 point out, there are scalability issues as a function of the predicate arity in computing the set of permutations for the output predicate computation. +- (interpretability) As Reviewer 2 notes, unlike del-ILP, it is not obvious how symbolic rules can be extracted. This is an important point to address up front in the text. +- (clarity) The paper is confusing or ambiguous in places: + +-Initially I read the 1,2,3 sequence at the top of 3 to be a deduction (and was confused) rather than three applications of the meta-rules. Maybe instead of calling that section ""primitive logic rules"" you can call them ""logical meta-rules"". + +-Another confusion, also mentioned by reviewer 3 is that you are assuming that free variables (e.g. the ""x"" in the expression ""Clear(x)"") are implicitly considered universally quantified in your examples but you don't say this anywhere. If I have the fact ""Clear(x)"" as an input fact, then presumably you will interpret this as ""for all x Clear(x)"" and provide an input tensor to the first layer which will have all 1.0's along the ""Clear"" relation dimension, right? + +-It seems that you are making the assumption that you will never need to apply a predicate to the same object in multiple arguments? If not, I don't see why you say that the shape of the tensor will be m x (m-1) instead of m^2. You need to be able to do this to get reflexivity for example: ""a <= a"". + +-I think you are implicitly making the closed world assumption (CWA) and should say so. + +-On pg. 4 you say ""The facts are tensors that encode relations among multiple objectives, as described in Sec. 2.2."". What do you mean by ""objectives""? I would say the facts are tensors that encode relations among multiple objects. + +-On pg. 5 you say ""We finish this subsection, continuing with the blocks world to illustrate the forward +propagation in NLM"". I see no mention of blocks world in this paragraph. It just seems like a description of what happens at one block, generically. + +-In many places you say that this model can compute deduction on first-order predicate calculus (FOPC) but it seems to me that you are limited to horn logic (rule logic) in which there is at most one positive literal per clause (i.e. rules of the form: b1 AND b2 AND ... AND bn => h). From what I can tell you cannot handle deduction on clauses such as b1 AND b2 => h1 or (h2 and h3). + +-There is not enough description of the exact setup for each experiment. For example in blocks world, how do you choose predicates for each layer? How many exactly for each experiment? You make it seem on p3 that you can handle recursive predicates but this seems to not have been worked out completely in the appendix. You should make this clear. + +-In figure 1 you list Move as if its a predicate like On but it's a very different thing. On is predicate describing a relation in one state. Move is an action which updates a state by changing the values of predicates. They should not be presented in the same way. + +-You use ""min"" and ""max"" for ""and"" and ""or"" respectively. Other approaches have found that using the product t-norm t-norm(x,y) = x * y helps with gradient propagation. del-ILP discusses this in more detail on p 19. Did you try these variations? + +-I think it would be helpful to somewhere explicitly describe the actual MLP model you use for deduction including layer sizes and activation functions. + +-p. 5. typo: ""Such a parameter sharing mechanism is crucial to the generalization ability of NLM to +problems ov varying sizes."" (""ov"" -> ""of"") + +-p. 6. sec 3.1 typo: ""For ∂ILP, the set of pre-conditions of the symbols is used direclty as input of the system."" (""direclty"" -> ""directly"") + +I think this is a valuable contribution and novel in the particulars of the architecture (eg. expand/reduce) and am recommending acceptance. But I would like to see a real effort made to sharpen the writing and make the exposition crystal clear. Please in particular pay attention to Reviewer 3's comments. + +",ICLR2019,5: The area chair is absolutely certain +P5iXz_p9iYc,1642700000000.0,1642700000000.0,1,1-lFH8oYTI,1-lFH8oYTI,Paper Decision,Reject,"Thank you for your submission to ICLR. The paper proposes a simple method for improving calibration performance using a loss based upon a Dirichlet KDE. The method is appealing in its simplicity, but several reviewers (and myself) have concerns simply about the fact that the method ultimately seemed to give rather marginal improvement over the standard cross-entropy baseline. The authors attempted to address this point in the rebuttal, with their additional example on the Kather domain. And while this is a nice addition, I'm still not fully convinced that the improvement here is _that_ significant, to the point where I think it would be important to consider much broader sweeps of hyperparameters, etc, for all methods (which I believe should be reasonable here given the data set sizes). I believe this has the potential to be a nice contribution, and its simplicity can be a positive, but ultimately I think a bit of additional effort is required to show the full empirical advantages of the method.",ICLR2022, +HJxjiCyleV,1544710000000.0,1545350000000.0,1,r1NJqsRctX,r1NJqsRctX,"Interesting and novel paper, but experimental results could be more convincing ",Accept (Poster),"The reviewers all argued for acceptance citing the novelty and potential of the work as strengths. They all found the experiments a little underwhelming and asked for more exciting empirical evaluation. The authors have addressed this somewhat by including multi-modal experiments in the discussion period. The paper would be more impactful if the authors could demonstrate significant improvements on really challenging problems where MCMC is currently prohibitively expensive, such as improving over HMC for highly parameterized deep neural networks. Overall, however, this is a very nice paper and warrants acceptance to the conference.",ICLR2019,5: The area chair is absolutely certain +BdYG9tZBDR,1576800000000.0,1576800000000.0,1,BklmtJBKDB,BklmtJBKDB,Paper Decision,Reject,"The novelty of the proposed work is a very weak factor, the idea has been explored in various forms in previous work.",ICLR2020, +i18cduOMgEL,1610040000000.0,1610470000000.0,1,eEeyRrKVfbL,eEeyRrKVfbL,Final Decision,Reject,"This paper considers the problem of pruning deep neural networks (DNNs) during training. The key idea is to include DNN elements only if they improve the predictive mean of the saliency (efficiency of the DNN elements in terms of minimizing the loss function). The objective of early pruning is to preserve the sub-network that can maximize saliency. This optimization problem is NP-hard, and even approximation is very expensive. The paper proves that one can simplify the approximation by ranking the network element by predictive mean of the saliency function. + +The proposed approach is novel as most of the prior work on pruning has focused on either (i) pruning on network initialization or (ii) pruning after the network has been fully trained. + +Couple of issues with the paper are: +1. Current approach is somewhat complicated with many hyper-parameters +2. Experimental results are not very compelling when compared to pruning on network initialization + +Overall, my assessment is that the paper takes a new research direction and has the potential to inspire the community, and followup work may be able to overcome the above two issues in future. However, due to the remaining shortcomings, the paper is not judged ready for publication in its present form. I strongly encourage to resubmit the paper after addressing the above two concerns.",ICLR2021, +26sJLzrNS4,1610040000000.0,1610470000000.0,1,tV6oBfuyLTQ,tV6oBfuyLTQ,Final Decision,Accept (Poster),"The reviewers generally found the idea interesting and the contribution of the paper significant. I agree, I think this is quite a neat idea to investigate, and the paper is written well and is engaging to read. + +I would encourage the authors to take into account all of the reviewer suggestions when preparing the camera-ready version. Of particular importance is the name: I think it's bad form to appropriate a name already used in other prior work (proto-value functions, which are very well known in the RL community), so I think it is very important for the final to change the name to something that does not conflict with an existing technique. Obviously this does not affect my evaluation of the paper, but I trust that the authors will address this feedback (I will check the camera-ready).",ICLR2021, +H1tREk6BM,1517250000000.0,1517260000000.0,468,By3VrbbAb,By3VrbbAb,ICLR 2018 Conference Acceptance Decision,Reject,This paper has some interesting ideas that have been implemented in a rather ad hoc way; the presentation focuses perhaps too much on engineering aspects.,ICLR2018, +hmXv-Fp-FBH,1642700000000.0,1642700000000.0,1,pMQwKL1yctf,pMQwKL1yctf,Paper Decision,Accept (Oral),"All reviewers found that the proposed LM with Brownian motion is interesting and novel. Several reviewers raised (minor) concerns about experiments, but have been generally resolved by the authors.",ICLR2022, +S1eWHdJZxN,1544780000000.0,1545350000000.0,1,rkxhX209FX,rkxhX209FX,The paper can be improved,Reject,"The paper addresses sample-efficient robust policy search borrowing ideas from active learning. The reviews raised important concerns regarding (1) the complexity of the proposed technique, which combines many separate pieces and (2) the significance of experimental results. The empirical setup adopted is not standard in RL, and a clear comparison against EPOpt is lacking. I appreciate the changes made to address the comment, and I encourage the authors to continue improving the paper by simplifying the model and including a few baseline comparisons in the experiments.",ICLR2019,4: The area chair is confident but not absolutely certain +jc-2SeNzu3A,1642700000000.0,1642700000000.0,1,JXhROKNZzOc,JXhROKNZzOc,Paper Decision,Accept (Poster),"The authors propose a data-free quantization method that can be applied post-training quantization without backpropagation. The method takes advantage of approximate Hessian information in a certain scalable approximate way. Based on the assumptions and deductions in the paper, SQuant tries to optimize constrained absolute sum of error (CASE) instead of MSE. There are good empirical results showing the effectiveness of the method, and the paper is well written, and the method should be of broad interest.",ICLR2022, +WQFrnhfJvZv,1610040000000.0,1610470000000.0,1,cT0jK5VvFuS,cT0jK5VvFuS,Final Decision,Reject,"This paper analyzes some design choices for neural processes, paying particular attention to their small-data performance, uncertainty, and posterior contraction. This is certainly a worthwhile project, and R3 found the analysis interesting, giving the paper a score of 8. However, R1, R2, and R4 found the experimental validation to be incomplete and insufficient to support the paper's broader recommendations. As the paper is investigating the various combinations of implementations, I tend to agree with R1, R2, and R4 that this paper---while having some interesting ideas---needs a bit more precision and breadth to its experiments.",ICLR2021, +Bk76U16SG,1517250000000.0,1517260000000.0,881,Sy_MK3lAZ,Sy_MK3lAZ,ICLR 2018 Conference Acceptance Decision,Reject,The idea studied here is fairly incremental and the empirical evaluation could be improved.,ICLR2018, +BJetiDy-lE,1544780000000.0,1545350000000.0,1,SkgQBn0cF7,SkgQBn0cF7,meta review,Accept (Poster),"This paper explores the use of multi-step latent variable models of the dynamics in imitation learning, planning, and finding sub-goals. The reviewers found the approach to be interesting. The initial experiments were a main weakpoint in the initial submission. However, the authors updated the experimental results to address these concerns to a significant degree. The reviewers all agree that the paper is above the bar for acceptance. I recommend accept.",ICLR2019,4: The area chair is confident but not absolutely certain +Kln9gby-qPT,1642700000000.0,1642700000000.0,1,a1m8Jba-N6l,a1m8Jba-N6l,Paper Decision,Reject,"This paper proposes an extension of mixup (a data augmentation method) to k-mixup using optimal transport. The idea is to select randomly at each iteration two subsets of k samples and compute the optimal transport solution. Each pairs of samples assigned by the optimal transport plan will then be used to perform mixup and promote smoothness in the prediction function. The authors also provide some theoretical results about preservation of the clusters. Finally numeric experiment show the interest of k-mixup on toy and real life dataset classification and study the effect of k and the $\alpha$ parameter (of the $\beta$ distribution). + +All reviewers found the paper interesting and acknowledge that it leads to some performance improvements in practice. But they had several concerns that lead to low scores. The justification of the method an more specifically the link with the theoretical findings was found lacking, indeed the result make sens fr a large $k$ which is not was is done in practice (but experiments also show a decrease sometimes for large $k$). One interesting discussion between the proposed approach and minibatch OT is also missing. In addition the reviewers found the numerical experiments interesting but regret that some mixup approaches have not been compared and also noted a small gap in performance for the proposed approach (with no variance reported). Also the Adversarial robustness measure is now considered weak in the literature and those results could have been made stronger with more modern adversaries. Their final concern was the fact that the method now has two parameters that needs tuning and that can have a large impact on the performance for limited gain. The authors did a detailed reply a,d edition of the paper that was very appreciated by the reviewers but that did not change their opinion that this paper still deserves some more work before being accepted. + +For these reasons the AC recommend to reject the paper but strongly suggests that the authors take into account the reveiwers' comments before resubmitting to a ML venue.",ICLR2022, +B1l4Bk6SM,1517250000000.0,1517260000000.0,540,rkrWCJWAW,rkrWCJWAW,ICLR 2018 Conference Acceptance Decision,Reject,"Meta score: 5 + +The paper explores an interesting idea, addressing a known bias in truncated BPTT by sampling across different truncated history lengths. Limited theoretical analysis is presented along with PTB language modelling experimentation. The experimental part could be stronger (e.g. trying to improve over the baseline) and perhaps more than just PTB. + +Pros: + - interesting idea +Cons: + - limited analysis + - limited experimentation + +",ICLR2018, +rkphrk6HG,1517250000000.0,1517260000000.0,660,SkBHr1WRW,SkBHr1WRW,ICLR 2018 Conference Acceptance Decision,Reject,"This paper deals with the important topic of learning better graph representations and shows promise in helping to detect critical substructures of graph that would help with the interpretability of representations. Unfortunately, this work fails to accurately portray how it relates to previous work (in particular, Niepert et al, Kipf et al, Duvenaud et al) and falls short of providing clear and convincing explanations of what it can do that these models can't, without including all of them in experimental comparisons. ",ICLR2018, +SkeDko7xlE,1544730000000.0,1545350000000.0,1,Hk41X2AqtQ,Hk41X2AqtQ,"Interesting latent variable model, borderline paper due to experimental execution and novelty",Reject,"Strengths: Interesting work on using latent variables for generating long text sequences. +The paper shows convincing results on perplexity, N-gram based and human qualitative evaluation. + +Weaknesses: More extensive comparisons with hierarchical VAEs and the approach in Serban et. al in terms of language generation quality and perplexity would have been helpful. Another point of reference for which additional comparisons were desired was: ""A Hierarchical Latent Structure for Variational Conversation Modeling"" by Park et al. Some additional substantive experiments were added during the discussion period. + +Contention: Authors differentiated their work from Park et al. and the reviewer bringing this work up ended up upgrading their score to a 7. The other reviewers kept their scores at 5. + +Consensus: The positive reviewer raised their score to a 7 through the author rebuttal and discussion period. One negative reviewer was not responsive, but the other reviewer giving a 5 asserts that they maintain their position. The AC recommends rejection. Situating this work with respect to other prior work and properly comparing with it seems to be the contentious issue. Authors are encouraged to revise and re-submit elsewhere.",ICLR2019,4: The area chair is confident but not absolutely certain +SLQt1bVfh5,1610040000000.0,1610470000000.0,1,t4EWDRLHwcZ,t4EWDRLHwcZ,Final Decision,Reject,"The reviewers generally like the paper, in particular the scalability of the proposed approach. The author response and revised version clarified some questions of the reviewers, however, it didn't fully mitigate their concerns.",ICLR2021, +wsfdlTlVR7j,1610040000000.0,1610470000000.0,1,F1vEjWK-lH_,F1vEjWK-lH_,Final Decision,Accept (Spotlight),"This paper proposes a scalable optimization method for multi-task learning in multilingual models. + +Pros: +1) Addresses a problem which has not been explored much in the past +2) Presents very good analysis to show the limitations of existing methods. +3) Good results. +4) Well written + +Cons: +1) Some missing details about various choices made in the experiments (mostly addressed in the rebuttal) + +This is a very interesting and useful work and I recommend that it should be accepted. +",ICLR2021, +S1lRAhu-eN,1544810000000.0,1545350000000.0,1,Skluy2RcK7,Skluy2RcK7,"Detailed analysis of unit selectivity, but reviewers unconvinced of impact",Reject,"The paper examined the folk-knowledge that there are highly selective units in popular CNN architectures, and performed a detailed analysis of recent measures of unit selectivity, as well as introducing a novel one. The finding that units are not extremely selective in CNNs was intriguing to some (not all) reviewers. Further, they show recent measures of selectivity dramatically over-estimate selectivity. + +There was not tight agreement amongst the reviewers on the paper's rating, but it trended towards rejection. Weaknesses highlighted by reviewers include lack of visual clarity in their demonstrations, the use of a several-generations-old CNN architecture, as well as a lack of enthusiasm for the findings.",ICLR2019,4: The area chair is confident but not absolutely certain +mSmn4jrT_6i,1642700000000.0,1642700000000.0,1,SYB4WrJql1n,SYB4WrJql1n,Paper Decision,Accept (Poster),"Dear Authors, + +The paper was received nicely and discussed during the rebuttal period. The current consensus suggests the paper be accepted, but could have another round of revisions before it gets published + +- Definition of sparsity within theoretical results + clarity of results. This seems to be the main concern by one of the reviewers. +- The reviewers acknowledged that some of the concerns raised could be found somewhere in the appendix, which raises further the concern of the presentation of the results: reviewers suggest a more focused and proper dissemination of the results (main theorems in main text + explanation of the results obtained, etc), which requires another round of revisions and reviewing. + +Best AC",ICLR2022, +Sy4xXkaBG,1517250000000.0,1517260000000.0,59,S1ANxQW0b,S1ANxQW0b,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The main idea of policy-as-inference is not new, but it seems to be the first application of this idea to deep RL, and is somewhat well motivated. The computational details get a bit hairy, but the good experimental results and the inclusion of ablation studies pushes this above the bar. +",ICLR2018, +HyxnRTvyx4,1544680000000.0,1545350000000.0,1,H1lnJ2Rqt7,H1lnJ2Rqt7,not enough support from reviewers to accept this paper,Reject,"I would like to commend the authors on their work engaging with the reviewers and for working to improve training time. However, there is not enough support among the reviewers to accept this submission. The reviewers raised several important points about the paper, but I believe there are a few other issues not adequately highlighted in the reviews that prevent this work from being accepted: + +1. [premises] It has not been adequately established that ""large batch training often times leads to degradation in accuracy"" inherently which is an important premise of this work. Reports from the literature can largely be explained by other things in the experimental protocol. Even the framing of this issue has become confused since, although it may be possible to achieve the same accuracy at any batch size with careful tuning, this might require using (at worst) the same number of steps as the smaller batch size in some cases and thus result in little to no speedup. For example see https://arxiv.org/abs/1705.08741 and recent work in https://arxiv.org/abs/1811.03600 for more information. Even Keskar et al. reported that data augmentation eliminated the solution quality difference between their larger batch size and their smaller batch size experiments which indicates that even if noisiness from small batches serving to regularize training other regularization techniques can serve just as well. + +2. [baseline strength] The appropriate baseline is standard minibatch SGD w/momentum (or ADAM or whatever) algorithm with extremely careful tuning of *all* of the hyperparameters. None of the popular learning rate heuristics will always work and other optimization parameters need to be tuned as well. If learning rate decay is used, it should also be tuned especially if one is trying to measure a speedup. The submission does not provide a sufficiently convincing baseline. + +3. [measurement protocol] The protocol for measuring a speedup is not convincing without more information on how the baselines were tuned to achieve the same accuracy in the fewest steps. Approximating the protocols in https://arxiv.org/abs/1811.03600 would be one alternative. + +Additionally there are a variety of framing of issues around hyperparameter tuning, but, because they are easier to fix, they are not as salient for the decision. +",ICLR2019,5: The area chair is absolutely certain +dLRG_68gfeE,1610040000000.0,1610470000000.0,1,Ud3DSz72nYR,Ud3DSz72nYR,Final Decision,Accept (Oral),"This paper tackles the important problem of endowing deep RL agents with added interpretability. Action values are decomposed as the combination of GVFs learned on externally-specified features, offering action explanations in terms of discounted future returns in the space of interpretable quantities. Reviewers praised the approach, as well as the level of detail for reproducibility purposes. R3 had concerns about the generality of the method but follow-up experiments have allayed these concerns. Given the reviewer response and the central importance of the problem considered to the field, I can wholeheartedly recommend acceptance.",ICLR2021, +3170cGfMqSN,1610040000000.0,1610470000000.0,1,krz7T0xU9Z_,krz7T0xU9Z_,Final Decision,Accept (Poster),"The paper shows that under a very restrictive assumption on the data, ReLU networks with one hidden layer and zero bias trained by gradient flow converge two a meaningful predictor provided that the network weights are randomly initialized with sufficiently small variances. While there is some overlap with a paper by Lyu & Li (2020), the paper under review establishes its results for networks with arbitrary widths whereas using the results of Lyu & Li (2020) works, at least so far, only for sufficiently wide networks. The assumption on the data is anything than realistic and actually any ""simple, conventional"" learning algorithm can easily learn in this regime. Nonetheless, getting meaningful results for neural networks is still a notoriously difficult task and for this reason, the paper deserves publication. ",ICLR2021, +ASzGLVLxr_l,1610040000000.0,1610470000000.0,1,UHGbeVORAAf,UHGbeVORAAf,Final Decision,Reject,"This paper introduces an ensemble method to few-shot learning. +Although the introduced method yields competitive results, it is fair to say it is more complicated than much simpler algorithms and does not necessarily perform better. Given that ensembling for few-shot learning has been around for a while, it is not clear that this paper will have a significant audience at ICLR. +Sorry about the bad news, + +AC. +",ICLR2021, +I-dzn5yAVm,1576800000000.0,1576800000000.0,1,rygoURNYvS,rygoURNYvS,Paper Decision,Reject,"The paper presents CuBERT (Code Understanding BERT), which is BERT-inspired pretraining/finetuning setup, for source code contextual embedding. The embedding results are tested on classification tasks to demonstrate the effectiveness of CuBERT. + +This is an interesting application paper that extends existing models to source code analysis. The authors did a good job at motivating the applications, describing the proposed models and discussing the experiments. The authors also agree to share all the datasets and source code so that the experiment results can be replicated and compared with by other researchers. + +One major concern is the lack of strong baselines. All reviewers are concerned about this issue. The paper could lead to a good publication in the future if the issues can be addressed. ",ICLR2020, +_xE1TkD5nls,1642700000000.0,1642700000000.0,1,OMxLn4t03FG,OMxLn4t03FG,Paper Decision,Reject,"This paper studies the performance of second-order algorithms on training multi-layers over-parameterized neural networks. The authors propose an algorithm based on the Gram-Gauss-Newton method, tensor-based sketching techniques, and preconditioning to train such a network, whose runtime is subquadratic in the width of the neural network. While some reviewers provide some weak support, none of them are in strong support, even after the author's response. I think one of the reasons is the lack of empirical experiments. Since the main claim of this paper is an efficient second-order algorithm, some experiments are necessary to back up this claim. Unfortunately, the authors did not try to add such an experiment during the rebuttal. I would suggest the authors add such experiments in the revision.",ICLR2022, +Q3tqUSl1gd,1610040000000.0,1610470000000.0,1,RgDq8-AwvtN,RgDq8-AwvtN,Final Decision,Reject,"# Quality: +The technical contribution of the paper seems reasonable and there were only minor points being highlighted by the reviewers. + +# Clarity: +The paper would benefit from being more polished. During the rebuttal, the authors suggested that several reviewers misunderstood the paper. This alone should encourage the authors to improve clarity. + +# Originality: +Several reviewers presented concerns about the claims of the authors and the existence of connections to existing literature. Nonetheless, the proposed approach seems novel to the best of the reviewers and my knowledge. + +# Significance of this work: +The topic of the manuscript is relevant and impactful. However, several reviewers suggested to include additional baselines in the experiments to validate the goodness of the proposed approach. + +# Overall: +The paper presents an interesting idea, with a high potential impact. Despite the interesting topic and some interesting insights, all the reviewers agree that the manuscript is not ready for publication just yet. I want to encourage the authors to keep improve it and resubmit it at the next conference.",ICLR2021, +HJlYUPF-gV,1544820000000.0,1545350000000.0,1,Hyxu6oAqYX,Hyxu6oAqYX,"not enough theoretical guarantees, evaluations are insufficient",Reject,"The authors present an algorithm for label noise correction when the label error is a function of the input features. + +Strengths +- Well motivated problem and a well written paper. + +Weaknesses +- The reviewers raised concerns about theoretical guarantees on generalization; it is not clear why energy based auto-encoder / contrastive divergence would be a good measure of label accuracy especially when the feature distribution has high variance, and when there are not enough clean examples to model this distribution correctly. +- Evaluations are all on toy-like tasks with small training sets, which makes it harder to gauge how well the techniques work for real-world tasks. +- It’s not clear how well the algorithm can be extended to multi-class problems. The authors suggested 1-vs-all, but have no experiments or results to support the claim. + +The authors tried to address some of the concerns raised by the reviewers in the rebuttal, e.g., how to address unavailability of correctly labeled data to train an auto-encoder. But other concerns remain. Therefore, the recommendation is to reject the paper. +",ICLR2019,4: The area chair is confident but not absolutely certain +Wcv1biWXMAR,1610040000000.0,1610470000000.0,1,GA87kjyd-f,GA87kjyd-f,Final Decision,Reject,"The paper proposes a very interesting decomposition of the neural tangent kernel, which promises +to decouple effects of the parameters and data. The authors illustrate the effects of this decomposition +by considering pruning strategies for initialization. +While the approach looks promising, the current paper is somewhat premature: The only ""hard"" +theoretical result, Theorem 1, is a direct consequence of the decomposition. Its consequences for +training discussed in the subsequent paragraph involve quite a few approximations, yet the effects +of these approximations remain unclear. This general, high-level tone is kept when discussing the +initializations. +Finally, the N(0,1)-response to Reviewer 3 worries me.",ICLR2021, +pqU5AIxZgr2,1642700000000.0,1642700000000.0,1,LgjKqSjDzr,LgjKqSjDzr,Paper Decision,Reject,"This paper proposes a method to use Transformers with tabular data by sharing attention. Reviewers raise significant concerns about the motivation, writing and experimental results. Author's did not submit a response. Hence I recommend rejection.",ICLR2022, +PutaSrTYSw9,1642700000000.0,1642700000000.0,1,XK4GN6UCTfH,XK4GN6UCTfH,Paper Decision,Reject,"Finally, all reviewers leaned towards rejection. The main concerns were missing methodological depth and questions regarding the experimental evaluation (unclear link between experimental outcomes and methodological details). The rebuttal was not perceived as being fully convincing, and finally nobody wanted to champion this paper. I think that this work has some potential, but in its present form, it does not seem to be ready for publication.",ICLR2022, +oXztx58SNcj,1642700000000.0,1642700000000.0,1,vds4SNooOe,vds4SNooOe,Paper Decision,Accept (Spotlight),"This work presents an approach to learning good representations for few-shot learning when supervision is provided at the super-class level and is otherwise missing at the sub-class level. + +After some discussion with the authors, all reviewers are supportive of this work being accepted. Two reviewers were even supportive of this work being presented at least as a spotlight. + +The approach presented is well motivated, experiments demonstrate its value and include a nice application in the medical domain, making the work stand out relatively to most work in few-shot classification. Therefore, I'm happy to recommend this work be accepted and receive a spotlight presentation.",ICLR2022, +mMJTM4le-fI,1610040000000.0,1610470000000.0,1,1rxHOBjeDUW,1rxHOBjeDUW,Final Decision,Accept (Poster),"This paper proposes to enhance the robustness of RL and supervised learning algorithms to noise in the observations by dropping input features that are irrelevant for the task. It relies on the information bottleneck framework (well derived in the paper) and learns a parametric compression of the input features that sets them to zero if they are not relevant for the taskn. The method is extensively evaluated on several RL tasks (exploration in VizDoom and DMLab with a noisy “TV” distractor) and supervised tasks (ImageNet or CIFAR-10 classification with noise). + +Reviewers have praised the idea, derivation and writing, as well as the extensive experiments on RL and supervised tasks. Critique focused on: +* the contrived nature of the TV noise (localised always in the same corner of the image -- a standard evaluation according to the authors), +* lack of comparison with other feature selection methods, +* lack of comparison with Conditional Entropy Bottleneck (done during rebuttal), +* more general noise than just specific pixels (clarified by the authors as being the features coming out of a convnet) + +Given that the reviewers’ comments were largely addressed by the authors, and given the final scores of the paper, I will recommend acceptance. +",ICLR2021, +rp87VwiG8sVm,1642700000000.0,1642700000000.0,1,VyZRObZ19kt,VyZRObZ19kt,Paper Decision,Reject,"The authors present a new learning based algorithm for constructing index structure. Existing learned index algorithm use a fixed value, in contrast the authors show that a more refined methods can be used to obtain higher quality solutions for the problem. + +The reviewers, after discussion, found the paper interesting and the experimental results promising but they feel that the paper in the current form is not yet ready for publication. In particular, +- in the current form the theoretical motivation and the experimental results are a bit detached + +Overall, the paper is interesting and the results are promising but it probably would benefit from significant re-writing before being accepted.",ICLR2022, +kpupzmwE0Eg,1610040000000.0,1610470000000.0,1,_qoQkWNEhS,_qoQkWNEhS,Final Decision,Reject,"The paper proposes a new defense against adversarial attacks on graphs using a reweighting scheme based on Ricci-flow. Reviewers highlighted that the paper introduces interesting ideas and that the use of Ricci-curvature/flow is a novel and promising contribution. Reviewers also recognized that the paper has significantly improved after rebuttal and clarified some aspects of their initial reviews. + +However, there exist still concerns around the current version of the manuscript. In particular, important aspects of the method and algorithm, as well as some design choices are currently unclear. This includes evaluating and discussing robustness, training method, and practicality/improvements in real-world scenarios. I agree with the majority of the reviewers that the current version requires an additional revision to iron out the aforementioned issues. However, I also agree with the reviewers that the overall idea is promising and I'd encourage the authors to revise and resubmit their work with considering the feedback from this round of reviews.",ICLR2021, +DfU2fwvUZ1o,1610040000000.0,1610470000000.0,1,4dXmpCDGNp7,4dXmpCDGNp7,Final Decision,Accept (Poster),"The authors introduce new evaluation criteria and methods for identifying salient features: rather than earlier approaches which attempt to 'remove' or marginalize out features in various ways, here they consider robustness analysis with small adversarial perturbations in an Lp ball. For text classification, a user study is included which is appreciated. + +In discussion, the authors addressed many points and all reviewers converged to recommend acceptance. + +A couple of points could be discussed further if space permits: +the impact of type of perturbation employed; and +the connection between optimizing for adversarial robustness and optimizing for insertion/deletion criteria.",ICLR2021, +5jvH3YXz59W,1610040000000.0,1610470000000.0,1,OAdGsaptOXy,OAdGsaptOXy,Final Decision,Reject,"The authors propose to improve the LMs ability on modelling entities by signalling the existence of entities and also allowing the model to represent entities also as units. The embeddings of the surface form and then entity unit are then added and passed through a layer to predict the next word. +The paper evaluates on QA and conducts probing tasks and shows that such an entity modelling results in better performance. + +All reviewers have found the idea conceptually simple and novel. At the same time a number of concerns are raised, with the most important being the lack of understanding around which and how hyper-parameters matter for this model and, most importantly, the confounder introduced to the model by the much larger size of parameters introduced by the embedding layers. While the authors comment that not all the parameters are used all the time, the size of the embeddings still count at the total size of parameters a model has. Thus, without properly controlling for this (e.g., have an another model where the extra embedding params are given to another part of the model), it is difficult to determine whether adding more parameters was the solution or adding more parameters for modelling the entities. ",ICLR2021, +rZ7A9Zmysh,1576800000000.0,1576800000000.0,1,Hkgpnn4YvH,Hkgpnn4YvH,Paper Decision,Reject,"The paper proposes a method for learning multi-image matching using graph neural networks. The model is learned by making use of cycle consistency constraints and geometric consistency, and it achieves a performance that is comparable to the state of the art. While the reviewers view the proposed method interesting in general, they raised issues regarding the evaluation, which is limited in terms of both the chosen datasets and prior methods. After rounds of discussion, the reviewers reached a consensus that the submission is not mature enough to be accepted for this venue at this time. Therefore, I recommend rejecting this submission.",ICLR2020, +J5FabpB6SM,1576800000000.0,1576800000000.0,1,H1lkYkrKDB,H1lkYkrKDB,Paper Decision,Reject,"The paper focuses on extracting the underlying dynamics of objects in video frames, for background/foreground extraction and video classification. Generally speaking, the presentation of the paper should be improved. Novelty should be clarified, contrasting the proposed approach with existing literature. All reviewers also agree the experimental section is also too weak in its current form. +",ICLR2020, +cMUa60K-1-8,1610040000000.0,1610470000000.0,1,lDjgALS4qs8,lDjgALS4qs8,Final Decision,Reject,"The paper proposes to explain the representation for layer-aware neural sequence encoders with multi-order-graph (MoG). Based on the MoG explanation, it further proposes Graph-Transformer as a graph-based self-attention network empowered Transformer. As commented by the authors, a main purpose of Graph-Transformer is to show an example application of the MoG explanation. + +During the discussion period, after reading the paper and checking the code, the AC had raised a serious concern: There is a big gap between the MoG motivation and the actual implementation. The AC had urged the referees to take a careful look at the implementation details, in particular, Lines 524-561 in the attached code: ""supplement/fairseq-0.6.2_halfdim_gate⁩ ▸ ⁨fairseq⁩ ▸ ⁨models⁩ ▸transformer.py"". The AC had made the following comments to the referees: ""Whether the performance gain of Graph-Transformer over Transformer is due to the MoG explanation is highly unclear. There is no direct evidence, such as appropriate visualization, to support that. In a high-level description, instead of using a usual skip connection that would combine beforex and x, the actual implementation is to 1) define increamental_x = x - beforex, 2) let increamental_x attend on beforex to produce x1, let beforex attend on increamental_x to produce x2, and let increamental_x attend on increamental_x to produce x3, 3) combine beforex, x1, x2, x3 in a certain way to produce the layer output."" + +Reviewer 2 responded to the AC's concern: ""After examining the transformer.py and Section 2 & 3, we cannot understand why the output of self-attentions could be regarded as MoG subgraphs? The authors did not explain the connection. In their code, the graph transformer seems to just utilize 3 multi-head attentions (line 539-541) in their encoder. Using MoG to interpret the outputs of three attentions (line 539-541) is not very convincing. The link is weak. We agree with your comments."" + +To summarize, the link between the actual implementation in the code and all the MoG explanations is quite weak, and the technical novelty of the actual implementation is not strong enough for an ICLR publication. Therefore, the AC recommends Reject. + + + + +",ICLR2021, +1VFqyF-x6M-,1642700000000.0,1642700000000.0,1,GMYWzWztDx5,GMYWzWztDx5,Paper Decision,Reject,"This submission proposes a few small changes to the (PreLN) Transformer architecture that enable training with higher learning rates (and therefore can result in faster convergence). The changes include the addition of two layer norm operations as well as a learnable head scaling operation in multi-headed attention. The proposed operations add only a small computational overhead and should be simple to implement. Experiments are conducted on language modeling and masked language modeling, with improved results demonstrated at various scales and according to various evaluation procedures. The paper also includes a good amount of ablation study as well as some analysis. Reviews on the paper were mixed, and a great deal of changes were made to the paper during the rebuttal period. To summarize the concerns and recommendations, reviewers requested +- better connection between the proposed changes and the purported issue (gradient scale mismatch between early/late layers) +- better analysis of why gradient scale mismatch is a major issue and investigation of where it comes from +- better comparison to existing techniques that allow for higher learning rate training of Transformers +- additional experiments on different model types and ideally different codebases/implementations + +I think overall this is a solid submission, since it proposes a simple change that is reasonably likely to be helpful (or at least not harmful). However, I think that there are enough concerns with the current draft and there were enough changes made during rebuttal that this paper should be resubmitted to a future conference. I would suggest the authors take the final updated form from this round, add additional motivation/analysis/experiments, and resubmit, and I suspect a positive outcome.",ICLR2022, +4j8dRoMv39,1610040000000.0,1610470000000.0,1,ipUPfYxWZvM,ipUPfYxWZvM,Final Decision,Accept (Poster),"The paper investigates the order of Transformer modules and its effect on performance. The proposed approach IOT, consists of several pre-defined encoder and decoders with different orderings (and weight sharing), along with a light predictor which is trained to choose the best configuration per instance. + +Most reviewers found the general idea of predicting the order of Transformer modules at instance-level quite intriguing. Other strengths included wide range of evaluation tasks, major empirical gains, and novelty. +
R1 and R4 raise valid and important concerns on validity of results when the model size and training time are controlled. +However, after carefully reading the author response and the revised paper, I feel that this issue is resolved. +The authors provide comparison with larger models, ensemble models, and models trained longer, and in all cases the gains are still obvious. +
Overall, I feel that the general idea behind this paper is very exciting and could inspire more research in this direction. Therefore I recommend accept. ",ICLR2021, +HJeJ6fCHx4,1545100000000.0,1545350000000.0,1,H1lug3R5FX,H1lug3R5FX,"Interesting work, but restrictive analysis",Reject,"The paper gives a theoretical analysis highlighting the role of codimension on the pervasiveness of adversarial examples. The paper demonstrates that a single decision boundary cannot be robust in different norms. They further proved that it is insufficient to learn robust decision boundaries by training against adversarial examples drawn from balls around the training set. + +The main concern with the paper is that most of the theoretical results might have a very restrictive scope and the writing is difficult to follow. + +The authors expressed concerns about a review not being very constructive. In a nutshell, the review in question points out that the theory might be too restrictive, that the experimental section is not very strong, that there are other works on related topics, and that the writing of the paper could be improved. While I understand the disappointing of the authors, the main points here appear to be consistent with the other reviews, which also mention that the theoretical results in this paper are not very general, that the writing is a bit complicated or heavy in mathematics, and not easy to follow, or that it is not clear if the bounds can be useful or easily applied in other work. + +One reviewer rates the paper marginally above the acceptance threshold, while two other reviewers rate the paper below the acceptance threshold. ",ICLR2019,4: The area chair is confident but not absolutely certain +o4Zmr_BMmUR,1642700000000.0,1642700000000.0,1,iEx3PiooLy,iEx3PiooLy,Paper Decision,Accept (Poster),"The paper claims to present actionable visual representations for manipulating 3D articulated objects. Specifically, the approach learns to estimate the spatial affordance map as well as the trajectories and their scores. After checking the rebuttal from the authors, all reviewers agree that the paper adds value to the research area. In the end, it got three borderline accept ratings. The initial criticism included lacking (experimental) comparison to baselines, and the authors successfully corresponded to the request from the reviewer. One reviewer commented that the proposed approach is a combination of Where2Act and curiosity guidance for RL Policy for Interactive Trajectory Exploration, which we believe is a valid point. Still, the paper extends the previous Where2Act and successfully demonstrates its success on difficult tasks. + +We recommend accepting the paper.",ICLR2022, +hdiYc-3OtPD,1642700000000.0,1642700000000.0,1,KkIE-qePhW,KkIE-qePhW,Paper Decision,Reject,"This paper deals with the important practical problem of speeding up GNNs. +Although the proposed method based on LSH may be considered to be a rather too simple preprocessing, it would be worthwhile to share the practical idea with the community as far as the proposed method is shown effective enough. +However, as pointed out by several reviewers, it is concerned that the experimental validation of this paper is not sufficient. +Further and deeper validations will make this paper stronger.",ICLR2022, +oDVIYe8W6M,1576800000000.0,1576800000000.0,1,HJxV-ANKDH,HJxV-ANKDH,Paper Decision,Accept (Poster),"This paper presents a method for optimizing parameter matrices of deep learning objectives while enforcing orthonormality constraints. While advantageous in certain respects, such constraints can be expensive to maintain when using existing methods. To address this issue, an new algorithm is proposed based on the Cayley Transform and analyzed in terms of convergence. After the discussion period two reviewers supported acceptance while one still voted for rejection. Consequently, in recommending acceptance here for a poster, it is worth examining the significance of unresolved concerns. + +First, the reject reviewer raised the valid point that the convergence proof relies on the assumption of Lipschitz continuous gradients, and yet the experiments use ReLU activation functions that do not satisfy this criteria. In my view though, it is sometimes reasonable to derive useful theory under the assumption of Lipschitz continuous derivatives that nonetheless provides insight into the case where these derivatives may not be Lipschitz on a set of measure zero (which would be the case with ReLU activations). So while ideally it might be nice to extend the theory to remove this assumption, the algorithm seems to work fine with ReLU activations in practice. And this seems reasonable given the improbability of any iterate exactly hitting the measure-zero points where the gradients are discontinuous. Beyond this issue, some criticisms were mentioned in terms of how and where the timing comparisons were presented. However, I believe that these issues can be easily remedied in a final revision.",ICLR2020, +BkH02MIde,1486400000000.0,1486400000000.0,1,Hk3mPK5gg,Hk3mPK5gg,ICLR committee final decision,Accept (Poster),"This paper provides a number of performance enhancements inspired by domain knowledge. Taken together, these produce a compelling system that has shown itself to be the best-in-class as per the related competition. + Experts agree that the authors do a good job at justifying the majority of the design decisions. + + pros: + - insights into the SOTA Doom player + + cons: + - lack of pure technical novelty: the various elements have existed previously + + This paper comes down to a matter of taste in terms of appreciation of SOTA systems or technical novelty. + With the code being released, I believe that this work will have impact as a benchmark, and as a guidebook + as to how features can be combined for SOTA performance on FPS-style scenarios.",ICLR2017, +B1lc77rkx4,1544670000000.0,1545350000000.0,1,rkVOXhAqY7,rkVOXhAqY7,"Somewhat controversial, but interesting new criterion for representation learning",Reject,"This paper proposes a criterion for representation learning, minimum necessary information, which states that for a task defined by some joint probability distribution P(X,Y) and the goal of (for example) predicting Y from X, a learned representation of X, denoted Z, should satisfy the equality I(X;Y) = I(X;Z) = I(Y;Z). The authors then propose an objective function, the conditional entropy bottleneck (CEB), to ensure that a learned representation satisfies the minimum necessary information criterion, and a variational approximation to the conditional entropy bottleneck that can be parameterized using deep networks and optimized with standard methods such as stochastic gradient descent. The authors also relate the conditional entropy bottleneck to the information bottleneck Lagrangian proposed by Tishby, showing that the CEB corresponds to the information bottleneck with β = 0.5. An important contribution of this work is that it gives a theoretical justification for selecting a specific value of β rather than testing multiple values. Experiments on Fashion-MNIST show that, in comparison to a deterministic classifier and to variational information bottleneck models with β in {0.01, 0.1, 0.5}, the CEB model achieves good accuracy and calibration, is competitive at detecting out-of-distribution inputs, and is more resistant to white-box adversarial attacks. Another experiment demonstrates that a model trained with the CEB criterion is *unable* to memorize a randomly labeled version of Fashion-MNIST. There was a strong difference of opinion between the reviewers on this paper. One reviewer (R1) dismissed the work as trivial. The authors rebutted this claim in their response and revision, and R1 failed to participate in the discussion, so the AC strongly discounted this review. The other two reviewers had some concerns about the paper, most of which were addressed by the revision. But, crucially, some concerns still remain. R4 would like more theoretical rigor in the paper, while R2 would like a direct comparison against MINE and CPC. In the end, the AC thinks that this paper needs just a bit more work to address these concerns. The authors are encouraged to revise this work and submit it to another machine learning venue.",ICLR2019,4: The area chair is confident but not absolutely certain +-8hAKSHXWye,1610040000000.0,1610470000000.0,1,x1uGDeV6ter,x1uGDeV6ter,Final Decision,Reject,"The main idea of the paper is to use image data to guide radar data acquisition by focusing on the blocks where the object has appeared. Four reviewers have relatively consistent rating: 3 of them rated “Ok but not good enough - rejection”, while 1 rated “clear rejection”. The main concerns include ad-hoc choices of algorithm design, lack of algorithm novelty, not adequate experiments in illustrating the performance, etc. During the rebuttal, the authors made efforts to response to all reviewers’ comments. However, the major concerns remain, and the rating were not changed. While the motivation is clear and the work has merits, the ACs agree with the reviewers’ concerns and this paper can not be accepted at its current state. +",ICLR2021, +Xd_aPgI_eNM,1642700000000.0,1642700000000.0,1,5ziLr3pWz77,5ziLr3pWz77,Paper Decision,Reject,"Four experts reviewed this paper and rated the paper below the acceptance threshold. The reviewers raised many concerns regarding the paper, mainly the lack of empirical studies and clarity. Some reviewers also suggested the authors better positioning the paper in the literature by discussing more related works. The rebuttal did not address all concerns. Considering the reviewers' concerns, we regret that the paper cannot be recommended for acceptance at this time. The authors are encouraged to consider the reviewers' comments when revising the paper for submission elsewhere.",ICLR2022, +LicgLzNpYLm,1610040000000.0,1610470000000.0,1,1-j4VLSHApJ,1-j4VLSHApJ,Final Decision,Reject,"The submission considers a new attack model for adversarial perturbation in a framework where the attacker has neither access to the trained model nor the data used for training the model. The submission suggests a""domain adaptation inspired attack"": learn a different model on a similar domain and generate the adversarial perturbations using that model. The authors then also develop a defense for this type of attack and provide some empirical evaluations of the resulting losses on a few NLP benchmark datasets. + +The paper refers to the literature on domain adaptation theory to motivate their suggested defense, but this analysis remains on an intuitive (rather than a formally rigorous) level. Furthermore, the empirical evaluation does not compare to a variety of attacks and the defense is only evaluated with respected to the self-suggested attack. This is a very minimal bar for a defense to meet. + +The reviewers have criticized the submission for the rather minimal extend of empirical evaluation. Given that the submission also doesn't provide a sound theoretical analysis for the proposed attack and defense, I agree with the reviewers that the submission does not provide sufficient novel insight for publication at ICLR. + +In contrast to some of the reviewers, I do find it legitimate (and maybe recommendable even) to focus on one chosen application area such as NLP. I don't see a requirement to also present experiments on image data or re-inforcement learning applciations. However, I would recommend that the authors highlight more explicitly what general lessons a reader would learn from their study. This could be done through a more extensive and systematic set of experiments or a through analysis in a well defined theoretical framework.",ICLR2021, +xNJZnCeHUq,1642700000000.0,1642700000000.0,1,QmKblFEgQJ,QmKblFEgQJ,Paper Decision,Reject,"The authors propose a new algorithm for clustering direct networks. The key idea behind the paper is to introduce a new flow imbalance measures and a new self-supervised GNN model to solve the task. + +Overall, the paper is interesting and it introduces some new ideas although it needs additional work before being published. + +In particular, +- the experiments could be improved by emphasizing more the evaluation on vol_sum/vol_max/etc metrics and by adding additional results on them +- the clarity of the experimental results should also be improved(for example, metrics / claims around Figure 4 still a bit hard to parse) +- finally, the paper would benefit by some theoretical results on the guarantees of the algorithm(most previous work in the area present interesting theoretical guarantees)",ICLR2022, +HkxCyPD-gN,1544810000000.0,1545350000000.0,1,BJgGhiR5KX,BJgGhiR5KX,interesting but not very novel framework,Reject,"Pros: + +- A new framework for learning sentence representations +- Solid experiments and analyses +- En-Zh / XNLI dataset was added, addressing the comment that no distant languages were considered; also ablation tests + +Cons: + +- The considered components are not novel, and their combination is straightforward +- The set of downstream tasks is not very diverse (See R2) +- Only high resource languages are considered (interesting to see it applied to real low resource languages) + +All reviewers agree that there is no modeling contribution. Overall, it is a solid paper but I do not believe that the contribution is sufficient. ",ICLR2019,4: The area chair is confident but not absolutely certain +Bkg3Wh4-xN,1544800000000.0,1545350000000.0,1,Hke8Do0cF7,Hke8Do0cF7,Not acceptable in current form,Reject,The reviewers agree this paper is not good enough for ICLR.,ICLR2019,5: The area chair is absolutely certain +i33MBEx1WZh,1610040000000.0,1610470000000.0,1,A2gNouoXE7,A2gNouoXE7,Final Decision,Accept (Poster),"This paper proposes a method for bilingual lexicon induction. +The proposed method is efficient, it optimizes a reconstruction and transfer loss. +Extensive experiments are reported, and the methods provides improvements over prior work. +Overall, the paper brings together prior ideas in a useful way.",ICLR2021, +uzhylSdtQ9pw,1642700000000.0,1642700000000.0,1,u6s8dSporO8,u6s8dSporO8,Paper Decision,Accept (Poster),"This paper studies group equivariant neural posterior estimation which seeks to endow conventional neural posterior estimation method with equivariance of both the data and parameters simultaneously. To test the efficacy of the proposed approach the authors experiment with gravitational wave data and show that the proposed GNPE achieves considerable performance gains. + + +Strengths: + +- The approach is independent of neural architectures and does not necessitate knowledge of exact equivariances. +- The method seems to be much better than regular NPE in cases where there are known equivariances. + +Weaknesses: + +- the writing of the paper is not clear, which makes the paper difficult to read and evaluate. +- It is hard to check the correctness of the conclusions and algorithm due to lack of necessary assumptions and derivation steps. +- the authors are knowledgeable about the subject but present material in a slightly callous way which prevents a precise understanding of their techniques. + + +This is a borderline paper with two reviewers in favor of acceptance and two with a slight tendency to reject. The two negative reviewers did not engage in a discussion with the authors or did not complete that discussion, failing to confirm their ratings or provide an update of those ratings. They also do not seem to give strong arguments for rejection. Based on this, I recommend the paper for acceptance. However, I encourage the authors to take into account the reviewers' comments, especially the part on clarity and rigor, to improve the paper for its camera-ready version.",ICLR2022, +HmFOnYDNdx,1576800000000.0,1576800000000.0,1,HkxQRTNYPH,HkxQRTNYPH,Paper Decision,Accept (Talk),"This paper proposes a novel method for considering translations in both directions within the framework of generative neural machine translation, significantly improving accuracy. + +All three reviewers appreciated the paper, although they noted that the gains were somewhat small for the increased complexity of the model. Nonetheless, the baselines presented are already quite competitive, so improvements on these datasets are likely to never be extremely large. + +Overall, I found this to be a quite nice paper, and strongly recommend acceptance, perhaps as an oral presentation.",ICLR2020, +0lFT19rXkk,1576800000000.0,1576800000000.0,1,rJg8TeSFDH,rJg8TeSFDH,Paper Decision,Accept (Spotlight),"After the revision, the reviewers agree on acceptance of this paper. Let's do it.",ICLR2020, +93lopxvJUU,1610040000000.0,1610470000000.0,1,gLWj29369lW,gLWj29369lW,Final Decision,Accept (Poster),"This paper extends the recent theoretical understanding on geometric properties for word embeddings to relations and entities of knowledge graph. It categorizes relations into different types and derive requirements for their representations. Empirically they experiment several graph embedding approaches and show that when the loss function is aligned with the requirement of the relation type, we can achieve better performance. The reviewers generally find the paper to be solid, well executed and provides useful insights. The authors are encouraged to strengthen the discussion of the motivation of this work, and improve the presentation based on reviewers' comments. ",ICLR2021, +5CmnutmWy,1576800000000.0,1576800000000.0,1,rJxycxHKDS,rJxycxHKDS,Paper Decision,Accept (Poster),"Although some criticism remains for experiments, I suggest to accept this paper.",ICLR2020, +QF3RdcTIPbd,1642700000000.0,1642700000000.0,1,b8mo34uDObn,b8mo34uDObn,Paper Decision,Reject,"This paper presents a method for ensembling light fine-tuning methods and full fine-tuning methods to achieve better performance both in-domain and out-of-domain distributions. As authors agree, similar idea has been explored in the computer vision literature. The reviewers like the overall idea of the paper, but they all had some concerns regarding the experiments. The reviewers provide valuable feedback on how to improve the experiments, potentially running the same idea on more datasets and tasks, provide more analyses and discussions on how to understand the results.",ICLR2022, +By7QNyaBz,1517250000000.0,1517260000000.0,317,HyfHgI6aW,HyfHgI6aW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The authors have proposed an architecture that incorporates a VIN with a DNC to combine low level planning with high level memory-based optimization, resulting in a single policy for navigation and other similar problems that is trained end-to-end with sparse rewards. The reviews are mixed, but the authors did allay the concerns of the most negative reviewer by adding a comparison to traditional motion planning (A*) algorithms. ",ICLR2018, +BYpQE6-ZpG,1576800000000.0,1576800000000.0,1,HyezmlBKwr,HyezmlBKwr,Paper Decision,Reject,"The paper is on a new approach approach to transductive learning. Reviewers were a bit on the fence. Their most important objection is that the performance improvements that the authors report almost entirely come from the ""online"" version, which basically gets to see the test distribution. That contribution is nevertheless, in itself, potentially interesting, but I was surprised not to see comparison with simple transductive learning from semi-supervised learning, learning with cache, or domain adaptation, e.g., using knowledge of the target distribution to reweigh the training sample, or [0], on using an adversary to select a distribution consistent with sample statistics. I encourage the authors to add more baselines, analyze differences with existing approaches, and, if their approach is superior to existing approaches, resubmit elsewhere. + +[0] http://papers.nips.cc/paper/5458-robust-classification-under-sample-selection-bias.pdf",ICLR2020, +Sw68EGL1C0,1610040000000.0,1610470000000.0,1,HNA0kUAFdbv,HNA0kUAFdbv,Final Decision,Reject,"The paper proposes to learn layout representations for graphic design using transformers with a masking approaching inspired by BERT. The proposed model is pretrained on a large-collection of ~1M slides (the script for crawling the slides will be open-sourced) and evaluated in several downstream tasks. + +Review Summary: +The submission received slightly negative with scores of 4 (R3) and 5 (R2,R4). +Reviewers found the paper to be well-written and clear, and the problem of layout embedding to be interesting. Reviewers agree that the use of transformers for layout embedding has not been explored in prior work. However, the paper did not have proper citation and comparisons against prior work for layout embedding, and lacked systematic evaluation. Reviewers also would like to know more details about the dataset that was used for pre-training. + +Pros: +- Novel use of transformers for layout embedding (not yet explored in prior work) +- Use of large dataset of slides + +Cons: +- Lacked proper citation and comparisons against prior work for layout embedding +- Lacked systematic evaluation +- Missing details about the dataset + +Reviewer Discussion: +During the author response period, the authors responded to the reviews indicating that they will improve the draft based on the feedback, but did not submit a revised draft. As there was no revision to the submission, there was limited discussion with the reviewers keeping with their original scores. All reviewers agrees that the direction is interesting but the current submission should not be accepted. + +Recommendation: +The AC agrees with the reviewers that the current version is not ready for acceptance at ICLR, and it would be exciting to see the improved version. We hope the authors will continue to improve their work based on the reviewer feedback and that they will submit an improved version to an appropriate venue. +",ICLR2021, +HpHHNjDA7HF,1610040000000.0,1610470000000.0,1,#NAME?,#NAME?,Final Decision,Reject,"This paper proposes an online distillation method for efficient object recognition. The main idea is to employ a binary student network to learn frequently occurring classes using the pseudo-labels generated by a teacher network. In order to identify rare vs frequent classes, an attention triplet loss is used. The proposed scheme is empirically evaluated on CIFAR-100 and tiny-imagenet datasets. + +The major and common concern from reviewers about this draft is the quality of presentation, which has made it difficult to read and understand the ideas and their underlying motivations. While specific instances of this were mentioned by the reviewers, and responded by the authors, the readability issues go beyond these instances. In fact, I could independently observe presentation issues different from those mentioned by the reviewers. For example, in Eq (1), what is eta (never defined before), what is the loss function here for which the gradient update leads to Eq (1)? The proof also is not clear and seems to have issues. For example, in Eq (13) what is the meaning of multiplying two vectors? If it means dot product, it should be denoted more clearly, e.g. as w^T w^* or . In Eq (12), can alpha be negative? If not, should be clarified why, and if it can be negative, then ()^2 >= (alpha n)^2 does not need to hold, but this inequality is used in Eq (14) anyway. + +Overall, I think the submission can benefit from an overhaul of the writing. I encourage authors to resubmit after improving on that.",ICLR2021, +HJlG73zxgN,1544720000000.0,1545350000000.0,1,Hkesr205t7,Hkesr205t7,Meta-Review,Reject,"The paper addresses generalized zero shot learning (test data contains examples from both seen as well as unseen classes) and proposes to learn a shared representation of images and attributes via multimodal variational autoencoders. +The reviewers and AC note the following potential weaknesses: (1) low technical contribution, i.e. the proposed multimodal VAE model is very similar to Vedantam et al (2017) as noted by R2, and to JMVAE model by Suzuki et al, 2016, as noted by R1. The authors clarified in their response that indeed VAE in Vedantam et al (2017) is similar, but it has been used for image synthesis and not classification/GZSL. (2) Empirical evaluations and setup are not convincing (R2) and not clear -- R3 has provided a very detailed review and a follow up discussion raising several important concerns such as (i) absence of a validation set to test generalization, (ii) the hyperparameters set up; (iii) not clear advantages of learning a joint model as opposed to unidirectional mappings (R1 also supports this claim). The authors partially addressed some of these concerns in their response, however more in-depth analysis and major revision is required to assess the benefits and feasibility of the proposed approach. +",ICLR2019,5: The area chair is absolutely certain +m24isqGWA,1576800000000.0,1576800000000.0,1,rkltE0VKwH,rkltE0VKwH,Paper Decision,Reject,"The authors present a method that utilizes intrinsic rewards to coordinate the exploration of agents in a multi-agent reinforcement learning setting. The reviewers agreed that the proposed approach was relatively novel and an interesting research direction for multiagent RL. However, the reviewers had substantial concerns about writing clarity, the significance of the contribution of the propose method, and the thoroughness of evaluation (particularly the number of agents used and limited baselines). While the writing clarity and several technical points (including addition ablations) were addressed in the rebuttal, the reviewers still felt that the core contribution of the work was a bit too marginal. Thus, I recommend this paper to be rejected at this time.",ICLR2020, +FnokJD2AVTt,1642700000000.0,1642700000000.0,1,P7OVkHEoHOZ,P7OVkHEoHOZ,Paper Decision,Accept (Poster),"This paper proposes Hindsight Foresight Relabeling (HFR), an approach for reward relabeling for meta RL. The main contribution is a measure of how useful a given trajectory is for the purpose of meta-task identification as well as the derivation of a task relabeling distribution based on this measure. + +Reviewers agreed that the paper tackles an interesting problem and found the main insight to be simple and intuitive. While the initial reviews raised some concerns regarding novelty, the performance gap, and using the learned Q-function to estimate post-adaptation returns the rebuttal did a good job of addressing these concerns. Overall, the paper proposes a non-trivial extension of hindsight relabeling to meta RL and while the results could be stronger I think the paper provides useful ideas and insights so I recommend acceptance as a poster.",ICLR2022, +ryllGpBiyV,1544410000000.0,1545350000000.0,1,rJg6ssC5Y7,rJg6ssC5Y7,"a useful benchmark for deep learning optimizers, but limited research contribution",Accept (Poster),"The field of deep learning optimization suffers from a lack of standard benchmarks, and every paper reports results on a different set of models and architectures, likely with different protocols for tuning the baselines. This paper takes the useful step of providing a single benchmark suite for neural net optimizers. + +The set of benchmarks seems well-designed, and covers the range of baselines with a variety of representative architectures. It seems like a useful contribution that will improve the rigor of neural net optimizer evaluation. + +One reviewer had a long back-and-forth with the authors about whether to provide a standard protocol for hyperparameter tuning. I side with the authors on this one: it seems like a bad idea to force a one-size-fits-all protocol here. + +As a lesser point, I'm a little concerned about the strength of some of the baselines. As reviewers point out, some of the baseline results are weaker than typical implementations of those methods. One explanation might be the lack of learning rate schedules, something that's critical to get reasonable performance on some of these tasks. I get that using a fixed learning rate simplifies the grid search protocol, but I'm worried it will hurt the baselines enough that effective learning rate schedules and normalization issues come to dominate the comparisons. + +Still, the benchmark suite seems well constructed on the whole, and will probably be useful for evaluation of neural net optimizers. I recommend acceptance. + +",ICLR2019,3: The area chair is somewhat confident +by-VkgNOYQM,1642700000000.0,1642700000000.0,1,eMudnJsb1T5,eMudnJsb1T5,Paper Decision,Accept (Spotlight),The paper proposes to extend mirror descent to sampling with stein operator when the density is defined on a constrained domain and non euclidean geometry. All reviewers agreed on the novelty and the merits of the paper. Accept,ICLR2022, +VSZRtSZCpPu,1610040000000.0,1610470000000.0,1,XdprrZhBk8,XdprrZhBk8,Final Decision,Reject,"The paper provides a functional approximation of the error of ResNets and VGGs pruned with IMP and SynFlow on CIFAR-10 and ImageNet, showing that it is predictable in terms of an invariant tying width, depth, and pruning level. In particular, it formulates the test error as a function of the density of the network after pruning and identifies a low-density high error plateau, a high-density low error plateau, and a power-law behavior for intermediate density. It further demonstrates that networks of different sparsities are freely interchangeable. The paper provides an interesting insight on the power law structure of the error as networks are pruned, however the results are very limited to specific types of networks (ResNets and VGGs), pruning methods (IMP and SynFlow) and datasets (CIFAR-10, ImageNet). Hence, it's not clear if the proposed functional approximation generalizes to other network families, pruning methods, and datasets. I understand that adding a new architecture or dataset is expensive, but fitting the proposed scaling law (the five parameters) requires pruning only a small number of networks, as mentioned by the authors. Comparing the calculated error and the actual error of the pruned network for different architectures and datasets can help verify the findings in the paper, and significantly widens its scope.",ICLR2021, +#NAME?,1642700000000.0,1642700000000.0,1,rTAclwH46Tb,rTAclwH46Tb,Paper Decision,Accept (Poster),"The paper proposes a method to learning rate scheduling that uses information form the eigenvalues of the Hessian. It shows that this scheduler obtains the minimax optimal rate on the noisy quadratic problem; and, empirically, this scheduler demonstrates faster convergence on CIFAR-10 and ImageNet, when the number of epochs is small. Using Hessian information in direct and indirect ways is of interest to the community, and the paper does a nice job illustrating that in a context of interest.",ICLR2022, +6-hKx0qbSQo,1642700000000.0,1642700000000.0,1,SsHBkfeRF9L,SsHBkfeRF9L,Paper Decision,Accept (Poster),"This main focus of this paper is graph modeling. Specifically, this paper considers a setting in which data is generated under continuous time dynamics based on neural ODE. Theoretical results regarding parameter estimation are provided. The results are also supported by experiments. + +The reviewers appreciate a thorough response to their questions and think that this paper would be of interest to ICLR and ML community. Please address reviewers comments in your final version.",ICLR2022, +S6nhA7nFE2,1610040000000.0,1610470000000.0,1,e3KNSdWFOfT,e3KNSdWFOfT,Final Decision,Reject,"This paper studies the convergence of gradient descent ascent (GDA) dynamics in a specific class of non-convex non-concave zero-sum games that the authors call ""hidden zero-sum games"". Unlike general min-max games, these games have a well-defined notion of a ""von Neumann solution"". The authors show that if the hidden game is strictly convex-concave then vanilla GDA converges not merely to local Nash, but typically to this von Neumann solution. + +The paper received four high quality reviews and was discussed extensively during the author rebuttal phase. From an application angle, the authors' replies did not convince the reviewers on the relevance of this paper to GANs, and one of the original ""accept"" recommendations was downgraded to a ""reject"" because of this. On the theory side, the novelty over Vlatakis-Gkaragkounis et al. (2019) is not clear and the reviewers found the writing often confusing or hard to connect with practice. The reviewer with the most positive recommendation did not champion the paper post-rebuttal. In the end, the consensus was that the work shows significant promise, but it requires refocusing before appearing at a top-tier conference.",ICLR2021, +xavPsH7wew,1576800000000.0,1576800000000.0,1,r1gPoCEKvH,r1gPoCEKvH,Paper Decision,Reject,"This paper introduces a simple NAS method based on sampling single paths of the one-shot model based on a uniform distribution. Next to the private discussion with reviewers, I read the paper in detail. + +During the discussion, first, the reviewer who gave a weak reject upgraded his/her score to a weak accept since all reviewers appreciated the importance of neural architecture search and that the authors' approach is plausible. +Then, however, it surfaced that the main claim of novelty in the paper, namely the uniform sampling of paths with weight-sharing, is not novel: Li & Talwalkar already introduced a uniform random sampling of paths with weight-sharing in the one-shot model in their paper ""Random Search and Reproducibility in NAS"" (https://arxiv.org/abs/1902.07638), which was on arXiv since February 2019 and has been published at UAI 2019. This was their method ""RandomNAS with weight sharing"". + +The authors actually cite that paper but do not mention RandomNAS with weight sharing. This may be because their paper also has been on arXiv since March 2019 (6 weeks after the one above), and was therefore likely parallel work. Nevertheless, now, 9 months later, the situation has changed, and the authors should at least point out in their paper that they were not the first to introduce RandomNAS with weight sharing during the search, but that they rather study the benefits of that previously-introduced method. + +The only real novelty in terms of NAS methods that the authors provide is to use a genetic algorithm to select the architecture with the best one-shot model performance, rather than random search. This is a relatively minor contribution, discussed literally in a single paragraph in the paper (with missing details about the crossover operator used; please fill these in). Also, this step is very cheap, so one could potentially just run random search longer. Finally, the comparison presented may be unfair: evolution uses a population size of 50, and Figure 2 plots iterations. It is unclear whether each iteration for random search also evaluated 50 samples; if not, then evolution got 50x more samples than random search. The authors should fix this in a new version of the paper. + +The paper also appears to make some wrong claims in Section 2. For example, the authors write that gradient-based NAS methods like DARTS inherit the one-shot weights and fine-tune the discretized architectures, but all methods I know of actually retrain from scratch rather than fine-tuning. Also, equation (3) is not what DARTS does; that does a bi-level optimization. +In Section 3, the authors say that their single-path strategy corresponds to a dropout rate of 1. I do not think that this is correct, since a dropout rate of 1 drops every connection (and does not leave one remaining). All of these issues should be rectified. + +The paper reports good results on ImageNet. Unfortunately, these may well be due to using a better training pipeline than other works, rather than due to a better NAS method (no code is available, so there is no way to verify this). On the other hand, the application to mixed-precision quantization is novel and interesting. + +AnonReviewer2 asked about the correlation of the one-shot performance and the final evaluation performance, and this question was not answered properly by the authors. This question is relevant, because this correlation has been shown to be very low in several works (e.g., Sciuto et al: ""Evaluating the search phase of Neural Architecture Search"" (https://arxiv.org/abs/1902.08142), on arXiv since February 2019 and a parallel ICLR submission). In those cases, the proposed approach would definitely not work. + +The high scores the reviewers gave were based on the understanding that uniform sampling in the one-shot model was a novel contribution of this paper. Adjusting for that, the real score is much lower and right at the acceptance threshold. After a discussion with the PCs, due to limited capacity, the recommendation is to reject the current version. I encourage the authors to address the issues identified by the reviewers and in this meta-review and to submit to a future venue. ",ICLR2020, +Bb8Co6P16F,1576800000000.0,1576800000000.0,1,rJgDT04twH,rJgDT04twH,Paper Decision,Reject,"The paper explores the idea of using implicit human feedback, gathered via EEG, to assist deep reinforcement learning. This is an interesting and at least somewhat novel idea. However, it is not clear that there is a good argument why it should work, or at least work well. The experiments carried are more exploratory than anything else, and it is not clear that much can be learned from the results. It's a proof of concept more than anything else, of the type that would work well for a workshop paper. More systematic empirical work would be needed for a good conference paper. + +The authors did not provide a rebuttal to reviewers, but rather agreed with their comments and that the paper needs more work. In light of this, the paper should be rejected and we wish the authors best of luck with a new version of the paper. +",ICLR2020, +v4ZZeVAAFXE,1642700000000.0,1642700000000.0,1,6Qvjzr2VGLl,6Qvjzr2VGLl,Paper Decision,Reject,"This paper presents the application of the hierarchical latent variable model, CW-VAE which is originally developed in the vision community, to the speech domain with meaningful modifications, and provide empirical analysis of the likelihood as well as discussions on the likelihood metrics. The reviewers tend to agree that it is a promising direction to study hierarchically structured LVMs for speech, and the introduction/adaptation of CW-VAE is useful. There were some discussion on the suitability of the likelihood evaluation, and it appears a fair comparison with wavenet shall take place at s=1 (single sample), a resolution level the proposed method does not yet scale up to. On the other hand, an important potential use case of the model is representation learning for speech, as it is a common belief that at suitable resolution the features shall discover units like phoneme. But I find the current evaluation of latent representations by LDA and KNN to be somewhat limited, and in fact there is no comparison with suitable baselines in Sec 3.2 in terms of feature quality. A task closer to modern speech recognition (e.g., with end-to-end models) would be preferred.",ICLR2022, +QtxdEfLPMv,1576800000000.0,1576800000000.0,1,S1efxTVYDr,S1efxTVYDr,Paper Decision,Accept (Talk),"This paper addresses the problem of poor generation quality in models for text generation that results from the use of the maximum likelihood (ML) loss, in particular the fact that the ML loss does not differentiate between different ""incorrect"" generated outputs (ones that do not match the corresponding training sequence). The authors propose to train text generation models with an additional loss term that measures the distance from the ground truth via a Gaussian distribution based on embeddings of the ground-truth tokens. This is not the first attempt to address drawbacks of ML training for text generation, but it is simple and intuitive, and produces improvements over the state of the art on a range of tasks. The reviewers are all quite positive, and are in agreement that the author responses and revisions have improved the paper quality and addressed initial concerns. I think this work will be broadly appreciated by the ICLR audience. One negative point is that the writing quality still needs improvement.",ICLR2020, +0KPEk_3NqGF,1642700000000.0,1642700000000.0,1,vBn2OXZuQCF,vBn2OXZuQCF,Paper Decision,Reject,"This is a borderline paper. While reviewers believe the findings from this paper may be of potential interest, they are fully convinced. For instance, if the authors want to claim the proposed mechanism is general for UDA, then they should demonstrate its effectiveness to other application domain(s), such as the NLP domain, where the pretrain-finetuning strategy is widely adopted for transfer learning. However, the authors did not provide correspondingly additional experiments as requested by a reviewer but claimed they only focused on the CV domain. If the focus is on the CV domain, then the authors need to explain in detail why in the CV domain, the proposed mechanism works well (while in other domains, it may not). There are many other concerns about the assumptions, experimental settings, etc. + +In summary, this is a borderline paper below the acceptance bar of ICLR.",ICLR2022, +HylSIa-SgV,1545050000000.0,1545350000000.0,1,BJx1SsAcYQ,BJx1SsAcYQ,lack novelty,Reject,"This paper proposes methods to improve the performance of the low-precision neural networks. The reviewers raised concern about lack of novelty. Due to insufficient technical contribution, recommend for rejection. ",ICLR2019,5: The area chair is absolutely certain +BmKJDQSzHHY,1642700000000.0,1642700000000.0,1,GBszJ1XlKDj,GBszJ1XlKDj,Paper Decision,Reject,"This is a nice paper which shows that KL-regularized natural policy gradient (assuming exact access to the MDP, meaning no noise in the reward and Q function estimates), which achieves linear convergence, can use ideas from quasi-newton methods and recover their quadratic convergence. Given the excitement surrounding policy gradient methods and their convergence rates, this is a valuable direction and family of ideas. Unfortunately, the reviewers had many concerns about presentation, and also of the exact meaning and relationship of the results to prior work; I'll add to this and note that one issue with quasi-newton methods is that it is unclear how long the ""burn-in"" phase is, meaning the phase before their quadratic convergence kicks in, and this is still an issue in the present work's theory; another issue, as raised by reviewers, is the difference between the regularized and unregularized optimal policies. As such, it makes sense for this paper to receive more time and polish.",ICLR2022, +HJgG_An0J4,1544630000000.0,1545350000000.0,1,HyxCxhRcY7,HyxCxhRcY7,"Limited novelty, but interesting results",Accept (Poster),"The paper proposes a new fine-tuning method for improving the performance of existing anomaly detectors. + +The reviewers and AC note the limitation of novelty beyond existing literature. + +This is quite a borader line paper, but AC decided to recommend acceptance as comprehensive experimental results (still based on empirical observation though) are interesting.",ICLR2019,4: The area chair is confident but not absolutely certain +eR3twRotB_,1576800000000.0,1576800000000.0,1,S1eWbkSFPS,S1eWbkSFPS,Paper Decision,Reject,Two reviewers are concerned about this paper while the other one is slightly positive. A reject is recommended.,ICLR2020, +JXBb6KtsT4,1576800000000.0,1576800000000.0,1,SJlpy64tvB,SJlpy64tvB,Paper Decision,Reject,"The paper investigates questions around adversarial attacks in a continual learning algorithm, i.e., A-GEM. While reviewers agree that this is a novel topic of great importance, the contributions are quite narrow, since only a single model (A-GEM) is considered and it is not immediately clear whether this method transfers to other lifelong learning models (or even other models that belong to the same family as A-GEM). This is an interesting submission, but at the moment due to its very narrow scope, it seems more appropriate as a workshop submission investigating a very particular question (that of attacking A-GEM). As such, I cannot recommend acceptance.",ICLR2020, +HkgOkmBleN,1544730000000.0,1545350000000.0,1,rkgBHoCqYX,rkgBHoCqYX,Meta review,Accept (Poster),"The manuscript studies a random matrix approach to recover sparse principal components. This work extends prior work using soft thresholding of the sample covariance matrix to enable sparse PCA. In this light, the main contribution of the paper is a study of generalizing soft thresholding to a broader class of functions and showing that this improves performance. The contributions of this paper are primarily theoretical. + +The reviewers and AC note issues with the discussion that can be further improved to better illustrate contributions, and place this work in context. In particular, multiple reviewers assumed that ""kernel"" referred to the covariance matrix. The authors provide a satisfactory rebuttal addressing these issues. + +While not unanimous, overall the reviewers and AC have a positive opinion of this paper and recommend acceptance.",ICLR2019,3: The area chair is somewhat confident +HyeR0YkSg4,1545040000000.0,1545350000000.0,1,BJe-DsC5Fm,BJe-DsC5Fm,Effective approach to zero order optimization with good analysis,Accept (Poster),"This is a solid paper that proposes and analyzes a sound approach to zero order optimization, covering a variants of a simple base algorithm. After resolving some issues during the response period, the reviewers concluded with a unanimous recommendation of acceptance. Some concerns regarding the necessity for such algorithms persisted, but the connection to adversarial examples provides an interesting motivation.",ICLR2019,5: The area chair is absolutely certain +rklZnFS-gN,1544800000000.0,1545350000000.0,1,SklVEnR5K7,SklVEnR5K7,Anti-aliasing has been explored before.,Reject,"The reviewers are reasonably positive about this submission although two of them feel the paper is below acceptance threshold. AR1 advocates large scale experiments on ILSVRC2012/Cifar10/Cifar100 and so on. AR3 would like to see more comparisons to similar works and feels that the idea is not that significant. AR2 finds evaluations flawed. On balance, the reviewers find numerous flaws in experimentation that need to be improved. + +Additionally, AC is aware that approaches such as 'Convolutional Kernel Networks' by J. Mairal et al. derive a pooling layer which, by its motivation and design, obeys the sampling theorem to attain anti-aliasing. Essentially, for pooling, they obtain a convolution of feature maps with an appropriate Gaussian prior to sampling. Thus, on balance, the idea proposed in this ICLR submission may sound novel but it is not. Ideas such as 'blurring before downsampling' or 'low-pass filter kernels' applied here are simply special cases of anti-aliasing. The authors may also want to read about aliasing in 'Invariance, Stability, and Complexity of Deep Convolutional Representations' to see how to prevent aliasing. On balance, the theory behind this problem is mostly solved even if standard networks overlook this mechanism. Note also that there exist a fundamental trade-off between shift-invariance plus anti-aliasing (stability) and performance; this being a reason why max-pooling is still preferred over anti-aliasing (better performance versus stability). Though, this is nothing new for those who delve into more theoretical papers on CNNs: this is an invite for the authors to go thoroughly first through the relevant literature/numerous prior works on this topic.",ICLR2019,5: The area chair is absolutely certain +AiaGte_VugE,1642700000000.0,1642700000000.0,1,1uf_kj0GUF-,1uf_kj0GUF-,Paper Decision,Reject,"In this paper, the authors propose a non-parametric approach for learning a two-layer neural net. I agree with the authors and reviewers that this is a timely problem. However, the solution in this paper come short of achieving this goal. In particular, the assumtions are very strong and cannot be generalized (e.g., non-negativity). The authors also need to better spell out the sample complexity.",ICLR2022, +BJjr2GI_g,1486400000000.0,1486400000000.0,1,r1Aab85gg,r1Aab85gg,ICLR committee final decision,Accept (Poster),"This is a nice contribution and that present some novel and interesting ideas. At the same time, the empirical evaluation is somewhat thin and could be improved. Nevertheless, the PCs believe this will make a good contribution to the Conference Track.",ICLR2017, +84Tre0EpZZ0o,1642700000000.0,1642700000000.0,1,KJggliHbs8,KJggliHbs8,Paper Decision,Accept (Poster),"In this submission, the authors presented a framework (GIANT) for self-supervised learning to improve LM by leveraging graph information. Reviewers agree that the method is somewhat novel, the (partial) theoretical analysis is interesting, and the evaluations are strong. We thank the authors for doing an excellent job in rebuttal which cleared essentially all the questions reviewers initially raised.",ICLR2022, +BJxcjAJ4eN,1544970000000.0,1545350000000.0,1,Bygh9j09KX,Bygh9j09KX,Area chair recommendation,Accept (Oral),"This paper proposes a hypothesis about the kinds of visual information for which popular neural networks are most selective. It then proposes a series of empirical experiments on synthetically modified training sets to test this and related hypotheses. The main conclusions of the paper are contained in the title, and the presentation was consistently rated as very clear. As such, it is both interesting to a relatively wide audience and accessible. + +Although the paper is comparatively limited in theoretical or algorithmic contribution, the empirical results and experimental design are of sufficient quality to inform design choices of future neural networks, and to better understand the reasons for their current behavior. + +The reviewers were unanimous in their appreciation of the contributions, and all recommended that the paper be accepted. + +",ICLR2019,5: The area chair is absolutely certain +8BLREiG-q5,1610040000000.0,1610470000000.0,1,yvQKLaqNE6M,yvQKLaqNE6M,Final Decision,Accept (Poster),"The paper received 3 reviews with positive ratings: 7,6,7. The reviewers appreciated overall quality of the manuscript, thoroughness of the evaluation, and practical importance of this work (mentioning though that the technical novelty is still not high). They also acknowledged impressive empirical performance. The authors provided detailed responses to each of the reviews separately, which seemed to have resolved the remaining concerns. +As a result, the final recommendation is to accept this work for presentation at ICLR as a poster.",ICLR2021, +ByeSoVN1l4,1544660000000.0,1545350000000.0,1,r1xywsC9tQ,r1xywsC9tQ,"Interesting exploration of the topic, but no clear contributions",Reject,"All three reviewers found this to be an interesting exploration of a reasonable topic—the use of ontologies in word representations—but all three also expressed serious concerns about clarity and none could identify a concrete, sound result that the paper contributes to the field.",ICLR2019,5: The area chair is absolutely certain +y-f3SvPFfm,1576800000000.0,1576800000000.0,1,HklliySFDS,HklliySFDS,Paper Decision,Reject,"This manuscript describes a continual learning approach where individual instances consist of sequences, such as language modeling. The paper consists of a definition of a problem setting, tasks in that problem setting, baselines (not based on existing continual learning approaches, which the authors argue is to highlight the need for such techniques, but with which the reviewers took issue), and a novel architecture. + +Reviews focused on the gravity of the contribution. R1 and R2, in particular, argued that the paper is written as though the problem/benchmark definition is the main contribution. R2 mentions that in spite of this, the methods section jumps directly into the candidate architecture. As mentioned above, several reviewers also took issue with the fact that existing CL techniques are not employed as baselines. The authors engaged with reviewers and promised updates, but did not take the opportunity to update their paper. + +As many of the reviewers' comments remain unaddressed and the authors' updates did not materialize, I recommend rejection, and encourage the authors to incorporate the feedback they have received in a future submission.",ICLR2020, +OvKG2N3mS,1576800000000.0,1576800000000.0,1,Bylh2krYPr,Bylh2krYPr,Paper Decision,Reject,"This paper proposes question-answering as a general paradigm to decode and understand the representations that agents develop, with application to two recent approaches to predictive modeling. During rebuttal, some critical issues still exist, e.g., as Reviewer#3 pointed out, the submission in its current form lacks experimental analysis of the proposed conditional probes, especially the trade-offs on the reliability of the representation analysis when performed with a conditional probe as well as a clear motivation for the need of a language interface. The authors are encouraged to incorporate the refined motivation and add more comprehensive experimental evaluation for a possible resubmission.",ICLR2020, +ytto0N98noT,1610040000000.0,1610470000000.0,1,Kkw3shxszSd,Kkw3shxszSd,Final Decision,Reject,"This paper tests out some straightforward data augmentation strategies on the protein inputs to the transformer used in the TAPE paper. Overall, there is insufficient intellectual merit to warrant publication at ICLR. As a side-note, the quality of the manuscript in terms of scholarliness of presentation was overall lacking.",ICLR2021, +KvC6pvYjWM,1576800000000.0,1576800000000.0,1,Syg6jTNtDH,Syg6jTNtDH,Paper Decision,Reject,"This paper proposes better methods to handle numerals within word embeddings. + +Overall, my impression is that this paper is solid, but not super-exciting. The scope is a little bit limited (to only numbers), and it is not by any means the first paper to handle understanding numbers within word embeddings. A more thorough theoretical and empirical comparison to other methods, e.g. Spithourakis & Riedel (2018) and Chen et al. (2019), could bring the paper a long way. + +I think this paper is somewhat borderline, but am recommending not to accept because I feel that the paper could be greatly improved by making the above-mentioned comparisons more complete, and thus this could find a better place as a better paper in a new venue.",ICLR2020, +kmRrrqPqrQE,1642700000000.0,1642700000000.0,1,K47zHehHcRc,K47zHehHcRc,Paper Decision,Reject,"The paper introduces the notion of interventional consistency of a representation learned using autoencoders, which is claimed to be a desirable property for disentanglement. The reviewers agree that the contributions are novel and relevant, but they also found the paper hard to follow due to a lack of clarity and motivation. Further, they considered the underlying assumptions very strong and possibly hard to find practical instances where they may hold (e.g., the assumption that statistical dependencies in the prior are preserved by the response map). The reviewers also noted that some real-world examples showing the interventional consistency would be helpful. + +After all, the paper contains interesting ideas and we would like to encourage the authors to pursue this line of work. Still, the paper in its current form is not ready for publication. We encourage the authors to address the reviewers' comments explicitly in a future version of the manuscript.",ICLR2022, +1uLQMAPGuW,1610040000000.0,1610470000000.0,1,QnzSSoqmAvB,QnzSSoqmAvB,Final Decision,Reject,"There is a pretty good consensus that this paper should not be accepted at ICLR. The reviewers do not seem think that extending MuZero to non-deterministic MuZero constitutes a significant advance. Three reviewers give clear rejects with scores (3, 4, 5) all with good confidence (4). A fourth reviewer gave a score of 6, i.e., borderline accept. While the fifth reviewer recommends, he does not seem to be very confident and did not step in to champion the paper. The program committee decided that the paper in its current form does not meet the acceptance bar.",ICLR2021, +yzIcqyRxbKS,1642700000000.0,1642700000000.0,1,L_sHGieq1D,L_sHGieq1D,Paper Decision,Reject,"This paper presents a domain generalization method for semantic segmentation. The model is trained on synthetic data (source) and is tested on unseen real datasets (target). The authors propose a simple data augmentation method, AdvStyle, generating unconstrained adversarial examples for the training on the source domain. + +There was no consensus on the method among the reviewers. Several issues have been raised. After rebuttal and discussion, no one really changed her/his mind. The motivation of why focus just on driving scenes is still questionable. Definitively, it could be interesting to investigate further why it is not straightforward to have gains on other kinds of scenes. Finally, we encourage the authors to address the raised concerns regarding the discussion with previous works and the comparisons for future publication.",ICLR2022, +3kK9YAtAfKn,1610040000000.0,1610470000000.0,1,IG3jEGLN0jd,IG3jEGLN0jd,Final Decision,Reject,"This promising work proves that the proposed contrastive learning approach to representation learning can recover the underlying topic posterior information given standard topic modelling assumptions. The work provides detailed proof and detailed experiments. The analysis is interesting and yields interesting insights. However, the experimental results are somewhat weak by lacking comparison with more recent document representation work. + +Pros: +- Good detailed proofs and experiments. +- Interesting idea of using topic modelling to understand representation learning. + +Cons: +- The description of DirectNCE is somewhat hidden and could be better introduced in the paper. +- Experimental baselines are weak lacking a comparison to recent document representation work such as Arora et al. 2019. +- Stronger classification baselines could be incorporated.",ICLR2021, +kcLvb6eiDN,1576800000000.0,1576800000000.0,1,r1enqkBtwr,r1enqkBtwr,Paper Decision,Reject,"The paper studies, theoretically and empirically, the problem when generalization error decreases as $n^{-\beta}$ where $\beta$ is not $\frac{1}{2}$. It analyses a Teacher-Student problem where the Teacher generates data from a Gaussian random field. The paper provides a theorem that derives $\beta$ for Gaussian and Laplace kernels, and show empirical evidence supporting the theory using MNIST and CIFAR. + +The reviews contained two low scores, both of which were not confident. A more confident reviewer provided a weak accept score, and interacted multiple times with the authors during the discussion period (which is one of the nice things about the ICLR review process). However, this reviewer also noted that ICLR may not be the best venue for this work. + +Overall, while this paper shows promise, the negative review scores show that the topic may not be the best fit to the ICLR audience.",ICLR2020, +xBFxR-EPLhR,1642700000000.0,1642700000000.0,1,7l1IjZVddDW,7l1IjZVddDW,Paper Decision,Accept (Spotlight),"This paper received 4 quality reviews. The rebuttal and discussions were effective. All reviewers raised their ratings after the rebuttal. It finally received 3 ratings of 8, and 1 rating of 5. The AC concurs with the contributions made by this work and recommend acceptance.",ICLR2022, +zemZKR2ztM,1576800000000.0,1576800000000.0,1,rJeO3aVKPB,rJeO3aVKPB,Paper Decision,Reject,"This paper presents a simple trick of taking multiple SGD steps on the same data to improve distributed processing of data and reclaim idle capacity. The underlying ideas seems interesting enough, but the reviewers had several concerns. + +1. The method is a simple trick (R2). I don't think this is a good reason to reject the paper, as R3 also noted, so I think this is fine. +2. There are not clear application cases (R3). The authors have given a reasonable response to this, in indicating that this method is likely more useful for prototyping than for well-developed applications. This makes sense to me, but both R3 and I felt that this was insufficiently discussed in the paper, despite seeming quite important to arguing the main point. +3. The results look magical, or too good to be true without additional analysis (R1 and R3). This concerns me the most, and I'm not sure that this point has been addressed by the rebuttal. In addition, it seems that extensive hyperparameter tuning has been performed, which also somewhat goes against the idea that ""this is good for prototyping"". If it's good for prototyping, then ideally it should be a method where hyperparameter tuning is not very necessary. +4. The connections with theoretical understanding of SGD are not well elucidated (R1). I also agree this is a problem, but perhaps not a fatal one -- very often simple heuristics prove effective, and then are analyzed later in follow-up papers. + +Honestly, this paper is somewhat borderline, but given the large number of good papers that have been submitted to ICLR this year, I'm recommending that this not be accepted at this time, but certainly hope that the authors continue to improve the paper towards a final publication at a different venue. +",ICLR2020, +k1mFVn3bgmh,1610040000000.0,1610470000000.0,1,BVPowUU1cR,BVPowUU1cR,Final Decision,Reject,The paper received low ratings and the reviewers pointed out a number of issues. The authors' short response failed to address these concerns. ,ICLR2021, +SJoHB1aSG,1517250000000.0,1517260000000.0,563,HyY0Ff-AZ,HyY0Ff-AZ,ICLR 2018 Conference Acceptance Decision,Reject,The reviewers point out that this is a well known result and is not novel.,ICLR2018, +6Wezr_C6UK,1576800000000.0,1576800000000.0,1,HkgTTh4FDH,HkgTTh4FDH,Paper Decision,Accept (Poster),"This paper provides theoretical guarantees for adversarial training. While the reviews raise a variety of criticisms (e.g., the results are under a variety of assumptions), overall the paper constitutes valuable progress on an emerging problem.",ICLR2020, +ryFq4k6Hf,1517250000000.0,1517260000000.0,414,SkOb1Fl0Z,SkOb1Fl0Z,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"The paper presents a domain-specific language for RNN architecture search, which can be used in combination with learned ranking function or RL-based search. While the approach is interesting and novel, the paper would benefit from an improved evaluation, as pointed out by reviewers. For example, the paper currently evaluated coreDSL+ranking for language modelling and extendedDSL+RL for machine translation. The authors should use the same evaluation protocol on all tasks, and also compare with the state-of-the-art MT approaches.",ICLR2018, +8YtBPnxSoVJ,1610040000000.0,1610470000000.0,1,tf8a4jDRFCv,tf8a4jDRFCv,Final Decision,Reject,"The authors propose a novel and elegant way for learning parameterized aggregation functions and show that their approach can achieve good performance on several datasets (in many cases outperforming other state-of-the-art methods). This is also appreciated by most of the reviewers. However, there have been several issues regarding the description of the proposed approach and the conducted experiments. These have been partly resolved in the rebuttal phase but should be more carefully assessed in another iteration of reviews. + +More specifically: Experiments regarding learning of a single LAF versus multiple LAF should partly be included in the main paper (e.g. Figure 4 showing the performance for different numbers of LAFs). When constructing deep sets in this setting with a similar number of aggregation function it appears not very sensible to me to incorporate the same aggregation function multiple times but one would rather include a set of different fixed aggregation functions (these could be derived from the proposed LAFs). The experiments would also benefit from including set transformers as baselines (set transformers are discussed in the paper but not considered in the experiments as the authors argue that this is an orthogonal approach; while I agree that the goal of set transformers is different, I think there would be big value in understanding how these approaches compare and/or can be combined). + +Beyond that I think a brief discussion of the related topic of learning pooling operations (e.g., in CNNs) is warranted. + +Some reviewers also find that their concerns are only partially addressed in the rebuttal (e.g., regarding the extension from sets to vectors and applications in which the achieved performance differences are bigger). + +One point which didn’t come up in the reviews but I would want to see addressed in a future version of the paper is an extended discussion of Figure 4. While there are cases were LAF clearly performs better, there are also cases, where Deep Sets outperform (this seem to be the cases in which the used aggregation units match the considered task). As LAFs can in theory represent these aggregation function it still seems challenging to learn the correct form of the aggregation function — I would appreciate deeper insights an analysis of this aspect. An immediate heuristic solution for many applications for improving performance thus might be to combine LAFs and standard aggregators. + +In summary, the submitted paper has big potential but should be carefully revised and the experiments should be extended before the paper is accepted.",ICLR2021, +h_NfqWcco6c,1610040000000.0,1610470000000.0,1,dqyK5RKMaW4,dqyK5RKMaW4,Final Decision,Reject,"This paper considers the problem of hardware and software co-design for neural accelerators. Specifically, it looks at hardware and the software compiler that maps DNN to hardware. It employs Bayesian Optimization (BO) to perform joint search over hardware and software design parameters in an alternating manner. To handle black-box constraints that cannot be evaluated without performing simulations, the method uses constrained BO algorithms. + +The paper talks about two technical challenges: +1) Black-box constraints. There is a lot of literature on constrained BO. +2) Semi-discrete design variables. The paper didn't propose any generic solution. There are some recent papers to handle mixed variables that may be useful. +https://arxiv.org/abs/1907.01329 +https://arxiv.org/abs/1906.08878 + +BO methodology is justified. There is recent work on hardware and software co-design for neural accelerators and should be taken into account for both qualitative and quantitative comparison. + +Overall, my assessment is that the paper in its current form lacks technical novelty for acceptance.",ICLR2021, +Flqnlcg2Pa,1576800000000.0,1576800000000.0,1,B1lPETVFPS,B1lPETVFPS,Paper Decision,Reject,"The paper proposes new regularizations on contrastive disentanglement. After reading the author's response, all the reviewers still think that the contribution is too limited and all agree to reject.",ICLR2020, +69V28tjSN,1576800000000.0,1576800000000.0,1,ryxjnREFwH,ryxjnREFwH,Paper Decision,Accept (Spotlight),"Main content: + +Blind review #1 summarizes it well: + +This paper presents a semantic parser that operates over passages of text instead of a structured data source. This is the first time anyone has demonstrated such a semantic parser (Siva Reddy and several others have essentially used unstructured text as an information source for a semantic parser, similar to OpenIE methods, but this is qualitatively different). The key insight is to let the semantic parser point to locations in the text that can be used in further symbolic operations. This is excellent work, and it should definitely be accepted. I have a ton of questions about this method, but they are good questions. + +-- + +Discussion: + +The reviews all agree on a generally positive assessment, and focus on details that have been addressed, rather than major problems. + +-- + +Recommendation and justification: + +This paper should be accepted. Even though novelty in terms of fundamental machine learning components is minimal, but the architecture employing neural models to do symbolic work is a good contribution in a crucial direction (especially in the theme of ICLR).",ICLR2020, +Xyls5Y1Vra,1576800000000.0,1576800000000.0,1,BkxSmlBFvr,BkxSmlBFvr,Paper Decision,Accept (Poster),The authors analyze knowledge graph embedding models for multi-relational link predictions. Three reviewers like the work and recommend acceptance. The paper further received several positive comments from the public. This is solid work and should be accepted.,ICLR2020, +UsEOQVKJ4O,1576800000000.0,1576800000000.0,1,BJeVklHtPr,BJeVklHtPr,Paper Decision,Reject,The paper is rejected based on unanimous reviews.,ICLR2020, +FWywWnOTIf,1576800000000.0,1576800000000.0,1,BJeGlJStPr,BJeGlJStPr,Paper Decision,Accept (Poster),"The authors propose a novel distributed reinforcement learning algorithm that includes 3 new components: a target network for the policy for stability, a circular buffer, and truncated importance sampling. The authors demonstrate that this improves performance while decreasing wall clock training time. + +Initially, reviewers were concerned about the fairness of hyper parameter tuning, the baseline implementation of algorithms, and the limited set of experiments done on the Atari games. After the author response, reviewers were satisfied with all 3 of those issues. + +I may have missed it, but I did not see that code was being released with this paper. I think it would greatly increase the impact of the paper at the authors release source code, so I strongly encourage them to do so. + +Generally, all the reviewers were in consensus that this is an interesting paper and I recommend acceptance.",ICLR2020, +0zSOgdUBpLa,1642700000000.0,1642700000000.0,1,g1SzIRLQXMM,g1SzIRLQXMM,Paper Decision,Accept (Spotlight),"This paper experiments with what is required for a deep neural network to be similar to the visual activity in the ventral stream (as judged by the brainscore benchmark). The authors have several interesting contributions, such as showing that a small number of supervised updates are required to predict most of the variance in the brain activity, or that models with randomized synaptic weights can also predict a significant portion of the variance. These different points serve to better connect deep learning to important questions in neuroscience and the presence of the paper at ICLR would create good questions. The discussion between authors and reviewers resulted in a unanimous vote for acceptance, and the authors already made clarifications to the paper. I recommend acceptance.",ICLR2022, +BJlH7A5CJN,1544630000000.0,1545350000000.0,1,SJxCsj0qYX,SJxCsj0qYX,ICLR 2019 decision,Reject,"This paper proposes new GAN training method with multi generator architecture inspired by Stackelberg competition in game theory. The paper has theoretical results showing that minmax gap scales to \eps for number of generators O(1/\eps), improving over previous bounds. Paper also has some experimental results on Fashion Mnist and CIFAR10 datasets. + +Reviewers find the theoretical results of the paper interesting. However, reviewers have multiple concerns about comparison with other multi generator architectures, optimization dynamics of the new objective and clarity of writing of the original submission. While authors have addressed some of these concerns in their response reviewers still remain skeptical of the contributions. Perhaps more experiments on imagenet quality datasets with detailed comparison can help make the contributions of the paper clearer. ",ICLR2019,4: The area chair is confident but not absolutely certain +VPsJwbUEBF,1576800000000.0,1576800000000.0,1,ryxOUTVYDH,ryxOUTVYDH,Paper Decision,Accept (Poster),"This paper proposes an ensemble method to identify noisy labels in the training data of supervised learning. The underlying hypothesis is that examples with label noise require memorization. The paper proposes methods to identify and remove bad training examples by retaining only the training data that maintains low losses after perturbations to the model parameters. This idea is developed in several candidate ensemble algorithms. One of the proposed ensemble methods exceeds the performance of state-of-the-art methods on MNIST, CIFAR-10 and CIFAR-100. + +The reviewers found several strengths and a few weaknesses in the paper. The paper was well motivated and clear. The proposed solution was novel and plausible. The experiments were comprehensive. The reviewers identified several parts of the paper that could be more clear or where more detail could be provided, including a complexity analysis and +extended experiments. The author response addressed the reviewer questions directly and also in a revised document. In the discussion phase, the reviewers were largely satisfied that their concerns were addressed. + +This paper should be accepted for publication as the paper presents a clear problem and solution method along with convincing evidence of method's merits. +",ICLR2020, +u1lT_BrBfu6,1610040000000.0,1610470000000.0,1,mCLVeEpplNE,mCLVeEpplNE,Final Decision,Accept (Poster),"This paper initially received mixed ratings but after the rebuttal, all reviewers recommended acceptance. Reviewers appreciate the novel technical ideas and extensive experimental results. ",ICLR2021, +Byui81Trz,1517250000000.0,1517260000000.0,858,rk8R_JWRW,rk8R_JWRW,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers agreed that the paper was somewhat preliminary in terms of the exposition and empirical work. They all find the underlying problem quite interesting and challenging (i.e. spiking recurrent networks). However, the manuscript failed to motivate the approach. In particular, everyone agrees that spiking networks are very interesting, but it's unclear what problem the presented work is solving. The authors need to be more clear about their motivation and then close the loop with empirical validation that their approach is solving the motivating problem (i.e. do we learn something about biological plausibility, are spiking networks better than traditional LSTMs at modeling a particular kind of data, or are they more efficiently implemented on hardware?). Motivating the work with one of these followed by convincing experiments would make this a much stronger paper. + +Pros: +- Tackles an interesting and challenging problem at the intersection of neuroscience and ML +- A novel method for creating a spiking LSTM + +Cons: +- The motivation is not entirely clear +- The empirical analysis is too simple and does not demonstrate the advantages of this approach +- The paper seems unfocused and could use rewriting + +",ICLR2018, +A-nteRdwe6,1642700000000.0,1642700000000.0,1,w8HXzn2FyKm,w8HXzn2FyKm,Paper Decision,Reject,"This paper studies a stochastic approximation framework for multi-agent consensus algorithms driven by Markovian noise in the spirit of the classical paper of Kushner & Yin. The authors' main result is that - modulo a series of assumptions, some conceptual, some technical - the generated sequence of play reaches a consensus, and they also estimate the rate of this convergence. + +Even though the paper's premise is interesting, the reviewers identified several weaknesses in the paper, and the reviewers that raised them where not convinced by the authors' replies (especially regarding the relative lack of numerical evidence to demonstrate the claims that are not supported by the theory, such as the role of Assumption 6). After my own reading of the paper and the discussion with the reviewers during the rebuttal phase, I concur that this version of the paper does not clear the bar for acceptance - but, at the same time, I would encourage the authors to submit a suitably revised version at the next opportunity.",ICLR2022, +SJnREyTrG,1517250000000.0,1517260000000.0,470,ryZElGZ0Z,ryZElGZ0Z,ICLR 2018 Conference Acceptance Decision,Reject,"There was substantial disagreement between reviewers on how this paper contributes to the literature; it seems (having read the paper) that the problem tackled here is clearly quite interesting, but it is hard to tease out in the current version exactly what the contribution does to extend beyond current art.",ICLR2018, +pvWIFp0Bx74,1642700000000.0,1642700000000.0,1,sMqybmUh_u8,sMqybmUh_u8,Paper Decision,Reject,"This paper studies meta-learning in hierarchical RL, where the unknown hierarchy is learned during meta-training and then applied to a test task. The authors propose an optimistic algorithm for solving this problem and analyze it. The main contribution of the paper is in the first end-to-end analysis. + +This paper has three borderline reviews and one reject. Despite the differences in the scores, all reviewers share the same opinion. The idea is novel and very interesting. However, the algorithm and its analysis rely on many assumptions, many of which are introduced in this work and not properly discussed. Because of this, the paper needs a major revision and is rejected for now.",ICLR2022, +X83Vnf_qlBs,1642700000000.0,1642700000000.0,1,B5XahNLmna,B5XahNLmna,Paper Decision,Accept (Poster),"This paper reveals that popular data poisoning systems, Fawkes and LowKey, fail to effectively protect user privacy in facial recognition. The methods to defend against poisoning attacks are quite simple---you can either adaptively tune the face recognition models or just wait for more advanced facial recognition systems. Given these “disappointed” findings from the technical solution side, this paper further argues that legislation may be the only viable solution to prevent abuses of facial recognition. + +Overall, all the reviewers highly appreciate the comprehensive and rigorous evaluations provided in this paper and enjoy reading it. The biggest concern is raised by the Reviewer 6s7m, given this work fails to discuss/compare to previous works on Facial identity anonymizing and the technical contribution is incremental. During the discussion period, all other reviewers reach a consensus that 1) facial identity anonymizing is not relevant; and 2) this work make enough contributions and is worthy to be heard by the general community; the Reviewer 6s7m still hold the opposite opinion, but is okay for accepting this paper anyway. + +In the final version, the authors should include all the clarification provided in the discussion period.",ICLR2022, +ryl2V8tAy4,1544620000000.0,1545350000000.0,1,HylTXn0qYX,HylTXn0qYX,ICLR 2019 decision,Accept (Poster),"This paper proposes a new method for verifying whether a given point of a two layer ReLU network is a local minima or a second order stationary point and checks for descent directions. All reviewers agree that the algorithm is based on number of new techniques involving both convex and non-convex QPs, and is novel. The method proposed in the paper has significant limitations as the method is not robust to handle approximate stationary points. Given these limitations, there is a disagreement between reviewers about the significance of the result . While I share the same concerns as R4, I agree with R3 and believe that the new ideas in the paper will inspire future work to extend the proposed method towards addressing these limitations. Hence I suggest acceptance. ",ICLR2019,4: The area chair is confident but not absolutely certain +7jU5b3dP8X,1642700000000.0,1642700000000.0,1,F7_odJIeQ26,F7_odJIeQ26,Paper Decision,Reject,"This papers presents a method for solving symbolic mathematic tasks. It first pretrains a transformer model with language translation, and then fine-tunes the pretrained model to the downstream mathematic tasks. It contains interesting points but our reviewers have serious concerns which are not fully resolved in the rebuttal. For the integration task, the proposed method achieves good results comparing with Lample & Charleston 2019 (LC) with much less training data. However, as the authors also noted (see the rebuttal), the higher accuracies in LC are achieved with more data. If the authors could also at least show how much data the proposed method needs to achieve the best result in LC, it will be very helpful for understanding the value of this work. In addition, the proposed method did not show similar improvements on the ODE task. So it is hard to see how this proposed method can be generally useful. Our reviewers also have big concerns on writing. Many sentences are really confusing.",ICLR2022, +mFPp0_ZsFUb,1642700000000.0,1642700000000.0,1,yRYtnKAZqxU,yRYtnKAZqxU,Paper Decision,Reject,"In this paper, data augmentation for graph contrastive learning (GraphCL) is studied. Most reviewers agree that the problems addressed in this paper are interesting and important for unsupervised graph representation learning literature. However, many reviewers were not fully satisfied with the novelty and the claim of the main contribution of this paper, a theoretical analysis of the conditions under which data augmentation works in GraphCL, due to the lack of clear explanation and evidence. Unfortunately, no reviewer has suggested acceptance of this paper at this time.",ICLR2022, +JxjQabM2Adv,1642700000000.0,1642700000000.0,1,FqKolXKrQGA,FqKolXKrQGA,Paper Decision,Reject,"The paper introduces a transformer-like architecture to perform network inference in network games. While the reviewers acknowledge that the research direction is interesting, they raise concerns regarding the significance of the contribution in terms of methodology, particularly in light of the state of the art, and the experimental evaluation, which in their view did not support the promise of the work. The authors did not reply/follow up on the reviews during the rebuttal period. I would encourage the authors to use the reviewers' comments to revise their paper and resubmit to another conference.",ICLR2022, +KONyY6CB1We,1642700000000.0,1642700000000.0,1,fEcbkaHqlur,fEcbkaHqlur,Paper Decision,Reject,"This paper presents a simple, reasonable, alternative to target networks. Given the effectiveness of target networks, and the fact that they are still somewhat poorly understood, this is a good topic for consideration. + +It is unfortunate that the paper did not have more depth, in terms of analysis and/or analytical experiments that expose the properties of the suggested approach, and the mechanism still seems heuristic (and inspired by the success of target networks, and similar) more than principled. That said, the proposed mechanism does seem somewhat effective (even if performance differences are not very pronounced), and is clean and simple to implement. + +This version of the paper is rejected because we believe the paper could be a lot better than it currently is. If the proposed regularisation mechanism is really as good as the authors argue, then it should be possible (and hopefully even easy) to demonstrate this clearly in more settings (e.g., in more algorithms). Alternatively or additionally, the authors could consider digging deeper into the understanding of the method. For instance, the paper often argues that target networks slow down learning, but (naively?) one could argue the exact same point (in general) for regularisation: this will trade off stability for speed. It could be that the proposed mechanism is indeed a better way to achieve this trade off, but this is currently argued heuristically and not really proven (either theoretically, or with sufficient empirical evidence) +(For what it is worth, I personally did not find Section 3.2 particularly enlightening, because it is known these TD algorithms are not actually gradient algorithms, and hence considering 'losses' and 'gradients' in this way does not convince me we are getting at an actual deeper understanding of the dynamics of these algorithms.) + +I wholeheartedly encourage the authors to take the comments and suggestions to heart and use these to improve the paper (as they have already started to do during this reviewing cycle), because I believe that there could be quite a good paper on this topic. I hope the authors can convince themselves and their readers more convincingly that this idea is an actual, lasting contribution to the literature. Ultimately, if they can, this will make the paper more impactful. So although I appreciate this decision will come as a disappointment, I hope the authors also see this as an opportunity to make a larger research impact. + +In particular, I would encourage considering: 1) comparing to our current theoretical understanding of target networks (see, e.g., [1]); 2) considering the effect of multi-step updates (shown in, e.g., [2] and [3] to be quite effective); and 3) considering whether the proposed approach (or a variation thereof) could be understood as a more fundamental idea: could this update be derived from first principles? + +[1] Shangtong Zhang, Hengshuai Yao, Shimon Whiteson (2021). Breaking the Deadly Triad with a Target Network. + +[2] Hessel et al. (2017). Rainbow: Combining Improvements in Deep Reinforcement Learning. + +[3] van Hasselt et al. (2018). Deep Reinforcement Learning and the Deadly Triad.",ICLR2022, +B1lPGBk-xV,1544770000000.0,1545350000000.0,1,HkeoOo09YX,HkeoOo09YX,meta review,Accept (Poster),"This paper proposes to use meta-learning to design MCMC sampling distributions based on Hamiltonian dynamics, aiming to mix faster on set of problems that are related to the training problems. The reviewers agree that the paper is well-written and the ideas are interesting and novel. The main weaknesses of the paper are that (1) there is not a clear case for using this method over SG-HMC, and (2) there are many design choices that are not validated. The authors revised the paper to address some aspects of the latter concern, but are encouraged to add additional revisions to clarify the points brought up by the reviewers. +Despite the weaknesses, the reviewers all agree that the paper exceeds the bar for acceptance. I also recommend accept.",ICLR2019,4: The area chair is confident but not absolutely certain +BJxfXfTRkV,1544630000000.0,1545350000000.0,1,H1lqZhRcFm,H1lqZhRcFm,Novel work,Accept (Poster),"The paper proposes a new unsupervised learning scheme via utilizing local maxima as an indicator function. + +The reviewers and AC note the novelty of this paper and good empirical justifications. Hence, AC decided to recommend acceptance. + +However, AC thinks the readability of the paper can be improved.",ICLR2019,4: The area chair is confident but not absolutely certain +MDoLfokDsDc,1642700000000.0,1642700000000.0,1,_hszZbt46bT,_hszZbt46bT,Paper Decision,Accept (Poster),"The paper proposes contrastive learning for tabular data to improve anomaly detection. +Strengths: +- Interesting and important problem. +- Usage of contrastive learning for anomaly detection in general multi-variate datasets is novel (as prior work mostly focuses on images) +- Extensive experiments with comparisons to multiple baselines on multiple datasets +- Well-written paper + +The reviewers raised some concerns about novelty (in particular, the relationship to the closely related paper ""Neural Transformation Learning for Deep Anomaly Detection Beyond Images""), hyperparameter tuning and additional baselines. The authors did a great job of addressing the concerns and multiple reviewers raised their scores. During the discussion phase, the consensus decision leaned towards accept. I recommend acceptance and encourage the authors to address any remaining concerns in the final version. + +Additional AC comments: +- Please make sure that the camera ready version does not exceed page limits. https://iclr.cc/Conferences/2022/CallForPapers +- "" As far as we can ascertain, masking was not used for one-class classification before."": There's some related work on pre-training BERT for OOD detection (cf. https://arxiv.org/abs/2004.06100 or https://arxiv.org/abs/2106.03004) which might be worth discussing.",ICLR2022, +SyWFL1pHz,1517250000000.0,1517260000000.0,825,SkxqZngC-,SkxqZngC-,ICLR 2018 Conference Acceptance Decision,Reject,"The paper proposes a BNP topic model that uses a stick-breaking prior over document topics and performs VAE-style inference over them. Unfortunately, the novelty of this work is limited, as VAE-like inference for LDA-like models, inference with stick-breaking priors for VAEs, and placing a prior on the concentration parameter in a non-parametric topic model have all been done before (see e.g. Srivastava & Sutton (2017), Nalisnick & Smyth (2017), and Teh, Kurihara & Welling (2007) respectively). There are also concerns about the correctness of treating topics as parameters (as opposed to random variables) in the proposed model. The authors' clarification regarding this point was helpful but not sufficient to show the validity of the approach.",ICLR2018, +k7BsjiIviL1,1610040000000.0,1610470000000.0,1,7TBP8k7TLFA,7TBP8k7TLFA,Final Decision,Reject,"This work studies the question of universal approximation with neural networks under general symmetries. For this purpose, the authors first leverage existing universal approximation results with shallow fully connected networks defined on infinite-dimensional input spaces, that are then upgraded to provide Universal Approximation of group-equivariant functions using group equivariant convolutional networks. + +Reviewers were all appreciative of the scope of this paper, aimed at unifying different UAT results under the same underlying 'master theorem', bringing a much more general perspective on the problem of learning under symmetry. However, reviewers also expressed concerns about the accessibility and readibility of the current manuscript, pointing at the lack of examples and connections with existing models/results. Authors did a commendable job at adding these examples and incorporating reviewers feedback into a much improved revision. + +After taking all the feedback into account, this AC has the uncomfortable job of recommending rejection at this time. Ultimately, the reason is that this AC is convinced that this paper can be made even better by doing an extra revision that helps the reader navigate through the levels of generality. As it turns out, this paper was reviewed by three top senior experts at the interface of ML and groups/invariances, who themselves found that the treatment could be made more accessible --- thus hinting a difficult read for non-experts. In particular, the main result of this work (theorem 9) is based on a rather intuitive idea (that one can leverage UAT for generic neural nets on the generator of an equivariant function), that requires some technical 'care' in order to be fleshed out. The essence of the proof can be conveyed in simple terms, after which following through the proof is much easier. Similarly, the paper quickly adopts an abstract (yet precise) formalism in terms of infinite-dimensional domains, which again clouds the essential ideas in technical details. While the paper now contains several examples, this AC believes the authors can go to the extra mile of connecting them together, and further discussing the shortcomings of the result --- in particular, the remarks on tensor representations and the invariant case are of great importance in practice, and should be discussed more prominently. Finally, while this work is only concerned with universal approximation, an important aspect that is not mentioned here is the quantitative counterpart, ie what are the approximation rates for symmetric functions under the considered models. +",ICLR2021, +OCGeLE5ZE68,1610040000000.0,1610470000000.0,1,L7WD8ZdscQ5,L7WD8ZdscQ5,Final Decision,Accept (Poster),"This paper studies the *last iterate* convergence of the projected Heavy-ball method (and an adaptive variant) for convex problems, and propose a specific coefficient schedule. All reviewers thought that looking at the last-iterate convergence of the HB method was interesting and that the proofs, while simple, were interestingly novel. Several concerns were raised by the quality of the writing. Several were addressed in a revision and the rebuttal. While R1 did not update their score, the AC thinks that the rebuttal has addressed appropriately their initial concerns. The AC recommends the paper for acceptance, *but* it is important that the authors make an appropriate careful pass over their paper for the camera ready version. + +### comments about the write-up + +- The paper still contains many typos (e.g. missing $1/t$ term in the average after equation (2); many misspelled words, etc.), please carefully proofread your paper again. +- The AC agrees with R1 that the quality of presentation still needs improvement. $\beta_{1t}$ is still used in the introduction without being defined -- please define it properly first e.g. +- The word ""optimal"" and ""optimality"" is usually misused in the manuscript. To refer to the convergence rate of an optimization algorithm, the standard terminology is to talk about the ""suboptimality"" or the ""error"" (e.g. see the terminology used by the cited [Harvey et al. 2019, Jain et al. 2019] papers). For example, one would say that the error or suboptimality of SGD has a $O(1/\sqrt{t})$ convergence rate. Saying ""optimality of"" or ""optimal individual convergence rate"" is quite confusing, and should be corrected. The adjective ""optimal"" (when talking about a convergence rate) should be restricted to when a matching lower bound exists. +- Finally, the text introducing the experimental section should be fixed to clarify the actual results and motivation. Specifically, the ""validate the correctness of our convergence analysis"" only applies in the convex setting. I recommend that a high level description of the convex experiment and the main message of the results is moved from the appendix to the main paper there (there is space). And then, the deep learning experiments can be introduced as just investigating the practical performance of the suggested coefficient schedule for HB.",ICLR2021, +HyglQStBxV,1545080000000.0,1545350000000.0,1,SJl8gnAqtX,SJl8gnAqtX,Lack of technical novelty,Reject,"I tend to agree with reviewers. This is a bit more of an applied type of work and does not lead to new insights in learning representations. +Lack of technical novelty +Dataset too small",ICLR2019,5: The area chair is absolutely certain +HJxKk2wEeV,1545010000000.0,1545350000000.0,1,rJlg1n05YX,rJlg1n05YX,lack of support,Reject,This paper points out methods to obtain sparse convolutional operators. The reviewers have a consensus on rejection due to clarity and lack of support to the claims.,ICLR2019,5: The area chair is absolutely certain +8CQqtcC4rz,1642700000000.0,1642700000000.0,1,XyVXPuuO_P,XyVXPuuO_P,Paper Decision,Reject,"The paper presents a meta-algorithm for learning a posterior-inference algorithm for restricted probabilistic programs. While the reviews agree that this is a very interesting research direction, they also reveal that there are several questions still open. One reviewer points out that there learning to infer should take both the time for learning+inference and the generalization to other programs into account, i.e., what happens if the program is too different from the training set? Is benefit than vanishing? Moreover, as pointed out by another review, recursion as well as while loops are not yet supported. Also, the relation to IC needs some further clarification. These issues show that the paper is not yet ready for publication at ICLR. However we would like to encourage the authors to improve the work and submit it to one of the next AI venues.",ICLR2022, +_mmUJ7DqbB,1576800000000.0,1576800000000.0,1,Skx73lBFDS,Skx73lBFDS,Paper Decision,Reject,"The paper presents a linear classifier based on a concatenation of two types of features for protein function prediction. The two features are constructed using methods from previous papers, based on peptide sequence and protein-protein interactions. + +All the reviewers agree that the problem is an important one, but the paper as it is presented does not provide any methodological advance, and weak empirical evidence of better protein function prediction. Therefore the paper would require a major revision before being suitable for ICLR. +",ICLR2020, +P5bO1SSCmb6,1610040000000.0,1610470000000.0,1,dFwBosAcJkN,dFwBosAcJkN,Final Decision,Accept (Poster),"This paper focuses on the adversarial robustness of deep neural networks against multiple and unforeseen threat models, which proposes a threat model called Neural Perceptual Threat Model (NPTM). The philosophy behind sounds quite interesting to me, namely, approximating human perception with a neural neural ""neural perceptual distance"". This philosophy leads to a novel algorithm design I have never seen, i.e., Perceptual Adversarial Training (PAT) which achieves good robustness against various types of adversarial attacks and even could generalize well to unforeseen perturbation types. + +The clarity and novelty are clearly above the bar of ICLR. While the reviewers had some concerns on the significance, the authors did a particularly good job in their rebuttal. Thus, all of us have agreed to accept this paper for publication! Please carefully address all +comments in the final version.",ICLR2021, +VoolRM9eFA,1576800000000.0,1576800000000.0,1,BkePHaVKwS,BkePHaVKwS,Paper Decision,Reject,"Unfortunately, this was a borderline paper that generated disagreement among the reviewers. After high level round of additional deliberation it was decided that this paper does not yet meet the standard for acceptance. The paper proposes a potentially interesting approach to learning surrogates for non-differentiable and non-decomposable loss functions. However, the work is a bit shallow technically, as any supporting theoretical justification is supplied by pointing to other work. The paper would be stronger with a more serious and comprehensive analysis. The reviewers criticized the lack of clarity in the technical exposition, which the authors attempted to mitigate in the rebuttal/revision process. The paper would benefit from additional clarity and systematic presentation of complete details to allow reproduction.",ICLR2020, +LUTeBSo6ik,1576800000000.0,1576800000000.0,1,HJl8SkBYPr,HJl8SkBYPr,Paper Decision,Reject,"The authors leverage advances in semi-supervised learning and data augmentation to propose a method for active learning. The AL method is based on the principle that a model should consistently label across perturbation/augmentations of examples, and thus propose to choose samples for active learning based on how much the estimated label distribution changes based on different perturbations of a given example. The method is intuitive and the experiments provide some evidence of efficacy. However, during discussion there was a lingering question of novelty that eventually swayed the group to reject this paper. ",ICLR2020, +FHDVIbJfIr,1576800000000.0,1576800000000.0,1,r1g6ogrtDr,r1g6ogrtDr,Paper Decision,Accept (Poster),"The paper proposes an attention mechanism for equivariant neural networks towards the goal of attending to co-occurring features. It instantiates the approach with rotation and reflection transformations, and reports results on rotated MNIST and CIFAR-10. All reviewers have found the idea of using self-attention on top of equivariant feature maps technically novel and sound. There were some concerns about readability which the authors should try to address in the final version. ",ICLR2020, +EqtW-gWRDf,1610040000000.0,1610470000000.0,1,J8_GttYLFgr,J8_GttYLFgr,Final Decision,Accept (Poster),The paper proposes and studies a new SO(2)-equivariant convolution layer for vehicle and pedestrian trajectory prediction. The experiments are detailed and demonstrate the effectiveness of the approach in relation to non-equivariant models.,ICLR2021, +CnoDJzWRFk,1576800000000.0,1576800000000.0,1,BygPO2VKPH,BygPO2VKPH,Paper Decision,Accept (Spotlight),"The paper extends LISTA by introducing gain gates and overshoot gates, which respectively address underestimation of code components and compensation of small step size of LISTA. The authors theoretically analyze these extensions and backup the effectiveness of their proposed algorithm with encouraging empirical results. All reviewers are highly positive on the contributions of this paper, and appreciate the rigorous theory which is further supported by convincing experiments. All three reviewers recommended accept. +",ICLR2020, +RDtcf6hkYy2,1610040000000.0,1610470000000.0,1,0xdQXkz69x9,0xdQXkz69x9,Final Decision,Reject,"This paper presents a method for attacking few-shot learners with poisoning a subset of support set. I believe this might be the first work to address adversarial examples for meta-learners (or few-shot learners), which is a timely issue. A common concern raised by most of reviewers is in the novelty of this work, in the sense that the method builds on a basic attack strategy (such as PGD) in the standard adversarial example setting. Authors responded to this, summarizing what's new in this paper. Episodic training for few-shot learners requires consuming support set (instead of single training data point). It is a nature of most meta-learning methods. Thus, it is easily expected that the adversarial attack for few-shot learners is naturally extended to poisoning a support set (or its subset) instead of a single data point. Certainly such extension may entail a new strategy. However, during the discussion period with reviewers, concerns on the novelty of such extension still remains. In particular, the few-shot learning algorithms do not allow big changes in the original model. The algorithms analyzed are prototypical networks that do not utilize fine-tuning, and MAML that fine-tunes for a small number of pre-fixed steps. So the transfer of adversarial samples may not be counted as a major contribution. + +",ICLR2021, +X1S0P5GNA,1576800000000.0,1576800000000.0,1,SylKikSYDH,SylKikSYDH,Paper Decision,Accept (Poster),"The paper proposes a ""compressive transformer"", an extension of the transformer, that keeps a compressed long term memory in addition to the fixed sized memory. Both memories can be queried using attention weights. Unlike TransfomerXL that discards the oldest memories, the authors propose to ""compress"" those memories. The main contribution of this work is that that it introduces a model that can handle extremely long sequences. The authors also introduces a new language modeling dataset based on text from Project Gutenberg that has much longer sequences of words than existing datasets. They provide comprehensive experiments comparing against different compression strategies and compares against previous methods, showing that this method is able to result in lower word-level perplexity. In addition, the authors also present evaluations on speech, and image sequences for RL. + +Initially the paper received weak positive responses from the reviewers. The reviewers pointed out some clarity issues with details of the method and figures and some questions about design decisions. After rebuttal, all of the reviewers expressed that they were very satisfied with the authors responses and increased their scores (for a final of 2 accepts and 1 weak accept). + +The authors have provided a thorough and well-written paper, with comprehensive and convincing experiments. In addition, the ability to model long-range sequences and dependencies is an important problem and the AC agrees that this paper makes a solid contribution in tackling that problem. Thus, acceptance is recommended.",ICLR2020, +PCU3R6raGmH,1610040000000.0,1610470000000.0,1,MG8Zde0ip6u,MG8Zde0ip6u,Final Decision,Reject,"This paper received 3 reviews with mixed initial ratings: 9,4,5. The main concerns of R1 and R3, who gave unfavorable scores, included lack of novelty and hence limited value of this work for the ML community. At the same time, R5 strongly advocates for acceptance and mentions meaningful contributions in the context of the specific application, including the new dataset. In response to that, the authors submitted a new revision and provided detailed responses to each of the reviews separately, which did not change the position of the reviewers. +The AC agrees with R1 and R3 that, even though the biometrics-related contributions are relevant, the scope of this work is too narrow and application-driven for presentation in the main track of ICLR. As a result, the final recommendation is reject.",ICLR2021, +rHvmfDP_tr,1576800000000.0,1576800000000.0,1,SJg4Y3VFPS,SJg4Y3VFPS,Paper Decision,Reject,"The authors propose Group Connected Multilayer Perceptron Networks which allow expressive feature combinations to learn meaningful deep representations. They experiment with different datasets and show that the proposed method gives improved performance. + +The authors have done a commendable job of replying to the queries of the reviewers and addresses many of their concerns. However, the main concern still remains: The improvements are not very significant on most datasets except the MNIST dataset. I understand the author's argument that other papers have also reported small improvements on these datasets and hence it is ok to report small improvements. However, the reviewers and the AC did not find this argument very convincing. Given that this is not a theoretical paper and that the novelty is not very high (as pointed out by R1) strong empirical results are accepted. Hence, at this point, I recommend that the paper cannot be accepted. + +",ICLR2020, +h2UFXR4cjRT,1642700000000.0,1642700000000.0,1,AXWygMvuT6Q,AXWygMvuT6Q,Paper Decision,Accept (Poster),"This paper proposes a framework for learning disentangled representations of content and style in an unsupervised way, using a permutation invariant network. It adopts VQ network for content encoding, and Cross-Attention for Style and Linking Attention at decoder. It is shown to be domain agonistic, working well in image and audio domain. Experiments are conducted on speech and image datasets. + +The paper is recommended as an accept (weak) to ICLR. The reviewers have given detailed feedback and suggestions -- please address them in the next revision of the paper.",ICLR2022, +bMSXeuQfN,1576800000000.0,1576800000000.0,1,BklekANtwr,BklekANtwr,Paper Decision,Reject," +The paper proposes to train LSTMs to encode car crashes (a temporal sequence of 3D mesh representations). Decoder LSTMs can then be used to 1) reconstruct the input or 2) predict the future sequence of structural geometry. The authors propose to use a spectral feature representation based on prior work as input into the encoding LSTM. The main contribution of the paper (based on the author response) is the introduction of this spectral feature representation to the ML community. The authors used single 3D truck model to generate 205 simulations, of which 105 was used for training, and 100 for testing. The authors presented reconstruction errors and TSNE visualization of the LSTM's reconstruction weights. + +Discussion Summary: +The paper got three weak rejects. The response provided by the authors failed to convince any of the reviewers to adjust their scores. The authors did not provide a revision based on the reviewer comments. + +Overall, the reviewers found the problem statement to be interesting. However, they had concerns about the following: +1. It's unclear what is the main technical contribution of the work. +Several of the reviewers pointed out the lack of technical novelty. From the writing, it's unclear if the proposed spectral feature representation is taken directly from prior work or there was some additional innovation in this submission. Based on the author response, it seems the proposed feature representation is taken directly from prior work as the authors themselves acknowledge that the submission is taking two known ideas and combining them. This can be made more explicit in the paper itself. + +2. Lack of comparison with existing work and experimental analysis +There is no comparison against existing work on predicting 3D structure deformation over time. While the proposed representation is interesting, the is no comparison with other methods or other alternative representations. Without any comparisons it is difficult to judge how the reconstruction error corresponding to actual reconstruction quality. How much error is acceptable? The submissions also fails to elucidate when the proposed representation should be used. Is it better than alternative representations (use 3D mesh directly? use point clouds? use alternate basis functions?) + +3. What is being learned by the model? +R3 pointed out that the authors mention that the model is trained in just half an hour and questioned whether the dynamics function is trivial to learn and that the only two parts of the 3D structure is analyzed. The authors responded that the ""coarse"" dynamic is easier to learn than the ""fine"" scale dynamics. Is what is learned by the model sufficient? How well would a model that just modeled the car as a rigid object and predicted the position do? The lack of comparison against baselines and alternative methods/representations makes it difficult to judge usefulness of the representation/approach that is presented. + +4. The paper also has minor typos. +Page 5: ""treat the for beams"" --> ""treat the four beams"" +Page 7: ""marrked"" --> ""marked"" + +Overall the paper addresses a interesting problem domain, and introduces a interesting representation to the ML community, but fails to do a proper experimental analysis showing how the representation compares to alternatives. Since the paper does not claim the novelty of the representation as its contribution, it is essential that it performs a thorough investigation of the task and perform empirical studies comparing the proposed representation/method against baselines and alternatives.",ICLR2020, +3BLb9ABsuIk,1610040000000.0,1610470000000.0,1,nEMiSX_ipXr,nEMiSX_ipXr,Final Decision,Reject,"The paper considers new notions of adversarial accuracy and risk which are called ""genuine"" with an aim to fix issues with the existing definitions in the literature. A number of issues in the paper, including lack of motivation and intuition, and poor formalism were identified by the reviewers. The paper also fails to cite some of the previous literature that has identified similar issues. The authors have only responded to some of the questions raised by the reviewers. ",ICLR2021, +uMu1stIs41,1576800000000.0,1576800000000.0,1,r1eWdlBFwS,r1eWdlBFwS,Paper Decision,Reject,"The paper proposes a hierarchical Bayesian model over multiple data sets that +has both data set specific as well as shared parameters. +The data set specific parameters are further encouraged to only capture aspects +that vary across data sets by an addition mutual information contribution to the +training loss. +The proposed method is compared to standard VAEs on multiple data sets. + +The reviewers agree that the main approach of the paper is sensible. However, +concerns were raised about general novelty, about the theoretical justification +for the proposed loss function and about the lack of non-trivial baselines. +The authors' rebuttal did not manage to full address these points. + +Based on the reviews and my own reading, I think this paper is slightly +below acceptance threshold.",ICLR2020, +1iFupR0wyE,1576800000000.0,1576800000000.0,1,SJgob6NKvH,SJgob6NKvH,Paper Decision,Accept (Poster),"This paper proposes RTFM, a new model in the field of language-conditioned policy learning. This approach is promising and important in reinforcement learning because of the difficulty to learn policies in new environments. + +Reviewers appreciate the importance of the problem and the effective approach. After the author response which addressed some of the major concerns, reviewers feel more positive about the paper. They comment, though, that presentation could be clearer, and the limitations of using synthetic data should be discussed in depth. + +I thank the authors for submitting this paper.",ICLR2020, +UWCmNRX6DhV,1610040000000.0,1610470000000.0,1,o29tNZZqGcN,o29tNZZqGcN,Final Decision,Reject,"The reviewers initially assessed this paper as slightly below the acceptance threshold. The reviewers seem to agree on the novelty and potential impact of this project, but they also highlighted the lack of clarity of the manuscript including lack of clarity in the method used to encode the graph data. + +As the authors noted, graph-related questions were the focus of most of the comments and questions from the reviewers. This is not because the reviewers did not understand and assess the method from the continual-learning side (I am also meta-reviewing several continual-learning papers and I believe that I can assess the novelty of this work). As I wrote above, reviewers were convinced of the paper's motivation. + +The authors provided good responses and discussed with at least one reviewer thoroughly. These interactions seem to have clarified important aspects of your proposed methodology and notably the properties of your graph-construction method. I found that your new results on larger datasets also provide an improvement. However, to be properly assessed, this number of clarifications regarding the core method requires a new round of reviews. The discussions have also highlighted some of the limits of your approach which do not seem to be acknowledged in your paper. This includes the discussion with reviewer2 regarding constraints on L & K, node classification (also I find that one to less important), and comparison to GraphSage on the non-lifelong learning scenario. + +Overall, and while I agree that continual learning from graph data is an important and unexplored problem, I also find that the current manuscript lacks clarity and, even though the ICLR discussion allowed reviewers to discuss these with the authors, there are still significant ways to improve the clarity of the current manuscript. As a result, I do not recommend acceptance of the current manuscript. + +I strongly suggest the authors keep on working on their manuscript as their idea seems to have potential and I would imagine that it may become one of the first works in a new interesting line of research.",ICLR2021, +MWQMaNzp8_-,1642700000000.0,1642700000000.0,1,LQCUmLgFlR,LQCUmLgFlR,Paper Decision,Reject,"This paper studies the problem of characterizing the optimal early stopping time in overparameterized learning as a function of model dimension and sample size. To do this the paper uses an explicit form of the gradient flow from prior work to present high probability bounds in the over-parameterized setting and characterizes various properties of the optimal stopping time. The authors also conduct various experiments to verify the theory. The reviews though the paper was interesting and insightful. They also raised some concerns about the (1)restrictiveness of the distributional assumptions, (2) poor explanation of the theoretical results, and (3) novelty with respect to other work and (4) other technical issues. The discussion and response mitigated these concerns but the reviewers decided to mostly keep their original score. My own reading of the paper is that there are good ideas in this paper and I agree with the authors that some of the technical issues raised by the reviewers is incorrect. However, it is also clear that the paper needs a bit more work to put it into the right context and also the proof need to be more clearly and carefully written before this paper can be accepted. Therefore I recommend rejection but encourage the authors to submit to a future ML venue after a thorough revision.",ICLR2022, +u_DJdrV1-4p,1610040000000.0,1610470000000.0,1,mOO-LfEVZK,mOO-LfEVZK,Final Decision,Reject,Two reviewers expressed clear concerns about the paper but the authors did not provide any response. ,ICLR2021, +Dh80LxCzyGA,1642700000000.0,1642700000000.0,1,fXHl76nO2AZ,fXHl76nO2AZ,Paper Decision,Accept (Poster),"The paper proposed an imputation free method to handle missing data by learning an input encoding matrix using RL with the prediction error as reward/penalty signal. Reviewers appreciate the interesting setup where RL is used to deal with missing data, and the method being imputation free. Three out of four reviewers (reviewer he3p, azSY, and 4Cb5) have raised concerns on the complexity of the proposed method, but it seems like all the reviewers see the strength of the work outweigh the weakness.",ICLR2022, +3Orygi4zw,1576800000000.0,1576800000000.0,1,ByliZgBKPH,ByliZgBKPH,Paper Decision,Reject,"The reviewers were not convinced about the significance of this work. There is no empirical or theoretical result justifying why this method has advantages over the existing methods. The reviewers also raised concerns related to the scalability of the proposal. Since none of the reviewers were enthusiastic about the paper, including the expert ones, I cannot recommend acceptance of this work.",ICLR2020, +H9JsiYdFZ9,1610040000000.0,1610470000000.0,1,kmBFHJ5pr0o,kmBFHJ5pr0o,Final Decision,Reject,"In this paper, the authors claim to propose a distributed large-batch adversarial training framework to robustify DNN. +Although the authors made efforts to clarify reviewers' concerns, it is clear that the authors still cannot convince some reviewers in several points after several rounds of discussion between reviewers and authors. + +The reviewers were not in consensus on acceptance and some concerns were still not clearly addressed in the rebuttal phase. +Hence, I recommend acceptance only if there is a room. +",ICLR2021, +BJR58kaBf,1517250000000.0,1517260000000.0,849,HyHmGyZCZ,HyHmGyZCZ,ICLR 2018 Conference Acceptance Decision,Reject,"This paper proposes a method for refining distributional semantic representation at the lexical level. The reviews are fairly unanimous in that they found both the initial version of the paper, which was deemed quite rushed, and the substantial revision unworthy of publication in their current state. The weakness of both the motivation and the experimental results, as well as the lack of a clear hypothesis being tested, or of an explanation as to why the proposed method should work, indicates that this work needs revision and further evaluation beyond what is possible for this conference. I unfortunately must recommend rejection.",ICLR2018, +r1eejX0lgE,1544770000000.0,1545350000000.0,1,H1f7S3C9YQ,H1f7S3C9YQ,Meta Review,Reject,"This paper presents a model to identify entity mentions that are synonymous. This could have utility in practical scenarios that handle entities. + +The main criticism of the paper is regarding the baselines used. Most of the baselines that are compared against are extremely simple. There is a significant body of literature that models paraphrase and entailment and many of those baselines are missing (decomposable attention, DIIN, other cross-attention mechanisms). Adding those experiments would make the experimental setup stronger. + +There is a bit of a disagreement between reviewers, but I agree with the two reviewers who point out the weakness of the experimental setup, and fixing those issues could improve the paper significantly.",ICLR2019,4: The area chair is confident but not absolutely certain +rJgTYBCxlN,1544770000000.0,1545610000000.0,1,Bkl87h09FX,Bkl87h09FX,perhaps not strong novelty but interesting insights based on extensive experiments on ELMO,Reject,"This paper presents an extensive empirical study to sentence-level pre-training. The paper compares pre-trained language models to other potential alternative pre-training options, and concludes that while pre-trained language models are generally stronger than other alternatives, the robustness and generality of the currently available method is less than ideal, at least with respect to ELMO-based pretraining. + +Pros: +The paper presents an extensive empirical study that offers new insights on pre-trained language models with respect to a variety of sentence-level tasks. + +Cons: +The primarily contributions of this paper is empirical and technical novelty is relatively weak. Also, the insights are based just on ELMO, which may have a relatively weak empirical impact. The reviews were generally positive but marginally positive, which reflect that insights are interesting but not overwhelmingly interesting. None of these is a deal-breaker per say, but the paper does not provide sufficiently strong novelty, whether based on insights or otherwise, relative to other papers being considered for acceptance. + +Verdict: +Leaning toward reject due to relatively weak novelty and empirical impact. + +Additional note on the final decision: +The insights provided by the paper are valuable, thus the paper was originally recommended for an accept. However, during the calibration process across all areas, it became evident that we cannot accept all valuable papers, each presenting different types of hard work and novel contributions. Consequently, some papers with mostly positive (but marginally positive) reviews could not be included in the final cut, despite their unique values, hard work, and novel contributions. ",ICLR2019,3: The area chair is somewhat confident +jeQSJYwzGvX,1610040000000.0,1610470000000.0,1,ml1LSu49FLZ,ml1LSu49FLZ,Final Decision,Reject,"This paper proposes enhancing contextualized word embeddings learned by Transformers by modeling long-range dependencies via a deep topic model, using a Poisson Gamma Belief Network (PGBN). The experimental results show incorporating topic information can further improve the performance of Transformers. While this is an interesting idea, reviewers pointed out some weaknesses: +- GLUE evaluation is not a test of long-term dependencies, it remains unclear whether providing topic information of preceding segments is enough to allow the model to draw information from these segments that is useful for a task. +- The improvement over the baseline does not seem to be significant. +- The ablation study could be improved and more experiments could be done to understand the effect of hyperparameters choices from the topic model, such as the number of layers of PGBN as well as the topic number of each layer. +- A comparison of the model performance for different lengths of input sequences would be helpful. +- There are many recent methods for long.range transformer transformer variants, it would be interesting to compare them against the proposed latent topic-based method. + +Unfortunately, no answers are provided by the authors to the questions asked by the reviewers, which makes me recommend rejection.",ICLR2021, +ZLeLaRpW6c,1642700000000.0,1642700000000.0,1,dHJtoaE3yRP,dHJtoaE3yRP,Paper Decision,Reject,"The paper proposes NAFS (Node-Adaptive Feature Smoothing), which constructs node representations by only using smoothing without parameter learning. The authors first provide a formulation for the smoothing operator. They then define over-smoothing distance to assess how much a node is close to the stationary state. Finally, they use the over-smoothing distance to calculate a smoothing weight for each node. Experiments are conducted to verify the efficacy. + +Strength +* The paper tackles the problem of over-smoothing, which is a well-known issue in GNN. +* The solution appears to be effective. +* The paper is generally clearly written. + +Weakness +* The novelty and significance of the work might not be enough. Aspects of the contributions exist in prior work. + +--- + +Additional experiments have been conducted during the rebuttal. The reviewers appreciate the efforts. + +After rebuttal: + +Reviewer SHxg increased the score accordingly. + +Reviewer w2Qg says “Given the concerns proposed by the other reviewers, I adjusted my score.” + +Reviewer YM4P says “I read the rebuttal and slightly increased my score.”",ICLR2022, +bWPj2DB5Q7,1576800000000.0,1576800000000.0,1,rJlnOhVYPS,rJlnOhVYPS,Paper Decision,Accept (Poster),"The paper proposes an unsupervised framework for domain adaptation in the context of person re-identification to reduce the effect of noisy labels. They use refined soft labels and propose a soft softmax-triplet loss to support learning with these soft labels. + +All reviewers have unanimously agreed to accept the paper and appreciated the comprehensive experiments on four datasets and ablation studies which give some insights about the proposed method. I agree with the assessment of the reviewers and recommend that this paper be accepted.",ICLR2020, +tSMNdV8M2Uk,1642700000000.0,1642700000000.0,1,5ECQL05ub0J,5ECQL05ub0J,Paper Decision,Accept (Poster),"This paper studies online learning using SGD with momentum for nonstationary data. For the specific setting of linear regression with Gaussian noise and oscillatory covariate shift, a linear oscillator ODE is derived that describes the dynamics of the learned parameters. This then allows analysis of convergence/divergence of learning for different settings of the learning rate and momentum. The theoretical results are validated empirically, and are shown to generalize to other settings such as those with other optimizers (Adam) or other models (neural nets). The reviewers praise the clear writing and the rigorous and systematic analysis. + +3 out of 4 reviewers recommend accepting the paper. The negative reviewer does not find the main contribution interesting and significant enough for acceptance. Although I think this is a reasonable objection, it is not shared by the other 3 reviewers. Since the negative reviewer does not point out any critical flaws in the paper, I think the positive opinions should outweight the negative one in this case. I therefore recommend accepting the paper.",ICLR2022, +Sk2XnzUul,1486400000000.0,1486400000000.0,1,Bygq-H9eg,Bygq-H9eg,ICLR committee final decision,Reject,The paper presents an evaluation of off-the-shelf image classification architectures. The findings are not too surprising and don't provide much new insight.,ICLR2017, +rJgq_VM_eV,1545250000000.0,1545350000000.0,1,ryepUj0qtX,ryepUj0qtX,"An intresting, novel approach to the network embedding problem on challenging graph structures, with uniformly better than state-of-art empirical results",Accept (Poster),"The conditional network embedding approach proposed in the paper seems nice and novel, and consistently outperforms state-of-art on variety of datasets; scalability demonstration was added during rebuttals, as well as multiple other improvements; although the reviewers did not respond by changing the scores, this paper with augmentations provided during the rebuttal appears to be a useful contribution worthy of publishing at ICLR. ",ICLR2019,4: The area chair is confident but not absolutely certain +SJefJYu-x4,1544810000000.0,1545350000000.0,1,H1eH4n09KX,H1eH4n09KX,"interesting approach, but results are not compelling enough",Reject,"The paper presents an algorithm for audio super-resolution using adversarial models along with additional losses, e.g. using auto-encoders and reconstruction losses, to improve the generation process. + +Strengths +- Proposes audio super resolution based on GANs, extending some of the techniques proposed for vision / image to audio. +- The authors improved the paper during the review process by including results from a user study and ablation analysis. + +Weaknesses +- Although the paper presents an interesting application of GANs for the audio task, overall novelty is limited since the setup closely follows what has been done for vision and related tasks, and the baseline system. This is also not the first application of GANs for audio tasks. +- Performance improvement over previously proposed (U-Net) models is small. It would have been useful to also include UNet4 in user-study, as one of the reviewers’ pointed out, since it sounds better in a few cases. +- It is not entirely clear if the method would be an improvement of state-of-the-art audio generative models like Wavenet. + +Reviewers agree that the general direction of this work is interesting, but the results are not compelling enough at the moment for the paper to be accepted to ICLR. Given these review comments, the recommendation is to reject the paper.",ICLR2019,5: The area chair is absolutely certain +rJIVHypSG,1517250000000.0,1517260000000.0,545,ryY4RhkCZ,ryY4RhkCZ,ICLR 2018 Conference Acceptance Decision,Reject,"Meta score: 4 + +The paper concerns the development of a density network for estimating uncertainty in recommender systems. The submitted paper is not very clear and it is hard to completely understand the proposed method from the way it is presented. This makes assessing the contribution of the paper difficult. + +Pros: + - addresses an interesting and important problem + - possible novel contribution + +Cons: + - poorly written, hard to understand precisely what is done + - difficult to compare with the state-of-the-art, not helped by disorganised literature review + - experimentation could be improved + +The paper needs more work before being ready for publication.",ICLR2018, +lpT9eHmJcb,1642700000000.0,1642700000000.0,1,GQd7mXSPua,GQd7mXSPua,Paper Decision,Accept (Poster),"Reviewers all found the work well-motivated in addressing uncertainty, a topic that has not seen much focus in meta-learning and few-shot learning. They describe the challenges well: small sample sizes and OOD shift. They then propose a solution they find works well empirically to overcome these challenges based on a set encoder and an energy function respectively. + +The proposal is largely one of engineering components that have been found to work well in the literature. I'm sympathetic to this style of research (particularly in today's neural network research), although the reviewers raise a primary concern about whether the choices leading to the proposal are justified. In particular, two Reviewers argue that there are no clear ablations compared to alternative simpler approaches, and so the approach of selecting a Set Transformer is rather arbitrary. My perspective is that theory provides one sufficient but not necessary angle to do this, and I do find the authors' replies to the two reviewers convincing. In particular, they add a baseline to estimate covariances suggested by Reviewer Zz5v and they describe how the current baselines do in fact use the shrinkage suggestion by Reviewer QrCN. + +I recommend the authors use the reviewers' feedback to enhance their submission's clarity and overall quality.",ICLR2022, +rksHNkprG,1517250000000.0,1517260000000.0,347,r1lfpfZAb,r1lfpfZAb,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"I (and some of the reviewers) find the general motivation quite interesting (operationalizing the Gricean maxims in order to improve language generation). However, we are not convinced that the actual model encodes these maxims in a natural and proper way. Without this motivation, the approach can be regarded as a set of heuristics which happen to be relatively effective on a couple of datasets. In other words, the work seems too preliminary to be accepted at the conference. + +Pros: +-- Interesting motivation (and potential impact on follow-up work) +-- Good results on a number of datasets +Cons: +-- The actual approach can be regarded as a set of heuristics, not necessarily following from the maxims +-- More serious evaluation needed (e.g., image captioning or MT) and potential better ways of encoding the maxims + +It is suitable for the workshop track, as it is likely to stimulate an interesting discussion and more convincing follow-up work. + +",ICLR2018, +Cp0gXjcBB,1576800000000.0,1576800000000.0,1,ryg7jhEtPB,ryg7jhEtPB,Paper Decision,Reject,"The authors argue that directly optimizing the IS proposal distribution as in RWS is preferable to optimizing the IWAE multi-sample objective. They formalize this with an adaptive IS framework, AISLE, that generalizes RWS, IWAE-STL and IWAE-DREG. + +Generally reviewers found the paper to be well-written and the connections drawn in this paper interesting. However, all reviewers raised concerns about the lack of experiments (Reviewer 3 suggested several experiments that could be done to clarify remaining questions) and practical takeaways. + +The authors responded by explaining that ""the main ""practical"" takeaway from our work is the following: If one is interested in the bias-reduction potential offered by IWAEs over plain VAEs then the adaptive importance-sampling framework appears to be a better starting point for designing new algorithms than the specific multi-sample objective used by IWAE. This is because the former retains all of the benefits of the latter without inheriting its drawbacks."" I did not find this argument convincing as a primary advantage of variational approaches over WS is that the variational approach optimizes a unified objective. At least in principle, this is a serious drawback of the WS approaches. Experiments and/or a discussion of this is warranted. + +This paper is borderline, and unfortunately, due to the high number of quality submissions this year, I have to recommend rejection at this point. +",ICLR2020, +t3ThYgE2pw,1576800000000.0,1576800000000.0,1,rJe4_xSFDB,rJe4_xSFDB,Paper Decision,Accept (Poster),"This paper improves upper bound estimates on Lipschitz constants for neural networks by converting the problem into a polynomial optimization problem. The proposed method also exploits sparse connections in the network to decompose the original large optimization problem into smaller ones that are more computationally tractable. The bounds achieved by the method improve upon those found from a quadratic program formulation. The method is tested on networks with random weights and networks trained on MNIST and provides better estimates than the baselines. + +The reviews and the author discussion covered several topics. The reviewers found the paper to be well written. The reviewers liked that tighter bounds on the Lipschitz constants can be found in a computationally efficient manner. They also liked that the method was applied to a real-world dataset, though they noted that the sizes of the networks analyzed here are smaller than the ones in common use. The reviewers pointed out several ways that the paper could be improved. The authors adopted these suggestions including additional comparisons, computation time plots, error bars, and relevant references to related work. The reviewers found the discussion and revised paper addressed most of their concerns. + +This paper improves on existing methods for analyzing neural network architectures and it should be accepted.",ICLR2020, +fMw0PHMXls,1576800000000.0,1576800000000.0,1,rJgsskrFwH,rJgsskrFwH,Paper Decision,Accept (Spotlight),"This paper presents an approach for scalable autoregressive video generation based on a three-dimensional self-attention mechanism. As rightly pointed out by R3, the proposed approach ’is individually close to ideas proposed elsewhere before in other forms ... but this paper does the important engineering work of selecting and combining these ideas in this specific video synthesis problem setting.’ +The proposed method is relevant and well-motivated, and the experimental results are strong. All reviewers agree that experiments on the Kinetics dataset are particularly appealing. In the initial evaluation, the reviewers have raised several concerns such as performance metrics, ablation study, training time comparison, empirical evaluation of the baseline methods on Kinetics, that were addressed by the authors in the rebuttal. +In conclusion, all three reviewers were convinced by the author’s rebuttal, and AC recommends acceptance of this paper – congratulations to the authors!",ICLR2020, +SonXgBfOmfL,1610040000000.0,1610470000000.0,1,QB7FkNVAfxa,QB7FkNVAfxa,Final Decision,Reject,"The authors provide a new analysis of learning of two-layer linear networks with gradient flow, leading to some novel optimization and generalization guarantees incorporating a notion of the imbalance in the weights. While there was some diversity of opinion, the prevailing view was that the results were not sufficiently significant for publication in ICLR.",ICLR2021, +rJeQjWJMeE,1544840000000.0,1545350000000.0,1,rJlHIo09KQ,rJlHIo09KQ,Interesting approach but somewhat limited analysis,Reject,"This paper proposes to unroll power iterations within a Slow-Feature-Analysis learning objective in order to obtain a fully differentiable slow feature learning system. Experiments on several datasets are reported. + +This is a borderline submissions, with reviewers torn between acceptance and rejection. They were generally positive about the clarity and simplicity of the presentation, whereas they raised concerns about the relative lack of novelty (especially related to the recent SpIN model), as well as the current limitations of the approach on large-scale problems. Reviewers also found authors to be responsive and diligent during the rebuttal phase. The AC agrees with this assessment, and therefore recommends rejection at this time, encouraging the authors to resubmit to the next conference cycle after addressing the above points. ",ICLR2019,4: The area chair is confident but not absolutely certain +BUCtqYo2qIB,1642700000000.0,1642700000000.0,1,xS8AMYiEav3,xS8AMYiEav3,Paper Decision,Accept (Poster),"Reviewers were almost unanimous in favor of this paper, with scores of 5,8,6,8. +I think it's a neat idea and am inclined to accept despite some issues w/ motivation / scalability. +Science proceeds in increments, and it's OK to propose something with scalability issues that someone else later tries to fix, etc.",ICLR2022, +LtsIRbTdWmJ,1642700000000.0,1642700000000.0,1,R11xJsRjA-W,R11xJsRjA-W,Paper Decision,Reject,"Motivated by the connections between privacy and generalization, this paper studies the correlation between MI attack accuracy and OOD accuracy on synthetic and real-world datasets. It shows that the measurements are not always correlated. I found the connection between the motivation and actual measurements performed in the experiments to be rather tenuous. Therefore it is hard to draw any insightful conclusions from the empirical results. It should also be noted that somewhat related disconnect between accuracy of MIA and generalization has already been observed in prior work.",ICLR2022, +8efzvgGukX,1576800000000.0,1576800000000.0,1,r1eX1yrKwB,r1eX1yrKwB,Paper Decision,Reject,"This paper addresses the problem of unsupervised domain adaptation and proposes explicit modeling of the source and target feature distributions to aid in cross-domain alignment. + +The reviewers all recommended rejection of this work. Though they all understood the paper’s position of explicit feature distribution modeling, there was a lack of understanding as to why this explicit modeling should be superior to the common implicit modeling done in related literature. As some reviewers raised concern that the empirical performance of the proposed approach was marginally better than competing methods, this experimental evidence alone was not sufficient justification of the explicit modeling. There was also a secondary concern about whether the two proposed loss functions were simultaneously necessary. + +Overall, after reading the reviewers and authors comments, the AC recommends this paper not be accepted. ",ICLR2020, +9ckjYt9BJ,1576800000000.0,1576800000000.0,1,HJeOekHKwr,HJeOekHKwr,Paper Decision,Accept (Poster),"The paper provides a theoretical study of what regularizations should be used in GAN training and why. The main focus is that the conditions on the discriminator that need to be enforced, to get the Lipshitz property of the corresponding function that is optimized for the generator. Quite a few theorems and propositions are provided. As noted by Reviewer3, this adds insight to well-known techniques: the Reviewer1 rightfully notes that this does not lead to any practical conclusion. +Moreover, then training of GANs never goes to the optimal discriminator, that could be a weak point; rather than it proceeds in the alternating fashion, and then evolution is governed by the spectra of the local Jacobian (which is briefly mentioned). This is mentioned in future work, but it is not clear at all if the results here can be helpful (or can be generalized). + At some point of the paper it gets to ""more theorems mode"" which make it not so easy and motivating to read. +The theoretical results at the quantitative level are very interesting. I have looked for a long time on Figure 1: does this support the claims? First my impression was it does not (there are better FID scores for larger learning rates). But in the end, I think it supports: the convergence for a smaller that $\gamma_0$ learning rate to the same FID indicated the convergence to the same local minima (probably). This is perfectly fine. Oscillations afterwards move us to a stochastic region, where FID oscillates. So, the theory has at least minor confirmation. + +",ICLR2020, +rkGdIy6HM,1517250000000.0,1517260000000.0,813,HJWGdbbCW,HJWGdbbCW,ICLR 2018 Conference Acceptance Decision,Reject,"While the reviewers agree that this paper does provide a contribution, it is small and does overlap with several concurrent works. it is a bit hand-engineered. +The authors have provided a lengthy rebuttal, but the final reviews are not strong enough. ",ICLR2018, +XgDWTjS7hm,1576800000000.0,1576800000000.0,1,SJgvl6EFwH,SJgvl6EFwH,Paper Decision,Reject,"This paper presents a conditional CNF based on the InfoGAN structure to improve ODE solvers. Reviewers appreciate that the approach shows improved performances over the baseline models. + +Reviewers all note, however, that this paper is weak in clearly defining the problem and explaining the approach and the results. While the authors have addressed some of the reviewers concerns through their rebuttal, reviewers still remain concerned about the clarity of the paper. + +I thank the authors for submitting to ICLR and hope to see a revised paper at a future venue.",ICLR2020, +S39Mmu6hZ,1576800000000.0,1576800000000.0,1,B1xybgSKwB,B1xybgSKwB,Paper Decision,Reject,"The paper introduces a novel approach to transfer learning in RL based on credit assignment. The reviewers had quite diverse opinions on this paper. The strength of the paper is that it introduces an interesting new direction for transfer learning in RL. However, there are some questions regarding design choices and whether the experiments sufficiently validate the idea (i.e., the sensitivity to hyperparameters is a question that is not sufficiently addressed). Overall, this research has great potential. However, a more extensive empirical study is necessary before it can be accepted.",ICLR2020, +POb6Prd9u,1576800000000.0,1576800000000.0,1,SJxhNTNYwB,SJxhNTNYwB,Paper Decision,Accept (Poster),This paper proposes a new black-box adversarial attack approach which learns a low-dimensional embedding using a pretrained model and then performs efficient search in the embedding space to attack target networks. The proposed approach can produce perturbation with semantic patterns that are easily transferable and improve the query efficiency in black-box attacks. All reviewers are in support of the paper after author response. I am very happy to recommend accept. ,ICLR2020, +jbc0bHlS-R,1576800000000.0,1576800000000.0,1,Sklyn6EYvH,Sklyn6EYvH,Paper Decision,Reject,"This paper that defines a “Residual learning” mechanism as the training regime for variational autoencoder. The method gradually activates individual latent variables to reconstruct residuals. + +There are two main concerns from the reviewers. First, residual learning is a common trick now, hence authors should provide insights on why residual learning works for VAE. The other problem is computational complexity. Currently, reviews argue that it seems not really fair to compare to a bruteforce parameter search. The authors’ rebuttal partially addresses these problems but meet the standard of the reviewers. + +Based on the reviewers’ comments, I choose to reject the paper. +",ICLR2020, +LF1xdc47TQ,1576800000000.0,1576800000000.0,1,H1lXCaVKvS,H1lXCaVKvS,Paper Decision,Reject,"One of the reviewers pointed out similarity to existing very recent work which would require significant reframing of the current paper. Hence, this work is below the bar at the moment.",ICLR2020, +X4Soaje2gJl,1610040000000.0,1610470000000.0,1,AWOSz_mMAPx,AWOSz_mMAPx,Final Decision,Accept (Poster),"This paper treats the problem of running gradient descent-ascent (GDA) in min-max games with a different step-size for the two players. Earlier work by Jin et al. has shown that, when the ratio of the step-sizes is large enough, the stable fixed points of GDA coincide with the game's strict local min-max equilibria. The main contribution of this paper is an explicit characterization of a threshold value $\tau^*$ of this ratio as the maximum eigenvalue of a specific matrix that involves the second derivatives of the game's min-max objective at each (strict local) equilibrium. + +This paper generated a fairly intense discussion, and the reviewers showed extraordinary diligence in assessing the authors' work. Specifically, the reviewers raised a fair number of concerns concerning the initial write-up of the paper, but these concerns were mostly addressed by the authors in their revision and replies. As a result, all reviewers are now in favor of acceptance. + +After my own reading of both versions of the paper and the corresponding discussion, I concur with the reviewers' view and I am recommending acceptance subject to the following revisions for the final version of the paper: +1. Follow the explicit recommendations of AnonReviewer3 regarding the numerical simulations (or, failing that, remove them altogether). [The authors' phrase that ""The theory we provide also does not strictly apply to using RMSprop"" does not suffice in this regard] +2. Avoid vague statements like $\tau \to \infty$ in the introduction regarding the work of Jin et al. and state precisely their contributions in this context. In the current version of the paper, a version of this is done in page 4, but the introduction is painting a different picture, so this discussion should be transferred there. +3. A persisting concern is that the authors' characterization of $\tau^*$ cannot inform a practical choice of step-size scaling (because the value of $\tau^*$ derived by the authors depends on quantities that cannot be known to the optimizer). Neither the reviewers nor myself were particularly convinced by the authors' reply on this point. However, this can also be seen as an ""equilibrium refinement"" result, i.e., for a given value of $\tau$ only certain equilibria can be stable. I believe this can be of interest to the community, even though the authors' characterization cannot directly inform the choice of $\gamma_1$ and $\gamma_2$ (or their ratio). + +Modulo the above remarks (which the authors should incorporate in their paper), I am recommending acceptance.",ICLR2021, +ssq7UfuK1C-,1642700000000.0,1642700000000.0,1,gFDFKC4gHL4,gFDFKC4gHL4,Paper Decision,Accept (Poster),"The paper studies real world ML APIs' performance shifts due to API updates/retraining and proposes a framework to efficiently estimate those shifts. The problem is very important and the presented approach definitely novel. My concern is about limited novelty of the theoretical analysis and weak experimental evaluation (just two dates, limited number of systems tested, small number of ablations). As of now the paper looks like an interesting but unfinished proposal. Looking forward to the discussion between the authors and the reviewers to address the concerns. + +In the rebuttal, the authors have addressed reviewers' comments, in particular by adding additional experiments that strengthen the paper. All the reviewers recommend the paper to be accepted. It is suggested that in the camera-ready version the authors will add additional details regarding the experiments, as some of the reviewers mentioned.",ICLR2022, +SOTyFRjZRg_,1642700000000.0,1642700000000.0,1,AsQz_GFFDQp,AsQz_GFFDQp,Paper Decision,Reject,"The paper studies two aspects of personalized federated learning: (1) Clients having their own labeling scheme. (2) Domain heterogeneity across clients. They propose a way to collaborate across clients by similarity matching. The key novelty is to measure similarity of client pairs, based on on how much their representation layer agrees (measured with cosine similarity). A second novelty is a low-rank factorization of model weights. Empirical evaluations show wins on MNIST, CIFAR10, 100. + +Reviewers had various grave concerns. On the method side, they were concerned that thee is not enough theoretical insight and analysis of the proposed approach, esp. the kernel factorization and its effect. On the empirical side, they were concerned that comparisons were not made with most recent baselines. There was a large number of PFL approaches published in 2021. e.g. FedBN. Among these, its worth noting pFedHN (ICML2021) which actually discussed the case of heterogeneous (permuted) labels (their Sec 3.3). + +In a discussion, reviewers appreciated the responses by the authors, the additional experiments and ablation studies. Unfortunately however, they found that the paper is not ready for publication in ICLR.",ICLR2022, +6pxNd8dS-bl,1642700000000.0,1642700000000.0,1,tx4qfdJSFvG,tx4qfdJSFvG,Paper Decision,Reject,"This paper presents a Feature Propagation (FP) method for dealing with missing features in graph learning tasks. The FP method is based on minimization of the Dirichlet energy and leads to a diffusion-type differential equation on the graph. Empirical results demonstrated the effectiveness. However, after rebuttal major concerns still remain on the novelty and siginificance, in particular, the connection with label propogation should be better elaborated, which is crucial to understand the contributions of this paper. Considering that, I can't recommend accept the current manuscript. The authors are encouraged to further improve for a more solid publication in the future.",ICLR2022, +rJeF4J6HG,1517250000000.0,1517260000000.0,393,Bya8fGWAZ,Bya8fGWAZ,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"This paper and reviews makes for a difficult call. The reviewers appear to be in agreement that Value Propagation provides an interesting algorithmic advance over earlier work on Value Iteration networks. AnonReviewer1 gives a strong rationale why the advance is both original and significant. Their experiments also show very nice results with VProp and MVProp in 2-D grid-worlds. + +However, I also fully agree with AnonReviewer2 that testing in other domains beyond 2-D grid-world is necessary. Earlier work on VIN was also tested on a Mars Rover / continuous control domain, as well as graph-based web navigation task. The authors' rebuttal on this point comes across as weak. In their view, they can't tackle real-world domains until VProp has been proven effective in large, complex grid-worlds. I don't buy this at all -- they could start initial experiments right away, which would perhaps yield some surprising results. Given this analysis, the committee recomments this paper for workshop. + +Pros: significant algorithmic advance, good technical quality and writeup, nice results in 2-D grid world. + +Con: Validation is only in 2-D grid-world domains. ",ICLR2018, +dn2bTg2QzM,1610040000000.0,1610470000000.0,1,5jzlpHvvRk,5jzlpHvvRk,Final Decision,Accept (Poster),"This paper received borderline scores but overall lean positive. + +The reviewers point out that the paper presents interesting new ideas and an effective solution to the problem of automatically searching for loss functions. The empirical results are convincing, although the baselines are not the strongest possible in terms of absolute performance. Overall, the ACs find that the paper has sufficient novelty and technical contribution to be accepted. ",ICLR2021, +SJ5y6MUdx,1486400000000.0,1486400000000.0,1,rJ0-tY5xe,rJ0-tY5xe,ICLR committee final decision,Accept (Poster),"The program committee appreciates the authors' response to concerns raised in the reviews. While there are some concerns with the paper that the authors are strongly encouraged to address for the final version of the paper, overall, the work has contributions that are worth presenting at ICLR.",ICLR2017, +ObEr38Q6Qd,1576800000000.0,1576800000000.0,1,BylWglrYPH,BylWglrYPH,Paper Decision,Reject,"Thanks for clarifying several issues raised by the reviewers, which helped us understand the paper. + +After all, we decided not to accept this paper due to the weakness of its contribution. I hope the updated comments by the reviewers help you strengthen your paper for potential future submission.",ICLR2020, +BJgvmyaSf,1517250000000.0,1517260000000.0,152,HktJec1RZ,HktJec1RZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"this submission introduces soft local reordering to the recently proposed SWAN layer [Wang et al., 2017] to make it suitable for machine translation. although only in small-scale experiments, the results are convincing.",ICLR2018, +eQV7LbDISS,1610040000000.0,1610470000000.0,1,ASAJvUPWaDI,ASAJvUPWaDI,Final Decision,Reject,"All reviewers feel this paper addresses and important topic, and has many merits. However, it is difficult to recommend publication at this time. The primary concern is that the paper has its theoretical optimality as an important contribution, but the reviewers and myself (in a non-public thread) were unable to verify the correctness of the proofs. In part unfortunately this is due to edits to the proofs happening late in the revision period, too late for further discussion with the authors. Some of the particular questions in the proof of theorem 1 (appendix B) include: clarifying the value of $\rho$ which makes the unnumbered equation above equation (6) equivalent to definition 1, and in particular whether the $1/|X_k|$ term should be inside or outside the absolute value; and clarifying various undefined symbols which are introduced in the equation at the top of page 13, but are never defined, including $M$, $b$, and $z_i$. Reviewers also had some concern that the algorithm should be benchmarked against more recent / better performant baselines than Kamiran et al. (2012).",ICLR2021, +xAis1gA73k_,1642700000000.0,1642700000000.0,1,ySQH0oDyp7,ySQH0oDyp7,Paper Decision,Accept (Poster),"This paper proposes a simple, theoretically motivated approach for post-training quantization. The authors justify its effectiveness with both a sound theoretical analysis, and strong empirical results across many tasks and models, including a state-of-the-art result for 2-bit quantized weights/activations. All reviewers agreed the paper is worth accepting, with 3/4 rating it as a clear accept following the discussion period, and the fourth reviewer not giving strong reasons not to accept.",ICLR2022, +Q1ImzIjMYQ,1576800000000.0,1576800000000.0,1,SJx0q1rtvS,SJx0q1rtvS,Paper Decision,Accept (Poster),"Thanks for the submission. This paper leverages the stability of differential privacy for the problems of anomaly and backdoor attack detection. The reviewers agree that this application of differential privacy is novel. The theory of the paper appears to be a bit weak (with very strong assumptions on the private learner), although it reflects the basic underlying idea of the detection technique. The paper also provides some empirical evaluation of the technique.",ICLR2020, +bKIuvP943r,1576800000000.0,1576800000000.0,1,ryedqa4FwS,ryedqa4FwS,Paper Decision,Reject,"This paper introduces a NAS algorithm based on multi-agent optimization, treating each architecture choice as a bandit and using an adversarial bandit framework to address the non-stationarity of the system that results from the other bandits running in parallel. + +Two reviewers ranked the paper as a weak accept and one ranked it as a weak reject. The rebuttal answered some questions, and based on this the reviewers kept their ratings. The discussion between reviewers and AC did not result in a consensus. The average score was below the acceptance threshold, but since it was close I read the paper in detail myself before deciding. + +Here is my personal assessment: + +"" +Positives: +1. It is very nice to see some theory for NAS, as there isn't really any so far. The theory for MANAS itself does not appear to be very compelling, since it assumes that all but one bandit is fixed, i.e., that the problem is stationary, which it clearly isn't. But if I understand correctly, MANAS-LS does not have that problem. (It would be good if the authors could make these points more explicit in future versions.) + +2. The absolute numbers for the experimental results on CIFAR-10 are strong. + +3. I welcome the experiments on 3 additional datasets. + +Negatives: +1. The paper crucially omits a comparison to random search with weight sharing (RandomNAS-WS) as introduced by Li & Talwalkar's paper ""Random Search and Reproducibility for Neural Architecture Search"" (https://arxiv.org/abs/1902.07638), on arXiv since February and published at UAI 2019. This method is basically MANAS without the update step, using a uniform random distribution at step 3 of the algorithm, and therefore would be the right baseline to see whether the bandits are actually learning anything. RandomNAS-WS has the same memory improvements over DARTS as MANAS, so this part is not new. Similarly, there is GDAS as another recent approach with the same low memory requirement: http://openaccess.thecvf.com/content_CVPR_2019/html/Dong_Searching_for_a_Robust_Neural_Architecture_in_Four_GPU_Hours_CVPR_2019_paper.html +This is my most important criticism. + +2. I think there may be a typo somewhere concerning the runtimes of MANAS. It would be extremely surprising if MANAS truly takes 2.5 times longer when run with 20 cells and 500 epochs than when run with 8 cells and 50 epochs. It would make sense if MANAS gets 2.5 slower when just going from 8 to 20 cells, but when going from 50 to 500 epochs the cost should go up by another factor of 10. And the text states specifically that ""for datasets other than ImageNet, we use 500 epochs during the search phase for architectures with 20 cells, 400 epochs for 14 cells, and 50 epochs for 8 cells"". Therefore, I think either that text is wrong or MANAS got 10x more budget than DARTS. + +3. Figure 2 shows that on Sport-8, MANAS actually does *significantly worse* when searching on 14 cells than on 8 cells (note the different scale of the y axis). It's also slightly better with 8 cells on MIT-67. I recommend that the authors discuss this in the text and offer some explanation, rather than have the text claim that 14 cells are better and the figure contradict this. Only for MANAS-LS, the 14-cell version actually works better. + +4. The authors are unclear about whether they compare to random search or random sampling. These are two different approaches. Random sampling (as proposed by Sciuto et al, 2019) takes a single random architecture from the search space and compares to that. Standard random search iteratively samples N random architectures and evaluates them (usually on some proxy metric), selecting and retraining the best one found that way. The number N is chosen for random search to use the same computational resources as the method being compared. The authors call their method random search but then appear to be describing random sampling. + +Also, with several recent papers showcasing problems in NAS evaluation (many design decisions affect NAS performance), it would be a big plus to have code available to ensure reproducibility. Many ICLR papers are submitted with an anonymized code repository, and if possible, I would encourage the authors to do this for a future version. +"" + +The prior rating based on the reviewers was slightly below the acceptance threshold, and my personal judgement did not push the paper above the acceptance threshold. I encourage the authors to improve the paper by addressing the reviewer's points and the points above and resubmit to a future venue. Overall, I believe this is very interesting work and am looking forward to a future version.",ICLR2020, +OvsQ_u7sg8,1576800000000.0,1576800000000.0,1,S1xaf6VFPB,S1xaf6VFPB,Paper Decision,Reject,"The authors present a neural framework for learning SAT solvers that takes the form of probabilistic inference. The whole process consists of propagation, decimation and prediction steps, so unlike other prior work like Neurosat that learns to predict sat/unsat and only through this binary signal, this work presents a more modular approach, which is learned via energy minimization and it aims at predicting assignments (the assignments are soft which give rise to a differentiable loss). On the other hand, at test time the method returns the first soft assignment whose hard version (obtained by thresholding) satisfies the formula. Reviewers found this to be an interesting submission, however there were some concerns regarding (among others) comparison to previous work. + +Overall, this submission has generated a lot of discussion among the reviewers (also regarding how this model actually operates) and it is currently borderline without a strong champion. Due to the concerns raised and the limited space in the conference's program, unfortunately I cannot recommend this work for acceptance. +",ICLR2020, +s51GZ-g4ee,1576800000000.0,1576800000000.0,1,SJeWHlSYDB,SJeWHlSYDB,Paper Decision,Reject,"This paper studies spread divergence between distributions, which may exist in settings where the divergence between said distributions does not. The reviewers feel this work does not have sufficient technical novelty to merit acceptance at this time.",ICLR2020, +4mOAMP_p_,1576800000000.0,1576800000000.0,1,BJlnmgrFvS,BJlnmgrFvS,Paper Decision,Reject,"The authors propose a novel algorithm for batch RL with offline data. The method is simple and outperforms a recently proposed algorithm, BCQ, on Mujoco benchmark tasks. + +The main points that have not been addressed after the author rebuttal are: +* Lack of rigor and incorrectness of theoretical statements. Furthermore, there is little analysis of the method beyond the performance results. +* Non-standard assumptions/choices in the algorithm without justification (e.g., concatenating episodes). +* Numerous sloppy statements / assumptions that are not justified. +* No comparison to BEAR, making it challenging to evaluate their state-of-the-art claims. +The reviewers also point out several limitations of the proposed method. Adding a brief discussion of these limitations would strengthen the paper. + +The method is interesting and simple, so I believe that the paper has the potential to be a strong submission if the authors incorporate the reviewers suggestions in a future submission. However, at this time, the paper falls below the acceptance bar.",ICLR2020, +HylFpbzsy4,1544390000000.0,1545350000000.0,1,ryf7ioRqFX,ryf7ioRqFX,a simple but well motivated trick for stabilizing LSTM optimization,Accept (Poster),"This paper presents a method for preventing exploding and vanishing gradients in LSTMs by stochastically blocking some paths of the information flow (but not others). Experiments show improved training speed and robustness to hyperparameter settings. + +I'm concerned about the quality of R2, since (as the authors point out) some of the text is copied verbatim from the paper. The other two reviewers are generally positive about the paper, with scores of 6 and 7, and R1 in particular points out that this work has already had noticeable impact in the field. While the reviewers pointed out some minor concerns with the experiments, there don't seem to be any major flaws. I think the paper is above the bar for acceptance. +",ICLR2019,5: The area chair is absolutely certain +h1l5jl__g,1576800000000.0,1576800000000.0,1,HJgpugrKPS,HJgpugrKPS,Paper Decision,Accept (Poster),"This work presents a theory for building scale-equivariant CNNs with steerable filters. The proposed method is compared with some of the related techniques . SOTA is achieved on MNIST-scale dataset and gains on STL-10 is demonstrated. The reviewers had some concern related to the method, clarity, and comparison with related works. The authors have successfully addressed most of these concerns. Overall, the reviewers are positive about this work and appreciate the generality of the presented theory and its good empirical performance. All the reviewers recommend accept.",ICLR2020, +5Z_7HMzo-tR,1610040000000.0,1610470000000.0,1,6isfR3JCbi,6isfR3JCbi,Final Decision,Accept (Poster),"This paper provides a privacy-preserving method to boost the sample quality after training a GAN. The reviewers were unanimous that this paper should be presented at ICLR, with an important contribution to privacy-preserving GANs.",ICLR2021, +n9YRQhUqhMI,1642700000000.0,1642700000000.0,1,0n1UvVzW99x,0n1UvVzW99x,Paper Decision,Reject,"An algorithm for learning prototype based nearest neighbor regression model is presented. This algorithm minimizes an MSE on training examples w.r.t. the prototype centers and the prototype outputs by a block coordinate descent. The main contribution is the optimization algorithm finding the prototypes. +Major concerns in the reviews include missing mathematical rigor, poor description of the experiments, and unclear novelty. From my own reading I would like to add that the main theoretical contribution (Theorem 1) makes assumptions that are beyond any reasonable constraint, in particular as we know for more than 40 years, that such kind of assumptions are superfluous for many, many other algorithms. + +In summary, a clear reject.",ICLR2022, +4Da-RPFlSK9,1642700000000.0,1642700000000.0,1,B72HXs80q4,B72HXs80q4,Paper Decision,Accept (Poster),"In this paper, the authors introduce a simple mixture-of-experts model, by greatly simplifying the routing mechanism: experts are randomly activated both at train and inference time. A consistency loss function is added for training the proposed models, enforcing all experts to make consistent predictions. The proposed method, called THOR, is evaluated on machine translation tasks, including multi-lingual MT, and outperforms the recently proposed Switch Transformer MoE. + +The reviews note that the paper is well written and easy to follow, and that the proposed method is simple. While the results look promising, the reviewers also raised concerns regarding comparisons to previous work, some of which were addressed in the rebuttal. Finally, a reviewer raised the concern that this method is related to ensembles, which work well for machine translation, but are not discussed or compared to. For these reasons, I believe that the paper is borderline, leaning toward acceptance.",ICLR2022, +M-tH3al04L,1576800000000.0,1576800000000.0,1,HkxdQkSYDB,HkxdQkSYDB,Paper Decision,Accept (Poster),"The work proposes a graph convolutional network based approach to multi-agent reinforcement learning. This approach is designed to be able to adaptively capture changing interactions between agents. Initial reviews highlighted several limitations but these were largely addressed by the authors. The resulting paper makes a valuable contribution by proposing a well-motivated approach, and by conducting extensive empirical validation and analysis that result in novel insights. I encourage the authors to take on board any remaining reviewer suggestions as they prepare the camera ready version of the paper.",ICLR2020, +EBa9tvs6Aw,1576800000000.0,1576800000000.0,1,r1nSxrKPH,r1nSxrKPH,Paper Decision,Reject,"The submission proposes a complex, hierarchical architecture for continuous control RL that combines Hindsight Experience Replay, vision-based planning with privileged information, and low-level control policy learning. The authors demonstrate that the approach can achieve transfer of the different control levels between different bodies in a single environment. + +The reviewers were initially all negative, but 2 were persuaded towards weak acceptance by the improvements to the paper and the authors' rebuttal. The discussion focused on remaining limitations: the use of a single maze environment for evaluation, as well as whether the baselines were fair (HAC in particular). After reading the paper, I believe that these limitations are substantial. In particular, this is not a general approach and its relevance is severely limited unless the authors demonstrate that it will work as well in a more general control setting, which is in their future work already. + +Thus I recommend rejection at this time.",ICLR2020, +nIpqtp-gG_i,1642700000000.0,1642700000000.0,1,Is5Hpwg2R-h,Is5Hpwg2R-h,Paper Decision,Reject,"This paper proposes a new approach, which combines offline reinforcement learning with learning in simulation. There were different views on the paper among the reviewers and we had quite a lot of discussions. As a consequence, there were still serious concerns remaining, e.g., whether the results are significant enough, whether there are clear advantages of the proposed mehtod over directly using offline RL methods. It is not justified whether the proposed framework can use offline data more efficiently or better reduce the gap between mismatched simulators and offline data. The reviewer who gave the highest score decided not to champion the paper. Considering all the discussions, we believe the paper is not ready for publication at ICIR yet.",ICLR2022, +TyDXRoU29Yl,1610040000000.0,1610470000000.0,1,TmUfsLjI-1,TmUfsLjI-1,Final Decision,Reject,"The paper studies efficient strategies for selection of pre-trained models for a downstream task. The main concerns consistently raised by the reviewers were limited methodological novelty, insufficient experimental analysis, unclear findings, and positioning of the paper with respect to related work that was ignored in the initial version. After the author response, R4 raised the score to borderline accept (still indicating the paper is weak without proper comparisons with other methods), whereas all other reviewers remained negative. The paper does have merits, as the methods are simple, and the problem is very practical (and somewhat understudied). However, the AC agrees with the majority that the paper is not ready for ICLR. The novelty is limited and the paper would benefit from more experiments, such as comparisons with simple baselines like early stopping as indicated by R1 and R3, and other methods such as Task2vec which address the same problem. The authors are encouraged to revise the paper according to the reviewers comments and submit it to another top conference.",ICLR2021, +SkdnEk6HM,1517250000000.0,1517260000000.0,440,H1bM1fZCW,H1bM1fZCW,ICLR 2018 Conference Acceptance Decision,Reject,"This paper proposes a way to automatically weight different tasks in a multi-task setting. The problem is a bit niche, and the paper had a lot of problems with clarity, as well as the motivation for the experimental setup and evaluation.",ICLR2018, +ryIzS1TSz,1517250000000.0,1517260000000.0,520,BkDB51WR-,BkDB51WR-,ICLR 2018 Conference Acceptance Decision,Reject,"Thank you for submitting you paper to ICLR. Two of the reviewers are concerned that the paper’s contributions are not significant enough —either in terms of the theoretical or experimental contribution -- to warrant publication. The authors have improved the experimental aspect to include a more comprehensive comparison, but this has not moved the reviewers. ",ICLR2018, +BUKCn3naaq3,1642700000000.0,1642700000000.0,1,34mWBCWMxh9,34mWBCWMxh9,Paper Decision,Reject,"This paper proposed a spatial smoothing layer for CNNs which is composed out of a feature range bounding layer (referred to as prob) and a bluring layer (referred to as blur). An empirical analyses shows that the proposed layer improves the accuracy and uncertainty of both deterministic CNNs and Bayesian NN (BNNs) approximated by MC-dropout. The paper further provides theoretical arguments for the hypothesis that bluring corresponds to an ensemble and represents the proposed method as a strategy to reduce the sample amount during inference in BNNs. + +Reviewers valued the extensive (theoretical as well as practical) analyses. However, the theoretical analysis should still be improved. First of all, the the proposed technique is motivated in the context of BNNs, which is not very strongly supported. Second, the argument that „the smoothing layer is an ensemble“ is based on the observation that it has some properties ensembles have as well: (1) they reduce feature map variances, (2) filter out high-frequency signals, and (3) flatten the loss landscape. But two things sharing the same properties do not need to be the same thing. Moreover, the proofs of the prepositions stating the properties are difficult to follow and may contain some flaws. Furthermore, the paper is not well self-contained and highly depends on the appendix. +Given these, the paper can not be accepted in its current state. + +A future version could improve over the current manuscript by making the theoretical statements and proofs more clear. Another option would be to analyze the contribution without connecting it to a Bayesian setting and ensembles, and instead focus on showing that the proposed smoothing layer has those good properties, doing detailed empirical studies, and showing that CNN components like global average pooling and ReLU + BN are special cases of the propose method.",ICLR2022, +VXwD3wEpJU4,1642700000000.0,1642700000000.0,1,#NAME?,#NAME?,Paper Decision,Reject,"This submission has been evaluated by 5 reviewers with 3 leaning towards borderline accept and 2 leaning towards borderline reject. Reviewers have been consistently concerned about several aspects of this work, i.e. that *the method is only demonstrated on toy datasets*, that there is an issue with the scalability to larger substructures, that the proposed approach did not excel *in the simple task of triangle counting* or even that *the authors did not perform any other experiments even on a toy dataset*, and that comparisons on Deep-LRP re. efficiency were not provided, and *more complex settings and sensitivity* were not investigated. Reviewers also noted that the general idea of recursion did already appear in GNNs in one or another setting. + +In making this decision, AC agrees that there is some potential in the proposed analysis and reviewers also highlighted this as a positive side of the submission. Yet, it is really hard to overlook at the same time the rebuttal where authors had the chance to address all reviewers comments regarding the experiments, their various details, and their variations. + +Failing to address these comments to the satisfaction of the majority of reviewers makes it impossible for AC to recommend the acceptance even tough there is every chance that the paper will ultimately make it to a high quality venue after a thorough revision (reviewers have really given a fair number of good suggestions that should assist authors).",ICLR2022, +G-QprYOczYO,1642700000000.0,1642700000000.0,1,l431c_2eGO2,l431c_2eGO2,Paper Decision,Reject,"This paper proposes a new regularizer, based on entropy maximization of samples near the decision boundary, to improve the calibration of neural networks while maintaining their accuracy. + +The method seems simple, sufficiently novel, and has promising results. However, based on the review process (described below), I feel the paper needs to significantly improve its evaluation and presentation before it can be accepted. + +The review process summary: + +* Two reviews were eventually weakly positive about the paper: without major concerns, but not enthusiastic. + +* One review (L8Yz) was not sufficiently informative. + +* One review (ESue) raised many points. I disagreed with most of these points, following the authors' discussion. However, a few points seemed valid, such as the not-so-impressive performance for OOD detection, which the authors did not address. + +* I therefore asked for an additional review (iva2). The review concluded the paper is interesting and potentially useful, but requires another round of revision before it can be accepted, mainly because of its clarity and missing comparisons. I agree with these conclusions.",ICLR2022, +Fz6mqvM4poK,1642700000000.0,1642700000000.0,1,gX9Ub6AwAd,gX9Ub6AwAd,Paper Decision,Reject,"This paper handles anomaly detection in surveillance videos. The authors proposed to use a frame-group attention method for handling this task. However, all the reviewers have concerns about the novelty, clarity, and experimental evaluations of this work. Moreover, no rebuttal is provided by the authors.",ICLR2022, +9fXtaf8OWPw,1642700000000.0,1642700000000.0,1,uorVGbWV5sw,uorVGbWV5sw,Paper Decision,Accept (Spotlight),"All the reviewers think that the work is significant and new. Therefore, they support the paper to be published at ICLR 2022. Given the strong results and the “accept” consensus from the reviewers, I accept the paper as “spotlight”. The authors should implement all the reviewers’ suggestions into the final version.",ICLR2022, +GSjHcX_BK4J,1642700000000.0,1642700000000.0,1,bB6YLDJewoK,bB6YLDJewoK,Paper Decision,Reject,"A number of suggestions have been given about the manuscript. The evaluation raised questions about clarify, placement with respect to other approaches, choices for the design, etc. There are no immediate replies from authors, so I hope the suggestions are useful for future work.",ICLR2022, +NMkwkr7Ean,1576800000000.0,1576800000000.0,1,SyeZIkrKwS,SyeZIkrKwS,Paper Decision,Reject,"The paper proposed the use of dynamic convolutional kernels as a way to reduce inference computation cost, which is a linear combination of static kernels and fused after training for inference to reduce computation cost. The authors evaluated the proposed methods on a variety models and shown good FLOPS reduction while maintaining accuracy. + +The main concern for this paper is the limited novelty. There have been many works use dynamic convolutions as pointed out by all the reviewers. The most similar ones are SENet and soft conditional computation. Although the authors claim that soft conditional computation ""focus on using more parameters to make models to be more expressive while we focus on reducing redundant calculations"", the methods are pretty the same and moreover in the abstract of soft conditional computation they have ""CondConv improves the performance and inference cost trade-off"".",ICLR2020, +CkMBuad13,1576800000000.0,1576800000000.0,1,HklJdaNYPH,HklJdaNYPH,Paper Decision,Reject,"This paper proposes a modification to the Transformer architecture in which the self-attention and feed-forward layer are merged into a self-attention layer with ""persistent"" memory vectors. This involves concatenating the contextual representations with global, learned memory vectors, which are attended over. Experiments show slight gains in character and word-level language modeling benchmarks. + +While the proposed architectural changes are interesting, they are also rather minor and had a small impact in performance and in number of model parameters. The motivation of the persistent memory vector as replacing the FF-layer is a bit tenuous since Eqs 5 and 9 are substantially different. Overall the contribution seems a bit thin for a ICLR paper. I suggest more analysis and possibly experimentation in other tasks in a future iteration of this paper.",ICLR2020, +jOnPym6Ro9,1576800000000.0,1576800000000.0,1,rklk_ySYPB,rklk_ySYPB,Paper Decision,Accept (Poster),"This paper extends the degree to which ReLU networks can be provably resistant to a broader class of adversarial attacks using a MMR-Universal regularization scheme. In particular, the first provably robust model in terms of lp norm perturbations is developed, where robustness holds with respect to *any* p greater than or equal to one (as opposed to prior work that may only apply to specific lp-norm perturbations). + +While I support accepting this paper based on the strong reviews and significant technical contribution, one potential drawback is the lack of empirical tests with a broader cohort of representative CNN architectures (as pointed out by R1). In this regard, the rebuttal promises that additional experiments with larger models will be added in the future to the final version, but obviously such results cannot be used to evaluate performance at this time.",ICLR2020, +SkxL_p32yV,1544500000000.0,1545350000000.0,1,S1lPShAqFm,S1lPShAqFm,ICLR 2019 decision,Reject,This paper studies the behavior of training of over parametrized models. All the reviewers agree that the questions studied in this paper are important. However the experiments in the paper are fairly preliminary and the paper does not offer any answers to the questions it studies. Further the writing is very loose and the paper is not ready for publication. I advise authors to take the reviews seriously into account before submitting the paper again. ,ICLR2019,4: The area chair is confident but not absolutely certain +Hk6eaz8ux,1486400000000.0,1486400000000.0,1,SJZAb5cel,SJZAb5cel,ICLR committee final decision,Reject,"There is a bit of spread in the reviewer scores, but ultimately the paper does not meet the high bar for acceptance to ICLR. The lack of author responses to the reviews does not help either.",ICLR2017, +HJxbryTHf,1517250000000.0,1517260000000.0,502,SkFEGHx0Z,SkFEGHx0Z,ICLR 2018 Conference Acceptance Decision,Reject,"This paper proposes a non-parametric method for metric learning and classification. One of the reviewers points out that it can be viewed as an extension of NCA. There is in fact a non-linear version of NCA that was subsequently published, see [1]. In this sense, the approach here appears to be a version of nonlinear NCA with learnable per-example weights, approximate nearest neighbour search, and the allowance of stale exemplars. In this view, there is concern from the reviewers that there may not be sufficient novelty for acceptance. + +The reviewers have concerns with scalability. It would be helpful to include clarification or even some empirical results on how this scales compared to softmax. It is particularly relevant for larger datasets like Imagenet, where it may be impossible to store all exemplars in memory. + +It is also recommended to relate this approach to metric-learning approaches in few-shot learning. Particularly to address the claim that this is the first approach to combine metric learning and classification. + +[1]: Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure. Ruslan Salakhutdinov and Geoffrey Hinton. AISTATS 2007 ",ICLR2018, +GacCzur68y,1610040000000.0,1610470000000.0,1,yWkP7JuHX1,yWkP7JuHX1,Final Decision,Accept (Oral),"The paper proposes to bring together a GAN, a differentiable renderer, and an inverse graphics model. This combined model learns 3D-aware image analysis and synthesis with very limited annotation effort (order of minutes). The results look impressive, even compared to training on a labeled dataset annotation of which took several orders of magnitude more time. + +The reviewers point out the novelty of the proposed system and the very high quality of the results. On the downside, R2 mentions that the model appears over-engineered and some important experimental results are missing. The authors’ response addresses these concerns quite well. + +Overall, this is a really strong work with compelling results, taking an important step towards employing generative models and neural renderers “in the wild”. I think it can make for a good oral. +",ICLR2021, +wnUrTCh3kDu,1642700000000.0,1642700000000.0,1,XbatFr32NRm,XbatFr32NRm,Paper Decision,Reject,"The reviewers unanimously recommend rejecting this submission, and I concur with that recommendation. This submission is not appropriate for a machine learning conference like ICLR. It does not display a thorough understanding of the literature nor does it make a sufficiently valuable contribution. There is no need to ""generalize"" MLPs, the community knows quite well that we can use dropout, skip connections, and batch norm with them. Even the original dropout paper applies dropout on fully connected ReLU MLPs. + +As another example, the submission attributes skip connections to He et al. 2016, but skip connections (also known as ""shortcut connections"" were in common use in the late 1980s and throughout the 1990s in the Connectionist community, including for non-convolutional simple feedforward neural networks or ""MLPs"". They were a well known technique throughout neural network history, although the advent of deeper layered neural network architectures perhaps gave them new importance. He et al. certainly popularized them for modern neural network architectures and popularized their residual formulation. The earliest reference I could find easily for skip connections was ""Learning to Tell Two Spirals Apart"" which was published in 1988 by Kevin J. Lang and Michael. J. Witbrock, but in general such architectural tricks were not viewed as particularly remarkable in the 1990s neural networks literature.",ICLR2022, +HklonHkSx4,1545040000000.0,1545350000000.0,1,Skh4jRcKQ,Skh4jRcKQ,Progress on the theoretical understanding of straight-through estimation for linear networks,Accept (Poster),"The paper contributes to the understanding of straight-through estimation for single hidden layer neural networks, revealing advantages for ReLU and clipped ReLU over identity activations. A thorough and convincing theoretical analysis is provided to support these findings. After resolving various issues during the response period, the reviewers concluded with a unanimous recommendation of acceptance. Valid criticisms of the presentation quality were raised during the review and response period, and the authors would be well served by continuing to improve the paper's clarity.",ICLR2019,5: The area chair is absolutely certain +lqdGqI4fcO,1576800000000.0,1576800000000.0,1,r1gNLAEFPS,r1gNLAEFPS,Paper Decision,Reject,"This paper addresses the classic medial image segmentation by combining Neural Ordinary Differential Equations (NODEs) and the level set method. The proposed method is evaluated on kidney segmentation and salient object detection problems. Reviewer #1 provided a brief review concerning ICLR is not the appropriate venue for this work. Reviewer #2 praises the underlying concept being interesting, while pointing out that the presentation and experiments of this work is not ready for publication yet. Reviewer #3 raises concerns on whether the methods are presented properly. The authors did not provide responses to any concerns. Given these concerns and overall negative rating (two weak reject and one reject), the AC recommends reject.",ICLR2020, +SJxWt6iVx4,1545020000000.0,1545350000000.0,1,H1lADsCcFQ,H1lADsCcFQ,Technical correctness issues,Reject,"On the positive side, this is among the first papers to exploit non-Euclidean geometry, specifically curvature for adversarial learning. However, reviewers are largely in agreement that the technical correctness of this paper is unconvincing despite substantial technical exchanges with the authors. ",ICLR2019,4: The area chair is confident but not absolutely certain +NxhQVbMFh5C,1642700000000.0,1642700000000.0,1,apv504XsysP,apv504XsysP,Paper Decision,Accept (Spotlight),"This paper builds on the success of the FermiNet neural wave function framework by pairing it with a graph neural network which predicts the parameters of neural wave function from the geometry. The resulting PESNet trains significantly faster, with no loss of accuracy. This method constitutes an important advance in ML-powered quantum mechanical calculations. + +The reviewers unanimously recommend acceptance.",ICLR2022, +JTjzaDPu5k,1576800000000.0,1576800000000.0,1,ByeMPlHKPH,ByeMPlHKPH,Paper Decision,Accept (Poster),"This paper presents an efficient architecture of Transformer to facilitate implementations on mobile settings. The core idea is to decompose the self-attention layers to focus on local and global information separately. In the experiments on machine translation, it is shown to outperform baseline Transformer as well as the Evolved Transformer obtained by a costly architecture search. +While all reviewers admitted the practical impact of the results in terms of engineering, the main concerns in the initial paper were the clarification of the mobile settings and scientific contributions. Through the discussion, reviewers are fairly satisfied with the authors’ response and are now all positive to the acceptance. Although we are still curious how it works on other tasks (as the title says “mobile applications”), I think the paper provides enough insights valuable to the community, so I’d like to recommend acceptance. +",ICLR2020, +H11bLkpSM,1517250000000.0,1517260000000.0,717,HJZiRkZC-,HJZiRkZC-,ICLR 2018 Conference Acceptance Decision,Reject,"This paper presents a method for using byte level convolutional networks for building text-based autoencoders. They show that these models do well compared to RNN-based methods which model text in a sequence. Evaluation is solely based on byte level prediction error. The committee feels that the paper would have been stronger if evaluation was on some actual task (say summarization, Miao and Blunsom, for example) and show that it works as well as RNNs, the paper would have been stronger.",ICLR2018, +7s5-q1YV6mq,1642700000000.0,1642700000000.0,1,FvfV64rovnY,FvfV64rovnY,Paper Decision,Reject,"This paper investigates the scaling laws of neural networks with respect to the number of training samples $D$ and parameters $P$ for some estimators in two regimes: the variance-limited regime and the resolution-limited regime. The theoretical results are supported by some numerical experiments. + +Unfortunately, the paper has several critical issues, in particular, in its novelty and technical correctness. +1. The theoretical analyses lack much of their rigor. The assumptions and problem setups are not precisely introduced. Accordingly, the statement of each theorem is presented in an inaccurate way. Moreover, some theoretical consequences contain technical flaws (e.g., $1/P$ should be replaced by $1/\sqrt{P}$ without an appropriate assumption on the loss function such as strong convexity and smoothness). +2. Many of the presented results are already known in the literatures. It is unfortunate that the authors did no cite relevant existing literatures and did not discuss its novelty compared with the existing work. + +For those reasons, this paper lacks its novelty and the quality of the paper is not sufficient to be accepted. +I recommend the authors to thoroughly survey the literature of the statistical learning theory from classic nonparametric regression analyses to recent advances on overparameterization.",ICLR2022, +zaw8umP18z9,1642700000000.0,1642700000000.0,1,qESp3gXBm2g,qESp3gXBm2g,Paper Decision,Reject,"Authors propose an autoencoding echo state machine for a one-shot one-class time series classification task. Their approach feeds a (one-dimensional) error signal over time relative to a reference training datum to SVMs. Training is very fast by design. OFC signal analysis has practical value in neuroscience. But only one benchmark (seq-MNIST) was used to evaluate their method. While the performance seem impressive, no explanation of why the internal representation learned by the proposed system is superior and robust to noise was provided. No sequential autoencoders or latent neural trajectory inference methods were compared. Although the manuscript has greatly improved through the review-rebuttal process, there are missing key details (e.g. length of E(t) used for classification--important for real-time application, initial state for the reservoir, choice of W_in --important since it seems to be a chaotic network that's driven by strong input). While there is novelty in the approach, there is a general lack of enthusiasm among the reviewers for the manuscript as is. The reviewers and AC strongly encourage the authors to further developed these ideas and add thorough analyses for another conference. + +(BTW, perhaps it's worth citing https://doi.org/10.1109/IJCNN.2016.7727309, since autoencoder combined with reservoir computing has been used for anomaly detection.)",ICLR2022, +cn8KrzsBzf,1610040000000.0,1610470000000.0,1,c9-WeM-ceB,c9-WeM-ceB,Final Decision,Accept (Poster),"The reviewers all agreed on accepting this paper, stating that it makes a compelling point about the usefulness of saliency methods to diagnose generalization. The reviewers found that the experiments were a strong point and applauded the thorough hyperparameter tuning and re-runs for statistical significance. One reviewer commented that the paper was too dense with information, so much so as to make it difficult to digest. However, overall this seems like an interesting paper that is relevant to the community and will hopefully foster some good discussion about the shortcomings and future directions of saliency methods.",ICLR2021, +cDr24yak532,1642700000000.0,1642700000000.0,1,mMiKHj7Pobj,mMiKHj7Pobj,Paper Decision,Reject,"This paper proposes to study Auto-induced Distribution Shift (ADS), the phenomenon that models can create a feedback-loop: the predictions of a model influence user behaviors when it is deployed, which, in turn, affects the accuracy measure of the model. The paper empirically shows that a meta-learning algorithm called PBT causes a distribution shift instead of maximizing accuracy. While the premise of this paper is interesting, the proposed frameworks are very similar to the idea of strategic behavior in machine learning, and of ""Performative Prediction"" (Juan C. Perdomo, Tijana Zrnic, Celestine Mendler-Dünner, Moritz Hardt). However, this line of work is neither cited nor discussed in this paper. In addition, the paper is hard to read in certain parts. We encourage the authors to compare their work with performative prediction. We hope the authors find the reviews helpful.",ICLR2022, +ZCCeMCRece,1610040000000.0,1610470000000.0,1,7eD88byszZ,7eD88byszZ,Final Decision,Reject,"The paper proposes a fast, nearly-linear time, algorithm for finding a sparsifier for general directed and undirected graphs that approximately preserves the spectral properties of the original graph. The reviewers appreciated the main contribution of the paper, but they were concerned about the correctness and clarity of the paper, as well as the relevance of the contribution to machine learning. Following the discussion with the authors, the reviewers still felt that these concerns had not been fully addressed by the authors' responses and the subsequent revision of the paper. After taking these concerns into account as well as evaluating the paper relative to other ICLR submissions, I recommend reject.",ICLR2021, +BJxjVto0JN,1544630000000.0,1545350000000.0,1,ByxmXnA9FQ,ByxmXnA9FQ,Arguable choices of parameters and the performance degradation issue,Reject,"The paper proposes a new framework for out-of-distribution detection, based on variational inference and a prior Dirichlet distribution. + +The reviewers and AC note the following potential weaknesses: (1) arguable and not well justified choices of parameters and (2) the performance degradation under many classes (e.g., CIFAR-100). + +For (2), the authors mentioned that this is because ""there are more than 20% of misclassified test images"". But, AC rather views it as a limitation of the proposed approach. The out-of-detection detection problem is a one or two classification task, independent of how many classes exist in the neural classifier. + +In overall, the proposed idea is interesting and makes sense but AC decided that the authors need more significant works to publish the work.",ICLR2019,4: The area chair is confident but not absolutely certain +Hy0b816HM,1517250000000.0,1517260000000.0,730,r1TA9ZbA-,r1TA9ZbA-,ICLR 2018 Conference Acceptance Decision,Reject,"All reviewers agree that the contribution of this paper, a new way of training neural nets to execute Monte-Carlo Tree Search, is an appealing idea. For the most part, the reviewers found the exposition to be fairly clear, and the proposed architecture of good technical quality. Two of the reviewers point out flaws in implementing in a single domain, 10x10 Sokoban with four boxes and four targets. Since their training methodology uses supervised training on approximate ground-truth trajectories derived from extensive plain MCTS trials, it seems unlikely that the trained DNN will be able to generalize to other geometries (beyond 10x10x4) that were not seen during training. Sokoban also has a low branching ratio, so that these experiments do not provide any insight into how the methodology will scale at much higher branching ratios. + +Pros: Good technical quality, interesting novel idea, exposition is mostly clear. Good empirical results in one very limited domain. +Cons: Single 10x10x4 Sokoban domain is too limited to derive any general conclusions. + +Point for improvement: The paper compares performance of MCTSnet trials vs. plain MCTS trials based on the number of trials performed. This is not an appropriate comparison, because the NN trials will be much more heavyweight in terms of CPU time, and there is usually a time limit to cut off MCTS trials and execute an action. It will be much better to plot performance of MCTSnet and plain MCTS vs. CPU time used.",ICLR2018, +H_KFwNCGaHN,1610040000000.0,1610470000000.0,1,aJLjjpi0Vty,aJLjjpi0Vty,Final Decision,Reject,"This paper mostly received negative scores. A few reviewers pointed out that the idea of modeling user preference in the frequency domain seems novel and interesting. However, there are a few concerns around the clarity of the paper, the motivation of the proposed approach, as well as the experimental results being unconvincing (both in terms of execution as well as exploration of the results). The authors did not provide a response. Therefore, I recommend reject. ",ICLR2021, +0fqZFeKUgp,1576800000000.0,1576800000000.0,1,SJev6JBtvH,SJev6JBtvH,Paper Decision,Reject,"The paper proposes a new method for testing whether new data comes from the same distribution as training data without having an a-priori density model of the training data. This is done by looking at the intersection of typical sets of an ensemble of learned models. + +On the theoretical side, the paper was received positively by all reviewers. The theoretical results were deemed strong, and the ideas in the paper were considered novel. The problem setting was considered relevant, and seen as a good proposal to deal with the shortcoming of models on out of distribution data. + +However, the lack of empirical results on at least somewhat realistic datasets (e.g. MNIST) was commented on by all reviewers. The authors only present a toy experiment. The authors have explained their decision, but I agree with R1 that it would be appropriate in such situations to present the toy experiment next to a more realistic dataset. This also means that the effectiveness of the proposed method in real settings is as of yet unclear. Although the provided toy example was considered clear and illuminating, the clarity of the text could still be improved. + +Although the reviewers had a spread in their final score, I think they would all agree that the direction this paper takes is very exciting, but that the current version of the paper is somewhat premature. Thus, unfortunately, I have to recommend rejection at this point. + +",ICLR2020, +ONZA6dxbvd,1576800000000.0,1576800000000.0,1,Hkee1JBKwB,Hkee1JBKwB,Paper Decision,Reject,"This paper proposes Conv-TT-LSTM for long-term video prediction. The proposed method saves memory and computation by low-rank tensor representations via tensor decomposition and is evaluated in Moving MNIST and KTH datasets. + +All reviews argue that the novelty of the paper does not meet the standard of ICLR. In the rebuttal, the authors polish the experiment design, which fails to change any reviewer’s decision. + +Overall, the paper is not good enough for ICLR.",ICLR2020, +RsGX9-F-C2o,1642700000000.0,1642700000000.0,1,SjGRJ4vSZlP,SjGRJ4vSZlP,Paper Decision,Reject,"The paper studies an interesting problem, but as pointed out by reviewers, the presentation of the problem statement and contributions need to be improved.",ICLR2022, +H1gE1BEHeN,1545060000000.0,1545350000000.0,1,SyMhLo0qKQ,SyMhLo0qKQ,meta-review,Accept (Poster),"All the reviewers and AC agrees that the main strength of the paper that it studies a rather important question of the validity of using linear interpolation in evaluating GANs. The paper gives concrete examples and theoretical and empirical analysis that shows linear interpolation is not a great idea. The potential weakness is that the paper doesn't provide a very convincing new evaluation to replace the linear interpolation. However, given that it's largely unclear what are the right evaluations for GANs, the AC thinks the ""negative result"" about linear interpolation already deserves an ICLR paper. ",ICLR2019,5: The area chair is absolutely certain +ZQqMgq4Kj3k,1610040000000.0,1610470000000.0,1,Bx05YH2W8bE,Bx05YH2W8bE,Final Decision,Reject,"The paper builds upon hypergraph convolutional networks (HCN), extending them to time-varying hypergraphs in dynamical settings. However, as some of the reviewers pointed out, it would be useful to explore other system variations to better justify the choices in this particular approach; perhaps an evaluation on a wider set of datasets would also strengthen the contribution of the paper, as well as adding evaluation metrics that can be more appropriate for the application considered (stock market prediction). Also, concerns were raised by several reviewers regarding the somewhat incremental improvement over the state of art, and the degree of novelty in the proposed approach. Overall, while the problem considered is important and the approach is promising, the paper in its current shape is somewhat borderline and may require a bit of additional work to be ready for publication. +",ICLR2021, +5sGaISizZu9,1610040000000.0,1610470000000.0,1,F8xpAPm_ZKS,F8xpAPm_ZKS,Final Decision,Reject,"In this paper, the authors aim to develop a new method for credit assignment, where certain types of future information is conditioned on. The authors are well-aware that naive conditioning on future information introduces bias due to Berkson's paradox (explaining away), and introduce a number of corrections (described in section 2.4 and 2.5). + +The authors illustrate their approach via a number of simulation studies and constructed problems. + +I think it would be nice if the authors found a way of connecting their notion of counterfactual to one used in causal inference (for instance, I think there is a connection via e.g. importance correction terms). + +Reviewers were worried about the contribution being incremental given existing work (from 2019), and relative simplicity of the evaluation of the approach, compared to existing similar work. +",ICLR2021, +j0rjRf0VT6K,1642700000000.0,1642700000000.0,1,3PN4iyXBeF,3PN4iyXBeF,Paper Decision,Accept (Poster),"The paper presents a quite rigorous analysis of approximate implicit differentiation with warm starts applied to strongly convex upper level/strongly convex lower level and nonconvex upper level/strongly convex lower level bilevel optimization algorithms in a very general yet also very practical framework. They allow for stochastic errors in the algorithms solving the upper and lower level problems, making their work practical and applicable to real problems in machine learning (hyperparameter optimization), while analyzing in a way that is agnostic towards which algorithms are specifically used for the lower and upper level problems. + +Three out of four reviewers were rather positive of the paper (scores: 6, 6, 8). One reviewer was very negative (score: 3). To my knowledge, the authors have convincingly answered all the points raised by the reviewer. Unfortunately, the reviewer did not follow up. + +Similarly to reviewers, I found sections 1-3 to be extremely well-written and to give a nice overview of the field. Section 4 had slight clarity issues (dense notation) that were addressed in the revision. Reviewer 6zLQ partially proof-read proofs. + +Overall, I recommend acceptance as a poster, as this paper is advancing stochastic implicit differentiation and should be of interest to many at the ICLR conference.",ICLR2022, +SvrhL6S1nD,1576800000000.0,1576800000000.0,1,BJe1334YDH,BJe1334YDH,Paper Decision,Accept (Poster)," The paper proposed the use of a combination of RL-based iterative improvement operator to refine the solution progressively for the capacitated vehicle routing problem. It has been shown to outperform both classical non-learning based and SOTA learning based methods. The idea is novel and the results are impressive, the presentation is clear. Also the authors addressed the concern of lacking justification on larger tasks by including an appendix of additional experiments. ",ICLR2020, +BJgKEmuGeV,1544880000000.0,1545350000000.0,1,BkeStsCcKQ,BkeStsCcKQ,Interesting observations about critical learning periods in deep networks in a well-written paper,Accept (Poster),"Irrespective of their taste for comparisons of neural networks to biological organisms, all reviewers agree that the empirical observations in this paper are quite interesting and well presented. While some reviewers note that the paper is not making theoretical contributions, the empirical results in themselves are intriguing enough to be of interest to ICLR audiences.",ICLR2019,5: The area chair is absolutely certain +mQm0s8ccJo,1576800000000.0,1576800000000.0,1,HkxQzlHFPr,HkxQzlHFPr,Paper Decision,Reject,"This paper proposes using first order logic to rule out superficial information for improved natural language inference. While the topic is of interest, reviewers find that the paper misses much of the previous literature on semantics which is highly relevant. + +I thank the authors for submitting this paper to ICLR. Please take the reviewers' comments, especially recommended references, to improve the paper for future submission.",ICLR2020, +r1JRM1THG,1517250000000.0,1517260000000.0,39,H1zriGeCZ,H1zriGeCZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper introduces an algorithm for optimization of discrete hyperparameters based on compressed sensing, and compares against standard gradient-free optimization approaches. + +As the reviewers point out, the provable guarantees (as is usually the case) don't quite make it to the main results section, but are still refreshing to see in hyperparameter optimization. + +The method itself is relatively simple compared to full-featured Bayesopt (spearmint), although not as widely applicable. +",ICLR2018, +SJljqe6RyV,1544630000000.0,1545350000000.0,1,SylCrnCcFX,SylCrnCcFX,"Novel work, and potentially of broader interest",Accept (Poster),"The paper aims to encourage deep networks to have stable derivatives over larger regions under networks with piecewise linear activation functions. + +All reviewers and AC note the significance of the paper. AC also thinks this is also a very timely work and potentially of broader interest of ICLR audience.",ICLR2019,5: The area chair is absolutely certain +NnEVCHLFH8,1610040000000.0,1610470000000.0,1,1FvkSpWosOl,1FvkSpWosOl,Final Decision,Accept (Poster),"This paper introduces an alternative to self-attention, based on matrix factorization, and apply it to computer vision problems such as semantic segmentation. The method is simple and novel and obtains competitive results compared to existing approaches. The reviewers found the paper well written and easy to understand. For these reasons, I recommend to accept the paper.",ICLR2021, +RY87fbkqY1b,1642700000000.0,1642700000000.0,1,clwYez4n8e8,clwYez4n8e8,Paper Decision,Reject,"This paper proposes a method for 4-bit quantized training of NNs (forward and backward), obtaining SOTA 4-bit training quantization, motivated by an analysis of rounding schemes (an important aspect) in quantized training. The main concerns from the reviewers were that the approach was not practical (both a general concern, and of specific note here since the word is used in the title and motivation of the work), due to lack of compatibility with (current) general purpose hardware, and lack of suitability of the approach for specialized hardware, so it is unclear what the actual use case is for the approach. The authors argued that (1) this is not a problem on some hardware and (2) that past works have not been held to this standard. I did not find the authors to provide a strong argument during the discussion period to address these concerns.",ICLR2022, +JbQRtKvp0,1576800000000.0,1576800000000.0,1,BJg4NgBKvH,BJg4NgBKvH,Paper Decision,Accept (Poster),"This paper proposes methodology to train binary neural networks. + +The reviewers and authors engaged in a constructive discussion. All the reviewers like the contributions of the paper. + +Acceptance is therefore recommended.",ICLR2020, +SMZKHVhdmq,1576800000000.0,1576800000000.0,1,H1eqQeHFDS,H1eqQeHFDS,Paper Decision,Accept (Poster),"This paper treats the task of point cloud learning as a dynamic advection problem in conjunction with a learned background velocity field. The resulting system, which bridges geometric machine learning and physical simulation, achieves promising performance on various classification and segmentation problems. Although the initial scores were mixed, all reviewers converged to acceptance after the rebuttal period. For example, a better network architecture, along with an improved interpolation stencil and initialization, lead to better performance (now rivaling the state-of-the-art) as compared to the original submission. This helps to mitigate an initial reviewer concern in terms of competitiveness with existing methods like PointCNN or SE-Net. Likewise, interesting new experiments such as PIC vs. FLIP were included.",ICLR2020, +kxetRUZ_WUY,1642700000000.0,1642700000000.0,1,ffS_Y258dZs,ffS_Y258dZs,Paper Decision,Reject,"All reviewers have agreed that the topic of evaluating compositional skills of agents is an important one and cast it as compositional learning as meta-reinforcement learning is an interesting approach. At the same time, reviewers have raised concerns with respect to the benchmark itself, the exposition and clarify of the ideas as well as the experimental evidence used to support some of the claims. The authors have not provided an author response but have acknowledged the reviewers feedback. + +As this paper stands I cannot recommend acceptance for the current manuscript.",ICLR2022, +BJXZEJ6Hz,1517250000000.0,1517260000000.0,289,rydeCEhs-,rydeCEhs-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper proposes a method for having a meta deep learning model generate the weights of a main model given a proposed architecture. This allows the authors to search over the space of architectures efficiently. The reviewers agreed that the paper was very well composed, presents an interesting and thought provoking idea and provides compelling empirical analysis. An exploration of the failure modes of the approach is highly appreciated. The lowest score was also of quite low confidence, so the overall score should probably be one point higher. + +Pros: +- Very well written and composed +- ""Thought provoking"" +- Some strong experimental results +- Analysis of weaker experimental results (failure modes) + +Cons: +- Some weak results (also in pros, however)",ICLR2018, +B1upnfUOl,1486400000000.0,1486400000000.0,1,Hku9NK5lx,Hku9NK5lx,ICLR committee final decision,Accept (Poster),The reviewers unanimously recommended accepting the paper.,ICLR2017, +sLUhqC63w4,1576800000000.0,1576800000000.0,1,BJleph4KvS,BJleph4KvS,Paper Decision,Reject,"This paper presents a new graph pooling method, called HaarPooling. Based on the hierarchical HaarPooling, the graph classification problems can be solved under the graph neural network framework. + +One major concern of reviewers is the experiment design. Authors add a new real world dataset in revision. Another concern is computational performance. The main text did not give a comprehensive analysis and the rebuttal did not fully address these problems. + +Overall, this paper presents an interesting graph pooling approach for graph classification while the presentation needs further polish. Based on the reviewers’ comments, I choose to reject the paper. +",ICLR2020, +fRfSiaseoP,1576800000000.0,1576800000000.0,1,r1e0G04Kvr,r1e0G04Kvr,Paper Decision,Reject,"This paper studies a problem of graph translation, which aims at learning a graph translator to translate an input graph to a target graph using adversarial training framework. The reviewers think the problem is interesting. However, the paper needs to improve further in term of novelty and writing. ",ICLR2020, +ryxR2SRxgE,1544770000000.0,1545350000000.0,1,S1E3Ko09F7,S1E3Ko09F7,Solid technical novelty with convincing empirical results.,Accept (Poster),"The paper presents two new methods for model-agnostic interpretation of instance-wise feature importance. + +Pros: +Unlike previous approaches based on the Shapley value, which had an exponential complexity in the number of features, the proposed methods have a linear-complexity when the data have a graph structure, which allows approximation based on graph-structured factorization. The proposed methods present solid technical novelty to study the important challenge of instance-wise, model-agnostic, linear-complexity interpretation of features. + +Cons: +All reviewers wanted to see more extensive experimental results. Authors responded with most experiments requested. One issue raised by R3 was the need for comparing the proposed model-agnostic methods to existing model-specific methods. The proposed linear-complexity algorithm relies on the markov assumption, which some reviewers commented to be a potentially invalid assumption to make, but this does not seem to be a deal breaker since it is a relatively common assumption to make when deriving a polynomial-complexity approximation algorithm. Overall, the rebuttal addressed the reviewers' concerns well enough, leading to increased scores. + +Verdict: +Accept. Solid technical novelty with convincing empirical results.",ICLR2019,4: The area chair is confident but not absolutely certain +CgNlsVf8GF,1576800000000.0,1576800000000.0,1,HkxIIaVKPB,HkxIIaVKPB,Paper Decision,Reject,"This work proposes a VAE-based model for learning transformations of sequential data (the main here intuition is to have the model learn changes between frames without learning features that are constant within a time-sequence). All reviewers agreed that this is a very interesting submission, but have all challenged the novelty and rigor of this paper, asking for more experimental evidence supporting the strengths of the model. After having read the paper, I agree with the reviewers and I currently see this one as a weak submission without potentially comparing against other models or showing whether the representations learned from the proposed model lead in downstream improvements in a task that uses this representations.",ICLR2020, +3fvRdDU6i,1576800000000.0,1576800000000.0,1,SkgJOAEtvr,SkgJOAEtvr,Paper Decision,Reject,"This work examines how internal consistency objectives can help emergent communication, namely through possibly improving ability to refer to unseen referents and to generalize across communicative roles. Experimental results support the second hypothesis but not the first. +Reviewers agree that this is an exciting object of study, but had reservations about the rationale for the first hypothesis (which was ultimately disproven), and for how the second hypothesis was investigated (lack of ablations to tease apart which part was most responsible for improvement, unsatisfactory framing). These concerns were not fully addressed by the response. +While the paper is very promising and the direction quite interesting, this cannot in its current form be recommended for acceptance. We encourage authors to carefully examine reviewers' suggestions to improve their work for submission to another venue.",ICLR2020, +4IIz4JjqHT,1576800000000.0,1576800000000.0,1,Syee1pVtDS,Syee1pVtDS,Paper Decision,Reject,"The paper proposes a decentralized algorithm with regret for distributed online convex optimization problems. The reviewers worry about the assumptions and the theoretical settings, they also find that the experimental evaluation is insufficient.",ICLR2020, +nB0ov2Hno-D,1610040000000.0,1610470000000.0,1,xBoKLdKrZd,xBoKLdKrZd,Final Decision,Reject,"This paper presents an empirical study of different efficient ways to estimate the performance of architectures in NAS, focussing on weight sharing and performance prediction methods. +Most reviewers appreciated the paper's goal of performing a careful, controlled study of different factors that can affect the ranking of architectures. However, all reviewers also had substantial concerns. Many of these could in principle be fixed by additional experiments, but the short time window of the author response period did not allow for this. +As a result, all reviewers voted for rejection, and I will follow that recommendation. +Nevertheless, I would like to encourage the authors to continue this work, as I believe that the NAS community needs more careful controlled studies of this type. For their next version, I encourage the authors to take into account the many points mentioned by the reviews.",ICLR2021, +D73iDiSRml,1576800000000.0,1576800000000.0,1,B1xv9pEKDS,B1xv9pEKDS,Paper Decision,Reject,"This paper proposes a two-stage distillation from pretrained language models, where the knowledge distillation happens in both the pre-training and the fine-tune stages. Experiments show improvement on BERT, GPT and MASS. All reviewers pointed that the novelty of the work is very limited.",ICLR2020, +tdWJ7FsJOs,1576800000000.0,1576800000000.0,1,SkgC6TNFvr,SkgC6TNFvr,Paper Decision,Accept (Poster),"Authors propose a novel scheme to perform active learning on image segmentation. This structured task is highly time consuming for humans to perform and challenging to model theoretically as to potentially apply existing active learning methods. Reviewers have remaining concerns over computation and that the empirical evaluation is not overwhelming (e.g., more comparisons). Nevertheless, the paper appears to bring new ideas to the table for this important problem. ",ICLR2020, +Syg2hNyWlV,1544770000000.0,1545350000000.0,1,HklyMhCqYQ,HklyMhCqYQ,The paper needs to be improved,Reject,"The main novelty of the paper lies in using multiple noise vectors to reconstruct the high resolution image in multiple ways. Then, the reconstruction with minimal loss is selected and updated to improve the fit against the target image. The most important control experiment in my opinion should compare this approach against the same architecture with only with m=1 noise vector (i.e., using a constant noise vector all the time). Unfortunately, the paper does not include such a comparison, which means the main hypothesis of the paper is not tested. Please include this experiment in the revised version of the paper. + +PS: There is another high level concern regarding the use of PSNR or SSIM for evaluation of super-resolution methods. As shown by ""Pixel recursive super resolution (Dahl et al.)"" and others, PSNR and SSIM metrics are only relevant in the low magnification regime, in which techniques based on MSE (mean squared error) are very competitive. Maybe you need to consider large magnification regime in which GAN and normalized flow-based models are more relevant.",ICLR2019,4: The area chair is confident but not absolutely certain +lLSAS21o3i,1642700000000.0,1642700000000.0,1,62r41yOG5m,62r41yOG5m,Paper Decision,Reject,"Description of paper content: + +The paper describes a technique to learn option policies using behavioral cloning and then recombine them using a high-level controller trained by RL. The underlying options are frozen. The method is tested in two published environments: a discrete grid world environment and a continuous action space robot. It is compared to three baselines. + +Summary of paper discussion: + +All reviewers moved to reject based on a lack of novelty and a lack of significant empirical results. No rebuttals were provided.",ICLR2022, +q2cLUXoDz3q,1642700000000.0,1642700000000.0,1,4l5iO9eoh3f,4l5iO9eoh3f,Paper Decision,Reject,"This paper formulates and solves a capacitated vehicle routing problem (CVRP) in the presence of costs for deploying additional vehicles: a mixture of supervised learning, algorithms, and OR techniques is used. In particular, a mix of greedy decoding, repairing of the solution, and post-processing with OR tools is used to extract a feasible solution from the probabilistic prediction. + +The paper makes a good case that existing methods do not solve the CVRP with a hard constraint on the fleet size. On the other hand, there is a strong dependence on heuristic improvements: e.g., a strongly dependence on the post-processing, and an additional repair procedure for the decoding process. The authors are encouraged to investigate how such improvements would work with existing approaches: i.e., how novel the new model’s contributions are.",ICLR2022, +NbVCpFpzjro,1642700000000.0,1642700000000.0,1,nEfdkfAyRT8,nEfdkfAyRT8,Paper Decision,Reject,The authors provide a cubic regularization approach to non-convex concave minimax problems. The reviewers highlight that the paper in its current form is not ready for publication due to issues such as the gap between the theory and the implementable algorithm.,ICLR2022, +kQEke-_G1L,1576800000000.0,1576800000000.0,1,ryxmrpNtvH,ryxmrpNtvH,Paper Decision,Reject,"This paper provides a series of empirical evaluations on a small neural architecture search space with 64 architectures. The experiments are interesting, but limited in scope and limited to 64 architectures trained on CIFAR-10. It is unclear whether lessons learned on this search space would transfer to large search spaces. One upside is that code is available, making the work reproducible. + +All reviewers read the rebuttal and participated in the private discussion of reviewers and AC, but none of them changed their mind. All gave a weak rejection score. + +I agree with this assessment and therefore recommend rejection.",ICLR2020, +SyfE3GL_l,1486400000000.0,1486400000000.0,1,S1LVSrcge,S1LVSrcge,ICLR committee final decision,Accept (Poster),"This paper describes a new way of variable computation, which uses a different number of units depending on the input. This is different from other methods for variable computation that compute over multiple time steps. The idea is clearly presented and the results are shown on LSTMs and GRUs for language modeling and music prediction. + + Pros: + - new idea + - convincing results in a head to head comparison between different set ups. + + Cons: + - results are not nearly as good as the state of the art on the reported tasks. + + The reviewers and I had several rounds of discussion on whether or not to accept the paper. One reviewer had significant reservations about the paper since the results were far from SOTA. However, since getting SOTA often requires a combination of several tricks, I felt that perhaps it would not be fair to require this, and gave them the benefit of doubt (especially because the other two reviewers did not think this was a dealbreaker). In my opinion, the authors did a fair enough job on the head to head comparison between their proposed VCRNN models and the underlying LSTMs and GRUs, which showed that the model did well enough.",ICLR2017, +HLiwLg0sDl,1610040000000.0,1610470000000.0,1,jOQbDGngsg8,jOQbDGngsg8,Final Decision,Reject,"This paper studies synthetic data generation for graphs under the constraint of edge differential privacy. There were a number of concerns/topics of discussions, which we consider separately: +1. Theoretical contributions. There are not that many theoretical contributions in this paper. I think this is OK, if the other components are compelling enough. On the theory, the authors mention that accounting for the constants is important in the analysis of DPSGD. On the contrary, I would say that these constants are not very important: if one requires specific constants, numerical procedures can determine values, otherwise for the sake of theory, no one generally needs these constant factors. + +2. Empirical/experimental contributions. This was the primary axis for evaluation for this paper. None of the authors were especially compelled by the results. The methods are essentially combinations of known tools from the literature, and it is not clear why these are the right ones to solve this problem in particular. If the results were very exciting, that might be sufficient to warrant acceptance, but it is still not clear how significant the cost of privacy is in this setting. The experiments are not thorough enough to give serious insight here. It is a significant oversight to not provide results on DPGGAN without the privacy constraint, as this is the best performing model with privacy. The omission of something as important as this (and lack of inclusion in the response, with only a promise to include later) is indication that the experiments are not sufficiently mature to warrant publication at this time. The decision of rejection is primarily based on concerns related to the empirical and experimental contributions. + +3. Privacy versus link reconstruction. Reviewer 4 had concerns about the notion of privacy, claiming that it does not correspond to the probability of a link being irrecoverable. This is differential privacy ""working as intended"", which is not intended to make each link be irrecoverable: it is simply to make sure the answer would be similar whether or not the edge were actually present, so it may be possible to predict the presence of an edge even if we are differentially private with respect to it (e.g., the presence of many other short paths between two nodes are likely to imply presence of an edge). Some discussion of this apparent contradiction might be warranted, as this might mislead reader who are specifically trying to prevent edge recovery. It might also be worthwhile to have discussion of node DP in the final paper. The authors comment ""we focus on edge privacy because it is essential for the protection of object interactions unique for network data compared with other types of data"" -- the stronger notion of node differential privacy might also be applicable here. It would indeed be interesting to know whether it can preserve the relevant statistics (some of which seem more ""global"" and thus preservable via node DP). + +",ICLR2021, +yJ_4TI698N,1576800000000.0,1576800000000.0,1,rkevSgrtPr,rkevSgrtPr,Paper Decision,Accept (Poster),"This is a nice paper on the classical problem of universal approximation, but giving a direct proof with good approximation rates, and providing many refinements and ties to the literature. + +If possible, I urge the authors to revise the paper further for camera ready; there are various technical oversights (e.g., 1/lambda should appear in the approximation rates in theorem 3.1), and the proof of theorem 3.1 is an uninterrupted 2.5 page block (splitting it into lemmas would make it cleaner, and also those lemmas could be useful to other authors).",ICLR2020, +0GAksCzRjs,1576800000000.0,1576800000000.0,1,B1gqipNYwH,B1gqipNYwH,Paper Decision,Accept (Poster),"This paper tackles the problem of autonomous skill discovery by recursively chaining skills backwards from the goal in a deep learning setting, taking the initial conditions of one skill to be the goal of the previous one. The approach is evaluated on several domains and compared against other state of the art algorithms. + +This is clearly a novel and interesting paper. Two minor outstanding issues are that the domains are all related to navigation, and it would be interesting to see the approach on other domains, and that the method involves a fair bit of engineering in piecing different methods together. Regardless, this paper should be accepted.",ICLR2020, +S1gaH4sMlN,1544890000000.0,1545350000000.0,1,SkgZNnR5tX,SkgZNnR5tX,"Very interesting approach, unsure about the usefulness in the current state of experiments",Reject,"The paper presents adversarial ""attacks"" to maze generation for RL agents trained to perform 2D navigation tasks in 3D environments (DM Lab). + +The paper is well written, and the rebuttal(s) and additional experiments (section 4) make the paper better. The approach itself is very interesting. However, there are a few limitations, and thus I am very borderline on this submission: + - the analysis of why and how the navigation trained models fail, is rather succinct. Analyzing what happens on the model side (not just the features of the adversarial mazes vs. training mazes) would make the paper stronger. + - (more importantly) Section 4: ""adapting the training distribution"" by incorporating adversarial mazes into training feels incomplete. That is a pithy as giving an adversarial attack for RL trained navigation agents would be much more complete of a contribution if at least the most obvious way to defend the attack was studied in depth. The authors themselves are honest about it and write ""Therefore, it is possible that many more training iterations are necessary for agents to learn to perform well in each adversarial setting."" (under 4.4 / Expensive Training). + +I would invite the authors to submit this version to the workshop track, and/or to finish the work started in Section 4 and make it a strong paper.",ICLR2019,3: The area chair is somewhat confident +rJlJZP7ElN,1544990000000.0,1545350000000.0,1,SyxXhsAcFQ,SyxXhsAcFQ,"Interesting ideas, but currently not sufficiently well presented",Reject,"This paper studies group equivariant neural network representations by building on the work by [Cohen and Welling, '14], which introduced learning of group irreducible representations, and [Kondor'18], who introduced tensor product non-linearities operating directly in the group Fourier domain. + +Reviewers highlighted the significance of the approach, but were also unanimously concerned by the lack of clarity of the current manuscript, making its widespread impact within ICLR difficult, and the lack of a large-scale experiment that corroborates the usefulness of the approach. They were also very positive about the improvements of the paper during the author response phase. The AC completely agrees with this assessment of the paper. Therefore, the paper cannot be accepted at this time, but the AC strongly encourages the authors to resubmit their work in the next conference cycle by addressing the above remarks (improve clarity of presentation and include a large-scale experiment). ",ICLR2019,5: The area chair is absolutely certain +RBbQLaxbOQg,1642700000000.0,1642700000000.0,1,YX0lrvdPQc,YX0lrvdPQc,Paper Decision,Accept (Poster),This paper focuses on understanding how the angle between two inputs change as they are propagated in a randomly-initialized convolutional neural network layers. They demonstrate very different behavior in different settings and provide rigorous measure concentration results. The reviewers thought the paper is well written and easy to read with nice theoretical results. They did raise a variety of technical concerns that were mostly addressed by the authors rebuttal. My own reading of the paper is that this is a nice contribution. I therefore agree with the reviewers and recommend acceptance.,ICLR2022, +DsqaINQe3ls,1610040000000.0,1610470000000.0,1,uFk038O5wZ,uFk038O5wZ,Final Decision,Reject,"The authors address the important task of improving dialogue summarization using conversation structure and factual knowledge. + +Pros: +1) Clearly written and well motivated (as acknowledge by all reviewers) +2) Technically sound (the proposed architecture is clearly in line with the problem that the authors are trying to solve) +3) Significant upgrades to the paper after the reviewer comments (in particular the authors have added detailed ablation studies and results on non-dialogue datasets) + +Cons: +1) There is a significant difference between the results in the ablation studies in the original version and in the new version. Originally, the differences between KGEDCg and KGEDCg-GE and KGEDCg-FKG were very minor, but now the margins are as large as 7+ pts. I would request the authors to explain this in the final version + +The reviewing team felt that while many Qs were sufficiently addressed by the authors, the large difference in the numbers reported for the ablation study in the initial and final version of the paper raises some new Qs which need to be addressed before the paper can be accepted. ",ICLR2021, +G0OuHbuhmBV,1610040000000.0,1610470000000.0,1,6NFBvWlRXaG,6NFBvWlRXaG,Final Decision,Accept (Poster),"The paper presents a theoretical analysis of the expressive power of equivariant models for point clouds with respect to translations, rotations and permutations. The authors provide sufficient conditions for universality, and prove that recently introduced architectures, e.g. Tensor Field Networks(TFN), do fulfil this property. + +The submission received positive reviews ; after rebuttal, all reviewers recommend acceptance and highlight the valuable paper modifications made by the authors to clarify the intuitions behind the proofs. + +The AC also considers that this paper is a solid contribution for ICLR, which will draw interest for both theoreticians and practitioners in the community. \ +Therefore, the AC recommends acceptance. ",ICLR2021, +TAvtT35E3Is,1642700000000.0,1642700000000.0,1,KUmMSZ_r28W,KUmMSZ_r28W,Paper Decision,Reject,"The manuscript extends the popular ""RL as inference"" framework with a generalized divergence minimization perspective. The authors observe that most policy optimization can be thought of as minimizing a reverse KL divergence, which has potentially undesirable mode-seeking properties. The authors propose a particle-based scheme wherein samples generated via Langevin dynamics are used for learning. + +Several reviewers found the ideas presented interesting, and cited potential novelty and high potential for tackling an important problem. Unfortunately, all reviewers found major shortcomings, from presentation (""messy"" presentation, lack of definition of notation and inconsistent use, issues around motivation and logical flow, vague and imprecise use of language, etc.). Several reviewers also had more fundamental criticisms, notably Uu6f who helpfully provided quite actionable feedback on the presentation. Unfortunately, discussion ended with the reviews: the authors offered no rebuttal or updates. The AC considers this a missed opportunity. + +The AC concurs with, first and foremost, the concerns around presentation. The current state of the manuscript makes it difficult to parse apart the contribution being made, and in light of all 4 reviewers recommending rejection either strongly or weakly and with no rebuttals or responses put forth, I have no basis to recommend anything other than rejection.",ICLR2022, +iUbFkHvZaJ,1610040000000.0,1610470000000.0,1,agyFqcmgl6y,agyFqcmgl6y,Final Decision,Reject,"This paper presents a method to formulate learning of causally disentangled representation as a part of the encode-decoder framework. Although the reviewers agree that the paper presents some interesting ideas, they feel the paper is not ready for publication yet. In particular, I encourage the authors to take the feedback of reviewer R2 into account, which is quite detailed and provides substantive ways of improving the work. After all, I recommend rejection. + +",ICLR2021, +VWa26osD3ul,1642700000000.0,1642700000000.0,1,ZAA0Ol4z2i4,ZAA0Ol4z2i4,Paper Decision,Reject,"There was some disagreement between reviewers regarding the quality of the paper. Reading the paper, I had difficulty understanding what you were trying to achieve and, similarly to reviewer VgPP, felt the experimental section to be weak. While I can appreciate that compute is expensive, it would have been relevant to design more controllable continuous environments to get cleaner results in addition to those on MuJoCo. As it is, there is a lot of noise (and Table 1 does not contain confidence intervals) which, added to the general brittleness of RL algorithms, makes the experiments lack convincing power. + +I encourage the authors to take all the feedback from the authors into account and resubmit an improved version of their work to another conference.",ICLR2022, +FXiuXBGXA2,1576800000000.0,1576800000000.0,1,BJlEEaEFDS,BJlEEaEFDS,Paper Decision,Reject," This paper presents an empirical analysis of the reasons behind BatchNorm vulnerability to adversarial inputs, based on the hypothesis that such vulnerability may be caused by using different statistics during the inference stage as compared to the training stage. While the paper is interesting and clearly written, reviewers point out insufficient empirical evaluation in order to make the claim more convincing.",ICLR2020, +LZg47iZFXd,1642700000000.0,1642700000000.0,1,s6roE3ZocH1,s6roE3ZocH1,Paper Decision,Reject,"The paper describes a genetic algorithm for molecular optimization under constraints. The aim is to generate molecules with better properties while close to an initial lead molecule. The proposed approach is a two-stage one. The first stage aims to satisfy constraints and searches for feasible molecules that are similar to the lead. The second stage optimizes the molecular property. The method is evaluated on logP optimization task, with minor improvement over previous work. + +The reviewers point out the following strengths and weaknesses: + +Strengths: + +- Molecular optimization under structural constraints is an important research direction. +- Comprehensive related work section. + +Weaknesses: + +- Lack of novelty because it is a standard application of genetic algorithm. +- The results show that the proposed method did not outperform existing baselines. +- The main claim of the paper (benefit of two-stage procedure) is not supported by ablation study. +- The authors only conduct experiments on improving LogP, which is a benchmark that is too easy and not challenging. +- The objective function and cross-over operation are the same or very similar to previous work. +- The experimental evaluation is limited, and the overall setting is not very relevant to real-world tasks. + +Overall, all reviewers vote for rejection. It is clear that the paper needs more work before it can be published.",ICLR2022, +8rj6BzV_qUw,1642700000000.0,1642700000000.0,1,8qQ48aMXR_g,8qQ48aMXR_g,Paper Decision,Reject,"This paper analyses generalization ability of graph neural network from the aspect of the distance between the test data point and the training data point, where the labels of a part of the vertexes are observed as the training data and a test data point is selected from the remaining vertexes. The theoretical result indicates that if the training data ""cover"" the whole vertexes of the graph, then the test accuracy will be better. This theoretical finding is supported by some numerical experiments. + +The problem this paper considers is interesting and would be worth investigation. However, the theoretical results presented in the paper are based on quite strong assumptions, and the statement of the theorem is not well exposed. +- First, the paper assumes that a distortion map is obtained by training and the training procedure can produce zero training errors. Although these assumptions are far from obvious in practice, the paper lacks justification of these assumption. Hence, these assumptions seem to be made only for the sake of proof. +- Second, the constants appeared in the theorems are not correctly specified. How different constants are correlated is not properly exposed. + +As for the experiments, they are not so strong: only Cora is experimented and training data size is small. +For these reasons, this paper is not sufficiently matured to appear in ICLR.",ICLR2022, +2jEMFvLVSXG,1610040000000.0,1610470000000.0,1,8SP2-AiWttb,8SP2-AiWttb,Final Decision,Reject,"The paper identifies a subtle gradient problem in adversarial robustness-- imbalanced gradients, which can cause create a false sense of adversarial robustness. The paper provides insights into this problem and proposes a margin decomposition based solution for the PGD attack. + +Pros: +- Novel insights into why some adversarial defenses may make some versions of PGD overestimate robustness. +- Proposes a method that is motivated by such findings of imbalanced gradients. +- The proposed attacks are shown effective across a wide range of defenses. + +Cons: +- The proposed gradient imbalance ratio could be better motivated: i.e. how is it connected to the scheme of margin decomposition? +- Limited novelty in the attacks: i.e. variant of the existing PGD and MT attacks with some proposed changes. +- Various concerns with experiments (i.e. stepsize tuning, choice of hyperparameters). + +Overall, the reviewers felt that there were some interesting ideas and directions presented in the paper; however, the reviewers also felt that the contribution was of marginal significance and more confidence in the various components (i.e. how the proposed metrics measure the imbalanced gradient effect and various concerns in the experiments) would have made the paper more convincing.",ICLR2021, +B1xE5TLZlN,1544810000000.0,1545350000000.0,1,BygRNn0qYX,BygRNn0qYX,"Evaluations, complexity, comparisons to the most recent methods.",Reject,"AR1 is concerned with the presentation of the paper and the complexity as well as missing discussion on recent embedding methods. AR2 is concerned about comparison to recent methods and the small size of datasets. AR3 is also concerned about limited comparisons and evaluations. Lastly, AR4 again points out the poor complexity due to the spectral decomposition. While authors argue that the sparsity can be exploited to speed up computations, AR4 still asks for results of the exact model with/without any approximation, effect of clipping spectrum, time complexity versus GCN, and more empirical results covering all these aspects. On balance, all reviewers seem to voice similar concerns which need to be resolved. However, this requires more than just a minor revision of the manuscript. Thus, at this time, the proposed paper cannot be accepted. + +",ICLR2019,5: The area chair is absolutely certain +ygivoWjfk9V,1642700000000.0,1642700000000.0,1,hniLRD_XCA,hniLRD_XCA,Paper Decision,Accept (Poster),"The paper was seen positively by all reviewers. The strength of the paper are: +- Intuitive and interesting combination of Koopman Operators and Optimal Control for Reinforcement Learning +- Convincing experiments on challenging benchmark tasks +- All of the issues of the reviewers (advantages to SAC, gaps in the theory and missing references) have been properly addressed in the rebuttal. + +I therefore recommend acceptance of the paper.",ICLR2022, +T8S-T8sjKz2,1642700000000.0,1642700000000.0,1,moHCzz6D5H3,moHCzz6D5H3,Paper Decision,Accept (Poster),"I recommend acceptance. This paper presents an interesting ""in-between"" of work on lottery tickets and work on supermasks, and I think it is sufficiently novel to merit acceptance even if the significance of the results will need to be left to the judgment of future researchers. The reviewers seem broadly in favor of acceptance, and I defer to their judgment as a proxy for that signal. + +For a quick bit of context, work on ""supermasks"" (Zhou et al., 2019) has shown that randomly initialized networks contain subnetworks that can reach high accuracy without training the weights themselves. That is to say, within randomly initialized networks are high-accuracy subnetworks. This work is interesting in its own right and has had a number of interesting implications for the theoretical community. This work derives from work on the lottery ticket hypothesis (LTH; Frankle & Carbin 2019), which shows that randomly initialized networks contain subnetworks that can train to full accuracy on their own. The key distinctions between these two kinds of work are (1) the LTH trains the subnetworks, while supermask work does not and (2) the LTH work requires that the subnetworks train to full accuracy, while work on supermasks obtain high (but not full) accuracy in many cases. No one approach is ""better"" than the other; they simply showcase different properties of neural networks. + +As far as I understand, this paper creates space for an ""in-between:"" high-accuracy subnetworks are created by finding subnetworks at random initialization and flipping the signs of some of the weights to improve accuracy. This is a limited modification to the subnetworks that falls short of actually training them (LTH work) but is more than leaving them at their random initializations (supermask work). Doing so appears to produce subnetworks that perform better than in supermask work but with a lighter-weight procedure than LTH work. The procedure for accomplishing this feat is different than either approach (using SynFlow to find the subnetwork and a binary neural network training scheme to find the signs), and there is probably significant room for improvement in this new algorithmic space (just as there was for both LTH and supermasks). + +This is novel and interesting, and I defer to the reviewers who find it worthy of acceptance. I have reservations about the eventual significance of the work, but that determination will be made by future researchers.",ICLR2022, +_eOwfOa_-6N,1610040000000.0,1610470000000.0,1,78SlGFxtlM,78SlGFxtlM,Final Decision,Reject,"This paper was evaluated by four reviewers. After rebuttal, several concerns remained, e.g. Rev. 1 is interested in more thorough comparisons even if the model is claimed to be backbone-agnostic. Rev. 2 is concerned about re-print of some theories and authors' response that 'contribution is not in theoretical innovation'. Rev. 3 is overall not impressed with the clarity of the paper. Finally, Rev. 4 also remains unconvinced after rebuttal due to several somewhat loose explanations provided by authors. + +At this point, AC agrees with reviewers that the paper requires more clear-cut theoretical contributions, ablations and improvements in writing clarity. While some reviewers might have been more inspired by the aspect of noisy labels, even ignoring this aspect, the overall consensus among all reviewers stands.",ICLR2021, +_oorAMImKBF,1642700000000.0,1642700000000.0,1,7r6kDq0mK_,7r6kDq0mK_,Paper Decision,Accept (Poster),"This paper proposes a self-supervised auto-encoder latent image animator that animates images via latent space navigation. The task of transferring motion from driving videos to source images is formulated as learning linear transformations in the latent space. Experiments conducted on real-world videos demonstrate that the proposed framework can successfully animate still images. The proposed framework is novel, the experimental results are supportive and promising. However, some related works are still missing and might need to be added to the current paper for discussion and comparison. + +The rebuttal has addressed all major concerns raised by all 5 reviewers. The revised paper also included some feedback from the reviewers, except those discussions and comparisons with some missing related works pointed out by reviewers. After the rebuttal, all reviewers tend to accept the paper. AC agrees with the reviewers and recommends accepting the paper as a poster. Lastly, AC urges the authors to further improve their paper by incorporating the discussion on other missing related works suggested by the reviewers.",ICLR2022, +CyIYYSAmNDV,1642700000000.0,1642700000000.0,1,Iog0djAdbHj,Iog0djAdbHj,Paper Decision,Accept (Poster),"The authors made substantial improvements to the originally submitted manuscript; however, reviewers initially remained reluctant to support the paper for acceptance based on the degree to which they were confident in the underlying arguments / position taken by the authors and the evidence provided to support their position and arguments. There are also concerns about the significance of the gains in performance afforded by the proposed approach. + +During the author response period two reviewers became satisfied with the additions and modifications leading to an increase in the final score. It will be critical for the authors to try to add ImageNet results if possible in addition to other promises made to reviewers. + +The AC recommends accepting this paper.",ICLR2022, +PeWTwo1bVoj3,1642700000000.0,1642700000000.0,1,9L1BsI4wP1H,9L1BsI4wP1H,Paper Decision,Accept (Poster),"This paper studies the problem of producing distribution-free prediction sets using conformal prediction that are robust to test-time adversarial perturbations of the input data. The authors point out that these perturbations could be label and covariate dependent, and hence different from covariate-shift handled in Tibshirani et al 19, the label-shift handled in Podkopaev and Ramdas 21, and the f-divergence shifts of Cauchois et al 2021. + +The authors propose a relatively simple idea that has appeared in other literatures like optimization but appears to be new to the conformal literature: (i) use a smoothed (using Gaussian noise on X, and inverse Gaussian CDF) nonconformity score function, in order to control its Lipschitz constant, (ii) utilize a larger score cutoff than the standard 1-alpha quantile of calibration scores employed in conformal prediction. The observation that point (i) alone lends some robustness to adversarial perturbations of the data is interesting. As several experiments in the paper and responses to reviewers show, this comes at the (apparently necessary) price of larger prediction sets. + +I read through all the comments and also the supplement. The authors have responded very well to all the reviewers questions/concerns, adding significant sections to their supplement as a result. Three reviewers are convinced, but one remaining reviewer requested additional experiments to compare with Cauchois et al (in addition to all the others already produced by the authors originally and in response to reviewers). However, the authors point out that the code in the aforementioned paper was not public, but they were able to privately get the code from the authors during the rebuttal period. At this point, I recommend acceptance of the paper even without those additional experiments, since it is not the authors' fault that the original code was not public. Nevertheless, I suggest to the authors that, if possible, they could add some comparisons to the camera-ready since they now have the code. + +I congratulate the authors on a nice work, a very solid rebuttal, and also the astute reviewers on pointing out various aspects that could be improved. + +Minor point for the authors (for the camera-ready): I would like to comment on the Rebuttal point 4.4 in the supplement, which then got further discussed in the thread. The reviewer points out four references [R1-R4]. I will add one more to the list [R5] https://arxiv.org/pdf/1905.10634.pdf (Kivaranovic et al, appeared in 2019, published in 2020). I think the literature reviews in this area are starting to be messy, and all authors need to do a better job. Clearly, the original paper of Vovk et al already establishes various types of conditional validity (and calls it PAC-style guarantee), produces guarantees that others in this area produce, and it appears that much recreation of the wheel is occurring. For eg, [R2, R4] do not cite [R5], despite [R5] appearing earlier and being published earlier, and having PAC-style guarantees and experiments with neural nets, etc. However, in turn, [R5] do not cite Vovk [R1], but [R2, R4] do cite [R1]. (And [R3] does not seem to be relevant to this discussion of conditional validity?) In any case, I am not sure any of these papers need citing since the current paper does not deal with conditional validity. If at all, just one sentence like ""Conditional validity guarantees, of the styles suggested by Vovk [2012], would be an interesting avenue for future work"".",ICLR2022, +4XOD_qt3Vsc,1610040000000.0,1610470000000.0,1,ucEXZQncukK,ucEXZQncukK,Final Decision,Reject,"This paper proposes an online meta-learning algorithm. 3 out of 4 reviews were borderline. The main concern during the discussion was that it is unclear what kind of online learning this paper does. For instance, in theory, the online learner competes with the best solution in hindsight. This is a regret-minimizing point of view. The other online learning is streaming. In this case, there is no regret. The goal is a sublinear representation that is competitive with some baseline that uses all space. + +After the discussion, I read the paper to understand the points raised by the reviewers. I agree that this paper is not ready to be accepted. My quick review is below: + +The authors combine MAML and BOL to have online updates (not all tasks are required beforehand) and handle distribution shift. But the way of combining these is not well justified. In particular, + +1) The distribution-shift story is not convincing. The reason is that the proposed algorithm is posterior-based. By definition, when you use posteriors as in (3)-(5), you assume that the datasets are sampled i.i.d. given \theta. This means independently and identically. So no distribution shift. I am familiar with Kalman filtering. For that, you need p(\theta_t | \theta_{t - 1}) in (3)-(5), which would be sufficient for tracking stochastic distribution shifts. + +2) I find the use of BOL unnatural. Since MAML is gradient-based, it would be more natural to have a gradient-based online learner. Gradient descent has online guarantees and does not require i.i.d. assumptions. + +3) The authors should clearly state what the objective of their online algorithm is. In particular, the informal justification of (3)-(5) as doing something similar to MAML (the paragraph around (6)) is highly confusing. I could not understand what the authors mean.",ICLR2021, +CnnZnnN3yPT,1610040000000.0,1610470000000.0,1,LDSeViRs4-Q,LDSeViRs4-Q,Final Decision,Reject,"The paper proposes a margin-based adversarial training procedure. The paper is lacking in terms of proper dicussion of related literature e.g. similarity and differences to MMA, the ""theoretical"" discussion on page 5 is incomplete as there is no way how one can estimate the perturbed samples to do the analysis (the authors seem to implicitly already assume that the adversarial samples lie on the decision boundary) and the underlying assumptions are not clearly stated, the reported robust accuracies +(see https://github.com/fra31/auto-attack for a leaderboard of adversarial defenses) on MNIST and CIFAR10 are worse than that of MMA which are in turn worse than SOTA. Thus this paper is below the bar for ICLR.",ICLR2021, +AvSsmGIvAR,1576800000000.0,1576800000000.0,1,rJejta4KDS,rJejta4KDS,Paper Decision,Reject,"This paper proposes an attack method to improve the transferability of adversarial examples under black-box attack settings. + +Despite the simplicity of the proposed idea, reviewers and AC commonly think that the paper is far from being ready to publish in various aspects: (a) the presentation/writing quality, (b) in-depth analysis and (c) experimental results. + +Hence, I recommend rejection.",ICLR2020, +WJpj1hcLN-S,1610040000000.0,1610470000000.0,1,USCNapootw,USCNapootw,Final Decision,Accept (Poster),"This paper proposes a selection mechanism to choose between a certified model with low clean accuracy and a naturally trained model with high accuracy, to improve the standard clean accuracy for certifiably robust models. At a high-level, the idea behind this combined system is that when the certified model cannot certify, one should avoid using it for classification, but rather should use a naturally trained model. A state-of-the-art naturally trained networks is used as the ""core network"", and a small certification network with high certifiable robustness is used as the ""certification network"". The major contribution is a selection network that adaptively chooses between these two networks. + +Pro ++ The idea of using two networks adaptively is novel. The proposed selection mechanism has been shown to be able to combine the merits of both networks to obtain better natural accuracy with good certified robustness. + + +Con +- The experiment section still has room for improvement. Specifically, the presentation of the results were not convincingly conveying the tradeoff between the clean accuracy and the certified accuracy. After the rebuttal, the authors made some improvements that addressed many of the concerns about the clarity and reproducibility issues. However, reviewers suggest further polishing the experiment section. + +Overall, I think the novelty of the paper combined with the promising results achieved outweigh the presentation issues. I would recommend accepting this paper. ",ICLR2021, +eAbLaBmB3Uc,1610040000000.0,1610470000000.0,1,aCgLmfhIy_f,aCgLmfhIy_f,Final Decision,Accept (Poster),"This paper proposes a method for regularizing the pre-training of an embedding function for relation extraction from text that encourages well-formed clusters among the relation types. Experiments on FewRel, SemEval 2010 Task 8, and a proposed FuzzyRed dataset show that the proposed prototype method generally outperforms prior state-of-the-art, including MTB (Soares et al., 2019), which was the strongest. The key, novel idea is to model prototype representations for target relations as part of the learning process. A contribution of the work is to show that learning prototype representations are useful in supervised deep learning architectures even beyond few-shot learning. This additional learning objective is useful as an inductive bias, and is perhaps of interest even beyond relation extraction research. + +Reviewers generally found the proposed method sound and intuitive, and the original set of experiments promising. Some of the reviewers raised concerns about the setup of the experiments, including the relationship between the pre-training and target tasks, and the need for several additional baselines. The authors were able to address these concerns, and the reviewers did not raise any follow-up concerns.",ICLR2021, +a128u8vaC,1576800000000.0,1576800000000.0,1,Byxv2pEKPH,Byxv2pEKPH,Paper Decision,Reject,"This paper proposes a new normalization scheme that attempts to prevent all units in a ReLU layer from being dead. The experimental results show that this normalization can effectively be used to train deep networks, though not as well as batch normalization. A significant issue is that the paper does not sufficiently establish that their explanation for the success of Farkas layer is valid. For example, do networks usually have layers with only inactive units in practice?",ICLR2020, +1Ik9rl447jR,1642700000000.0,1642700000000.0,1,BNIt2myzSzS,BNIt2myzSzS,Paper Decision,Reject,"The paper tackles the problem of missing data in centralized training multi-agent RL approaches. The authors propose 1) using generative adversarial imputation networks for imputing missing data and 2) discarding training data where data from multiple consecutive timesteps is missing. + +Reviewers agreed that the problem of missing data in multi-agent RL is interesting. At the same time, several reviewers shared two main concerns about the experimental evaluation: +* The lack of comparisons to baselines other than MADDPG, especially decentralized critic approaches. +* The lack of experiments on non-toy domains such as SMAC. + +The author response did not sufficiently address these concerns leaving the reviewers in agreement that the paper should not be accepted without these additional experiments.",ICLR2022, +ryxPJUOZxN,1544810000000.0,1545350000000.0,1,BJl_VnR9Km,BJl_VnR9Km,Significant revisions in review,Reject,"There was major disagreement between reviewers on this paper. Two reviewers recommend acceptance, and one firm rejection. The initial version of the manuscript was of poor quality in terms of exposition, as noted by all reviewers. However, the authors responded carefully and thoroughly to reviewer comments, and major clarity and technical issues were resolved by all authors. + +I ask PCs to note that the paper, as originally submitted, was not fit for acceptance, and reviewers noted major changes during the review process. I do believe this behavior should be discouraged, since it effectively requires reviewers to examine the paper twice. Regardless, the final overall score of the paper does not meet the bar for acceptance into ICLR.",ICLR2019,4: The area chair is confident but not absolutely certain +GD00YpsY8Gr,1642700000000.0,1642700000000.0,1,R8sQPpGCv0,R8sQPpGCv0,Paper Decision,Accept (Poster),"This submission proposes a simple, efficient, and effective position representation method for the Transformer architecture called ALiBi. ALiBi enables better extrapolation and performance (in terms of efficiency and task performance). The submission also includes careful analysis and extensive experiments, and notably suggests that the gains of ALiBi may be less pronounced in more scaled-up settings. All reviewers agreed the paper should be accepted. I think it's reasonably likely that ALiBi will become a common choice in future Transformer models, or at the very least that this work will prompt further work on developing improved position representations for Transformer models. I therefore recommend acceptance.",ICLR2022, +bhpo2DZPM8x,1610040000000.0,1610470000000.0,1,xpFFI_NtgpW,xpFFI_NtgpW,Final Decision,Accept (Poster),"This paper makes a thorough investigation on the idea of decoupling the input and output word embeddings for pre-trained language models. The research shows that the decoupling can improve the performance of pre-trained LMs by reallocating the input word embedding parameters to the Transformer layers, while further improvements can be obtained by increasing the output embedding size. Experiments were conducted on the XTREME benchmark over a strong mBERT. R1&R2&R3 gave rather positive comments while R4 raised concerns on the model size. The authors gave detailed response on these concerns but R4 still thought the paper is overclaimed because the experiments were only conducted in a multilingual scenario. ",ICLR2021, +SygY7w18xN,1545100000000.0,1545350000000.0,1,BJgklhAcK7,BJgklhAcK7,Good contribution on meta-learning,Accept (Poster),"This work builds on MAML by (1) switching from a single underlying set of parameters to a distribution in a latent lower-dimensional space, and (2) conditioning the initial parameter of each subproblem on the input data. +All reviewers agree that the solid experimental results are impressive, with careful ablation studies to show how conditional parameter generation and optimization in the lower-dimensional space both contribute to the performance. While there were some initial concerns on clarity and experimental details, we feel the revised version has addressed those in a satisfying way.",ICLR2019,5: The area chair is absolutely certain +KuXYTx39CndE,1642700000000.0,1642700000000.0,1,KDAEc2nai83,KDAEc2nai83,Paper Decision,Reject,"This paper introduces a variant of DQN optimized for desktop environments to make large scale experiments more feasible for anyone. + +This paper was close. The reviewers appreciated the effort and motivation, but in the end the reviewers all seemed to think that the paper was not ready. The main contribution is framed as making DQN training more feasible, but the reviewers expected the paper to show examples of what the workflow for another architecture would look like and ideally present results for domains beyond Atari. In addition, several reviewers thought the paper could be more precise about (1) ruling out speed differences due to hardware and low-level software, and (2) contextualizing the speedups reported---does 3x matter, what should we expect, etc. + +This is certainly an interesting direction. The AC personally thinks that if the authors take some steps to address the points above this will be a great and potentially high impact paper.",ICLR2022, +LF2Cb7DMZq,1576800000000.0,1576800000000.0,1,B1eyO1BFPr,B1eyO1BFPr,Paper Decision,Accept (Poster),"The authors propose a simple modification of local SGD for parallel training, starting with standard SGD and then switching to local SGD. The resulting method provides good results and makes a practical contribution. Please carefully account for reviewer comments in future revisions.",ICLR2020, +4ISTrC063UW,1610040000000.0,1610470000000.0,1,MkrAyYVmt7b,MkrAyYVmt7b,Final Decision,Reject,"Firstly, thank you authors for your thought-provoking submission and discussion. The key point of disagreement clearly is the fundamental assumption that ""the result of an anomaly detection method should be invariant to any continuous invertible reparametrization f."" All reviewers found this assumption to be too strong, leading all four to recommend rejection. + +I also recommend rejection at this time. To me, it seems reasonable and practical to assume that anomalies are defined based on distance (in a fixed feature space). So if we are allowed to deform the space, clearly this definition breaks down and the concept of an anomaly becomes empty. Perhaps I am wrong about this, but nevertheless, the paper could do a much better job of convincing the reader that its fundamental reparametrization assumption is appropriate and of consequence in practice.",ICLR2021, +NRhMgzZatzW,1610040000000.0,1610470000000.0,1,bK-rJMKrOsm,bK-rJMKrOsm,Final Decision,Reject,"This paper proposes an interesting collaborative multi-head attention (MHA) method to enable heads to share projections, which can reduce parameters and FLOPs of transformer-based models without hurting performance on En-De translation tasks. For pre-trained language models, a tensor decomposition method is used to easily covert the original MHA to its collaborative version without retraining. + +This paper receives 3 weak reject and 1 weak accept recommendations. On one hand, all the reviewers agree that the paper is well motivated and the proposed idea is interesting. On the other hand, all the reviewers also commented that the current empirical results and comparisons are weak, which are not enough to support the paper's main claim. From the current results, it is difficult to draw a conclusion that collaborative MHA is better. + +Specifically, (i) From Table 2, it can be seen that the proposed method is not effective for pre-trained models, i.e., even if the model size is not reduced much, the performance can be dropped significantly. (ii) More experiments, such as QA, more translation/generated tasks will make this paper more convincing. (iii) More rigorous experiments are needed to justify the practical value of the proposed method. If the authors try to emphasize that they go beyond practical realm, then probably a careful re-positioning of the paper is needed, which may not be a trivial task. + +The rebuttal unfortunately did not fully address the reviewers' main concerns. Therefore, the AC regrets that the paper cannot be recommended for acceptance at this time. The authors are encouraged to consider the reviewers' comments when revising the paper for submission elsewhere.",ICLR2021, +wa7sQ4jpbk,1610040000000.0,1610470000000.0,1,whNntrHtB8D,whNntrHtB8D,Final Decision,Reject,"The range of the initial reviews was fairly high with overall scores ranging from 4 to 7. + +The authors provided a good response that answered most of the reviewers' comments and questions. One of the reviewers even increased their score following the authors' response. + +The focus of some of our discussions and what ultimately led to my suggestion was the related work of MIR [1]. The methodological differences between (the three versions of) MIR in [1] and GMED [this paper] appear less significant than what the current submission suggests. While there is some disagreement between the authors and Reviewer1 about the exact differences, I find that the current manuscript does not acknowledge the close relationship between these two contributions. Further, from the experimental standpoint and without further justifications the gains from GMED+MIR could be attributed to using more replay (from combining GMED and MIR). + + +In their response, the authors disputed the view of Reviewer1. I believe the source of the confusion between the author and the reviewer might be captured in this sentence from the author response to Reviewer1: The approach does not learn a generator that can “generate examples that are more forgettable for the classifier”; instead, feeding more forgettable examples in GEN-MIR aims at reducing the forgetting of the generator. + +Looking at Equations 2, 3 and Algorithm 1 from [1], in GEN-MIR while two different procedures are used to obtain forgettable examples for the generator (B_G in Alg. 1) and the classifier (B_C in Alg. 1), the generator is used in both cases. In other words, the generator is used to generate examples for both itself and for the classifier. So, I think it's fair to say that the generator does indeed generate examples that are more forgettable for the classifier (Eq. 2). + + +I strongly encourage the authors to prepare another version of their work where the differences between MIR [1] and their contribution are clearly highlighted and the results show the advantages of GMED (including memory-editing in data space). + + +[1] Rahaf Aljundi, Lucas Caccia, Eugene Belilovsky, Massimo Caccia, Min Lin, Laurent Charlin, and Tinne Tuytelaars. Online continual learning with maximally interfered retrieval. In NeurIPS 2019. https://arxiv.org/abs/1908.04742",ICLR2021, +C0X-6ED9d3,1642700000000.0,1642700000000.0,1,ODnCiZujily,ODnCiZujily,Paper Decision,Reject,"The authors propose a novel operator splitting method for solving convex relaxations of neural network verification problems, and develop and validate an optimized implementation of the same on large scale networks, focusing on the problem of verifying robustness to norm bounded adversarial perturbations. + +The reviewers agree that the paper contains interesting ideas that are worthy of further development and that these ideas may prove useful eventually in pushing the envelope of what is possible in neural network verification. However, in its current form, the paper misses some key experimental evidence to rigorously evaluate the value of the contributions made: +1) Comparison against SOTA incomplete verifiers: The authors do not provide detailed and rigorous comparisons against well-known baselines (for example the incomplete verifiers from Fast-and-complete (Xu et al., 2021), Beta-CROWN(Wang et al. 2021)) +2) Incorporating tighter relaxations: It would be valuable for the community to understand whether the proposed algorithm is compatible with tighter relaxations like those of (Tjandraatmadja et al., 2020). Even if they are not, it would be interesting to understand the comparison against standard solvers for these tighter relaxations compared against the advanced solver developed by the authors applied to the weaker relaxation. +3) Showing performance in the context of complete verification: While this is not a requirement, it would be great to see how the method performs in the conjunction with a branch and bound search, as this sometimes reveals surprising tradeoffs or weaknesses of incomplete verifiers (as observed in the results of Beta-CROWN(Wang et al. 2021)). + +I encourage the authors to strengthen the paper adding these experiments and resubmit to a future venue.",ICLR2022, +ejhRgJcjIHr,1642700000000.0,1642700000000.0,1,w_drCosT76,w_drCosT76,Paper Decision,Accept (Poster),"This paper was a tough call. The key contribution of the paper is a genuinely useful technique for generating chemical compounds satisfying desired properties. However, there are some key issues with paper. + +Reviewer *BjiD* found out that baselines are weak. Most importantly, he run thorough experiments with GraphGA, outperforming by a significant margin the baselines. With minor tweaks (e.g. enabling generating larger molecules). GraphGA achieves comparable though slightly weaker results to DST. Importantly, as pointed out by Reviewer BjiD, there is an important flaw in the experiments that some methods have a cap on the number of atoms they can add. For example, on the logP optimization task, it is possible to optimize the score by just adding carbon atoms. I would like to thank very much Reviewer for going beyond and running these experiments. + +All reviewers emphasized the novelty as a key contribution. In internal discussion, I raised concern about novelty and framing of the work. One could argue that any autoregressive model (i.e. adding atoms and bonds at each step) forms a DST. One could also argue that training LSTM to produce the distribution of interest, like in [1], is also a DST because the fine-tuned LSTM encodes the distribution of many molecules and is differentiable with respect to the distribution it encodes. + +Despite these flaws, it is a solid contribution, which is likely to be useful for the community. Thank you for your submission, and it is my pleasure to recommend acceptance. + +For the camera-ready please: (a) include a well-tuned GraphGA (implementing different tradeoffs of diversity and fitness), (b) include LSTM as implemented in Guacamole as baseline, (c) discuss much more clearly novelty of the work. Additionally please ensure that other baselines are not hampered by limit on number of atoms they can add. + +References: + +[1] Generating Focussed Molecule Libraries for Drug Discovery with Recurrent Neural Networks, Segler et al, [https://arxiv.org/abs/1701.01329](https://arxiv.org/abs/1701.01329)",ICLR2022, +0CafPDEql,1576800000000.0,1576800000000.0,1,rJgLlAVYPr,rJgLlAVYPr,Paper Decision,Reject,"This paper presents White Box Network (WBN), which allows for composing function blocks from a given set of functions to construct a target function. The main idea is to introduce a selection layer that only selects one element of the previous layer as an input to a function block. The reviewers were unanimous in their opinion that the paper is not suitable for publication at ICLR in its current form. There were significant concerns about the clarity in writing, and reviewers have provided detailed discussion should the authors wish to improve the paper.",ICLR2020, +H1ljxjoQeN,1544960000000.0,1545350000000.0,1,r1gnQ20qYX,r1gnQ20qYX,Presentation and Evaluation concerns remain,Reject,"This paper presents an approach that combines rule lists with prototype-based neural models to learn accurate models that are also interpretable (both due to rules and the prototypes). This combination is quite novel, the reviewers and the AC are unaware of prior work that has combined them, and find it potentially impactful. The experiments on the healthcare application were appreciated, and it is clear that the proposed approach produces accurate models, with much fewer rules than existing rule learning approaches. + +The reviewers and AC note the following potential weaknesses: (1) there are substantial presentation issues, including the details of the approach, (2) unclear what the differences are from existing approaches, in particular, the benefits, and (3) The evaluation lacked in several important aspects, including user study on interpretability, and choice of benchmarks. + +The authors provided a revision to their paper that addresses some of the presentation issues in notation, and incorporates some of the evaluation considerations as appendices into the paper. However, the reviewer scores are unchanged since most of the presentation and evaluation concerns remain, requiring significant modifications to be addressed.",ICLR2019,4: The area chair is confident but not absolutely certain +NjuVN1_KUWj,1610040000000.0,1610470000000.0,1,h2EbJ4_wMVq,h2EbJ4_wMVq,Final Decision,Accept (Poster),"This work describes a system for collaborative learning in which several agents holding data want to improve their models by asking other agents to label their points. The system preserves confidentiality of queries using MPC and also throws in differentially private aggregation of labels (taken from the PATE framework). It provides expriments showing computational feasibility of the system. The techniques use active learning to improve the models. + +Overall the ingredients are fairly standard but are put together in a new (to the best of my , admittedly limited, knowledge of this area). This seems like a solid attempt to explore approaches for learning in a federated setting with strong limitations on data sharing.",ICLR2021, +7Te5Wq2-IJ3,1610040000000.0,1610470000000.0,1,3FkrodAXdk,3FkrodAXdk,Final Decision,Reject,"This work studies statistics of ensemble models that capture the prediction diversity between ensemble members. The goal of the work is to identify or construct a metric which is predictive of the holdout accuracy achieved by the ensemble prediction. + +Pros: +* Studies empirically how measures of ensemble diversity relate to ensemble prediction accuracy. +* Proposes improvements to diversity metrics that correlate better with accuracy. + +Cons: +* Unclear/confusing presentation. +* Limited empirical validation that relies mostly on CIFAR-10 results to justify claims +* Some claims made (trend between ensemble diversity and accuracy, Q diversity capturing not capturing negative correlations) are not substantiated. + +All reviewers recommend this paper to be rejected and the authors did not reply to any reviews.",ICLR2021, +Syg7k0PgxN,1544740000000.0,1545350000000.0,1,rkluJ2R9KQ,rkluJ2R9KQ,Not entirely novel but still very interesting approach,Accept (Poster),"This paper is concerned with solving Online Combinatorial Optimization (OCO) problems using reinforcement learning (RL). There is a well-established traditional family of approaches to solving OCO problems, therefore the attempt itself to solve them with RL is very intriguing, as this provides insights about the capabilities of RL in a new but at the same time well understood class of problems. + +The reviewers agree that this approach is not entirely new. While past similar efforts take away some of the novelty of this paper, the reviewers and AC believe that still the setting considered here contains novel and interesting elements. + +All reviewers were unconvinced that this work can provide strong claims about using RL to learn any primal-dual algorithm. This takes away some of the paper’s impact, but thanks to discussion the authors managed to clarify some “hand-wavy” claims and toned-down the claims that were not convincing. Therefore, it was agreed that the new revision still provides some useful insight into the RL and primal-dual connection, even without a complete formal connection. +",ICLR2019,4: The area chair is confident but not absolutely certain +vCz0RHivdzX,1610040000000.0,1610470000000.0,1,TVbDOOr6hL,TVbDOOr6hL,Final Decision,Reject,"The authors suggest a VAE model for causal inference. The approach is motivated by CEVAE (Louizos et al., 2017) which uses a VAE to learn a latent representation of confounding between the treatment, target, and covariates. This paper goes beyond this approach and tries to design generative model architectures that encourage learning disentangled representations between different underlying factors of variation inspired by Hassanpour & Greiner (2019). + +The reviewers agreed that the topic will be of interest to a large group of readers. While the first version of the papers raised questions about the experimental design, several questions on the architecture design were addressed during the rebuttal period (e.g., deeper architectures). Other improvements were suggested and not adopted (e.g., alternative methods to achieve better disentanglement). The ablation studies seem to suggest that some of the loss terms are not actually needed and that non-probabilistic autoencoders (beta=0) also work well. We recommend aiming at improving the writing quality and coverage of more background material on the proposed architectures and causal factors. +",ICLR2021, +KugaNxpndx,1576800000000.0,1576800000000.0,1,r1lF_CEYwS,r1lF_CEYwS,Paper Decision,Accept (Poster),"This paper studies the role of topology in designing adversarial defenses. Specifically , the authors study defense strategies that rely on the assumption that data lies on a low-dimensional manifold, and show theoretical and empirical evidence that such defenses need to build a topological understanding of the data. + +Reviewers were initially positive, but had some concerns pertaining to clarity and limited experimental setup. After a productive rebuttal phase, now reviewers are mostly in favor of acceptance, thanks to the improved readibility and clarity. Despite the small-scale experimental validation, ultimately both reviewers and AC conclude this paper is worthy of publication. ",ICLR2020, +COFWsEm0w1,1642700000000.0,1642700000000.0,1,Pfj3SXBCbVQ,Pfj3SXBCbVQ,Paper Decision,Reject,"This paper demonstrates the hypothesis that a very small word piece vocabulary (giving a ""quasi character level"" model) outperforms current methods of neural MT in truly low resource scenarios, and provides some auxiliary studies around word piece frequency and domain transfer. It considers LSTM, CNN, and Transformer NMT models. This is useful information for people working in low resource scenarios to know. + +The paper got 3 reviews by people with very strong machine translation expertise. There was a general consensus that the paper was insufficiently aware of prior work on this topic and the paper had problems in experiment construction which raised issues about the comprehensiveness of the result. That is, while this paper adopts a more extremely small vocabulary, Sennrich and Zhang (2017) already showed that a much smaller subword vocabulary can give much stronger results for low resource MT (while Araabi and Monz questioned whether this was as true for Transformer NMT. Meanwhile Cherry et al. (2018) and Kreutzer and Sokolov (2018) argued already the benefits of (almost) character-level NMT. On the experimental side, both not having results on genuinely low-resource scenarios and the commented of Reviewer FBrF that the problem with larger subword vocals here may be mainly due to the small corpus size used for constructing the subword vocabulary are both quite important. Moreover, as mainly an MT experimental study, this paper seems better suited to a more specialized audience of MT researchers at an ACL, WMT, AMTA, etc. venue. + +I recommend rejecting this paper as not sufficiently novel, with experiments that need further work, and lacking strong interest to a broader representation learning audience.",ICLR2022, +Syxwvjn-eV,1544830000000.0,1545350000000.0,1,B1lz-3Rct7,B1lz-3Rct7,Paper decision,Accept (Poster),"Reviewers are in a consensus and recommended to accept after engaging with the authors. Please take reviewers' comments into consideration to improve your submission for the camera ready. +",ICLR2019,4: The area chair is confident but not absolutely certain +Bkh8HyaHf,1517250000000.0,1517260000000.0,577,Bk_fs6gA-,Bk_fs6gA-,ICLR 2018 Conference Acceptance Decision,Reject,"The authors use a memory-augmented neural architecture to learn solve combinatorial optimization problems. The reviewers consider the approach worth studying, but find the authors' experimental protocol and baselines flawed. ",ICLR2018, +_OjMWjdaZ4c,1642700000000.0,1642700000000.0,1,fy_XRVHqly,fy_XRVHqly,Paper Decision,Accept (Poster),"This paper studies the role of positional and relational embedding s for multi-task reinforcement learning with transformer-based policies, The paper is well-motivated, the experiment shows its effectiveness against other competitive methods. In the rebuttal period, the authors solved most of the reviews’ questions such as novelty and ablation studies. There are still some concerns about the generalizability of this approach for other tasks and more experiments are needed.",ICLR2022, +uJmKUA8zVws2,1642700000000.0,1642700000000.0,1,P1zfguZHowl,P1zfguZHowl,Paper Decision,Reject,"The paper proposes to use the Huber and absolute loss for value function estimation in reinforcement learning, and optimizes it by leveraging a recent primal-dual formulation by Dai et al. + +This is a controversial paper. On one hand, it is a well motivated idea to apply robust loss on RL; the paper implemented the idea well by leveraging the saddle point formulation, and empirically demonstrate its advantages in practice. + +On the other hand, the technical novelty of this paper is limited. The idea of Huber and standard conjugate formulation are straightforward application of existing techniques (despite being well motivated). + +The authors seem to think that there has been no application of Huber loss on RL. But existing implementations of RL already uses Huber loss. For example, in the openAI baselines (https://openai.com/blog/openai-baselines-dqn/), they said the following: + +""Double check your interpretations of papers: In the DQN Nature paper the authors write: “We also found it helpful to clip the error term from the update [...] to be between -1 and 1.”. There are two ways to interpret this statement — clip the objective, or clip the multiplicative term when computing gradient. The former seems more natural, but it causes the gradient to be zero on transitions with high error, which leads to suboptimal performance, as found in one DQN implementation. The latter is correct and has a simple mathematical interpretation — Huber Loss. You can spot bugs like these by checking that the gradients appear as you expect — this can be easily done within TensorFlow by using compute_gradients."" + +The authors discussed the first approach above on in the rebuttal, but I am not sure if the authors have considered the second method. If not, it would be worthwhile to discuss and compare with it. + +See also ""Agarwal et al. An Optimistic Perspective on Offline Reinforcement Learning"" and ""Dabney et al. Distributional Reinforcement Learning with Quantile Regression."" + +On the other hand, I have not seen the application of saddle point approach by primal-dual method of Dai on Huber specially. + +It seems that the proposed algorithm is in the end equivalent to MSBE+primal-dual+ (h with softmax output). If it is that simple, I think it would help the readers to explicitly point this out upfront in the beginning (which is an interesting conceptual connection). Because the primal-dual approach need to be approximate h with a neural network, the difference of the two methods is vague in the primal-dual space. + +A side mark: when we say ""an objective for which we can obtain *unbiased* sample gradients"", i think that the gradient estimator of the augmented Lagrange is unbiased; the gradient estimates of MHBE and MABE are still biased. + +Overall, it is a paper with a well motivated and valuable contribution, but limited in terms of technical depth and novelty.",ICLR2022, +2ntjJ4kyH3,1576800000000.0,1576800000000.0,1,H1g8p1BYvS,H1g8p1BYvS,Paper Decision,Reject,"This paper proposes to address the issue of biases and artifacts in benchmark datasets through the use of adversarial filtering. That is, removing training and test examples that a baseline model or ensemble gets wright. + +The paper is borderline, and could have flipped to an accept if the target acceptance rate for the conference were a bit higher. All three reviewers ultimately voted weakly in favor of it, especially after the addition of the new out-of-domain generalization results. However, reviewers found it confusing in places, and R2 wasn't fully convinced that this should be applied in the settings the authors suggest. This paper raises some interesting and controversial points, but after some private discussion, there wasn't a clear consensus that publishing it as is would do more good than harm.",ICLR2020, +2RAVb1ofHsQ,1610040000000.0,1610470000000.0,1,IX3Nnir2omJ,IX3Nnir2omJ,Final Decision,Accept (Poster),"This paper analyses the signal propagation through residual architectures; then suggests a scaling method which, together with weight standardization, allows to train such networks to high accuracy with batch-norm; it demonstrates that the method performs better than previous methods (Fixup, SkipInit), and can be used on more advanced architectures. + +The reviewers initially had several concerns, but after the author's revision, these concerns were addressed and most reviewers recommended acceptance. One reviewer did not respond, but I think these concerns were addressed. I think it will help to further convince the readers on the usefulness of the method readers if the authors would check the sensitivity to the learning rate with the current method and compare with other methods (SkipInit, Fixup, BN). The reason I'm suggesting this is that I think one of the main reasons BN is still in popular use is that it commonly tends to make training more robust to changes in hyper-parameters, such as the learning rate (while other methods, like SkipInit and Fixup, require more hyper-parameter tuning). + +Overall the analysis and the suggested method seem useful, especially at a small batch size and the writing is mostly clear, so I recommend acceptance. + +",ICLR2021, +qN3O56AS3zu,1642700000000.0,1642700000000.0,1,CTOJRqLMsl,CTOJRqLMsl,Paper Decision,Reject,"This paper theoretically studies the convergence of memory-based continual learning with stochastic gradient descent, and suggested several methods based on adaptive learning rates. + +The reviewers appreciated the novelty of the direction, and some of them thought the experimental results are promising. + +However, most reviewers (3/4) were negative. I think the main reason was the paper presentation and clarity, which they found lacking (and I agree). One reviewer thought the experimental evaluation should be improved, but there might have been some misunderstanding there. Lastly, even the positive reviewer thought the results were somewhat incremental and non-surprising. + +I hope the authors improve their paper and re-submit.",ICLR2022, +SJxN1LwxeN,1544740000000.0,1545350000000.0,1,r1eVMnA9K7,r1eVMnA9K7,Concerned about the rigor of experiments,Accept (Poster),"This paper introduces an unsupervised algorithm to learn a goal-conditioned policy and the reward function by formulating a mutual information maximization problem. The idea is interesting, but the experimental studies seem not rigorous enough. In the final version, I would like to see some more detailed analysis of the results obtained by the baselines (pixel approaches), as well as careful discussion on the relationship with other related work, such as Variational Intrinsic Control.",ICLR2019,5: The area chair is absolutely certain +HJea5S_F1E,1544290000000.0,1545350000000.0,1,HJxyAjRcFX,HJxyAjRcFX,Intersting new loss function for cGANs,Accept (Poster),The paper presents new loss functions (which replace the reconstruction part) for the training of conditional GANs. Theoretical considerations and an empirical analysis show that the proposed loss can better handle multimodality of the target distribution than reconstruction based losses while being competitive in terms of image quality.,ICLR2019,3: The area chair is somewhat confident +fyzCUtyBNRu,1610040000000.0,1610470000000.0,1,GzMUD_GGvJN,GzMUD_GGvJN,Final Decision,Reject,"While the general idea of the paper is certainly interesting and highly relevant, there is consensus that the paper cannot be published in the current form. +There were serious concerns about +- the correctness and generality of the method +- clarity of presentation +- experimental evaluation + +The authors graciously accepted the feedback, we wish them all the best in thoroughly revising and resubmitting the paper.",ICLR2021, +I1UG75rTFN8,1610040000000.0,1610470000000.0,1,gSJTgko59MC,gSJTgko59MC,Final Decision,Reject,"This paper develops an interesting new angle on the behavior of large-width neural networks by elucidating the connection between the NNGP and noisy gradient descent and by examining finite-width corrections through an Edgeworth expansion. While these contributions are important, the paper would better serve the community if its presentation were significantly improved before publication. The main issue is not one of presentation style -- papers with physics-style prose are welcomed and appreciated at ICLR -- but rather one of presentation substance. In addition to the various specific points raised by the reviewers, I would add that the figures and captions are difficult to interpret, the experiments need a more in-depth discussion, and the notations should all be defined at the time of their introduction, among other things. For these reasons, I cannot recommend accepting the paper in its current form, but I hope to see a more polished version of the manuscript at a subsequent conference.",ICLR2021, +gY_6qe58kwd,1610040000000.0,1610470000000.0,1,YNnpaAKeCfx,YNnpaAKeCfx,Final Decision,Accept (Poster),"All the reviewers and I agree that the proposed approach is interesting and the paper is overall well written. However, I agree with R3 that the paper need further re-working the theoretical part (see the post-rebuttal comments of R4). Thus, I would encourage the authors to carefully address the comments of the reviewers in the revised version of the paper, which would ultimately improve the quality of the paper. ",ICLR2021, +cf9kl9YLyW,1610040000000.0,1610470000000.0,1,ESG-DMKQKsD,ESG-DMKQKsD,Final Decision,Accept (Poster),"This paper uses an extension of HoloGAN for few shot recognition and novel view synthesis. All but one reviewer gave a final rating of accept. These reviewers were concerned that the submitted version of this work had not adequately placed this work in context with prior art. However, during the discussion these concerns seem to have been addressed sufficiently. The most negative reviewer was not impressed by the quality of the generated images; however these are relatively new methods and the few shot recognition aspect of this work is also part of the contribution. Accounting for all reviews and the discussion the AC recommends accepting this work as a poster. ",ICLR2021, +i5IG-Z8iRd,1576800000000.0,1576800000000.0,1,ryeEr0EFvS,ryeEr0EFvS,Paper Decision,Reject,"This paper proposes a modification to GCNs that generalizes the aggregation step to multiple levels of neighbors, that in theory, the new class of models have better discriminative power. The main criticism raised is that there is lack of sufficient evidence to distinguish this works theoretical contribution from that of Xu et al. Two reviewers also pointed out the concerns around experiment results and suggested to includes more recent state of the art SOTA results. While authors disagree that the contributions of their work is incremental, reviewers concerns are good samples of the general readers of this paper— general readers may also read this paper as incremental. We highly encourage authors to take another cycle of edits to better distinguish their work from others before future submissions. +",ICLR2020, +1oDRR8VrrQo,1610040000000.0,1610470000000.0,1,23ZjUGpjcc,23ZjUGpjcc,Final Decision,Accept (Poster),"The presented idea is aligned with past work using multiple experts or multiple sources for transfer. However, it is positioned uniquely and cleverly in that the approach is developed with scalability in mind. Within this setting, the paper is convincing. Although the approach does not come with strong backing theory, it is intuitive and seems to work well. During the discussions phase, the authors have clarified some questions that made the paper convincing, even if it is a relatively heuristic approach. The results are strong if one is concerned with both quantitative performance and efficiency, a combination of objectives very often encountered in practice. Overall, it is expected that this idea can stimulate further research along those lines, especially since this paper is very nice and easy to read.",ICLR2021, +v8ZVBwsy_,1576800000000.0,1576800000000.0,1,BkxpMTEtPB,BkxpMTEtPB,Paper Decision,Accept (Poster)," The paper proposes a neural network architecture to address the problem of estimating a sparse precision matrix from data, which can be used for inferring conditional independence if the random variables are gaussian. The authors propose an Alternating Minimisation procedure for solving the l1 regularized maximum likelihood which can be unrolled and parameterized. This method is shown to converge faster at inference time than other methods and it is also far more effective in terms of training time compared to an existing data driven method. + +Reviewers had good initial impressions of this paper, pointing out the significance of the idea and the soundness of the setup. After a productive rebuttal phase the authors significantly improved the readibility and successfully clarified the remaining concerns of the reviewers. This AC thus recommends acceptance. ",ICLR2020, +Sye7irOggE,1544750000000.0,1545350000000.0,1,BJfOXnActQ,BJfOXnActQ,Good proposal for incorporating class dependencies in few-shot learning.,Accept (Poster),"The reviewers think that incorporating class conditional dependencies into the metric space of a few-shot learner is a sufficiently good idea to merit acceptance. The performance isn’t necessarily better than the state-of-the-art approaches like LEO, but it is nonetheless competitive. One reviewer suggests incorporating a pre-training strategy to strengthen your results. In terms of experimental details, one reviewer pointed out that the embedding network architecture is quite a bit more powerful than the base learner and would like some additional justification for this. They would also like more detail on the computing the MAML gradients in the context of this method. Beyond this, please ensure that you have incorporated all of the clarifications that were required during the discussion phase.",ICLR2019,4: The area chair is confident but not absolutely certain +bLVrSlxedBM,1642700000000.0,1642700000000.0,1,bpUHBc9HCU8,bpUHBc9HCU8,Paper Decision,Reject,"The paper studies a robust GNN against adversarial attacks on both graph structure and node features. +The reviewers agree that the paper need to improve in terms of novelty and more technical details to meet ICLR standard.",ICLR2022, +fAg2ck3d9B,1576800000000.0,1576800000000.0,1,HylloR4YDr,HylloR4YDr,Paper Decision,Reject,"Solid, but not novel enough to merit publication. The reviewers agree on rejection, and despite authors' adaptation, the paper requires more work and broader experimentation for publication.",ICLR2020, +44ExlaeEoxJ,1610040000000.0,1610470000000.0,1,88_MfcJoJlS,88_MfcJoJlS,Final Decision,Reject,"There was quite a bit of internal discussion on this paper. To summarize: +- The idea is very neat and interesting and likely to work +- The paper is likely to inspire future work +- There are still serious doubts about the experimental evaluation that is not entirely up to par with current standards + - The reviewers were not convinced 100% by the arguments about the 'custom' environments + - The reviewers were not convinced 100% that the baselines were given their best shot + +While the paper has potential to provide valuable input for the community, it needs a bit more work before being presentable at a highly competitive venue like ICLR. +",ICLR2021, +Toj121hzE5,1642700000000.0,1642700000000.0,1,wronZ3Mx_d,wronZ3Mx_d,Paper Decision,Reject,"The submission describes a method for tuning machine learning pipeline hyperparameters using transfer learning from related tuning tasks. In particular, the method uses learned meta features to construct a covariance function for a GP. + +This was an extremely difficult case and could have gone either way. It was the closest case for any paper I serve as the AC for. Two of the reviewers recommended rejecting the paper and three recommended accepting, although during discussion one of the reviewers recommending accepting the paper seemed to actually be more on the reject side. + +Ultimately, I have decided to recommend rejecting this submission. However, if either the clarity (especially concerning the neural network setup) or the experiments were somewhat improved I would have recommended accepting it. I view clarity as an extremely important factor when weighing whether a submission should be accepted. I concur with the reviewers on the following weaknesses of the experiments: (1) the lack of an ablation test when considering ad-hoc meta features and (2) the experimental evaluation is based on mostly aggregated metrics. + +I know this recommendation must be disappointing, but I encourage the authors to polish the work a bit more and resubmit it somewhere.",ICLR2022, +wsLrlzCPQzd,1642700000000.0,1642700000000.0,1,js62_xuLDDv,js62_xuLDDv,Paper Decision,Accept (Poster),"This paper investigates an important problem, i.e., the fairness of the learned representation in deep metric learning, which is relatively under-explored by the research community. Observing that the existing metric learning approaches become less fair when trained on an imbalanced dataset, the authors propose finDML to benchmark previous methods on multiple imbalanced datasets with three newly proposed metrics. +Further, a PARADE module is adapted into this problem to tackle the fairness issue. + +The paper is meticulously written of good structure, and well motivated by experimental findings. The authors have a deep and thorough discussion with reviewers, through which the mixed preliminary ratings became all positive, with most concerns well addressed. AC found no ground for rejection and thus recommended acceptance. Authors shall integrate all response material into the next revision.",ICLR2022, +B__IQtCudX,1642700000000.0,1642700000000.0,1,kamUXjlAZuw,kamUXjlAZuw,Paper Decision,Reject,"This paper establishes the guarantee for the generalization of fairness-aware learning in binary classification under PAC-learning and a more practical asymptotic framework. The paper is nicely written, and theorems and proofs are well organized. However, novelty of the contribution seems to be insufficient. A future version of the paper may benefit from additional theoretical results or more diverse experiments.",ICLR2022, +rJHZnG8dg,1486400000000.0,1486400000000.0,1,ryh_8f9lg,ryh_8f9lg,ICLR committee final decision,Reject,"The paper explores neural-network learning on pairs of samples that are labeled as either similar or dissimilar. The proposed model appears to be different from standard siamese architectures, but it is poorly motivated. The experimental evaluation of the proposed model is very limited.",ICLR2017, +VDSjIEi8mQQ,1610040000000.0,1610470000000.0,1,JCz05AtXO3y,JCz05AtXO3y,Final Decision,Reject,"The proposed approach seems to have elements of novelty, it is well presented and reasonably motivated by the authors. In addition, empirical results seem to be promising. However, although rebuttal helped to clarify some of the pending issues, there are concerns on the fact that the raised issue about ""resolution dilemmas"" does not find in the paper a quantitative treatment. Without that, it is difficult to fully understand how to drive the learning of useful structural landmarks. Thus, notwithstanding the paper seems to contribute in a significant way to the advancement of the GNN field, it still needs additional work to better develop the proposed concepts in a quantitative theory. ",ICLR2021, +B1slX1TBz,1517250000000.0,1517260000000.0,65,B1ZvaaeAZ,B1ZvaaeAZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper explores the training of CNNs which have reduced-precision activations. By widening layers, it shows less of an accuracy hit on ILSVRC-12 compared to other recent reduced-precision networks. R1 was extremely positive on the paper, impressed by its readability and the quality of comparison to previous approaches (noting that results with 2-bit activations and 4-bit weights matched FP baselines). This seems very significant to me. R1 also pointed out that the technique used the same hyperparameters as the original training scheme, improving reproducibility/accessibility. R1 asked about application to MobileNets, and the authors reported some early results showing that the technique also worked with smaller network/architectures designed for low-memory hardware. R2 was less positive on the paper, with the main criticism being that the overall technical contribution of the paper was limited. They also were concerned that the paper seemed to motivate based on reducing memory footprint, but the results were focused on reducing computation. R3 liked the simplicity of the idea and comprehensiveness of the results. Like R2, they thought the paper was limited novelty. In their response to R3, the authors defended the novelty of the paper. I tend to side with the authors that very few papers target quantization at no accuracy loss. Moreover, the paper targets training, which also receives much less attention in the model compression / reduced precision literature. Is the architecture really novel? No. But does the experimental work investigate an important tradeoff? Yes.",ICLR2018, +SyvX4yprf,1517250000000.0,1517260000000.0,319,B1hcZZ-AW,B1hcZZ-AW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This is a meta-learning approach to model compression which trains 2 policies using RL to reduce the capacity (computational cost) of a trained network while maintaining performance, such that it can be effectively transferred to a smaller student network. The approach has similarities to recently proposed methods for architecture search, but is significantly different. The paper is well written and the experiments are clear and convincing. One of the reviews was unacceptable; I am not considering it (R1).",ICLR2018, +SkHsmyaSf,1517250000000.0,1517260000000.0,207,B1Yy1BxCZ,B1Yy1BxCZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Pros: ++ Nice demonstration of the equivalence between scaling the learning rate and increasing the batch size in SGD optimization. + +Cons: +- While reporting convergence as a function of number of parameter updates is consistent, the paper would be more compelling if wall-clock times were given in some cases, as that will help to illustrate the utility of the approach. +- The paper would be stronger if additional experimental results, which the authors appear to have at hand (based on their comments in the discussion) were included as supplemental material. +- The results are not all that surprising in light of other recent papers on the subject. +",ICLR2018, +cyp1tLXNNAC,1642700000000.0,1642700000000.0,1,ds8yZOUsea,ds8yZOUsea,Paper Decision,Accept (Poster),"The paper addresses a few very important points on sequential latent-variable models, and introduce a different view on meta-RL. Even though the methods that this paper poses are incremental, it is such a hot-debated topic that I would prefer to see this published now.",ICLR2022, +B8_e7jdS_as,1610040000000.0,1610470000000.0,1,3SV-ZePhnZM,3SV-ZePhnZM,Final Decision,Accept (Poster),"There was quite some variance in opinion on this paper, with some reviewers commenting on problems with clarity and experimental evaluation. The authors rebuttals improved the reviewer opinions slightly. The rebuttal and accompanying revisions are convincing, and the new experimental results are convincing and also very much appreciated. This is one of the first papers taking a comprehensive look at incremental, few-shot classification AND regression. Despite some problems with clarity (which were well-addressed in rebuttal and revisions), the paper is original and presents novel ideas about incremental few-shot learning. + +Pros: consideration of both few-shot classification and regression, ablation study well-executed and convincing. + +(remaining) Cons: some minor problems with clarity - please take reviewer comments on board when preparing the camera ready version.",ICLR2021, +RMDYTxVXKY,1576800000000.0,1576800000000.0,1,Syx1DkSYwB,Syx1DkSYwB,Paper Decision,Accept (Poster),Congratulations on getting your paper accepted to ICLR. Please make sure to incorporate the reviewers' suggestions for the final version.,ICLR2020, +HkaasGIOx,1486400000000.0,1486400000000.0,1,SJQNqLFgl,SJQNqLFgl,ICLR committee final decision,Reject,The authors agree with the reviewers that this manuscript is not yet ready.,ICLR2017, +tKENN3Y375R,1610040000000.0,1610470000000.0,1,JeweO9-QqV-,JeweO9-QqV-,Final Decision,Reject,"Four reviewers have reviewed this paper. After rebuttal, the reviewers' recommendations were borderline. Rev. 4 remains concerned about relation of second-order approaches in CV and second-order filters. Indeed, there exists a connection although it is perhaps subtle in its nature and equally concerning is the connection with general Polynomial filters in many GCN papers. As other reviewers point out, MixHop and Jumping Knowledge also allow multi-hop designs. More importantly, APPNP and SGC networks allow multiple hops. From that point of view, the proposed approach is rather a recap of existing observations and contributions. Finally, even Rev. 2 has indicated that the paper is perhaps 'average' after checking with comments of other reviewers. Therefore, at this point, the paper is slightly below the acceptance threshold. +",ICLR2021, +1zFgjxlHxId,1642700000000.0,1642700000000.0,1,zHZ1mvMUMW8,zHZ1mvMUMW8,Paper Decision,Reject,"### Summary + +The paper proposes a technique that enables inference directly on a compressed model without decompressing the model. + +### Discussion + +- Strengths + - An important problem as well as a compelling direction, namely inference without decompression. + +- Weaknesses: + - The reviewers provided a number of both broad and specific criticisms of the work. + The most salient point is the lack of comparison to modern baselines. Notably, the primary comparison is to a 2015 technique that, while seminal, has since been followed by significant related work (e.g, that identified by Reviewer eHWE, R8Un, and G6tm). In concert, the evaluation should consider at least one more contemporary network in the domain, such as a ResNet. + +### Recommendation + +I recommend Reject. At current, this work is the first step in a strong, compelling direction. However, the work needs to be contextualized within a more modern context of contemporary results",ICLR2022, +SJ94rypSM,1517250000000.0,1517260000000.0,548,SkYXvCR6W,SkYXvCR6W,ICLR 2018 Conference Acceptance Decision,Reject,"meta score: 4 + +The paper has been extensively edited during the review process - the edits are so extensive that I think the paper requires a re-review, which is not possible for ICLR 2018 + +Pros: + - potentially interesting and novel approach to prefix encoding for character level CNN text classification + - some experimental comparisons +Cons: + - lacks good comparison with the state-of-the-art, which makes it difficult to determine conclusions + - writing style lacks clarity. + +I would recommend that the authors continue to improve the paper and submit it to a later conference. +",ICLR2018, +Hkxo1EHleE,1544730000000.0,1545350000000.0,1,SkMx_iC9K7,SkMx_iC9K7,Reviewers agree upon rejection,Reject,"All reviewers are in agreement for a rejection decision. +Details below.",ICLR2019,4: The area chair is confident but not absolutely certain +BkgwK8e8l4,1545110000000.0,1545350000000.0,1,SJgNwi09Km,SJgNwi09Km,"Interesting approach to completely learn a structured prior, but reaching worse likelihood ",Accept (Poster),"A well-written paper that proposes an original approach for leaning a structured prior for VAEs, as a latent tree model whose structure and parameters are simultaneously learned. It describes a well-principled approach to learning a multifaceted clustering, and is shown empirically to be competitive with other unsupervised clustering models. +Reviewers noted that the approach reached a worse log-likelihood than regular VAE (which it should be able to find as a special case), hinting towards potential optimization difficulties (local minimum?). This would benefit form a more in-depth analysis. +But reviewers appreciated the gain in interpretability and insights from the model, and unanimously agreed that the paper was an interesting novel contribution worth publishing. +",ICLR2019,3: The area chair is somewhat confident +HJgwecHlx4,1544740000000.0,1545350000000.0,1,Bklfsi0cKm,Bklfsi0cKm,A neat connection between deep convolutional networks and Gaussian processes,Accept (Poster),"This paper builds on a promising line of literature developing connections between Gaussian processes and deep neural networks. Viewing one model under the lens of (the infinite limit of) another can lead to neat new insights and algorithms. In this case the authors develop a connection between convolutional networks and Gaussian processes with a particular kind of kernel. The reviews were quite mixed with one champion and two just below borderline. + +The reviewers all believed the paper had contributions which would be interesting to the community (such as R1: ""the paper presents a novel efficient way to compute the convolutional kernel, which I believe has merits on its own"" and R2: ""I really like the idea of authors that kernels based on convolutional networks might be more practical compared to the ones based on fully connected networks""). All the reviewers found the contribution of the covariance function to be novel and exciting. + +Some cited weaknesses of the paper were that the authors didn't analyze the uncertainty from the model (arguably the reasoning for adopting a Bayesian treatment), novelty in appealing to the central limit theorem to arrive at the connection, and scalability of the model. + +In the review process it also became apparent that there was another paper with a substantially similar contribution. The decision for this paper was calibrated accordingly with that work. + +Weighing the strengths and weaknesses of the paper and taking into account a reviewer willing to champion the work it seems there is enough novel contribution and interest in the work to justify acceptance. + +The authors provided responses to the reviewer concerns including calibration plots and timing experiments in the discussion period and it would be appreciated if these can be incorporated into the camera ready version.",ICLR2019,5: The area chair is absolutely certain +ryxqdQ7Je4,1544660000000.0,1545350000000.0,1,SkgEaj05t7,SkgEaj05t7,Good but more study needed,Accept (Poster),The reviewers found the paper insightful and the authors explanations well-provided. However the paper would benefit from more systematic empirical evaluation and corresponding theoretical intuition.,ICLR2019,5: The area chair is absolutely certain +ZhMBPvNvpMg,1642700000000.0,1642700000000.0,1,KxbhdyiPHE,KxbhdyiPHE,Paper Decision,Accept (Spotlight),"This work proposed a method for encouraging an agent showing altruistic behaviour towards another agent (leader) without having access to the leader's reward function. The basic idea is based on the hypothesis that having the ability to reach many future states (i.e., called choice) is useful for the leader agent, no matter what it reward function is. The altruistic agent learns a policy that maximizes the choice of the leader agent. The paper defines three notions of choice, and evaluates them on four environments. + +The reviewers believe that this work attempts to solve an important problem, proposes a novel approach, and performs reasonably good experiments. The reviewers are all on the positive side at the end of the discussion phase. Therefore, I recommend acceptance of the paper. I also suggest a spotlight presentation for this work because of the novelty of the problem, which might be of interest to other researchers. + +The authors have already done some revisions to their paper (including adding a new environment). I encourage them to consider any remaining comments from reviewers in their final version.",ICLR2022, +VtMf73VgVd,1576800000000.0,1576800000000.0,1,H1ep5TNKwr,H1ep5TNKwr,Paper Decision,Reject,"The paper learns an embedding on the nodes of the graph, iteratively aligning the vector associated to a node with that of its neighbor nodes (based on the Hebbian rule). + +The reviews state that the approach is interesting though very natural/straightforward, and that it might go too far to call it ""Hebbian"" (Rev#2) - you might want also to see it as a Self-Organizing Map for graphs. + +A main criticism was about the comparison with the state of the art (all reviewers). The authors did add empirical comparisons with the suggested VGAE and SEAL, and phrase it nicely as ""our algorithm outperforms SEAL on one out of four data sets"". Looking at the revised paper, this is true: the approach is outperformed by SEAL on 3 out of 4 datasets. + +Another criticism regards the insufficient analysis of the results (e.g. through visualization, studying the clusters obtained along different runs, etc). +This aspect is not addressed in the revised version. + +An excellent point is the scalability of the approach, which is worth emphasizing. + +I thus encourage the authors to rewrite and polish the paper, improving the positioning of the proposed approach w.r.t. the state of the art, and providing a more thorough analysis of the results. + +",ICLR2020, +taPlDubAAg,1610040000000.0,1610470000000.0,1,1MJPtHogkwX,1MJPtHogkwX,Final Decision,Reject,"The paper has two contributions. A novel benchmark for clinical multi-modal multi-task learning based on the already released MIMIC III and a multi-modal multi-task machine learning model. While the paper does show value in providing a curated benchmark and combining/unifying existing approaches to a timely problem, the reviewers agree that the paper provides insufficient novelty to warrant publication.",ICLR2021, +Z-hWwUcg,1576800000000.0,1576800000000.0,1,H1ebc0VYvH,H1ebc0VYvH,Paper Decision,Reject,"The main concern raised by the reviewers is limited experimental work, and there is no rebuttal.",ICLR2020, +bbPBQAVHk5T,1642700000000.0,1642700000000.0,1,y1PXylgrXZ,y1PXylgrXZ,Paper Decision,Accept (Poster),"Note: This meta review is written by the SAC, but it's synced with the AC. + +Summary (adopted from Reviewer wCmR): This paper presents a modification of monotone deep equilibrium layers that allows to compute the bounds on the output via the IBP algorithm. This also allows to train a certifiably robust DEQ model with a competitive performance. + +Initial reviews were mixed, but post rebuttal the opinions generally improved. Reviewer wCmR intend to increase their score slightly (6 to 7) and Reviewer KViU also mentioned that their opinion improved. Reviewer 7ZJs maintained their score and, during discussion phase, made many arguments against acceptance. One of those was about Tarski's theorem which was deemed not so important by the AC and also KViU. Another concern was about experimental results to which KViU agreed, and this remains the main concern for now. + +Most reviewers agree that the work is interesting and is a good step, but then utility of the new modification and significance of the results remains a question. It is likely that the work may be useful in the future, and as there is an overall increase in the opinion, I believe that it is okay to accept the paper. + +I encourage the authors to take the comments of the reviewers into account, and clearly mention the issues raised in the paper. + +SAC",ICLR2022, +OGc9WjBRYt7,1610040000000.0,1610470000000.0,1,8pz6GXZ3YT,8pz6GXZ3YT,Final Decision,Reject,"Even though the authors revised the problem formulation, the paper seems not ready for publication. The assumptions are still too strong (The learning algorithm assumes knowledge of the sparsity mask). The proof technique also heavily relies on Zhong et al'17 without properly highlighting the difference. ",ICLR2021, +uQeKpqqQTtnE,1642700000000.0,1642700000000.0,1,v-27phh2c8O,v-27phh2c8O,Paper Decision,Reject,"This paper proposes to use evolutionary methods to learn auxiliary loss functions, demonstrating superior performance vs. typical auxiliary losses previously proposed in the RL literature. + +Demonstrating that it is possible to learn auxiliary losses by evolution, both for pixel and state representations, that help train significantly faster (even on new environments) is definitely a meaningful contribution, as acknowledge by the majority of reviewers. + +Although many of the original reviews' concerns were addressed by the authors during the discussion period, two major ones were only partially answered, both related to the limited empirical evaluation of the proposed approach (which is crucial for such a contribution that aims to demonstrate an improvement over existing related techniques): +1. The limited set of environments used for evaluation (and in particular the lack of partially observable environments) +2. The fact that the baseline being compared to was CURL, which the paper describes as ""the state-of-the-art pixel-based RL algorithm"", while reviewers mentioned DrQ and RAD as two more recent (and better) algorithms that were known well ahead of the ICLR submission deadline (note that the more recent DrQ-v2 is now even better). Since the data augmentation techniques used by these algorithms help shape the internal representation, like auxiliary losses do, it would have been important to validate that the proposed technique could be useful when plugged on top of such baselines. + +The authors did try their best to address these major concerns during the rebuttal period, but the discussion between reviewers and myself came to the conclusion that this wasn't quite convincing enough yet. I encourage the authors to investigate these points in more depth in a future version of this work so as to make the empirical validation stronger (NB: the links provided in the last comment by authors on Nov. 30th didn't work, but this wasn't the main factor in the decision).",ICLR2022, +BJmhiMUOx,1486400000000.0,1486400000000.0,1,HyEeMu_xx,HyEeMu_xx,ICLR committee final decision,Reject,"The program committee appreciates the authors' response to concerns raised in the reviews. Authors have conducted additional experiments and provided comparisons to other existing models. However, reviewer scores are not leaning sufficiently towards acceptance. + + The effectiveness of this approach on realistic data still remains unclear in the context of existing approaches. I agree that the reported improvement on Visual Genome over the baseline is non-trivial. But evaluating an existing state-of-the-art VQA approach (for instance) would help better place the performance of this approach in perspective relative to state-of-the-art. + + Incorporating reviewer comments, and more convincing demonstration of the model's capabilities on realistic data will help make the paper stronger.",ICLR2017, +mZw18Hduq4A,1610040000000.0,1610470000000.0,1,oXQxan1BWgU,oXQxan1BWgU,Final Decision,Reject,"The paper proposes a variant of MAML for meta-learning on tasks with a hierarchical tree structure. The proposed algorithm is evaluated on synthetic datasets, and it compares favorably to MAML. The reviewers identified several significant weaknesses, including: (1) the experimental evaluation is limited, and it only includes small synthetic datasets; (2) the proposed algorithm is incremental over MAML. The reviewers agreed that the paper cannot be accepted in its current form. I recommend reject.",ICLR2021, +Skxr7xuVyV,1543960000000.0,1545350000000.0,1,HkM3vjCcF7,HkM3vjCcF7,decision,Reject,"The paper presents a multi-scale extension of the hourglass network. As the reviewers point out, the paper is below ICLR publication standard due to low novelty (i.e., multi-scale extension is not a new idea) and significance (i.e., not a significant performance gain against the state-of-the-art method or other baselines).",ICLR2019,5: The area chair is absolutely certain +qcRF5_EapXS,1642700000000.0,1642700000000.0,1,gJcEM8sxHK,gJcEM8sxHK,Paper Decision,Accept (Poster),"The authors explore the hypothesis of whether grounded representations can be leaved from text only. They show that a language model trained with relatively little data can make conceptual domains such as color to a grounded world representation such as RGB coordinates. The paper was positively received by the reviewers, specifically after a fruitful discussion to further clarify the points that the authors were making and their conclusions. The authors have already edited some parts of the paper, I ask them to go back and include other points that the reviewers have made. I recommend this paper for acceptance, it will generate good discussion and ideas at ICLR.",ICLR2022, +Ef-EP3BSjoj,1642700000000.0,1642700000000.0,1,bwq6O4Cwdl,bwq6O4Cwdl,Paper Decision,Accept (Poster),"This paper provide an explanation why contrastive learning methods like SimSiam avoid collapse without negative samples. As the authors claimed, this is indeed a timely work for understanding the recent success in self-supervised learning (SSL). The key idea in this submission is to decomposes the gradient into a center vector and residual vector which respectively correspond to de-centering and de-correlation. Such an explanation is interesting and novel. The empirical results are solid and convincing. During the rebuttal stage, the concerns from the reviewers are well resolved, and the writing of the new version is significantly better than the original one.",ICLR2022, +ObW8wf3kb_X,1610040000000.0,1610470000000.0,1,piLPYqxtWuA,piLPYqxtWuA,Final Decision,Accept (Poster),"This paper presents a number of techniques to improve the existing non-autoregressive end-to-end TTS model -- FastSpeech. These techniques include replacing the teacher forcing with ground truth targets and using a variation adaptor to introduce auxiliary information such as duration, energy and pitch. The experiments show that the proposed Fastspeech 2 model is faster in training compared to the existing FastSpeech model and meanwhile can still achieve high quality synthesized speech. The work reported in the paper is essentially about system improvement over FastSpeech but has it value in the speech community given the current interest in non-autoregressive rapid TTS. On the other hand, concerns are also raised regarding the complexity of the pipeline and the significance of the novelty. The authors' rebuttal is good and has addressed most of the concerns. Overall, this is an interesting paper and can be accepted. ",ICLR2021, +HeE6xIIgJA,1576800000000.0,1576800000000.0,1,S1g_t1StDB,S1g_t1StDB,Paper Decision,Reject,"Two reviewers are borderline and one recommends rejection. The main criticism is the simplicity of language, scalability to a more complex problem, and questions about experiments. Due to the lack of stronger support, the paper cannot be accepted at this point. The authors are encouraged to address the reviewer's comments and resubmit to a future conference.",ICLR2020, +tumHqHLIlx,1576800000000.0,1576800000000.0,1,SklkDkSFPB,SklkDkSFPB,Paper Decision,Accept (Poster),"Two reviewers recommend acceptance. One reviewer is negative, however, does not provide reasons for rejection. The AC read the paper and agrees with the positive reviewers. in that the paper provides value for the community on an important topic of network compression.",ICLR2020, +TC4VKQJDVN,1576800000000.0,1576800000000.0,1,rylfl6VFDH,rylfl6VFDH,Paper Decision,Reject,"This paper introduces a new adaptive variational dropout approach to balance accuracy, sparsity and computation. + +The method proposed here is sound, the motivation for smaller (perhaps sparser) networks is easy to follow. The paper provides experiments in several data-sets and compares against several other regularization/pruning approaches, and measures accuracy, speedup, and memory. The reviewers agreed on all these points, but overall they found the results unconvincing. They requested (1) more baselines (which the authors added), (2) larger tasks/datasets, and (3) more variety in network architectures. The overall impression was it was hard to see a clear benefit of the proposed approach, based on the provided tables of results. + +The paper could sharpen its impact with several adjustments. The results are much more clear looking at the error vs speedup graphs. Presenting ""representative results"" in the tables was confusing, especially considering the proposed approach rarely dominated across all measures. It was unclear how the variants of the algorithms presented in the tables were selected---explaining this would help a lot. In addition, more text is needed to help the reader understand how improvements in speed, accuracy, and memory matter. For example in LeNet 500-300 is a speedup of ~12 @ 1.26 error for BB worth-it/important compared a speedup of ~8 for similar error for L_0? How should the reader think about differences in speedup, memory and accuracy---perhaps explanations linking to the impact of these metrics to their context in real applications. I found myself wondering this about pretty much every result, especially when better speedup and memory could be achieved at the cost of some accuracy---how much does the reduction in accuracy actually matter? Is speed and size the dominant thing? I don't know. + +Overall the analysis and descriptions of the results are very terse, leaving much to the reader to figure out. For example (fig 2 bottom right). If a result is worth including in the paper it's worth explaining it to the reader. Summary statements like ""BB and DBB either achieve significantly smaller error than the baseline methods, or significant speedup and memory saving at similar error rates."" Is not helpful where there are so many dimensions of performance to figure out. The paper spends a lot of time explaining what was done in a matter of fact way, but little time helping the reader interpret the results. + +There are other issues that hurt the paper, including reporting the results of only 3 runs, sometimes reporting median without explanation, undefined metrics like speedup ,%memory (explain how they are calculated), restricting the batchsize for all methods to a particular value without explanation, and overall somewhat informal and imprecise discussion of the empirical methodology. + +The authors did a nice job responding to the reviewers (illustrating good understanding of the area and the strengths of their method), and this could be a strong paper indeed if the changes suggested above were implemented. Including SSL and SVG in the appendix was great, but they really should have been included in the speedup vs error plots throughout the paper. This is a nice direction and was very close. Keep going!",ICLR2020, +YlJKamiek-e,1642700000000.0,1642700000000.0,1,1DUwCRNAbA,1DUwCRNAbA,Paper Decision,Reject,"In this paper, the authors present an investigation of the impact of demographics on the peer review outcomes of ICLR. This is an important topic, as the demographics of ICLR and similar conferences are seriously skewed and may cause some people to feel excluded. The authors look into this complex problem with extensive manual annotations and analyses. + +The main weakness of this paper is that it is observational, and while the results are interesting, it is difficult to take away a clear and convincing message for the future. Part of the reason is that the whole problem is quite complex, and the hypotheses that are presented and tested in this paper reveal relatively shallow findings. Compared to the NeurIPS experiments which are carefully designed, these are not causal (see one of the reviewers' comments), so it is difficult to draw conclusions beyond correlations. + +In summary, the results are interesting, and despite some of the reviewers' concerns, I would not exclude this paper because of the topic being irrelevant to the cfp, but I think the paper needs a more clear and convincing message.",ICLR2022, +sxfizitYCk,1642700000000.0,1642700000000.0,1,mk8AzPcd3x,mk8AzPcd3x,Paper Decision,Reject,"The authors propose a new methods for graph shortest distance embedding method called BCDR based on betweenness centrality. Then they show that the method is competitive both theoretically than experimentally with existing work. + +After a discussion with the reviewers and after considering the nice changes in the paper and explanation in the rebuttal we agree that the paper contains some very interesting ideas but it is not probably ready for publication. The comparison with previous works is, in fact, still a bit limited and it should be extended. In addition the algorithm should also be tested on larger datasets.",ICLR2022, +a5NsjlB6rht,1610040000000.0,1610470000000.0,1,p3_z68kKrus,p3_z68kKrus,Final Decision,Reject,"The paper investigates the average stability of kernel minimal norm interpolating predictors. The main result +establishes an upper bound on a particular notion of average stability for which it is well-known that it +can be used to bound the generalization error. This upper bound holds for all interpolating predictors +from the RKHS, but it is minimized by the minimal norm predictor. + +While at first glance this result looks highly interesting, a closer look reveals that the significance of the results +crucially depends on the quality of the derived upper bound. Here two reviewers raised concerns, since it is +by no means clear that even the optimized upper bound produces meaningful bounds on the generalization +performance. The authors tried to address these concerns in their response and promised to update their +paper accordingly. As a result, they added a paragraph on page 8. Unfortunately, this paragraph remains extremely +vague, in particular if it comes to the more interesting case of non-linear kernels. Here, the authors briefly refer to +a paper by El Karoui but no details are given. However, looking at El Karoui's paper it is anything but obvious whether +the results of that paper lead to reasonable upper bounds on the average stability for a sufficiently general class +of distributions. +As a result, I view the paper under review to be premature since it remains unclear if the observed optimality of the minimal norm solution is a real feature or just an artifact due to an upper bound that is simply too loose to make any conclusion. + + ",ICLR2021, +XB8VO-mw0Ew,1642700000000.0,1642700000000.0,1,ajOSNLwqssu,ajOSNLwqssu,Paper Decision,Reject,"Reviewers agreed that taking into account the secondary structure in addition to the amino acid sequence, although not new in bioinformatics, may be a good idea in the context of deep generative models of peptides. On the other hand, all reviewers also agreed that the experimental results do not allow concluding about the potential benefit of the method, i.e., whether it is likely to produce potent AMPs (and whether it does it better than existing methods). Indeed, the proposed computational criteria can not replace a proper experimental validation, and it is not clear whether a ""better method"" on the computational criteria will be ""better"" in the real world. Second, the results on the computational criteria are not convincing: regarding the physical properties, it remains debatable to claim that a method is good if it outputs many AMPs that fulfill the criterion, while less than 7% of the true AMPs do; and regarding the computational prediction of being an AMP, the proposed method is outperformed by existing ones. In conclusion, we consider that the paper is not ready for publication at ICLR, since there is no significant methodological novelty nor significant experimental results if this is an application paper, and we encourage the authors to consider a publication with wet lab experiments to demonstrate the relevance of the method.",ICLR2022, +NLMgo5Iywkm,1642700000000.0,1642700000000.0,1,WcZUevpX3H3,WcZUevpX3H3,Paper Decision,Reject,"The reviewers had remarkably consistent feedback about this paper. They appreciated the formulation of the federated learning problem with architectures having both shared and private (personalized) components. On the other hand, they felt the experiments were insufficient to prove the effectiveness of the method, and had several suggestions in terms of tasks and datasets. They also felt that it's hard to assess whether the existence of private/personalized components is warranted without visualizing the difference between architectures. Overall, the reviewers had good feedback that could strengthen the paper.",ICLR2022, +hWcG2AqOCYVp,1642700000000.0,1642700000000.0,1,i7FNvHnPvPc,i7FNvHnPvPc,Paper Decision,Reject,"This paper studies the transferability of adversarial attacks in deep neural networks. In particular, it proposes the reverse adversarial perturbation (RAP) method to boost attack transferability by flattening the landscape of the loss function. + +The reviewers acknowledge the strengths of the paper, which include effectiveness of the simple RAP method proposed and the extensive experimentation presented. + +However, a number of outstanding concerns still remain. Some of them include the technical novelty of the paper, insufficient theoretical justification of the proposed method, lack of grounded justification between flatness of the loss landscape and model generalization under the specific context of attack transferability, similarity of the optimization problem with some existing work, potential difficulties of the min-max attack generation problem, among others. + +As it stands, this is a borderline paper that is reasonably good, but not great. Addressing the outstanding concerns will make the paper more ready for publication in ICLR.",ICLR2022, +Bkgjce41xE,1544660000000.0,1545350000000.0,1,B1xJAsA5F7,B1xJAsA5F7,after revisions the reviewers reached a consensus on accepting the paper,Accept (Poster),"The revisions made by the authors convinced the reviewers to all recommend accepting this paper. Therefore, I am recommending acceptance as well. I believe the revisions were important to make since I concur with several points in the initial reviews about additional baselines. It is all too easy to add confusion to the literature by not including enough experiments. ",ICLR2019,5: The area chair is absolutely certain +mT6r2axivF,1576800000000.0,1576800000000.0,1,Syx_f6EFPr,Syx_f6EFPr,Paper Decision,Reject,"This was a difficult paper to decide, given the strong disagreement between reviewer assessments. After the discussion it became clear that the paper tackles some well studied issues while neglecting to cite some relevant works. The significance and novelty of the contribution was directly challenged, yet I could not see a convincing case presented to mitigate these criticisms. The paper needs to do a better job of placing the work in the context of the existing literature, and establishing the significance and novelty of its main contributions.",ICLR2020, +rylbEV8VeE,1545000000000.0,1545350000000.0,1,r1GbfhRqF7,r1GbfhRqF7,A good paper but short reviews,Accept (Poster),"This paper proposes a new kernel learning framework for change point detection by using a generative model. The reviewers agree that the paper is interesting and useful for the community. One of the reviewer had some issues with the paper but those were resolved after the rebuttal. The other two reviewers have short reviews and somewhat low confidence, so it is difficult to tell how this paper stands among other that exist in the literature. Overall, given the consistent ratings from all the reviewers, I believe this paper can be accepted. ",ICLR2019,2: The area chair is not sure +rkeqKBDSkV,1544020000000.0,1545350000000.0,1,Bkg6RiCqY7,Bkg6RiCqY7,a useful and influential finding,Accept (Poster),"Evaluating this paper is somewhat awkward because it has already been through multiple reviewing cycles, and in the meantime, the trick has already become widely adopted and inspired interesting follow-up work. Much of the paper is devoted to reviewing this follow-up work. I think it's clearly time for this to be made part of the published literature, so I recommend acceptance. (And all reviewers are in agreement that the paper ought to be accepted.) + +The paper proposes, in the context of Adam, to apply literal weight decay in place of L2 regularization. An impressively thorough set of experiments are given to demonstrate the improved generalization performance, as well as a decoupling of the hyperparameters. + +Previous versions of the paper suffered from a lack of theoretical justification for the proposed method. Ordinarily, in such cases, one would worry that the improved results could be due to some sort of experimental confound. But AdamW has been validated by so many other groups on a range of domains that the improvement is well established. And other researchers have offered possible explanations for the improvement. +",ICLR2019,5: The area chair is absolutely certain +HkS0LkaSf,1517250000000.0,1517260000000.0,896,r1BRfhiab,r1BRfhiab,ICLR 2018 Conference Acceptance Decision,Reject,"All of the reviewers have found some aspects of the formulation interesting, but they raised concerns regarding the practical use of the experimental setup. +",ICLR2018, +SJJgrJ6Sz,1517250000000.0,1517260000000.0,486,BJMuY-gRW,BJMuY-gRW,ICLR 2018 Conference Acceptance Decision,Reject,"Though the general direction is interesting and relevant to ICLR, the novelty is limited. As reviewers point out it is very similar to Le & Zuidema (2015), with few modifications (using LSTM word representations, a different type of pooling). However, it is not clear if they are necessary as there is no direct comparison (e.g., using a different type of pooling). Overall, though the submission is generally solid, it does not seem appropriate for ICLR. + ++ solid ++ well written +- novelty limited +- relation to Le & Zuidema is underplayed",ICLR2018, +d8vmM9LwzD,1576800000000.0,1576800000000.0,1,ryeQmCVYPS,ryeQmCVYPS,Paper Decision,Reject,"The reviewers wondered about the practical application of this method, given that the performance was lower. The reviewers were also surprised by some of your claims and wanted you to explore them more deeply. + +On the positive side, the reviewers found your experiments to be very thorough. You also performed additional experiments during the rebuttal period. We hope that those experiments will help you to build a better paper as you work towards publishing this work.",ICLR2020, +SA1ZReZ0_IL,1642700000000.0,1642700000000.0,1,in1ynkrXyMH,in1ynkrXyMH,Paper Decision,Reject,"The reviewers all appreciated the novel concept behind the work. I agree with this, I think the principles behind the work are novel and interesting, and I would encourage the authors to improve the validation of this method and publish it in the future. + +However, reviewers also raised a number of issues with the current paper: (1) the evaluation appears a bit preliminary, and could be improved significantly with additional datasets and more ablations/comparisons; (2) it's not clear if the improvements from the method are especially significant; (3) the writing could be improved (I do see that the authors made a significant number of changes and improved parts of the paper in response to reviewer concerns to a degree). Probably the writing issues could be fixed, but the skepticism about the experiment results seems harder to address, and while I recognize that the authors made an effort to point some existing ablations in the paper that do address parts of what the reviewers raised, I do think that in the balance the experimental results leave the validation of the work as somewhat borderline. + +While less important for the decision, I found that the paper is somewhat overselling the contribution in the opening -- while the particular concept of using gradients as features in this way is interesting, similar ideas have been proposed in the past, and the paper would probably be better if it was more clearly positioned in the context of prior work rather than trying to present a new ""framework"" like this. It kind of feels like it's biting off too much in the opening, and then delivering a comparatively more modest (but novel and interesting!) technical component.",ICLR2022, +yl5g2OHr_Ej,1642700000000.0,1642700000000.0,1,k6F-4Bw7LpV,k6F-4Bw7LpV,Paper Decision,Reject,"The paper proposes a new perspective on the generalization performance of interpolating classifiers based on the entire joint distribution of their inputs and outputs. It conjectures that, when conditioned on certain subgroups, the output distribution matches the distribution of true labels. The conjecture is investigated empirically on a number of datasets and models, and proved to hold for a simple nearest neighbor model. + +This paper generated varying responses from the reviewers and a detailed discussion. One main concern focused on whether the feature calibration conjecture is actually surprising, given standard expectations about generalization from learning theory. Indeed, from the discussion and the paper itself, it seems the authors conceived of classical generalization as a statement about whether train performance $\approx$ test performance, whereas one reviewer remarked that ""what it really talks about is concentration of measure."" I agree with the importance of this distinction in general, though it is perhaps less relevant in the current setting of modern interpolating classifiers, for which so little about generalization is understood in the first place. In particular, the empirical observations of varied forms of good generalization behavior for overparameterized models are likely to be interesting to the community, regardless of whether this behavior might be expected in the large sample limit. + +As such, this is a very borderline paper, with many good arguments both for and against acceptance. After a detailed discussion among the chairs, it was decided that the current version is just shy of the acceptance threshold, but I would strongly encourage the authors to address the main reviewer concerns and resubmit a revised manuscript to a future venue.",ICLR2022, +#NAME?,1576800000000.0,1576800000000.0,1,S1lxKlSKPH,S1lxKlSKPH,Paper Decision,Accept (Poster),"The paper proposes a simple and effective way to stabilize training by adding consistency term to discriminator. Given the stochastic augmentation procedure $T(x)$ the loss is just a penalty on $D$. The main unsolved question why it help to make discriminator ""smoother"" in the consistency case for a standard GAN (since typically, no constraints are enforced). Nevertheless, at the moment this a working heuristics that gives new SOTA, and that is the main strength. The reviewer all agree to accept, and so do I.",ICLR2020, +#NAME?,1610040000000.0,1610470000000.0,1,0DALDI-xyW4,0DALDI-xyW4,Final Decision,Reject,"The paper studies a high-order discretization of the ODE corresponding to Nesterov's accelerated method, as introduced by Su-Boyd-Candes. The main claim of the paper is that the more complex discretization scheme leads to a method that is more stable and faster. However, the theoretical claims do not seem sufficiently supported, and the experimental results are insufficient to judge the usefulness of the proposed approach. Thus, the reviews could not recommend acceptance, and I concur. The authors are advised to revise the paper to provide more theoretical and experimental evidence for usefulness/competitiveness of the proposed approach, and resubmit to a different venue. ",ICLR2021, +HkgMMeVZxN,1544790000000.0,1545350000000.0,1,S1x4ghC9tQ,S1x4ghC9tQ,Original paper,Accept (Oral),The reviewers agree that this is a novel paper with a convincing evaluation.,ICLR2019,4: The area chair is confident but not absolutely certain +yGPWiiQWckN,1642700000000.0,1642700000000.0,1,RjMtFbmETG,RjMtFbmETG,Paper Decision,Reject,"This paper proposes a new softmax like operator, to be used instead of eps-greedy or softmax in Q learning algorithms. There has been some previous work in this direction, most notably Mellowmax, but the proposed operator is more computationally efficient, and there is some experimental evidence that it improves DQN performance. +The reviews were mixed, with two mildly positive reviewers (6), who found the work interesting, and two negative reviewers (3,5), who raised issues about the impact of the work when taken as a part of a larger RL algorithm, and about the generality of the work w.r.t. to other RL algorithms like policy gradients. During the discussion, the reviewers did not reach an agreement. +My decision to reject the paper is based on the following: while the idea is novel, and the contraction analysis is appropriate, the main interest to the community in such an idea is either experimental - can it be used to push the state of the art RL algorithms? or theoretical - can we glean new theoretical insights using this method? In its current presentation, there is not enough evidence in the paper to support either of these. +I encourage the authors to either dig deeper into the experimental evaluation and produce more convincing results, or dig deeper into the theory and show some theoretical benefit of Resmax.",ICLR2022, +nSlr6M4Zfsj,1642700000000.0,1642700000000.0,1,nRCS3BfynGQ,nRCS3BfynGQ,Paper Decision,Reject,"This work proposes to extend the invariance/equivariance properties of GNNs by focusing on distance-preserving and angle-preserving transformations, given respectively by the Euclidean and Conformal group. Preliminary experiments are reported that demonstrate the advantage of such architectures. +Reviewers found this work generally interesting, tackling an important problem and proposing a valid solution. However, they also raised important concerns, namely the relatively minor novelty relative to recent models (such as EGNN), as well as the lack of convincing real-world experiments that would validate the modeling assumptions. Taking all these considerations into account, the AC recommends rejection at this time, and encourages the authors to address the points raised by reviewers in a revision.",ICLR2022, +_LwcM0laOv,1576800000000.0,1576800000000.0,1,BylQSxHFwr,BylQSxHFwr,Paper Decision,Accept (Poster),"Reviewer #1 noted that he wishes to change his review to weak accept post rebuttal, but did not change his score in the system. Presuming his score is weak accept, then all reviewers are unanimous for acceptance. I have reviewed the paper and find the results appear to be clear, but the magnitude of the improvement is modest. I concur with the weak accept recommendation. ",ICLR2020, +U-goOjAqjHi,1642700000000.0,1642700000000.0,1,Oxdln9khkxv,Oxdln9khkxv,Paper Decision,Reject,"All reviewers suggested rejection of the paper. This is based on concerns regarding novelty of results, clarity of presentation, simplicity of conducted experiments and missing ablation studies (and several other points raised in the reviews). The authors also did not submit a rebuttal. Hence I am recommending rejection of the paper.",ICLR2022, +HyeEw79Vl4,1545020000000.0,1545350000000.0,1,S1zk9iRqF7,S1zk9iRqF7,Advances in differentially-private data generation,Accept (Poster),"This paper improves upon the PATE-GAN framework for differentially-private synthetic data generation. They eliminate the need for public data samples for training the GAN, by providing a distribution which can be sampled from instead. + +The authors were unanimous in their vote to accept.",ICLR2019,4: The area chair is confident but not absolutely certain +HkeloZBJgN,1544670000000.0,1545350000000.0,1,ryxxCiRqYX,ryxxCiRqYX,Interesting new perspective on deep learning,Accept (Poster),"This paper relates deep learning to convex optimization by showing that the forward pass though a dropout layer, linear layer (either convolutional or fully connected), and a nonlinear activation function is equivalent to taking one τ-nice proximal gradient descent step on a a convex optimization objective. The paper shows (1) how different activation functions correspond to different proximal operators, (2) that replacing Bernoulli dropout with additive dropout corresponds to replacing the τ-nice proximal gradient descent method with a variance-reduced proximal method, and (3) how to compute the Lipschitz constant required to set the optimal step size in the proximal step. The practical value of this perspective is illustrated in experiments that replace various layers in ConvNet architectures with proximal solvers, leading to performance improvements on CIFAR-10 and CIFAR-100. The reviewers felt that most of their concerns were adequately addressed in the discussion and revision, and that the paper should be accepted.",ICLR2019,5: The area chair is absolutely certain +Hye-Eey7lV,1544900000000.0,1545350000000.0,1,ByfyHh05tQ,ByfyHh05tQ,Consensus is accept,Accept (Poster),"After a healthy discussion between reviewers and authors, the reviewers' consensus is to recommend acceptance to ICLR. The authors thoroughly addressed reviewer concerns, and all reviewers noted the quality of the paper, methodological innovations and SotA results.",ICLR2019,5: The area chair is absolutely certain +FlzeyEDd2Cr,1642700000000.0,1642700000000.0,1,O2s9k4h0x7L,O2s9k4h0x7L,Paper Decision,Reject,"This to me looks like quality work not yet adequately developed, and thus is borderline work. The authors seem to have achieved a good result: equalling SotA SEAL (although, one reviewer did preliminary experiments and could not match this) with a sophisticated algorithm using a variety of Bayesian tricks, a more scalable algorithm, and one potentially adapted to further tasks. However, not all of these impressive feats are adequately demonstrated in this paper, though many had parts included in the rewrite. So I'd say the paper needs a rewrite and more focussed experimental work to broaden the presentation of empirial performance, for instance to node classification. +I certaintly appreciated the use of IBP and Dirichlet models within the system, so would love to see the work further developed. +The reviewers agreed in several aspects: (1) more experimental work, for instance on better and larger benchmark data, (2) better presentation and discussion of the theory, (3) better discussion of the motivation for the model (as per reviewer D8S8), and oftentimes linked to the ablation study to support this, which you have done some of (4) additional connections to recent related work in graph representation learning on link prediction works +The authors have done a good job or addressing many of the reviewers concerns, ultimately lifting the paper from Reject to Borderline Negative, but I think more work is needed.",ICLR2022, +S5FRMZhf59,1576800000000.0,1576800000000.0,1,BygzbyHFvB,BygzbyHFvB,Paper Decision,Accept (Spotlight),"The paper proposes a new algorithm for adversarial training of language models. This is an important research area and the paper is well presented, has great empirical results and a novel idea. ",ICLR2020, +3h0VKFtfvNV,1610040000000.0,1610470000000.0,1,yeeS_HULL7Z,yeeS_HULL7Z,Final Decision,Reject,"This paper proposes to use context-based metric learning, where an attention/Transformer-based mechanism is used to incorporate neighborhood information for deep learning-based metric learning. This was initially demonstrated on two simpler datasets, although larger ones were added during the rebuttal. On the whole, reviewers appreciated the simplicity and intuition behind the idea, but the consensus among all of the reviewers found several aspects lacking, including: 1) clarity of the descriptions in the paper, 2) novelty compared to existing work, especially that of Set Transformer for clustering, 3) lack of convincing results compared to baselines, or at least analysis/justification for negative results. While the reviewers appreciated the authors' rebuttal and experiments, it did not address many of these concerns. The idea is interesting and seems to hold some promise, so the authors are encouraged to refine these aspects in order to fully explore this idea and submit to a future venue. ",ICLR2021, +8YUT0JYT7JU,1610040000000.0,1610470000000.0,1,TR-Nj6nFx42,TR-Nj6nFx42,Final Decision,Accept (Poster),"This paper gives a new PAC-Bayesian generalization error bound for graph neural networks (GCN and MPGNN). The bound improves the previously known Rademacher complexity based bound given by Garg et al. (2020). In particular, its dependency on the maximum node degree and the maximum hidden dimension is improved. + +This paper gives an interesting improvement on the generalization analysis of GNNs. The writing is clear, where its connection to existing work and its technical contribution are well discussed. +The biggest concern is its technical novelty. Indeed, the proof follows the out-line of Neyshabur et al. (2017). Given that the technical novelty would be a bit limited, however, the analysis should properly deal with the complicated structure specific to GNNs which makes the analysis more difficult than usual CNN/MLP and requires subtle and careful manipulations. +In addition to that, the improvement of the generalization bound is valuable for the literature (while the improvement seems a bit minor for graphs with small maximum degree). + +For these reasons, I recommend acceptance for this paper.",ICLR2021, +myzUxy6lvs,1576800000000.0,1576800000000.0,1,ryxUMREYPr,ryxUMREYPr,Paper Decision,Reject,"This paper studies the problem of mode collapse in GANs. The authors present new metrics to judge the model's diversity of the generated faces. The authors present two black-box approaches to increasing the model diversity. The benefit of using a black box approach is that the method does not require access to the weights of the model and hence it is more easily usable than white-box approaches. However, there are significant evaluation problems and lack of theoretical and empirical motivation on why the methods proposed by the paper are good. The reviewers have not changed their score after having read the response and there is still some gaps in evaluation which can be improved in the paper. Thus, I'm recommending a Rejection.",ICLR2020, +sCLrkQPh2LY,1610040000000.0,1610470000000.0,1,YTyHkF4P03w,YTyHkF4P03w,Final Decision,Reject,"This work appears to be a promising start to a research direction. However, as the reviewers noted, the work does not compare to alternative approaches and the presentation of the work overall is incomplete.",ICLR2021, +SkAcsG8de,1486400000000.0,1486400000000.0,1,S1di0sfgl,S1di0sfgl,ICLR committee final decision,Accept (Poster),"This extension to RNNs is clearly motivated, and the details of the proposed method are sensible. The paper would have benefitted from more experiments such as those in Figure 5 teasing out the representations learned by this model.",ICLR2017, +Ndn2zhMfK7,1576800000000.0,1576800000000.0,1,Bkx29TVFPr,Bkx29TVFPr,Paper Decision,Reject,"The paper proposes an implicit function approach to learning the modes of multimodal regression. The basic idea is interesting, and is clearly related to density estimation, which the paper does not discuss. + +Based on the reviews and the fact that the authors did not submit a helpful rebuttal, I recommend rejection.",ICLR2020, +FHYKL83VmV,1576800000000.0,1576800000000.0,1,BJluxREKDB,BJluxREKDB,Paper Decision,Accept (Poster),"This paper proposes a new method to learning heuristics for quantified boolean formulas through RL. The focus is on a method called backtracking search algorithm. The paper proposes a new representation of formulas to scale the predictions of this method. + +The reviewers have an overall positive response to this paper. R1 and R2 both agree that the paper should be accepted, and have given some minor feedback to improve the paper. R3 initially was critical of the paper, but the rebuttal helped to clarify their doubt. They still have one more comment and I encourage the authors to address this in the final version of the paper. + +R3 meant to increase their score but somehow this is not reflected in the current score. Based on their comments though, I am assuming the scores to be 6,8,6 which makes the cut for ICLR. Therefore, I recommend to accept this paper.",ICLR2020, +odPEBe7Qow,1642700000000.0,1642700000000.0,1,_SJ-_yyes8,_SJ-_yyes8,Paper Decision,Accept (Poster),"The paper addresses various improvements in visual continuous RL, based on a previous RL algorithm (DrQ). As the reviewers point out, the main contribution of the paper is of empirical nature, demonstrating how several different choices relative to DrQ significantly improve data efficiency and wall-clock computation, such that several control problems of the DeepMind control suite can be solved more efficiently. The average rating for the paper is above the acceptance threshold, and some reviewers increased their rating after there rebuttal. While a mostly empirically motivated papers is always a bit more controversial, the paper may nevertheless stimulated an interesting discussion at ICLR that will be beneficial for the community, and should thus be accepted.",ICLR2022, +Byg9QIWlxN,1544720000000.0,1545350000000.0,1,SkfMWhAqYQ,SkfMWhAqYQ,Meta Review,Accept (Poster),"This paper presents an approach that relies on DNNs and bags of features that are fed into them, towards object recognition. The strength of the papers lie in the strong performance of these simple and interpretable models compared to more complex architectures. The authors stress on the interpretability of the results that is indeed a strength of this paper. + +There is plenty of discussion between the first reviewer and the authors regarding the novelty of the work as the former point out to several related papers; however, the authors provide relatively convincing rebuttal of the concerns. + +Overall, after the long discussion, there is enough consensus for this paper to be accepted to the conference.",ICLR2019,4: The area chair is confident but not absolutely certain +4mgcBIeQ1gv,1610040000000.0,1610470000000.0,1,Jf24xdaAwF9,Jf24xdaAwF9,Final Decision,Reject,"The reviewers have arrived at the consensus that this is a paper with an interesting idea, both novel and well-explained, but not quite backed up with sufficient empirical evidence. Like them, I think there is a lot of potential in modular methods for continual learning, and I know these are challenging advances to demonstrate. So I encourage you to persist, iterate and submit a stronger version of this paper in the future!",ICLR2021, +AbdGevkyPC,1576800000000.0,1576800000000.0,1,Bke6vTVYwH,Bke6vTVYwH,Paper Decision,Reject,The paper combines graph convolutional networks with noisy label learning. The reviewers feel that novelty in the work is limited and there is a need for further experiments and extensions. ,ICLR2020, +rktmafIOe,1486400000000.0,1486400000000.0,1,HkNEuToge,HkNEuToge,ICLR committee final decision,Reject,"This paper proposes a variant of convolutional sparse coding with unit norm code vectors using cosine distance to evaluate reconstructions. The performance gains over baseline networks are quite minimal and demonstrated on limited datasets, therefore this work fails to demonstrate practical usefulness, while the novelty of the contribution is too slight to stand on its own merit.",ICLR2017, +p3FCnyfAX,1576800000000.0,1576800000000.0,1,SkgWeJrYwr,SkgWeJrYwr,Paper Decision,Reject,"In this paper the authors propose a wrapper feature selection method that selects features based on 1) redundancy, i.e. the sensitivity of the downstream model to feature elimination, and 2) relevance, i.e. how the individual features impact the accuracy of the target task. The authors use a combination of the redundancy and relevance scores to eliminate the features. + +While acknowledging that the proposed model is potentially useful, the reviewers raised several important concerns that were viewed by AC as critical issues: +(1) all reviewers agreed that the proposed approach lacks theoretical justification or convincing empirical evaluations in order to show its effectiveness and general applicability -- see R1’s and R2’s requests for evaluation with more datasets/diverse tasks to assess the applicability and generality of the proposed model; see R1’s, R4’s concerns regarding theoretical analysis; +(2) all reviewers expressed concerns regarding the technical issue of combining the redundancy and relevance scores -- see R4’s and R2’s concerns regarding the individual/disjoint calibration of scores; see R1’s suggestion to learn to reweigh the scores; +(3) experimental setup requires improvement both in terms of clarity of presentation and implementation -- see R1’s comment regarding the ranker model, see R4’s concern regarding comparison with a standard deep learning model that does feature learning for a downstream task; both reviewers also suggested to analyse how autoencoders with different capacity could impact the results. +Additionally R1 raised a concern regarding relevant recent works that were overlooked. +The authors have tried to address some of these concerns during rebuttal, but an insufficient empirical evidence still remains a critical issue of this work. To conclude, the reviewers and AC suggest that in its current state the manuscript is not ready for a publication. We hope the reviews are useful for improving and revising the paper. +",ICLR2020, +VEEw1BHzKNN,1642700000000.0,1642700000000.0,1,gaYko_Y2_l,gaYko_Y2_l,Paper Decision,Reject,"Existing methods for graph clustering usually use node/edge information, but ignore graph-level information. This paper proposes incorporating graph-level labels into graph clustering and formulating the new problem as weakly supervised graph clustering. The paper further proposes Gaussian Mixture Graph Convolutional Network (GMGCN) framework for the task. Experimental results on several datasets demonstrate the effectiveness of the method. + +The authors are very active in answering questions by the reviewers. They have successfully addressed some of the issues. However, there are still questions that remain unaddressed. The submission is not of the quality of ICLR papers. + +Strength +* A new method is proposed. +* The proposed methods outperform baseline models on the given datasets and synthetic datasets. + +Weakness +* The explanations are not clear enough. Although the authors provide detailed responses to the reviews, the problems indicated by the reviewers are still not well addressed. +* The proposed method seems to be too complicated. +* It is not clear why the proposed method works. +* The problem studied might not be realistic. + +---- +Here is a summary of the reviewers' final comments. + +* Reviewer oDis slightly increased the score。 + +* Reviewer r2ym says“I read responses to my concerns and others, but except for some clarifying statements and notations, authors' responses are not convincing enough. Also, while I now understand the concept of proposed work better than before, I do not think that it is explained and presented well enough.” + +* Reviewer inpd says “would like to keep my original score”.",ICLR2022, +SykJHyaSM,1517250000000.0,1517260000000.0,473,SkFvV0yC-,SkFvV0yC-,ICLR 2018 Conference Acceptance Decision,Reject,"The paper presents a variant of network morphism (Wei et al., 2016) for dynamically growing deep neural networks. There are some novel contributions (such as OptGD for finding a morphism given the parent network layer). However, in the current form, the experiments mostly focus on comparisons against fixed network structure (but this doesn't seem like a strong baseline, given Wei et al.'s work), so the paper should provide more comparisons against Wei et al. (2016) to highlight the contribution of this work. In addition, the results will be more convincing if the state-of-the-art performance can be demonstrated for large-scale problems (such as ImageNet classification). ",ICLR2018, +23yNBHvPFrD,1610040000000.0,1610470000000.0,1,edku48LG0pT,edku48LG0pT,Final Decision,Reject,"This paper proposed an MCMC sampler that combines HMC and neural network based proposal distribution. It is an improvement over L2HMC and [Titsias & Dellaportas, 2019], with the major innovation being that, the proposed normalizing flow-based proposal is engineered such that the density of the proposal $q(x'|x)$ is tractable. Experiments are conducted on synthetic distributions, Bayesian logistic regression and deep energy-based model training. + +While reviewers are overall happy about the novelty of the approach, some clarity issues have been raised in some of the reviewers' initial reviews. Also concerns on the evaluation settings, including the missing evaluation metric such as ESS/second, are also raised by the reviewers. The revision addressed some of the clarity issues, but some experimental evaluation issues still exist (e.g. comparing with L2HMC in terms of ESS/second), and the replaced MALA baseline results make the improvement of the proposed approach less clear. + +I personally find the proposed approach as a very interesting concept. However I also agree with the reviewers that more experimental studies need to be done in order to understand the real gain of the approach. ",ICLR2021, +SJlUjT3gxV,1544760000000.0,1545350000000.0,1,r1lFIiR9tQ,r1lFIiR9tQ,Rejection: interest idea but empirical results are too week ,Reject,"The paper proposes a new method for training generative models by minimizing general f-divergences. The main technical idea is to optimize f-divergence between joint distributions which is rightly observed to be the upper bound of the f-divergence between the marginal distributions and address the disjoint support problem by convolving the data with a noise distribution. The basic ideas in this work are not completely novel but are put together in a new way. + +However, the key weakness of this work, as all the reviewer noticed, is that the empirical results are too week to support the usefulness of the proposed approach. The only quantitive results are in table 2, which is only a simple Gaussian example. It essential to have more substantial empirical results for supporting the new algorithm. +",ICLR2019,4: The area chair is confident but not absolutely certain +ZmXK-yH3sC,1576800000000.0,1576800000000.0,1,B1eyA3VFwS,B1eyA3VFwS,Paper Decision,Reject,"This paper introduces an FFT-based loss function to enforce physical constraints in a CNN-based PDE solver. The proposed idea seems sensible, but the reviewers agreed that not enough attention was paid to baseline alternatives, and that a single example problem was not enough to understand the pros and cons of this method.",ICLR2020, +34fTue2NkT,1576800000000.0,1576800000000.0,1,HyxFF34FPr,HyxFF34FPr,Paper Decision,Reject,"The paper proposes a method for object detection by predicting category-specific object probability and category-agnostic bounding box coordinates for each position that's likely to contain an object. The proposed idea is interesting and the experimental results show improvement over RetinaNet and other baselines. However, in terms of weakness, (1) conceptually speaking it's unclear whether the proposed method is a big departure from the existing frameworks; and (2) although the authors are claiming SOTA performance, the proposed method seems to be worse than other existing/recent work. Some example references are listed below (more available here: https://paperswithcode.com/paper/foveabox-beyond-anchor-based-object-detector). + +[1] Scale-Aware Trident Networks for Object Detection +https://arxiv.org/abs/1901.01892 + +[2] GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond +https://arxiv.org/abs/1904.11492 + +[3] CBNet: A Novel Composite Backbone Network Architecture for Object Detection +https://arxiv.org/abs/1909.03625 + +[4] EfficientDet: Scalable and Efficient Object Detection +https://arxiv.org/abs/1911.09070 + +References [3] and [4] are concurrent works so shouldn't be a ground of rejection per se, but the performance gap is quite large. Compared to [1] and [2] which have been on arxiv for a while (+5 months) the performance of the proposed method is still inferior. Despite considering that object detection is a very competitive field, the conceptual/technical novelty and overall practical significance seem limited for ICLR. For a future submission, I would suggest that a revision of this paper being reviewed in a computer vision conference, rather than ML conference. +",ICLR2020, +0d3M2WhcMLf6,1642700000000.0,1642700000000.0,1,#NAME?,#NAME?,Paper Decision,Reject,This paper proposed a transformer based routing network which removes the constraints in the original routing network such as the depth of a network. Multi-Task learning (MTL) based on routing has been an interesting topic in the deep learning research community. Our reviewers have serious concerns on the experiments. The presented empirical results do not seem to be able to sufficiently support the claims in this paper. Comparing with SOTA MTL methods is needed to make the proposed method convincing.,ICLR2022, +QZvyI5FFo8Z,1610040000000.0,1610470000000.0,1,VyENEGiEYAQ,VyENEGiEYAQ,Final Decision,Reject,"The paper attempts to make transformers more scalable for longer sequences. In this regards, authors propose a clustering-based attention mechanism, where only tokens attends to other tokens in the same cluster. This design reduces memory requirements and allows more information mixing than simple local windows. Using the proposed approach, new state-of-the-art performance is obtained on Natural Questions long answer, although marginal. However, reviewers raised numerous concerns. First, the novelty of the paper compared to prior work like reformer or routing transformer which also conceptually does clustering is not resolved. Second, the claim that k-means yields a more balanced/stable clustering than LSH is not well established. Finally, why clustering, i.e. attention between similar vectors is better than dissimilar or randomly chosen vectors or does is it even as expressive is not clear. Thus, unfortunately I cannot recommend an acceptance of the paper in its current form to ICLR.",ICLR2021, +y8aJq1wyHEa,1642700000000.0,1642700000000.0,1,oj2yn1Q4Ett,oj2yn1Q4Ett,Paper Decision,Accept (Poster),"In this paper, the authors study the decentralized empirical risk minimization problem with Reproducing Kernel Hilbert Space. I found the problem formulation and the solution quite interesting. The authors also answered the main comments of the reviewers. Even though part of the work is incremental, I feel that there is enough merit to accept this paper.",ICLR2022, +B1EsHJprG,1517250000000.0,1517260000000.0,639,BJjBnN9a-,BJjBnN9a-,ICLR 2018 Conference Acceptance Decision,Reject,"The paper received borderline negative scores: 5,6,4. + +The authors response to R1 question about the motivations was ""...thus can achieve similar classification results with much smaller network sizes. This translates into smaller memory requirements, faster computational speeds and higher expressivity."" If this is really the case, then some experimental comparison to compression methods (e.g. Song Han's PhD work at Stanford) is needed to back up this. + +R4 raises issues with the experimental evaluation and the AC agrees with them that they are disappointing. In general R4 makes some good suggestions for improving the paper. + +The author's rebuttal also makes the general point that the paper should be accepted as it contains ideas, that these are sufficient alone: ""We strongly believe that with some fine-tuning it could achieve considerably better results, however we also believe that this is not the point in a first submission..."". The AC disagrees with this. Ideas are cheap. *Good ideas*, i.e. those that work, as in get good performance on standard benchmarks are valuable however. The reason for having benchmarks is to give some of objective way of seeing if an idea has any merit to it. So while the reviewers and the AC accept that the paper has some interesting ideas, this is not enough for warrant acceptance. ",ICLR2018, +vrRG-GqSNUI,1642700000000.0,1642700000000.0,1,DfMqlB0PXjM,DfMqlB0PXjM,Paper Decision,Accept (Spotlight),"A multi-scale hierarchical variational autoencoder based technique is developed for unsupervised image denoising and artefact removal. The method is shown to achieve state of the art performance on several datasets. Further, the multi-scale latent representation leads to an interpretable visualization of the denoising process. + +The reviewers unanimously recommend acceptance.",ICLR2022, +tOJklmrQ67U,1642700000000.0,1642700000000.0,1,UTdxT0g6ZuC,UTdxT0g6ZuC,Paper Decision,Reject,"In this work the authors consider the automatic selection of time-series forecasting model (and hyperparameters) based on historical data. It adopts a conventional feature-based meta-learning approach. Experimental results show an improved performance over the considered baselines. + +The reviewers appreciated the clarifications provided by the authors, but a number of concerns were unresolved. For instance, questions remained regarding the dataset collection, the baselines against which the proposed method was compared to (which were considered too weak) and the large number of missing details in the presentation of the method. Based on this the reviewers concluded that the paper could not be accepted in its current form and would require a major revision.",ICLR2022, +nqDjYYUPchm,1610040000000.0,1610470000000.0,1,M71R_ivbTQP,M71R_ivbTQP,Final Decision,Reject,"Overall, this seems like a neat idea and well-done work. Main principle is to extract a very sparse net that does a good job at locally ""explaining"" a given example. The NeuroChains idea does this with a diffentiable sparse objective. I think this work is well-positioned and has nice properties: (1) retains a very small percentage of ""filters"", (2) it appears that all the selected filters are actually needed/useful (3) there are some generalization properties wrt to unseen samples that are close to the sample of interest. + +I appreciate that the authors responded with very detailed rebuttals to the concerns of the reviewers. I'm still worried, like AnonReviewer4, about the generalization around local regions though the follow-up experiments satisfy me for the most part. There is a genuine concern that while this method has the *potential* to produce useful outputs that could be useful for downstream experts to analyze the underlying network, the paper itself doesn't really show this. In other words, while I agree that on the technical side of things, the work passes the bar, it's not clear that the work passes the bar from the impact side of things. + +This did make for a genuinely a borderline case in terms of decisions and unfortunately this work landed on the reject side this time around.",ICLR2021, +HJF8Ny6BG,1517250000000.0,1517260000000.0,359,rk4Fz2e0b,rk4Fz2e0b,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"This paper was perceived as being well written, but the technical contribution was seen as being incremental and somewhat heuristic in nature. Some important prior work was not discussed and more extensive experimentation was recommended. +However, the proposed approach of partitioning the graph into sub graphs and a schedule alternating between intra and inter graph partitions operations has some merit. + +The AC recommends inviting this paper to the Workshop Track.",ICLR2018, +rBuAEYnPDHv,1642700000000.0,1642700000000.0,1,rdBuE6EigGl,rdBuE6EigGl,Paper Decision,Reject,"This paper considers augmenting LSTM language models with a form of residual connection that adds and additional feed forward layer before the softmax that integrates the output of the recurrent cell with the input embedding. This architectural variation is evaluated on the standard Penn Treebank and Wikitext-2 language modelling tasks and shown to lead to lower perplexities on the test sets, particularly when dynamic evaluation is used. + +The reviewers agree that the proposed addition is well motivated, however they also observe that there has been substantial work in language modelling on various forms of residual and skip connections and it is not clear how this work relates to that body of work. The authors have provided some additional comparisons during the discussion, however the reviewers feel that further evaluation and analysis is needed. There was also some additional confusion about the varying hyperparamter tuning protocols employed in the different evaluations. The author’s have clarified this in their response so that it is clearer how the different results were obtained. + +Overall this paper presents an promising initial result, but it would benefit from more complete evaluation, analysis, and hyperparameter tuning. This could include ablation studies and analysis to shed more light on what the proposed architectural addition is contributing, how this relates to other varieties of residual connection, and it’s positive interaction with dynamic evaluation. It would also be useful to include a tuned model with a comparison to previously reported Wikitext-2 results.",ICLR2022, +ka03bK43qM,1576800000000.0,1576800000000.0,1,SJxTZeHFPH,SJxTZeHFPH,Paper Decision,Reject,"The paper investigates the effect of focal loss on calibration of neural nets. + +On one hand, the reviewers agree that this paper is well-written and the empirical results are interesting. On the other hand, the reviewers felt that there could be better evaluation of the effect of calibration on downstream tasks, and better justification for the choice of optimal gamma (e.g. on a simpler problem setup). + +I encourage the others to revise the draft and resubmit to a different venue. +",ICLR2020, +L02LTmHQpM8,1610040000000.0,1610470000000.0,1,GVNGAaY2Dr1,GVNGAaY2Dr1,Final Decision,Reject,"This paper proposes a method for collaborative multi-agent learning and ad-hoc teamwork. The paper includes extensive empirical results across multiple environments (including one of known outstanding high difficulty) and repeatedly performs favourably in comparison to a suitable set of state of the art methods. The proposed method is motivated by theoretical analysis, which was considered interesting but its connection to the method in the initial paper was weak. + +Overall, there are remaining concerns which have not been fully addressed in the discussion phase. The authors' responses and discussion with the reviewers should be utilised to improve the material's presentation and to clarify the theory-empirical connection in future revisions of the paper.",ICLR2021, +IfT3DTalruX,1642700000000.0,1642700000000.0,1,7ADMMyZpeY,7ADMMyZpeY,Paper Decision,Reject,"In this paper, authors introduce two properties of feature representations, namely local alignment and local congregation, and show how these properties can be predictive of downstream performance. The paper has a heavier focus on providing theoretical statements using these properties but authors also empirically evaluate their suggested method. + +**Strong Points**: +- The paper is well-written and easy to follow. + + +- The proposed concepts (local alignment and local congregation) are intuitive. + + +- The theoretical statements and their proofs are correct. + + +- The proposed metric shows some advantage against a few baselines. + + +- Prior work on feature representations and transferability are discussed. + + +**Weak Points**: + + +- **The connections to prior work on K-nearest neighbors and linear classifiers are not properly discussed.** This is very important because authors assume that the network that outputs the feature representations is trained on a different data and they reduced the analysis to that of a binary linear classifier. Hence, all classical learning theory results on binary classifiers apply in this setting. Furthermore, KNN methods and analysis can be simply applied on the features as well. In light of this and the lack of discussion on this matter, the significance of the theoretical and empirical results are not clear. + + +- **The main proposed properties could be improved further**. It looks like the defined properties (local alignment and local congregation) could be improved by merging them into one property about separability of data? The current properties are sensitive to scaling which is undesirable given that classification performance is invariant to scaling of the features. It seems like local congregation is mostly capturing the scale so some normalized version of local alignment might be able to capture the main property of interest. + + +- **The theoretical results in their current form are not very significant.** One limiting factor on the theoretical results is that since the analysis is done only on the classification layer, it does not say anything about the relationship of the upstream and downstream tasks. But perhaps the most important limitation is that the properties are defined based on the downstream task distribution as opposed to downstream training data. That makes it difficult to measure them in practical settings where we have a limited number of data points. Classical results on learning theory avoid this and only use measures that depend on the given training set. + + +- **The empirical evaluation could benefit from stronger baselines** Authors mentioned ""We therefore consider only baselines that make minimal assumptions about the pre-trained feature representation and the target task"" and hence avoided comparing to many prior methods. However, I think the appropriate approach would be to compare the performance of the proposed method to strong baselines but then explain how they differ in terms of their assumptions, etc. Moreover, there are other simple heuristic baselines to consider, eg. K-NN (which is not computationally expensive in the few-shot settings) or a classifier that is trained by initializing it to be the sum of feature vectors in the first class (assuming binary classification) minus sum of feature vectors in the second class and doing a few SGD updates on it. Therefore, I believe authors could improve the empirical section significantly by taking these suggestions into account. + + +**Final Decision Rationale**: + +This is a borderline paper. While the paper has a nice combination of theoretical and empirical contributions, both theoretical and empirical contributions have a lot of room for improvement (and a clear path to get there) as pointed above. In particular, I believe having either strong theoretical contributions or strong empirical contributions would have been enough for acceptance and I hope authors would take the above suggestions into account and submit the improved version of this work again!",ICLR2022, +rJ6gr1prf,1517250000000.0,1517260000000.0,499,SJzMATlAZ,SJzMATlAZ,ICLR 2018 Conference Acceptance Decision,Reject,"After careful consideration, I think that this paper in its current form is just under the threshold for acceptance. Please note that I did take into account the comments, including the reviews and rebuttals, noting where arguments may be inconsistent or misleading. + +The paper is a promising extension of RCC, albeit too incremental. Some suggestions that may help for the future: + +1) Address the sensitivity remark of reviewer 2. If the hyperparameters were tuned on RCV1 instead of MNIST, would the results across the other datasets remain consistent? + +2) Train RCC or RCC-DR in an end-to-end way to gauge the improvement of joint optimization over alternating, as this is one of the novel contributions. + +3) Discuss how to automatically tune \lambda and \delta_1 and \delta_2. These may appear in the RCC paper, but it's unclear if the same derivations hold when going to the non-linear case (they may in fact transfer gracefully, it's just not obvious). It would also be helpful for researchers building on DCC.",ICLR2018, +rkVKhzIdl,1486400000000.0,1486400000000.0,1,HycUbvcge,HycUbvcge,ICLR committee final decision,Reject,"This is largely a clearly written paper that proposes a nonlinear generalization of a generalized CCA approach for multi-view learning. In terms of technical novelty, the generalization follows rather straightforwardly. Reviewers have expressed the need to clarify relationship and provide comparisons to existing proposals for combining deep learning with CCA. As such the paper has been evaluated to be borderline. The proposed method appears to yield significant gains on a speech dataset, though comparisons on other datasets appear to be less conclusive. Some basic baselines as missing, e.g., concatenating views and running a deep model, or using much older nonlinear extensions of CCA such as kernel CCA (e.g. accelerated via random features, and combined with deep representations underneath).",ICLR2017, +XsYrmeMO2bw,1642700000000.0,1642700000000.0,1,OcKMT-36vUs,OcKMT-36vUs,Paper Decision,Accept (Poster),"The paper provides a thorough study of the evolution of Hessian depending on a wide variety of aspects such as initialization, architectural choices, and common training heuristics. The paper makes a number of interesting observations. Some of them are not really new but overall, the experimental evaluation of the paper makes it a valuable resource for the community. + +The reviewers are overall quite positive. One reviewer notes that more investigation of the behavior of batch-normalization is required. I encourage the author to address this concern in the final manuscript. There is a lot of recent work on batch-normalization that might be worth discussing, e.g.: +Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs +Jonathan Frankle, David J. Schwab, Ari S. Morcos + +Batch Normalization Provably Avoids Rank Collapse for Randomly Initialised Deep Networks +Hadi Daneshmand, Jonas Kohler, Francis Bach, Thomas Hofmann, Aurelien Lucchi + +A Quantitative Analysis of the Effect of Batch Normalization on Gradient Descent +Yongqiang Cai, Qianxiao Li, Zuowei Shen",ICLR2022, +1S-faS-wI8O,1610040000000.0,1610470000000.0,1,dV19Yyi1fS3,dV19Yyi1fS3,Final Decision,Accept (Poster),"Quantization is an important practical problem to address. The proposed method which quantizes a different random subset of weights during each forward is simple and interesting. The empirical results on RoBERTa and EfficientNet-B3 are good, in particular, for int4 quantization. During the rebuttal, the authors further included quantization results on ResNet which were suggested by the reviewers. This additional experiment is important for comparing this proposed approach with the existing methods which do not have quantization results on the models in this paper. ",ICLR2021, +nx10142g4r,1610040000000.0,1610470000000.0,1,xpx9zj7CUlY,xpx9zj7CUlY,Final Decision,Accept (Oral),"The reviewers agree that this is an interesting and original paper that will be of interest to the ICLR community, and is likely to lead to follow up work.",ICLR2021, +ir875TV2Bj,1576800000000.0,1576800000000.0,1,BygWRaVYwH,BygWRaVYwH,Paper Decision,Reject,"The reviewers agree that the technical innovations presented in this paper are not great enough to justify acceptance. The authors correctly point out to the reviewers that the ICLR CFP states that the topics of ""implementation issues, parallelization, software platforms, hardware” are acceptable. I would point out that most papers in these spaces describe *technical innovations* that enable improvements in ""parallelization, software platforms, hardware"" rather than implementations of these improvements. However, it is certainly true that a software package is an acceptable (although less common) basis for a publication, provided is it sufficiently unique and impactful. After pointing this out to the reviewers and collecting opinions, the reviewers do not feel the combined technical and software contributions of this paper are enough to justify acceptance. + ",ICLR2020, +OMUQW7bOXqY,1642700000000.0,1642700000000.0,1,OJm3HZuj4r7,OJm3HZuj4r7,Paper Decision,Accept (Poster),"It is important to have good stable and trustworthy algorithms. Though I am unconvinved that the C-DQN algorithm proposed here is the final word (and I suppose this is not controversial, and the authors might agree), the ideas presented here are sufficiently interesting to be disseminated and discussed more widely. All reviewers recommended accepting the paper, and I'll follow their lead. + +That said, the paper can still be improved, and the authors are encouarged to carefully consider the feedback provided by the reviewers. In particular, it is good to be clear about which parts are principled, and which parts are somewhat heuristic or arbitrary, and could therefore presumably be improved in future work. In fact, doing so clearly could make the paper _more_ rather than less impactful. + +In any case, it seems good to include this paper at the conference, to highlight the questions and partial answers given here, and to inspire more discussion.",ICLR2022, +Bk3yr1TBz,1517250000000.0,1517260000000.0,484,rJg4YGWRb,rJg4YGWRb,ICLR 2018 Conference Acceptance Decision,Reject,"A version of GCNs of Kipf and Welling is introduced with (1) no non-linearity; (2) a basic form of (softmax) attention over neighbors where the attention scores are computed as the cosine of endpoints' representations (scaled with a single learned scalar). There is a moderate improvement on Citeseer, Cora, Pubmed. + +Since the use of gates with GCNs / Graph neural networks is becoming increasingly common (starting perhaps with GGSNNs of Li et al, ICLR 2016)) and using attention in graph neural networks is also not new (see reviews and comments for references), the novelty is very limited. In order to make the submission more convincing the authors could: (1) present results on harder datasets; (2) carefully evaluate against other forms of attention (i.e. previous work). + +As it stands, though it is interesting to see that such simple model performs well on the three datasets, I do not see it as an ICLR paper. + +Pros: +-- a simple model, achieves results close / on par with state of the art + +Cons: +-- limited originality +-- either results on harder datasets or / and evaluation agains other forms of attention (i.e. previous work) are needed + + +",ICLR2018, +ERrsLj55s0g,1642700000000.0,1642700000000.0,1,ZgrmzzYjMc4,ZgrmzzYjMc4,Paper Decision,Reject,"This paper studies the problem of choosing the best cloud provider for a task. The problem is formulated as a bandit and solved using algorithm CloudBandit. The algorithm is compared to several baselines, such as SMAC, and performs well. The evaluation is done on 60 different multi-cloud configuration tasks across 3 public cloud providers, which the authors want to share with the public. + +This paper has four borderline reject reviews. All reviewers agree that it studies an important problem and that the promised multi-cloud optimization dataset could spark more research in the area of cloud optimization. The weaknesses of the paper are that it is not technically strong and that the quality of the new dataset is not clear from its description. At the end, the scores of this paper are not good enough for acceptance. Therefore, it is rejected.",ICLR2022, +XtFN2kHAMlc,1642700000000.0,1642700000000.0,1,_PlNmPOsUS9,_PlNmPOsUS9,Paper Decision,Reject,"The paper proposes a new method to train ensembles of classifiers that are robust to adversarial attacks, in particular black-box transfer-based ones. This is achieved by enforcing the output of early layers of different members of the ensemble to have, on average, gradients with low cosine similarity, which should in turn create different decision boundaries. For this, the authors design a specific loss function, PARL, to be minimized at training time. Two reviewers gave the score of 6 while two reviewers gave the score of 3. The main concerns are: 1) unclear meaning of taking the sum of the gradients of different neurons, and why the similarity of that across models is a proxy for similarity of the decision boundaries; 2) lack of experiments, that is, omitting a simpler baseline like individual robust classifiers. Positive score reviewers also did not champion the paper, thus, the paper should be well addressed these main concerns in the revision and cannot accept to ICLR for now.",ICLR2022, +B1sjjGLdl,1486400000000.0,1486400000000.0,1,HkwoSDPgg,HkwoSDPgg,ICLR committee final decision,Accept (Oral),"The paper presents a general teacher-student approach for differentially-private learning in which the student learns to predict a noise vote among a set of teachers. The noise allows the student to be differentially private, whilst maintaining good classification accuracies on MNIST and SVHN. The paper is well-written.",ICLR2017, +rJwEXJaHf,1517250000000.0,1517260000000.0,118,rJQDjk-0b,rJQDjk-0b,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The reviewers agree that the proposed method is theoretically interesting, but disagree on whether it has been properly experimentally validated. My view is that the the theoretical contribution is interesting enough to warrant inclusion in the conference, and so I will err on the side of accepting.",ICLR2018, +r13lTMUOe,1486400000000.0,1486400000000.0,1,HyenWc5gx,HyenWc5gx,ICLR committee final decision,Reject,"The proposed approach is not consistently applied for the different experiments; this significantly harms the overall value of the research. The results are also quite domain-specific, and it is not clear if the findings would hold more generally. The paper is not clearly organised or written and does not give a specific enough introduction to the field of transfer learning.",ICLR2017, +BkxaHGJ1lV,1544640000000.0,1545350000000.0,1,ryGs6iA5Km,ryGs6iA5Km,Excellent theoretical contribution to the graph neural network literature,Accept (Oral),"Graph neural networks are an increasingly popular topic of research in machine learning, and this paper does a good job of studying the representational power of some newly proposed variants. The framing of the problem in terms of the WL test, and the proposal of the GIN architecture is a valuable contribution. Through the reviews and subsequent discussion, it looks like the issues surrounding Theorem 3 have been resolved, and therefore all of the reviewers now agree that this paper should be accepted. There may be some interesting followup work based on studying depth, as pointed out by reviewer 1, but this may not be an issue in GIN and is regardless a topic for future research.",ICLR2019,4: The area chair is confident but not absolutely certain +fSRh0ePzeP,1642700000000.0,1642700000000.0,1,eqRTPB134q0,eqRTPB134q0,Paper Decision,Reject,"The paper formally studies the problem of partial identifiability when inferring a reward function from a given data source (e.g., expert demonstrations or trajectory preferences). To formally characterize this ambiguity in a data source, the paper proposes considering the infinite-limit data regime, which bounds the reward information recoverable from a source. Furthermore, this ambiguity is then studied in the context of different downstream tasks, as recovering an exact reward function may not be necessary for a given task. The paper is primarily theoretical, and the results provide a unified view of the problem of partial identifiability in reward learning for different sources and downstream tasks. + +Overall, the reviewers acknowledged the importance of the problem setting and found the results promising. There is quite a bit of spread in the reviewers' final assessment of the paper with ratings 8, 8, 3, 3 (note: one of the reviewers with rating 3 has a low confidence). The authors' responses did help in discussions; however, a few of the concerns, as raised by reviewers, still remained. The key issues are related to the general accessibility of the paper and the lack of concrete examples to highlight the proposed theoretical framework. At the end of the discussions, several reviewers (including those with an overall positive rating) shared concerns about the paper's accessibility. With this, unfortunately, the paper stands as borderline. Nevertheless, this is exciting and potentially impactful work, and we encourage the authors to incorporate the reviewers' feedback when preparing a future revision of the paper.",ICLR2022, +taGTaJSXjdE,1642700000000.0,1642700000000.0,1,RNf9AgtRtL,RNf9AgtRtL,Paper Decision,Reject,"This paper studies improving continuous control. The paper suggests a practical, beneficial combination approach that does well in the presented experiments. It also provides some overview and comparison over several recent insights in RL. While both are valuable, multiple reviewers had concerns that the paper has some limitations on both. In particular, the proposed ensemble approach is quite simple though valuable, and that reviewers generally felt that raised their expectations as to the strength of the empirical results which was not yet there. The reviewers’ provided a lot of detailed feedback that may be useful in revising the contribution.",ICLR2022, +Ni5eKNNvjY,1576800000000.0,1576800000000.0,1,B1xbTlBKwB,B1xbTlBKwB,Paper Decision,Reject,"The authors tackle an interesting and important problem, developing numerical common-sense. They use a crowdsourcing service to collect a dataset and use regression from word embeddings to numerical common sense. + +Reviewers were concerned with the size and quality of the dataset, the quality of the prediction methods used, and the analysis of the experimental results. + +Given the many concerns, I recommend rejecting the paper, but I encourage the authors to revise the paper to address the concerns and resubmit to another venue.",ICLR2020, +MjcwQXsMKzi,1610040000000.0,1610470000000.0,1,R0a0kFI3dJx,R0a0kFI3dJx,Final Decision,Accept (Poster),"The paper introduces a new step size rule for the extragradient/mirror-prox algorithm, building upon and improving the results of Bach & Levy for the deterministic convex-concave setups. The proposed adaptation of EG/Mirror-prox -- dubbed AdaProx in the submitted paper -- has the rate interpolation property, which means that it provides order-optimal rates for both smooth and nonsmooth problems, without any knowledge of the problem class or the problem parameters for the input instance. The paper also demonstrates that the same algorithm can handle certain barrier-based problems, using regularizers based on the Finsler metric. + +The consensus of the reviews was that the theory presented in the paper is solid and interesting. The main concerns shared by a subset of the reviews were regarding the practical usefulness of the proposed method. In particular, the method exhibits large constants in the convergence bounds and cannot handle stochastic setups. Further, the empirical evidence provided in the paper was deemed insufficient to demonstrate the algorithm's competitiveness on learning problems. If possible, the authors are advised to provide more convincing empirical results in a revised version, or, alternatively, to tone down the claims regarding the practical performance of the method.",ICLR2021, +rJ-sH1Trf,1517250000000.0,1517260000000.0,636,SJa1Nk10b,SJa1Nk10b,ICLR 2018 Conference Acceptance Decision,Reject,"The paper received mixed reviews with scores of 5 (R1), 5 (R2), 7 (R3). All three reviewers raise concerns about the lack of comparisons to other methods. The rebuttal is not compelling on this point. There are quite a few methods that could be used for this application available (often with source code) and should be compared to, e.g. DenseNets (Huang et al.). Given that the proposed method isn't in of itself hugely novel, a thorough experimental evaluation is crucial to the justifying the approach. The AC has closely looked at the rebuttal and the paper and feels that it cannot be accepted for this reason at this time. ",ICLR2018, +YzhrRI64Xc2P,1642700000000.0,1642700000000.0,1,DXRwVRh4i8g,DXRwVRh4i8g,Paper Decision,Reject,"The authors present a method for creating a curriculum for goal-conditioned reinforcement learning. In particular, they propose to use reachability traces to define a sequence of sub-goals that aid learning. During the review process, the reviewers mentioned the novelty of the proposed approach and the intuitive explanations provided by the authors. However, the reviewers also pointed out that the experiments could be more thorough, errors in the theoretical justification of the method as well as simplicity of the evaluation environments, among others. Some of the reviewers increased their score after the authors' rebuttal but it was not enough to advocate for acceptance of the paper. I encourage the authors to incorporate reviewers' feedback in the next version of the paper.",ICLR2022, +8IaY1uTJ5R3,1610040000000.0,1610470000000.0,1,zCu1BZYCueE,zCu1BZYCueE,Final Decision,Reject,"This paper studies the important problem of efficiently identifying good hyperparameters for convolutional neural networks. The proposed approach is based on using an SVD of unfolded weight tensors to build a response surface that can be optimized with a dynamic tracking algorithm. Reviewers raised a number of concerns which were not fully addressed in rebuttals and lead me to recommend rejecting this work. In particular: focus on single hyperparameter (learning rate) made it unclear whether the proposed approach could actually be used for other hyperparameters or to jointly optimize combination of hyperparameters, empirical improvements even for learning rate are not strong and baselines are weak, and concern that initial success early in training (5 epochs) may not lead to generalization late in training. Additionally, there were several concerns around the clarity of the presentation, which I also found hard to follow: how is KG related to information-theoretic metrics, why is the particular form of averaging across layers reasonable, and how is it related to other generalization / performance metrics? With additional experiments on other hyperparameters (for example L2 regularization), I think the work would be greatly strengthened. ",ICLR2021, +SJIsVkTrM,1517250000000.0,1517260000000.0,425,SJyfrl-0b,SJyfrl-0b,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"The authors addressed the reviewers concerns but the scores remain somewhat low. +The method is not super novel, but it is an incremental improvement over existing approaches.",ICLR2018, +Td640Jvki6h,1610040000000.0,1610470000000.0,1,QzKDLiosEd,QzKDLiosEd,Final Decision,Reject," +The paper presents a side-channel attack in a scenario where the attacker is able to place a induction sensor near the power cable of the victim's GPU. The authors train a neural network to analyse the magnetic flux measured by the sensor to recover the structure (layer type and layer parameters) of the target neural network. The authors also show that for a wide range of target network structure, by training a network with the inferred structure, they produce adversarial examples as effective as a white box attack. + +The points raised by the reviewers were the following: 1) the result that this type of side-channel attack works is interesting, 2) the practicality of the attack is unclear because the attacker needs hardware access to the victim's GPU, 3) the ML contribution is not really clear and a venue on cyber-security might be more appropriate. + +Side-channel attacks on deep neural networks can be of relevance to ICLR (as pointed to by the authors by the ICLR papers/submissions on system side-channel attacks). Nonetheless, I tend to agree with R1 and R2 that the ML contribution is limited (either in terms of application of ML or methodology), and the concerns of practicality of the approach make me lean towards rejection.",ICLR2021, +y74Ry1hL2,1576800000000.0,1576800000000.0,1,ryx0nnEKwH,ryx0nnEKwH,Paper Decision,Reject,The paper proposes a novel mechanism to reduce the skewness of the activations. The paper evaluates their claims on the CIFAR-10 and Tiny Imagenet dataset. The reviewers found the scale of the experiments to be too limited to support the claims. Thus we recommend the paper be improved by considering larger datasets such as the full Imagenet. The paper should also better motivate the goal of reducing skewness.,ICLR2020, +#NAME?,1610040000000.0,1610470000000.0,1,uRKqXoN-Ic9,uRKqXoN-Ic9,Final Decision,Reject,"All reviewers except for AnonReviewer1 were in favour of accept. AnonReviewer1 was strongly in favour of reject, but AnonReviewer2 argued against some of AnonReviewer1's opinion. The authors also gave a coherent, well argued statement of their contribution. Nevertheless, there are some improvements still needed. + +Position: the scope of the uncertainty estimation to Dirichlet based uncertainty estimation techniques was limited. + +Sticking to Dirichlet-based uncertainty is limited, although the coverage of methods within the Dirichlet-based family is OK but could be improved. Note (from AnonReviewer1's comments) Joo Chung and Seo, ICML 2020, is one paper that should be included and Chan, Alaa, and van der Schaar, ICML2020 is also relevant. While its not about adversial attacks it covers a related idea with a good technique. Finally, these papers cite Ovadia, Fertig Ren etal. NeurIPS 2019, which is an excellent summary of calibration and estimation under shift, not exactly adversarial attacks but surely related. The big winner is deep ensembles (Lakshminarayanan etal, NeurIPS 2017). I think using deep ensembles directly would be a good complement to the Dirichlet methods in this paper. +Note, also, the authors already included additional works mentioned by AnonReviewer2. + +Critique: The authors proposed a robust training strategy but this didn't lead to uniform improvement. + +Position: The scope of the adversarial attacks is limited. + +The attacks covered are a good though basic range. But because these show problems, the argument is that more sophisticated attacks do not need to be studied. + +Position: The datasets covered is limited. + +Certainly, there are problems with extending experiments to text data. But the argument is that if things don't work well for the smaller datasets given, then that is still a problem, so why bother extending the evaluation to larger datasets. + +Arguably, the latter two positions have been addressed by the authors, but not the first two. This makes the paper marginal. +So this is a good publishable paper, but comparatively marginal.",ICLR2021, +HJgQZ6UggV,1544740000000.0,1545350000000.0,1,rJg4J3CqFm,rJg4J3CqFm,An interesting word embedding method,Accept (Poster)," ++ An interesting and original idea of embedding words into the (very low dimensional) Wasserstein space, i.e. clouds of points in a low-dimensional space ++ As the space is low-dimensional (2D), it can be directly visualized. ++ I could imagine the technique to be useful in social / human science for data visualization, the visualization is more faithful to what the model is doing than t-SNE plots of high-dimensional embeddings ++ Though not the first method to embed words as densities but seemingly the first one which shows that multi-modality / multiple senses are captured (except for models which capture discrete senses) ++ The paper is very well written + +- The results are not very convincing but show that embeddings do capture word similarity (even when training the model on a small dataset) +- The approach is not very scalable (hence evaluation on 17M corpus) +- The method cannot be used to deal with data sparsity, though (very) interesting for visualization +- This is mostly an empirical paper (i.e. an interesting application of an existing method) + +The reviewers are split. One reviewer is negative as they are unclear what the technical contribution is (but seems a bit biased against empirical papers). Another two find the paper very interesting. + + + + +",ICLR2019,4: The area chair is confident but not absolutely certain +LoiEz4H2V0w,1610040000000.0,1610470000000.0,1,cP2fJWhYZe0,cP2fJWhYZe0,Final Decision,Reject,"The reviewers generally feel that the phenomenon discovered in this paper is relevant and could be very important when considering interpretability. However, there are still a number of remaining concerns. The reviewers are not convinced by the human study - they feel there is structure in the SIS’s such that a human trained on these images with an abstract category (i.e., without being told their real-world label) could potentially successfully learn to classify them. There is also a concern that SIS is model-based, that is, the inductive biases of the model (shape, color, etc.) could be leaking information into the SIS image. Finally, there should be some stronger evidence that this represents a serious practical problem for the community. Are there instances where current interpretable approaches break down because of this phenomenon? + +One suggestion to potentially strengthen the human experiment: you could try training a denoising autoencoder on the full images, removing 95% of the pixels at random. Then, given an SIS, use the denoising autoencoder to reconstruct the image and then provide that to a human subject. The question is: how much information about the image as a whole is preserved in the SIS (when combined with an appropriate inductive bias)? +",ICLR2021, +SJ9xEJaHM,1517250000000.0,1517260000000.0,281,B1nZ1weCZ,B1nZ1weCZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The paper contains an interesting way to do online multi task learning, by borrowing ideas from active learning and comparing and contrasting a number of ways on the arcade learning environment. Like the reviewers, I have some concerns about using the target scores and I think more analysis would be needed to see just how robust this method is to the choice/distribution of target scores (the authors mention that things don't break down as long as the scores are ""reasonable"", but that's not a particularly empirical nor precise statement). + +My inclination is to accept the paper, because of the earnest efforts made by the authors in understanding how DUA4C works. However, I do agree that the paper should have a larger focus on that: basically Section 6 should be expanded, and the experiments should be rerun in such a way that the setup for DUA4C is more ""favorable"" (in terms of hyper-parameter optimization). If there's any gap between any of the proposed methods and DUA4C, then this would warrant further analysis of course (since it would mean that there's an advantage to using target scores). ",ICLR2018, +JBfANLtQkT,1642700000000.0,1642700000000.0,1,UeE41VsK1KJ,UeE41VsK1KJ,Paper Decision,Reject,"Two reviewers increased their scores after considering the responses from the authors, and all reviewers are somewhat positive. However, the increased scores are still 6 only, and as the area chair, I have concerns about the foundations of this research. + +The authors write ""there is no off-the-shelf baseline that can automatically disentangle the data from different domains in the open-ended regression setting."" This is not true for the standard situation of a mixture of regression lines, as in Section 4.1. Completely standard EM (not necessarily hard EM) will solve this problem, as long as the individual lines (sinusoids etc.) can be represented easily by the EM components. Another standard method that should be another baseline is a mixture-of-experts neural network. + +One thing that EM cannot handle is learning the number of components in a mixture model, as in learning the _k_ in _k_-means. To the extent that ""open-ended"" here refers to a new approach for this problem, with mathematical guarantees, it is interesting. But this point of view needs more explanation. + +The paper begins ""A hallmark of general intelligence is the ability of handling open-ended environments, which roughly means complex, diverse environments with no manual task specification."" If there is one aspect of natural environments that is crucial and fundamental, it is the presence of noise. However, starting theoretically with Definition 1 and empirically with Section 4.1, the authors work in a world of deterministic functions. This mismatch undermines the conceptual significance of the paper. + +Perhaps because of the universality of noise, the authors do not present a real-world dataset or task for which the OSL method is directly natural or applicable. Rather, they impose restricted specifications on datasets such as MNIST and introduce metrics that are novel, hence hardly natural, undermining the empirical significance of the paper.",ICLR2022, +xrNte9sEYFd,1642700000000.0,1642700000000.0,1,nxcABL7jbQh,nxcABL7jbQh,Paper Decision,Accept (Poster),"The authors proposed a new loss function for end-to-end edge detection to overcome the label imbalance and edge thickness problems. Overall, the proposed VT appears to be a useful representation for boundary detection. Though similar to DT, VT outperforms DT by a large margin and is more robust to noise. One reviewer recommends acceptance, two others recommend marginal acceptance. The main issues have been adequately addressed in the rebuttal.",ICLR2022, +jy1VFQU7Ro5,1610040000000.0,1610470000000.0,1,NTEz-6wysdb,NTEz-6wysdb,Final Decision,Accept (Poster),"The paper attempts to improve retrieval in open domain question answering systems, which is a very important problem. In this regards, the authors propose to utilize cross-attention scores from a seq2seq reader models as signal for training retrieval systems. This approach overcomes typical low amount of labelled data available for retriever model. The reviewers reached a consensus that the proposed approach are interesting and novel. The proposed approach establish new state-of-the-art performance on three QA datasets, although the improvements over previous methods are marginal. Overall, reviewers agree that the paper will be beneficial to the community and thus I recommend an acceptance to ICLR. +",ICLR2021, +SygwPQgLlE,1545110000000.0,1545350000000.0,1,Syeben09FQ,Syeben09FQ,Revise and resubmit,Reject,"All reviewers still argue for rejection for the submitted paper. The AC thinks that this paper should be published at some point, but for now it is a ""revise and resubmit"".",ICLR2019,4: The area chair is confident but not absolutely certain +H1fYnGU_g,1486400000000.0,1486400000000.0,1,ry18Ww5ee,ry18Ww5ee,ICLR committee final decision,Accept (Poster),"This paper presents a simple strategy for hyperparameter optimization that gives strong empirical results. The reviewers all agreed that the paper should be accepted and that it would be interesting and useful to the ICLR community. However, they did have strong reservations about the claims made in the paper and one reviewer stated that their accept decision was conditional on a better treatment of the related literature. + + While it is natural for authors to argue for the advantages of their approach over existing methods, some of the claims made are unfounded. For example, the claim that the proposed method is guaranteed to converge while ""methods that rely on these heuristics are not endowed with any + theoretical consistency guarantees"" is weak. Any optimization method can trivially add this guarantee by adopting a simple strategy of adding a random experiment 1/n of the time (in fact, SMAC does this I believe). This claim is true of random search compared to gradient based optimization on non-convex functions as well, yet no one optimizes their deep nets via random search. Also, the authors claim to compare to state-of-the-art in hyperparameter optimization but all the comparisons are to algorithms published in either 2011 or 2012. Four years of continued research on the subject are ignored (e.g. methods in Bayesian optimization for hyperparameter tuning have evolved considerably since 2012 - see e.g. the work of Miguel Hern‡ndez Lobato, Matthew Hoffman, Nando de Freitas, Ziyu Wang, etc.). It's understood that it is difficult to compare to the latest literature (and the authors state that they had trouble running recent algorithms), but one can't claim to compare to state-of-the-art without actually comparing to state-of-the-art. + + Please address the reviewers concerns and tone down the claims of the paper.",ICLR2017, +DszXX7DVNB,1610040000000.0,1610470000000.0,1,Hf3qXoiNkR,Hf3qXoiNkR,Final Decision,Accept (Poster),"This paper considers the problem of learning models for NLP tasks that are less reliant on artifacts and other dataset-specific features that are unlikely to be reliable for new datasets. This is an important problem because these biases limit out-of-distribution generalization. Prior work has considered models that explicitly factor out known biases. This work proposes using an ensemble of weak learners to implicitly identify some of these biases and train a more robust model. The work shows that weak learners can capture some of the same biases that humans identify, and that the resulting trained model is significantly more robust on adversarially designed challenge tasks while sacrificing little accuracy on the test sets of the original data sets. + +The paper's method is useful, straightforward, and intuitively appealing. The experiments are generally well conducted. Some of the reviewers raised questions about evaluating on tasks with unknown biases. The authors addressed these concerns in discussion and we encourage them to include this in the final version of the paper using the additional page.",ICLR2021, +HkIsQkpSG,1517250000000.0,1517260000000.0,208,HyMTkQZAb,HyMTkQZAb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This clearly written paper extends the Kronecker-factored approximate curvature optimizer to recurrent networks. Experiments on Penn Treebank language modeling and training of differentiable neural computers on a repeated copy task show that the proposed K-FAC optimizers are stronger than SGD, Adam, and Adam with layer normalization. The most negative reviewer objected to a lack of theoretical error bounds on the approximations made, but the authors successfully argue that obtaining such bounds would require making assumptions that are likely to be violated in practice, and that strong empirical performance on real tasks is sufficient justification for the approximations. + +Pros: ++ ""Completes"" K-FAC training by extending it to recurrent models. ++ Experiments show effects of different K-FAC approximations. + +Cons: +- The algorithm is rather complex to implement. +",ICLR2018, +PjpKgeyDsAI,1642700000000.0,1642700000000.0,1,u7UxOTefG2,u7UxOTefG2,Paper Decision,Reject,"The authors question the assumption that the epistemic uncertainty provided by Bayesian neural networks should be useful for out of distribution detection. They start their analysis in the infinite width limit so as to be able to understand how the induced kernels in a Gaussian process behave. The paper also discusses the potential tradeoffs between generalization and detection. Overall, the paper presents some facts that, while not surprising, (Reviewer fGuy), are helpful in questioning the default assumption. Overall, though, the combination of the lack of surprise with the multi-part, somewhat loosely connected message reduces the quality of the submission.",ICLR2022, +ryIHSkTHz,1517250000000.0,1517260000000.0,559,BJDEbngCZ,BJDEbngCZ,ICLR 2018 Conference Acceptance Decision,Reject,The paper studies the global convergence for policy gradient methods for linear control problems. Multiple reviewers point out strong concerns about the novelty of the results.,ICLR2018, +Tg46szRUykv,1642700000000.0,1642700000000.0,1,XctLdNfCmP,XctLdNfCmP,Paper Decision,Accept (Poster),"This paper proposes a model to predict the spatiotemporal dynamics of physical simulations on irregular meshes. The observations are modeled as a sequence of graph representations, each graph corresponding to a snapshot of the observation sequence at time t. This model uses two components, a graph encoder-decoder to compress the observations and an autoregressive transformer to model the dynamics. The two components are trained sequentially. At inference time, given an initial hidden state infered from the data and some additional conditional information, a sequence of states is predicted in an auto-regressive manner in the hidden space, each state of the sequence is then decoded to produce a prediction in the original observation space. The originality of the model lies in the encoder-decoder graph and in the use of a transformer for the prediction. Tests are performed on three fluid dynamics simulation data sets. + +All reviewers pointed out some original contributions in the proposed method, in particular the use of transformers in the learned hidden space. In the rebuttal, the authors provided substantial additional results and further details and explanations. Their responses led two reviewers to increase their scores. All reviewers ultimately agree that the paper presents interesting results and conclusions.",ICLR2022, +gS9EpKTVdNU,1610040000000.0,1610470000000.0,1,cB_mXKTs9J,cB_mXKTs9J,Final Decision,Reject,"The authors propose an algorithm that learns sparse patterns of images that are highly predictive of a target class, even if added to a non-target class. The reviewers agree that the algorithm is novel, is tested on a wide array of experiments, and the paper well written. + +Unfortunately, it seems that some of the main claims, such as DNNs trained on clean data ""learn abstract shapes along with some texture"", resort to qualitative evaluation of the few examples shown in the paper. Furthermore, two reviewers were concerned with how one particular design choice in the algorithm might bias the authors' claims. In particular, pointed out that the patterns learned are highly to the initial canvas used, which is not necessarily strongly motivated. + +As these two issues are integral parts of the paper, I hesitate to recommend Acceptance at this point. That said, the approach looks very promising and I hope the authors continue to pursue this idea.",ICLR2021, +9QgSRC8fY8Lc,1642700000000.0,1642700000000.0,1,PLDOnFoVm4,PLDOnFoVm4,Paper Decision,Accept (Spotlight),"This paper presents an extension of the Predictive State Representation (PSRs) to multi-agent systems, with a dynamic interaction graph represents each agent’s predictive state based on its “neighborhood” agents. Three types of agent networks are considered: static complete graphs (all agents affect all others experience); static non-complete graphs (only some agents affect one another); and dynamic non-complete graphs (agents affect one another in a time varying way). A number of theoretical results are presented, including PAC bounds for the approximations in the framework. The paper also contains a number of experiments that clearly show the advantages of the proposed technique over some related methods. + +The reviewers unanimously agree that this is a strong paper, with a solid theoretical and empirical analysis.",ICLR2022, +B101E1THM,1517250000000.0,1517260000000.0,271,SyzKd1bCW,SyzKd1bCW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This is an interesting and well-written paper introducing two unbiased gradient estimators for optimizing expectations of black box functions. LAX can handle functions of both continuous and discrete random variables, while RELAX is specialized to functions of discrete variables and can be seen as a version of the recently introduced REBAR with its concrete-relaxation-based control variate replaced by (or augmented with) a free-form function. The experimental section of the paper is adequate but not particularly strong. If Q-prop is the most similar existing RL approach, as is state in the paper, why not include it as a baseline? It would also be good to see how RELAX performs at optimizing discrete VAEs using just the free-form control variate (instead of combining it with the REBAR control variate).",ICLR2018, +5_wSqFwkfxc,1610040000000.0,1610470000000.0,1,NrN8XarA2Iz,NrN8XarA2Iz,Final Decision,Reject,"This paper presents an approach to reward shaping in RL centred on the question of how to select between different shaping signals. As such this is an interesting research direction that could make important contributions in the area. +Generally the reviewers felt that the paper is too preliminary in its current form. There were several questions raised around problems with the technical formulation. It was also felt that the experiments could be more rigourous to fully validate the claims of the paper.",ICLR2021, +fJW8r359dn,1576800000000.0,1576800000000.0,1,BkxREeHKPS,BkxREeHKPS,Paper Decision,Reject,"This paper proposes to reduce the number of variational parameters for mean-field VI. A low-rank approximation is used for this purpose. Results on a few small problems are reported. + +As R3 has pointed out, the main reason to reject this paper is the lack of comparison of uncertainty estimates. I also agree that, recent Adam-like optimizers do use preconditioning that can be interpreted as variances, so it is not clear why reducing this will give better results. + +I agree with R2's comments about missing the ""point estimate"" baseline. Also the reason for rank 1,2,3 giving better accuracies is unclear and I think the reasons provided by the authors is speculative. + +I do believe that reducing the parameterization is a reasonable idea and could be useful. But it is not clear if the proposal of this paper is the right one. Due to this reason, I recommend to reject this paper. However, I highly encourage the authors to improve their paper taking these points into account.",ICLR2020, +SJgMH6SfxN,1544870000000.0,1545350000000.0,1,HkGmDsR9YQ,HkGmDsR9YQ,meta-review,Reject,"The authors have presented an empirical study of generalization and regularization in DQN. They evaluate generalization on different variants of Atari games and show that dropout and L2 regularization are beneficial. The paper does not contain any major revelations, nor does it propose new algorithms or approaches, but it is a well-written and clear demonstration, and it would be interesting to the deep RL community. However, the reviewers did not feel that the paper met the bar for publication at ICLR because the experiments were not more comprehensive, which would be expected for an empirical study. The AC will side with the reviewers but hopes that the authors will expand their study and resubmit to another venue in the future.",ICLR2019,5: The area chair is absolutely certain +TwpIHryJTFJ,1610040000000.0,1610470000000.0,1,T6AxtOaWydQ,T6AxtOaWydQ,Final Decision,Accept (Poster),"Three reviewers recommended an acceptance (rating 7) while R1 deviated much from them (rating 3). After reading R1's concerns carefully and the authors' rebuttal, I found some of the criticisms to be invalid. The authors provided a satisfactory response, addressing concerns and clarifying potential misunderstandings. Because R1 did not update the review after the rebuttal period, I am assuming the concerns have been adequately addressed. The three other reviewers all unanimously agreed that this paper tackles a timely topic, proposes a simple and effective approach, and shows convincing empirical results. I concur with the reviewers' recommendations.",ICLR2021, +BZpXolzI16N,1610290000000.0,1610470000000.0,1,53WS781RzT9,53WS781RzT9,Final Decision,Reject,"This work analyses the impact of mini-batch size on the variance of the gradients during SGD, in the context of linear models. It shows an inverse relationship between the variance of the gradient and the batch size for such models, under certain assumptions. Reviewers generally agree that the work is theoretically sound. However, all reviewers believe that the contributions of this work are limited. This concern was not adequately addressed during the discussion phase and led to the ultimate decison to reject.",ICLR2021, +XpXgxYs-Rc,1576800000000.0,1576800000000.0,1,rJgE9CEYPS,rJgE9CEYPS,Paper Decision,Reject,"This paper proposes discriminability distillation learning (DDL) for learning group representations. The core idea is to learn a discriminability weight for each instance which are a member of a group, set or sequence. The discriminability score is learned by first training a standard supervised base model and using the features from this model, computing class-centroids on a proxy set, and computing the iter and intra-class distances. A function of these distance computations are then used as supervision for a distillation style small network (DDNet) which may predict the discriminability score (DDR score). A group representation is then created through a combination of known instances, weighted using their DDR score. The method is validated on face recognition and action recognition. + +This work initially received mixed scores, with two reviewers recommending acceptance and two recommending rejection. After reading all the reviews, rebuttals, and discussions, it seems that a key point of concern is low clarity of presentation. During the rebuttal period, the authors have revised their manuscript and interacted with reviewers. One reviewer has chosen to update their recommendation to weak acceptance in response. The main unresolved issues are related to novelty and experimental evaluation. Namely, for novelty comparison and discussion against attention based approaches and other metric learning based approaches would benefit the work, though the proposed solution does present some novelty. For the experiments there was a suggestion to evaluate the model on more complex datasets where performance is not already maxed out. The authors have provided such experiments during the rebuttal period. + +Despite the slight positive leanings post rebuttal, the ACs have discussed this case and determine the paper is not ready for publication.",ICLR2020, +FdwJq_Jl-P,1576800000000.0,1576800000000.0,1,rkxDoJBYPB,rkxDoJBYPB,Paper Decision,Accept (Poster),The submission presents an approach that leverages machine learning to optimize the placement and scheduling of computation graphs (such as TensorFlow graphs) by a compiler. The work is interesting and well-executed. All reviewers recommend accepting the paper.,ICLR2020, +BJlTjLcglV,1544750000000.0,1545350000000.0,1,B1xVTjCqKQ,B1xVTjCqKQ,Trainable Image Compressed Sensing with solid empirical results,Accept (Poster),"This paper studies deep convolutional architectures to perform compressive sensing of natural images, demonstrating improved empirical performance with an efficient pipeline. +Reviewers reached a consensus that this is an interesting contribution that advances data-driven methods for compressed sensing, despite some doubts about the experimental setup and the scope of the theoretical insights. We thus recommend acceptance as poster. ",ICLR2019,4: The area chair is confident but not absolutely certain +9qZORetR6g,1610040000000.0,1610470000000.0,1,O1pkU_4yWEt,O1pkU_4yWEt,Final Decision,Reject,"This paper tackles an important problem and includes experiments on a new domain (Russian documents vs English documents). Unfortunately, all reviewers agree that this paper lacks novelty for publication in its current state. Additional details and clarifications to the proposed approach, notably through a more thorough performance analysis, would improve the significance of the paper.",ICLR2021, +iepMdWY-ZP,1576800000000.0,1576800000000.0,1,rylJkpEtwS,rylJkpEtwS,Paper Decision,Accept (Poster),"This paper develops the notion of the arrow of time in MDPs and explores how this might be useful in RL. All the reviewers found the paper thought provoking, well-written, and they believe the work could have significant impact. The paper does not fit the typical mold: it presents some ideas and uses illustrative experiments to suggest the potential utility of the arrow without nailing down a final algorithm or make a precise performance claim. Overall it is a solid paper, and the reviewers all agreed on acceptance. + +There are certainly weaknesses in the work, and there is a bit of work to do to get this paper ready. R2 had a nice suggestion of a baseline based on simply learning a transition model (its described in the updated review)---please include it. The description of the experimental methodology is a bit of a mess. Most of the experiments in the paper do not clearly indicate how many runs were conducted or how errorbars where computed or what they represent. It is likely that only a handful of runs were used, which is surprising given the size of some of the domains used. In many cases the figure caption does not even indicate which domain the data came from. All of this is dangerously close to criteria for rejection; please do better. + +Readability is also known as empowerment and it would be good to discuss this connection. In general the paper was a bit light on connections outlining how information theory has been used in RL. I suggest you start here (http://www2.hawaii.edu/~sstill/StillPrecup2011.pdf) to improve this aspect. Finally, the paper has a very large appendix (~14 oages) with many many more experiments and theory. I am still not convinced that the balance is quite right. This is probably a journal or long arxiv paper. Maybe this paper should be thought of as a nectar version of a longer standalone arxiv paper. + +Finally, relying on effectiveness of random exploration is no small thing and there is a long history in RL of ideas that would work well, given it is easy to gather data that accurately summarizes the dynamics of the world (e.g. proto-value, funcs). Many ideas are effective given this assumption. The paper should clearly and honestly discuss this assumption, and provide some arguments why there is hope.",ICLR2020, +OJb5t91n2CT,1642700000000.0,1642700000000.0,1,ZkC8wKoLbQ7,ZkC8wKoLbQ7,Paper Decision,Accept (Spotlight),"The paper analyzes the learning behavior of deep networks inside RL algorithms, and proposes an interesting hypothesis: that many of the observed difficulties in deep RL methods stem from _capacity loss_ of the trained network (that is, the network loses the ability to adapt quickly to fit new functions). As the paper points out, some of these difficulties have popularly been attributed to other causes (such as difficulties in exploration) or to less-specific causes (such as reward sparsity: the paper proposes that capacity loss mediates observed problems due to sparsity). + +The paper investigates its hypothesis two ways: first by attempting to measure how capacity varies over time during training of existing deep RL methods, and second by proposing a new regularizer to attempt to preserve capacity. These experiments are set up well, and their results are convincing — while there is likely no perfect way to measure or preserve capacity, the methods chosen here make sense. + +This is a strong paper: it proposes a creative, appealing, and interesting hypothesis about an important problem (difficulties in training deep RL methods), and conducts a well-designed evaluation. We expect and hope that it will inspire interesting follow-on work. + +We thank the authors for their thorough and helpful participation in the discussion period, including updates to improve the clarity of the paper.",ICLR2022, +sZViNxICvy,1576800000000.0,1576800000000.0,1,rkxUfANKwB,rkxUfANKwB,Paper Decision,Reject,"The paper proposes All SMILES VAE which can capture the chemical properties of small molecules and also optimize the structures of these molecules. The model achieves significantly performance improvement over existing methods on the Zinc250K and Tox21 datasets. + +Overall it is a very solid paper - it addresses an important problem, provides detailed description of the proposed method and shows promising experiment results. The work could be a landmark piece, leading to major impacts in the field. However, given its potential, the paper could benefit from major revisions of the draft. Below are some suggestions on improving the work: +1. The current version contains a lot of materials. It tries to strike the balance between machine learning methodology and details of the application domain. But the reality is that the lack of architecture details and some sloppy definitions of ML terms make it hard for readers to fully appreciate the methodology novelty. + +2. There is still room for improvement in experiments. As suggested in the review, more datasets should be used to evaluate the proposed model. Since it is hard to provide theoretic analysis of the proposed model, extensive experiments should be provided. + +3. The complexity analysis is not fully convincing. Some fair comparison with the alternative approaches should be provided. + +In summary, it is a paper with big potentials. The current version is a step away from being ready for publication. We hope the reviews can help improve the paper for a strong publication in the future. ",ICLR2020, +c-MLNnzDIas,1610040000000.0,1610470000000.0,1,V1N4GEWki_E,V1N4GEWki_E,Final Decision,Reject,"The paper shows empirically that training unstructured sparse networks from random initialization performs poorly as sparse NNs have poor gradient flow at initialization. Besides, the authors argue that sparse NNs have poor gradient flow during training. They show that DST based methods achieving the best generalization have improved gradient flow. Moreover, they find the LTs do not improve gradient flow, rather their success lies in re-learning the pruning solution they are derived from. I read the paper and the reviewers discussed the rebuttal. Although all the reviewers found the rebuttal helpful and they all agree that the paper is decently well written and has some clear value, the majority believes that further observations are required for making the paper and its hypothesis convincing. There are also some recent related work on initialization of pruned networks, e.g. by rescaling their weights at initialization. I believe, adding the discussion of such related techniques and making the connection to existing work will greatly strengthen the paper and provides more evidence to support its claims. + + +",ICLR2021, +R1RbU_QSTF5,1642700000000.0,1642700000000.0,1,e42KbIw6Wb,e42KbIw6Wb,Paper Decision,Accept (Poster),"This paper proposes an elegant approach to object detection where an encoder network reads in an image and a decoder network outputs coordinate and category information via a sequence of textual tokens. This method does away with several object detection specific details and tricks such as region proposals and ROI pooling. The paper received positive reviews from all reviewers who agreed that this formulation of object detection was novel and provided a new perspective that may transfer to other computer vision tasks. One common concern amongst reviewers was the slow inference time due to the sequential nature of the decoder -- and this concern was a central point of discussion between the authors and reviewers. My takeaway from this discussion is that this model is certainly slower than traditional computer vision models that can generate boxes in parallel. The slowdown however, is image dependent. Less cluttered environments require shorter output sequences. Moreover, such a model can easily be applied to concept localization, e.g. ""Locate the horses"", in which cases one can expect fewer objects of the desired category, and hence acceptable inference speeds. Importantly, the contributions of this paper are noteworthy in spite of the proposed architecture having the drawback of being slow. Given this, I recommend accepting this paper for its merits.",ICLR2022, +EHzmTUSkkyN,1610040000000.0,1610470000000.0,1,ajOrOhQOsYx,ajOrOhQOsYx,Final Decision,Accept (Poster),"Reviewers generally agree that the main result of the paper, which generalizes the classical Wigner-Eckart Theorem and provides a basis for the space of G-steerable kernels for any compact group G, is a significant result. There are also several concerns +that need to be addressed. R4 notes that the use of the Dirac delta function (e.g. Theorem C.7) is informal and mathematically imprecise and needs to be fixed. R1 notes that it would be helpful to at least describe how this general formulation can be applied in machine learning. + +Presentation and accessibility: the current version of the paper will be accessible to only a small part of the machine learning audience, i.e. those already with advanced knowledge in mathematics and/or theoretical physics, in particular in representation theory. If the authors aim to make it more accessible, the writing would need to be substantially improved.",ICLR2021, +PpdNEoBky,1576800000000.0,1576800000000.0,1,BJeRykBKDH,BJeRykBKDH,Paper Decision,Reject,"The paper proposes combining paired attention with co-attention. The reviewers have remarked that the paper is will written and that the experiments provide some new insights into this combination. Initially, some additional experiments were proposed, which were addressed by the authors in the rebuttal and the new version of the paper. However, ICLR is becoming a very competitive conference where novelty is an important criteria for acceptance, and unfortunately the paper was considered to lack the novelty to be presented at ICLR.",ICLR2020, +Zzxik0JS8Uc,1642700000000.0,1642700000000.0,1,4N-17dske79,4N-17dske79,Paper Decision,Accept (Poster),"The authors propose a method for associative learning as an alternative to back propagation based learning. The idea is to interesting. The coupling between layers are broken down into local loss functions that can be updated independently. The targets are projected to previous layers and the information is preserved using an auto-encoder loss function. The projections from the target side are then compared with the projections from input side using a bridge function and a metric loss. The method is evaluated on text and image classification tasks. The results suggest that this is a promising alternative to back propagation based learning. + +Pros ++ A novel idea that seems promising ++ Evaluated on text and image classification tasks and demonstrated utility + +Cons +- The impact of the number of additional parameters and the computation is not clarified (even though epoch's are lower) + +The authors utilized the discussion period very well, running additional experiments that were suggested (especially ablation studies). They also clarified all the questions that were raised. In all, the paper has improved substantially from the robust discussion.",ICLR2022, +ryB4ryaHM,1517250000000.0,1517260000000.0,544,H1DGha1CZ,H1DGha1CZ,ICLR 2018 Conference Acceptance Decision,Reject,"meta score: 4 + +This paper proposes an activation function, called displaced ReLU (DReLU), to improve the performance of CNNs that use batch normalization. + +Pros + - good set of experiments using CIFAR, with good results + - attempt to explain the approach using expectations +Cons + - theoretical explanations are not so convincing + - limited novelty + - CIFAR is relatively limited set of experiments + - does not compare with using bn after relu, which is now well-studied and seems to address the motivation of this paper (and thus questions the conclusions)",ICLR2018, +Tx3pQzw9HUI,1642700000000.0,1642700000000.0,1,EFgzhSJYIj6,EFgzhSJYIj6,Paper Decision,Reject,"The paper studies network architecture search in the context of reinforcement learning. In particular it applies the DARTS method to the Procgen RL benchmark, and conducts extensive experimental evaluations. It identifies a number of issues that could potentially prevent DARTS from working well in the RL setting (such as nonstationarity and high variance), but in the end shows good performance without needing to modify DARTS substantially. + +The reviewers agreed that a key strength of the paper is in its experiments. But they also identified a weakness in novelty: if a paper's main contribution is to combine two previously well-explored ideas (in this case, RL and DARTS) then there is a high bar for the quality of exposition and positioning, and the reviewers did not feel that this bar was met. (Though the authors' updates during the rebuttal period did help substantially with clarity and relationship to other methods -- thank you for these!) + +Recommended decision: while the paper makes a worthwhile contribution, it does not in its current form rise to the level of novelty and general interest that is needed for publication in ICLR.",ICLR2022, +BkC0sG8ug,1486400000000.0,1486400000000.0,1,S1jmAotxg,S1jmAotxg,ICLR committee final decision,Accept (Poster),"This paper will make a positive contribution to the conference, especially since it is one of the first to look at stick-breaking as it applies to deep generative models. The paper will make a positive contribution to the conference.",ICLR2017, +HylP_0YggN,1544750000000.0,1545350000000.0,1,B1eCCoR5tm,B1eCCoR5tm,meta-review,Reject,"The paper proposes a data augmentation technique to ensemble classifiers. +Reviewers pointed to a few concerns, including a lack of novelty, a lack +of proper comparison with state-of-the-art models or other data augmentation +approaches. +Overall, all reviewers recommended to reject the paper, and I concur with them.",ICLR2019,4: The area chair is confident but not absolutely certain +hUvk37qwLOc,1610040000000.0,1610470000000.0,1,GvqjmSwUxkY,GvqjmSwUxkY,Final Decision,Reject,"This paper defines a truly unsupervised image translation scenario. Namely, there are no parallel images or domain labels. To achieve robust performance in this scenario, the authors use 1) clustering and 2) generator-discriminator structure to map images from different domains and generate images for target domains. + +In all, all the reviewers agree that this definition of unsupervised image translation is interesting. However, there are also several concerns for the real-world practical application and empirical results. Unlike unsupervised text translation whose target language is known, the truly unsupervised image translation is difficult to make sense without identifying what is the target domain. This limits the contribution of this paper to some specific tasks instead of more general tasks. For the empirical results, the selection of data and the hyperparameter K do not convince the reviewers. + +",ICLR2021, +i6tPMhtoDQ,1610040000000.0,1610470000000.0,1,cP5IcoAkfKa,cP5IcoAkfKa,Final Decision,Accept (Poster),"This paper proposes techniques to lower the barrier to run large scale simulations under resource (compute) constraints. The key idea is to do batch simulation and policy learning on 1 or more GPUs without sacrificing the fps rate for rendering (~20k fps on 1GPU). The proposed setup and methods are evaluated on the point navgiation tasks on two environments namely Gibson and Matterport3D. One of the key ideas for rendering is to render a big tile of images for separate instantiations of the environment in parallel. This gives big speeds up to rendering and policy optimization. + +${\bf Pros}$: +1. Large number of FPS with smaller compute budget. Large scale Deep RL research has been difficult to democratize due to the need for big compute budgets. Although this paper is more heavy on the engineering side, I think it can greatly accelerate research and therefore seems like a good fit for the ICLR community to consider. + +2. The paper and proposed steps are clearly written and justified + +${\bf Cons}$: +1. This method is limited to environments where the observation space follows a particular structure. This is perhaps the biggest limitation of this approach but the underlying assumptions are reasonable and quite a few realistic environments will fall into this category. + + +During the rebuttal and discussion period, R2 raised concerns about ablations but was satisfied with author's response. R5 raised concerns about other prior work - CuLE (Dalton et al, NeurIPS 2020). However, this paper is concurrent work and does not tackle 3D simulation rendering as done in this paper. For these reasons I believe the paper does not have any big red flags or pending concerns. ",ICLR2021, +rkA881aHz,1517250000000.0,1517260000000.0,795,HyiRazbRb,HyiRazbRb,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers have unanimously expressed strong concerns about the technical correctness of the theoretical results in the paper. The paper should be carefully revised and checked for technical errors. In its current form, the paper is not suitable for acceptance at ICLR 2018. +",ICLR2018, +B1aaEk6rf,1517250000000.0,1517260000000.0,458,HyFaiGbCW,HyFaiGbCW,ICLR 2018 Conference Acceptance Decision,Reject,"Both R1 and R2 suggested that Conceptors (Jaeger, 2014) had previously explored learning transformations in the context of reservoir computing. The authors acknowledged this in their response and added a reference. The main concern raised by the reviewers was lack of novelty and weak experiments (both the MNIST and depth maps were small and artificial). The authors acknowledged that it was mainly a proof of concept type of work. R1 and R2 also rejected the claim of biological plausibility (and this was also acknowledged by the authors). Though the authors have taken great care to respond in detail to each of the reviewers, I agree with the consensus that the paper does not meet the acceptance bar.",ICLR2018, +h3QgGdLjF,1576800000000.0,1576800000000.0,1,H1xauR4Kvr,H1xauR4Kvr,Paper Decision,Reject,"In this work, the authors develop a method for providing frequentist confidence intervals for a range of deep learning models with coverage guarantees. While deep learning models are being used pervasively, providing reasonable uncertainty estimates from these models remains challenging and an important open problem. Here, the authors argue that frequentist statistics can provide confidence intervals along with rigorous guarantees on their quality. They develop a jack-knife based procedure for deep learning. The reviews for this paper were all borderline, with two weak accepts and two weak rejects (one reviewer was added to provide an additional viewpoint). The reviewers all thought that the proposed methodology seemed sensible and well motivated. Among the cited issues, major topics of discussion were the close relation to related work (some of which is very recent, Giordano et al.) and that the reviewers felt the baselines were too weak (or weakly tuned). The reviewers ultimately did not seem convinced enough by the author rebuttal to raise their scores during discussion and there was no reviewer really willing to champion the paper for acceptance. Unfortunately, this paper falls below the bar for acceptance. It seems clear that there is compelling work here and addressing the reviewer comments (relation to related work, i.e. Robbins, Giordano and stronger baselines) would make the paper much stronger for a future submission.",ICLR2020, +rJxGDucSlV,1545080000000.0,1545350000000.0,1,S1MeM2RcFm,S1MeM2RcFm,reject,Reject,The reviews agree the paper is not ready for publication at ICLR. ,ICLR2019,4: The area chair is confident but not absolutely certain +rRlb9w2Vc,1576800000000.0,1576800000000.0,1,HygiDTVKPr,HygiDTVKPr,Paper Decision,Reject,"Thanks to the reviewers and the authors for an interesting discussion. The reviewers are mixed, learning toward positive, but a few shortcomings were left unaddressed: (i) Turning the task into a mention-pair classification problem ignores the mention detection step, and synergies from joint modeling are lost. (ii) Lee et al. (2018) has been surpassed by some margin by BERT and spanBERT, models ignored in this paper. (iii) Several approaches to aggregating structured annotations have already been introduced, e.g., for sequence labelling tasks. [0] Overall, the limited novelty, the missing baselines, and the missing related work lead me to not favor acceptance at this point. + +[0] https://www.aclweb.org/anthology/P17-1028/",ICLR2020, +ZgxdwIFUigQ,1610040000000.0,1610470000000.0,1,eBHq5irt-tk,eBHq5irt-tk,Final Decision,Reject,"The authors re-state Mackay's definition of effective dimensionality and describe its connections to posterior contraction in Bayesian neural networks, model selection, width-depth tradeoffs, double descent, and functional diversity in loss surfaces. The authors claim the effective dimensionality leads to a richer understanding of the interplay between parameters and functions in deep neural networks models. In their experiments the authors show that effective dimensionality compares favourably to alternative norm- and flatness- based generalization measures. + +Strengths: + +1 - The authors include a description of how to compute a scalable approximation to the effective dimensionality using the Lanczos algorithm and Hessian vector products. + +2 - The authors include some novel experimental results showing the effective dimensionality with respect to changes in width and depth. These results are informative in how changes in depth and width affect this metric in a different way. The same for the experiments with the double descent curve. + +Weaknesses: + +1 - For some reason the authors seem to have taken the concept of effective dimensionality from David Mackay's approximation to the model evidence in neural networks and ignored all the extra terms in such approximation. It is currently unclear why there is a need to do this and focus only on the effective dimensionality. Almost all the experiments that the authors describe could have been done using a similar approximation to Mackay's model evidence. It is unclear why is there a need to focus just on a part of Mackay's approximation. The fact that the authors state that the effective dimensionality is only meaningful for models with low train loss seems indicative that David Mackay's approximation to the model evidence would be a better metric. + +2 - With the exception of the experiments for changes in the effective dimensionality as a function of the depth and width and the double descent curve, all the other experiments and results are expected and not new to anyone familiar with David Mackay's work. + +3 - The experiments on depth and width are for only one dataset and may not be representative in general. The authors should consider other additional datasets. + +The authors should improve the paper, including a justification for using only the effective dimensionality and not David Mackay's approximation to the model evidence. They should also strengthen the experiments by comparing with David Mackay's approximation to the model evidence and should consider additional datasets as mentioned above.",ICLR2021, +PRow2-dQp-,1610040000000.0,1610470000000.0,1,eMP1j9efXtX,eMP1j9efXtX,Final Decision,Accept (Spotlight),"This paper proposes an algorithm for offline RL, that consists in solving a finite MDP derived from a fixed batch of transitions. +The initial reviews were overall positive, and the concerns raised at this stage were nicely addressed by the rebuttal and the revision from the authors. +The final discussion led to the consensus that this paper should be accepted at ICLR.",ICLR2021, +XxEAzVCsdu,1576800000000.0,1576800000000.0,1,BJlbo6VtDH,BJlbo6VtDH,Paper Decision,Reject,"This paper proposes a generalized way to generate sequences from undirected sequence models. + +Overall, I believe a framework like this could definitely be a valuable contribution, but as Reviewer 1 and Reviewer 3 noted, the paper is a bit lacking both in theoretical analysis and strong empirical results. I don't think that this is a bad paper at all, but it feels like the paper needs a little bit of an extra push to tighten up the argumentation and/or results before warranting publication at a premier venue such as ICLR. I'd suggest the authors continue to improve the paper and aim to re-submit at revised version at a future conference. ",ICLR2020, +RVatYJBi8b,1610040000000.0,1610470000000.0,1,9MdLwggYa02,9MdLwggYa02,Final Decision,Reject,"This submission proposes a variant of population based training (PBT) for hyperparameter selection/evolution, aimed at addressing drawbacks of existing variants (e.g. the coupling of the choice of checkpoint with the choice of hyperparameters). Reviewers generally agreed that the paper is interesting and covers an important topic, and the evaluation does show improvements over existing PBT variants. On the other hand they also raised a few important issues: + +1. The `hoptim` library is claimed as a primary contribution of the work, but it is not clear from the manuscript what benefits this library offers over existing software. When claiming a library as a main contribution, it is helpful to provide a more thorough description of the software and its benefits, and/or ideally a link (anonymized for review) to the software. The authors did respond by providing a brief description of the benefits of the library, mitigating this issue somewhat. However it's still difficult to discern how/whether to weigh the open source library as a main contribution of the paper. + +2. The evaluation is not very convincing: the differences are small and error margins are not provided for the neural network-based experiments, meaning that any differences could be due to noise. The authors fairly point out that it is difficult to perform multiple runs of these experiments as the resource requirements are large, and they have done 20 runs of the Rosenbrock experiment with smaller compute requirements. But the reviewers were not convinced that the Rosenbrock experiment reflects the method's application to neural network hyperparameter selection; the problems are too different. The submission would be significantly stronger if it included results over multiple runs of an ""intermediate"" sized experiment on a problem involving a neural network demonstrating that ROMUL outperforms competing approaches by a statistically significant margin. + +3. The proposed approach is ultimately heuristic. This is not necessarily a problem if there are strong empirical results demonstrating the efficacy of the proposed heuristic, but in this case the empirical results didn't convince (see point 2). + +Given these concerns raised by reviewers, the submission is not quite ready for ICLR. I hope the authors will consider resubmitting the paper after improving it based on the reviewers' feedback.",ICLR2021, +SJltM8Ogg4,1544750000000.0,1545350000000.0,1,BJeWOi09FQ,BJeWOi09FQ,The paper can be improved,Reject,The paper addresses the problem semantic segmentation using a sequential patch-based model. I agree with the reviewers that the contributions of the paper are not enough for a machine learning venue: (1) there has been prior work on using sequence models for segmentation and (2) the complexity of the proposed approach is not fully justified. The authors did not submit a rebuttal. I encourage the authors to take the feedback into account and improve the paper.,ICLR2019,4: The area chair is confident but not absolutely certain +xDEYoeEDJF,1610040000000.0,1610470000000.0,1,awOrpNtsCX,awOrpNtsCX,Final Decision,Reject,"This paper proposes a new kind of CNN that convolves on deformable regions and cooperates with the Poisson equation to determine the deformable regions. Experiments on texture segmentation look promising. + +Pros: +1. The paper is well written and easy to follow. +2. The idea is interesting and the reviewers liked it. +3. The experiments on texture segmentation are promising. + +Cons: +1. Actually, convolution on non-rectangular region is not new, in contrast to the authors' claim and reviewers' belief, although the authors may argue that the mechanisms of determining the region for convolution are different and the CNNs are used for different tasks. See, e.g., + +Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei: Deformable Convolutional Networks. ICCV 2017: 764-773 + +and other papers by the same first author. So the AC would discount the novelty of the paper. + +2. As most of the reviewers commended, the experiments on texture segmentation were insufficient. Although extra experiments were added (thank the authors' effort on doing this), the reviewers actually still deemed that they were not convincing enough, e.g., should compare with more state-of-the-art methods. Reviewers #2 both confirmed this issue in confidential comments. + +Although Reviewer #1 increased his/her score, the final average score is still below the threshold. So the AC decided to reject the paper.",ICLR2021, +gYcvk9RzvoR,1642700000000.0,1642700000000.0,1,xMJWUKJnFSw,xMJWUKJnFSw,Paper Decision,Accept (Poster),"This paper presents a technique for compositionally constructing embeddings for nodes in knowledge graphs, hence reducing the memory requirements as well as allowing inductive learning. The reviewers find the direction promising and the approach novel and well-motivated. There were some concerns about the experiment results — Reviewer KuBz suggests including more baselines, Reviewer CpaB suggests trying NodePiece on single-relation graphs and Reviewer 2qcD notes that NodePiece lags behind the other approaches on some tasks. Most of these concerns seem to have been addressed in the author response and I tend to agree with the authors that single-relation graphs are out of the scope of this work. Reviewer X7aq also raised a concern about the claims made regarding (i) uniqueness of the hashes and (ii) sub-linearity of the approach. It is good to see that claim (ii) has been removed, but (i) is still present in many places — it would be good to add a discussion about why the hashes are highly likely to be unique in the final version.",ICLR2022, +sLndtCv7gSj,1642700000000.0,1642700000000.0,1,qaxhBG1UUaS,qaxhBG1UUaS,Paper Decision,Accept (Poster),"The reviewers are all weakly positive. The author response clarified important aspects of the paper. The new human evaluation was critical. However, the human evaluation result presentation is flawed: presenting Likert scores as means does not reflect them well. The authors should use something similar to a Gantt chart to fully reflect the distribution across Likert categories. Another detail in the human evaluation that are troubling: it does not reflect interaction with the system, but judgements through observation. Therefore, the human evaluation does not reflect the ability of the learned dialogue system to interact with users. Overall, the paper makes a nice, original contribution, but despite author improvement there are evaluation flaws (even if they are common in papers using these benchmarks).",ICLR2022, +omIUBajNkcZ,1610040000000.0,1610470000000.0,1,yoem5ud2vb,yoem5ud2vb,Final Decision,Reject,"This paper introduces a technique called TOMA to learn abstract graph representations of MDPs. Such an approach is said to be more efficient both in terms of memory and computation. Despite this being an interesting research topic, the reviewers unanimously recommend rejecting the paper. They all agree that the writing needs to be improved for clarity, that some of the algorithmic choices seem arbitrary, and that there are relevant baselines missing. Moreover, the authors didn’t submit a response to most reviewers. +",ICLR2021, +WGwJKN_Z1Jn,1610040000000.0,1610470000000.0,1,q_S44KLQ_Aa,q_S44KLQ_Aa,Final Decision,Accept (Poster),"The reviewers and AC liked the basic idea of how this paper improves on ALISTA, and the initial scores were high. Because the contributions rely quite a lot on empirical demonstrations, the reviewers asked for more experiments, changes to experiments, and timing results. The revision and rebuttal addressed most of these requests. The multipath channel estimation problem was interesting though outside the scope of the AC and reviewer's expertise, so it is hard to evaluate how helpful the method is in that particular setting.",ICLR2021, +gIskGUawPrF,1642700000000.0,1642700000000.0,1,J_F_qqCE3Z5,J_F_qqCE3Z5,Paper Decision,Accept (Poster),"The paper proposes a simple approach to quantizing neural network weights with encouraging empirical results. The authors did work hard to improve the paper and address reviewers' concerns during the discussion period. I believe the presentation of results can improve by adding a discussion of inference time. I am not sure if all of the baselines (e.g., in Figure 4) have the same inference cost. + +PS1: The method does seem to unroll the iterative optimization process (ie. EM) of a Gaussian mixture model (GMM) and differentiates through the unrolled iterations. The paper makes the connection to attention, but does not seem to make a clear connection with GMM and EM. If this connection is correct, adding a discussion can be helpful. + +PS2: I am not a big fan of using differentiable k-means as the method name. Differentiable k-means is confusing partly because k-means is differentiable, i.e., one can optimize k-means centers using gradient descent. The proposed approach seems more relevant to meta-learning, where one differentiate though one optimization process to optimize a secondary objective.",ICLR2022, +d-KIcBBMx23,1610040000000.0,1610470000000.0,1,SVP44gujOBL,SVP44gujOBL,Final Decision,Reject,"This paper proposed two algorithms for curriculum learning, one based on the the knowledge of a good solution (e.g. a local minima or a solution found by SGD) and another one proposed for natural image datasets based on entropy and standard deviation over pixels. + +Reviewers seem to like the ideas behind the proposed algorithms and their simplicity. However, there are several major concerns that are shared among reviewers: +1- One of the algorithms needs knowledge of a good solution (e.g. a local minima or a solution found by SGD) which makes it impractical and the other one doesn't use any information about the mapping between input and the label. +2- Discussing previous work on curriculum learning, explaining how proposed algorithms are different than previous work and empirical comparison to other curriculum learning methods are lacking or need a significant improvement. +3- The experiment section needs improvement both in terms of experimental methodology and having more tasks/datasets. + +Reviewers have done a great job at pointing to specific areas that need improvement. I hope authors would use reviewers' comments to improve their work. + +Given the above major concerns, I recommend rejecting this paper. + +",ICLR2021, +JiVDqKJzfDN,1610040000000.0,1610470000000.0,1,zcOJOUjUcyF,zcOJOUjUcyF,Final Decision,Reject,"The paper investigates an active learning strategy for speeding up the convergence for SSL deep learning algorithms. When the SSL objective could learn a good approximation of the optimal model, the proposed method efficiently converges to the result with a few queries. The main idea is that when the eigenvalues of the NTK are large, the convergence rate is faster. The proposed algorithm maximizes the smallest eigenvalue of the NTK. An empirical investigation is also reported. +The reviewers appreciated the general idea, but questioned about the actual execution of this paper in terms of both experimental comparison and (lack of) supportive theoretical results. I would like to encourage the authors to consider improving their paper along one of these two lines. Unfortunately, as it currently stands, this paper is not ready for publication.",ICLR2021, +r1xkKaY1gV,1544690000000.0,1545350000000.0,1,BkgBvsC9FQ,BkgBvsC9FQ,"a novel, improved GAN for dialogue modeling",Accept (Poster),"This paper tackles the task of end-to-end systems for dialogue generation and proposes a novel, improved GAN for dialogue modeling, which adopts conditional Wasserstein Auto-Encoder to learn high-level representations of responses. In experiments, the proposed approach is compared to several state-of-the-art baselines on two dialog datasets, and improvements are shown both in terms of objective measures and human evaluation, making a strong support for the proposed approach. +Two reviewers suggest similarities with a recent ICML paper on ARAE and request including reference to it and also request examples demonstrating differences, which are included in the latest version of the paper.",ICLR2019,4: The area chair is confident but not absolutely certain +v4kQn_KMSMc,1610040000000.0,1610470000000.0,1,xVzlFUD3uC,xVzlFUD3uC,Final Decision,Reject,"This paper studies the behavior of SGD for linear models fit with the squared Euclidean loss. There are three main results: + +The first result (Sec. 4) studies the behavior where instead of regularizing the objective, Gaussian noise is added to the inputs. The main result is a sufficient condition for how the learning rate and noise can jointly change over time in order for SGD on the MSE error with noisy input to asymptotically converge to the same solution as regular gradient descent without noisy input. + +The second result (Sec. 5) is slightly more general in that is considers the case where the noise can be an isotropic Gaussian where the variance changes over time. Again, a result is given for how the learning rate interacts with the data in order to asymptotically converge to the unregularized solution. This is first studied in Thm 5.1 then assuming power-law decay in the noise in Thm. 5.2. It should be emphasized that though these are asymptotic guarantees, the results give asymptotic *rates* of convergence. In my opinion this is a significant strength of the results that was not emphasized by the reviewers. + +The third result (Sec. 6) studies SGD for least-squares linear models where the stochasticity is due to data subsampling only. The fraction of subsampled data may change over time. + +The primary sentiment from reviewers was that the mathematical complexity of the paper meant that they could not understand it or give a fair review. (More on this below.) For this reason, and because the overall reviews are somewhat borderline, I read the paper in detail. A specific concern raised by two reviewers was that the paper first presents a very general framework but then studies very restricted specific problems. Some reviewers felt that the paper was very well-written, while others felt it was poorly written. There are no experiments. + +For my part, I mostly concur that the paper is well-written (albeit quite technical). However, I agree with the concern from reviewers that the technical results all concern extremely restricted settings, and it's not clear what value the extremely general setup brings. I also find the title of the paper a bit puzzling. For specifics of the results, the practical value of Sections 4 and 5 is unclear. It's well-known that adding data noise is exactly equivalent to adding ridge regularization when doing linear regression. But ridge regularized linear regression would be a non-stochastic problem. So what is the value of studying the convergence rates in this case? The paper never makes this clear. + +I have concerns about the exponential convergence rate in Thm. 6.1. The paper claims that an exponential convergence rate for SGD has been extensively studied. I do not believe this is true. In general SGD does not have an exponential convergence rate. There are modified methods like SAG that achieve this on finite data sets, but that's not what's studied here. The paper cites two papers: The first is Bottou et al. (2018). This is a lengthy review, with no specific reference given. I am familiar with it and also spent time searching but could not find a specific result. Ma et al. (2018) is also cited. This indeed gives an exponential convergence rate but assuming that at the optimum the loss for all datapoints is zero! No such assumption is made in the submitted paper, and the issue is not further discussed. This is cause for grave concern.",ICLR2021, +lElCevcLrF,1610040000000.0,1610470000000.0,1,0NQdxInFWT_,0NQdxInFWT_,Final Decision,Reject,"The review phase was very constructive, where reviewers raised several opportunities for improvements. The authors did a very good job in their rebuttal, which led some reviewers to change their opinion in a positive direction. Overall, reviewers agree that this is the borderline paper with remaining concerns about the weak experimentation. The paper was again discussed by the Area Chair and Program chairs. Due to the competitive nature of the conference and the high bar of experimental evaluations expected by empirical papers, the paper was finally rejected. We hope authors will use the feedback from the reviews and make a stronger submission in near future. ",ICLR2021, +H18Zhf8dx,1486400000000.0,1486400000000.0,1,BJ6oOfqge,BJ6oOfqge,ICLR committee final decision,Accept (Poster),"The reviewers all agree that this is a strong, well-written paper that should be accepted to the conference. The reviewers would like to see the authors extend the analysis to larger data sets and extend the variety of augmentations. Two of the reviewers seem to suggest that some of the experiments seem too good to be true. Please consider releasing code so that others can reproduce experiments and build on this in future work.",ICLR2017, +yRJ4ibxazn2,1610040000000.0,1610470000000.0,1,mNtmhaDkAr,mNtmhaDkAr,Final Decision,Accept (Poster),"The paper studies the features extracted by the pre-trained language model and how fine-tuning makes use of these features. The paper is well-motivated by two lines of research in the NLP area -- 1) probing approaches for understanding the features extracted in the pre-training model, 2) model behavior analysis that shows models take shortcuts for making predictions. The paper provides a comprehensive study to bridge the gaps between these two lines of discussion. + +All the reviewers agree the paper has strong merits and concerns have been addressed. +",ICLR2021, +ZQNCMFufj5,1576800000000.0,1576800000000.0,1,rylZKTNYPr,rylZKTNYPr,Paper Decision,Reject,"The paper proposes an interesting idea to leave a very simple form for piecewise-linear RNN, but separate units in to two types, one of which acts as memory. The ""memory"" units are penalized towards the linear attractor parameters, i.e. making elements of $A$ close to 1 and off-diagonal of $W$ close to $1$. +The benchmarks are presented that confirm the efficiency of the model. +The reviewer opinion were mixed; one ""1"", one ""3"" and one ""6""; the Reviewer1 is far too negative and some of his claims are not very constructive, the ""positive"" reviewer is very short. Finally, the last reviewer raised a question about the actual quality on the results. This is not addressed. Although there is a motivation for such partial regularization, the main practical question is how many ""memory neurons"" are needed. I looked through the paper - this addressed only in the supplementary, where the value of $M_{reg}$ is mentioned (=0.5 M). For $M_{reg} = M$ it is the L2 penalty; what happens if the fraction is 0.1, 0.2, ... and more? A very crucial hyperparameter (and of course, smart selection of it can not be worse than L2RNN). This study is lacking. In my opinion, one can also introduce weights and sparsity constraints on them (in order to detect the number of ""memory"" neurons more-or less automatically). Although I feel this paper has a potential, it is not still ready for publication and could be significantly improved.",ICLR2020, +HkaMm1aSf,1517250000000.0,1517260000000.0,95,B1IDRdeCW,B1IDRdeCW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper analyzes mathematically why weights of trained networks can be replaced with ternary weights without much loss in accuracy. Understanding this is an important problem, as binary or ternary weights can be much more efficient on limited hardware, and we've seen much empirical success of binarization schemes. This paper shows that the continuous angles and dot products are well approximated in the discretized network. The paper concludes with an input rotation trick to fix discretization failures in the first layer. + +Overall, the contribution seems substantial, and the reviewers haven't found any significant issues. One reviewer wasn't convinced of the problem's importance, but I disagree here. I think the paper will plausibly be helpful for guiding architectural and algorithmic decisions. I recommend acceptance. +",ICLR2018, +S1gtv5p0JE,1544640000000.0,1545350000000.0,1,S1xq3oR5tQ,S1xq3oR5tQ,a good step in bringing computational neuroscience and CNNs together,Accept (Oral),The paper advocates neuroscience-based V1 models to adapt CNNs. The results of the simulations are convincing from a neuroscience-perspective. The reviewers equivocally recommend publication.,ICLR2019,5: The area chair is absolutely certain +B1xB2NM-lN,1544790000000.0,1545350000000.0,1,HyllasActm,HyllasActm,problems with the employed metrics - lack of novelty over static image neural compression,Reject,"The paper proposes a neural network architecture for video compression. The reviewers point out lack of novelty with respect to recent neural compression works on static images, which the present paper extends by adding a temporal consistency loss. More importantly, reviewers point our severe problems with the metrics used to measure compression quality, which the authors promise to take into account in a future manuscript. ",ICLR2019,5: The area chair is absolutely certain +QBNt8I0zAs,1576800000000.0,1576800000000.0,1,B1g79grKPr,B1g79grKPr,Paper Decision,Reject,"The paper addresses a video generation setting where both initial and goal state are provided as a basis for long-term prediction. The authors propose two types of models, sequential and hierarchical, and obtain interesting insights into the performance of these two models. Reviewers raised concerns about evaluation metrics, empirical comparisons, and the relationship of the proposed model to prior work. + +While many of the initial concerns have been addressed by the authors, reviewers remain concerned about two issues in particular. First, the proposed model is similar to previous approaches with sequential latent variable models, and it is unclear how such existing models would compare if applied in this setting. Second, there are remaining concerns on whether the model may learn degenerate solutions. I quote from the discussion here, as I am not sure this will be visible to authors [about Figure 12]: ""now the two examples with two samples they show have the same door in the middle frame which makes me doubt the method learn[s] anything meaningful in terms of the agent walking through the door but just go to the middle of the screen every time.""",ICLR2020, +BkhvLk6rf,1517250000000.0,1517260000000.0,808,SJPpHzW0-,SJPpHzW0-,ICLR 2018 Conference Acceptance Decision,Reject,"The paper defines a new measure of influence and uses it to highlight important features. The definition is novel however, the reviewers have concerns regarding its significance, novelty and a thorough empirical comparison to existing literature is missing.",ICLR2018, +S1cYhfIOg,1486400000000.0,1486400000000.0,1,Sy4tzwqxe,Sy4tzwqxe,ICLR committee final decision,Reject,"This paper is both time and topical in that it forms part of the growing and important literature of ways of representing approximate posterior distributions for variational inference that do need need a known or tractable density. The reviewers have identified a number of areas that when addressed will improve the paper greatly. These include: more clearer and structured introduction of methods, with the aim of highlighting how this work adds to the existing literature; careful explanation of the term wild variational inference, especially in context of alternative terms and more on relation to existing work; experiments on higher-dimensional data, much greater than 54 dimensions, to help better understand the advantages and disadvantages of this approach. It if for these reasons that the paper at this point is not yet ready for acceptance at the conference.",ICLR2017, +bzV3dNAsPw,1576800000000.0,1576800000000.0,1,Hkls_yBKDB,Hkls_yBKDB,Paper Decision,Reject,"The authors consider the problem of program induction from input-output pairs. +They propose an approach based on a combination of imitation learning from +an auto-curriculum for policy and value functions and alpha-go style tree search. +It is a applied to inducing assembly programs and compared to ablation +baselines. + +This paper is below acceptance threshold, based on the reviews and my own +reading. +The main points of concern are a lack of novelty (the proposed approach is +similar to previously published approaches in program synthesis), missing +references to prior work and a lack of baselines for the experiments.",ICLR2020, +BkeSQTGLdg,1486400000000.0,1486400000000.0,1,H1_EDpogx,H1_EDpogx,ICLR committee final decision,Reject,"This paper is well motivated and clearly written, and is representative of the rapidly growing interdisciplinary area of hardware-software co-design for handling large-scale Machine Learning workloads. In particular, the paper develops a detailed simulator of SSDs with onboard multicore processors so that ML computations can be done near where the data resides. + + Reviewers are however unanimously unconvinced about the potential impact of the simulator, and more broadly the relevance to ICLR. The empirical section of the paper is largely focused on benchmarking logistic regression models on MNIST, which reviewers find underwhelming. It is conceivable that the results reflect performance on real hardware, but the ICLR community would atleast expect to see realistic deep learning workloads on larger datasets such as Imagenet, where scalability challenges have been throughly studied. Without such results, the impact of the contribution is hard to evaluate and the claimed gains are bit of a leap of faith. + + The authors make several good points in their response about the paper - that their method is expected to scale, that high quality simulations can given insights that can inform hardware manufacturing, and that their approach complements other hardware and algorithmic acceleration strategies. They are encouraged to resubmit the paper with a stronger empirical section, e.g., benchmarking training and inference of Inception-like models on ImageNet.",ICLR2017, +UbYnHfmziIX,1642700000000.0,1642700000000.0,1,mL07kYPn3E,mL07kYPn3E,Paper Decision,Reject,"Thanks for your submission to ICLR. + +This paper presents an extension to prototypical networks based on using hyperspheres to represent the prototypes. Strong empirical results are presented using this approach. + +Overall, this is a very borderline paper and could go either way. The idea itself it simple, though the results seem to be fairly strong. I read through the paper myself and tend to think that it could use a bit more work before it's ready. Some of the issues raised by the reviewers---particularly with respect to experiments and literature review---are worth nailing down. Further, I think that the method could be explored in a more principled/theoretical way. For instance, when reading this idea, the first thing that pops into my mind is that representing the prototype with a hypersphere is very similar to representing a distribution (e.g., a Gaussian) using a mean and covariance (in this case, a spherical covariance). Indeed, if you take the KL divergence between two spherical Gaussians, you get something very similar to the expression used in the paper. This is all to say that there may be other more general directions to take this idea, or other interpretations of what is going on. + +Please do keep in mind the comments of the reviewers when preparing a future version of the manuscript.",ICLR2022, +TnIqz0QKizy,1610040000000.0,1610470000000.0,1,ZHJlKWN57EQ,ZHJlKWN57EQ,Final Decision,Reject,"After reading the paper, reviews and authors’ feedback. The meta-reviewer agrees with reviewers that the paper has limited novelty and could be more clear about mix precision training. Therefore this paper is rejected. + +Thank you for submitting the paper to ICLR. +",ICLR2021, +qYp1XTbCV8,1576800000000.0,1576800000000.0,1,HyljzgHtwS,HyljzgHtwS,Paper Decision,Reject,"Three reviewers recommend rejection. After a good rebuttal, the first reviewer is more positive about the paper yet still feels the paper is not ready for publication. The authors are encouraged to strengthen their work and resubmit to a future venue.",ICLR2020, +r1xsyihZe4,1544830000000.0,1545350000000.0,1,rk4Qso0cKm,rk4Qso0cKm,Paper decision,Accept (Poster),"Reviewers are in a consensus and recommended to accept after engaging with the authors. Please take reviewers' comments into consideration to improve your submission for the camera ready. +",ICLR2019,4: The area chair is confident but not absolutely certain +8Ntd842F0C,1610040000000.0,1610470000000.0,1,dOcQK-f4byz,dOcQK-f4byz,Final Decision,Accept (Poster),"The paper shows that standard transformers can be trained to generate satisfying traces for Linear Temporal Logic (LTL) formulas. To establish this, the authors train a transformer on a set of formulas, each paired with a single satisfying trace generated using a classical automata-theoretic solver. It is shown that the resulting model can generate satisfying traces on held-out formulas and, in some cases, scale to formulas on which the classical solver fails. + +The reviewers generally liked the paper. While the transformer model is standard, the use of deep learning to solve LTL satisfiability is novel. Given the centrality of LTL in Formal Methods, the paper is likely to inspire many follow-up efforts. There were a few concerns about the evaluation; however, I believe that the authors' comments address the most important of them. Given this, I am recommending acceptance. Please add the new experimental results (about out-of-distribution generalization) to the final version of the paper. ",ICLR2021, +r1geNGL21V,1544480000000.0,1545350000000.0,1,HkxCenR5F7,HkxCenR5F7,Many modifications to VAEs with little justification,Reject,"This paper heavily modifies standard time-series-VAE models to improve their representation learning abilities. However, the resulting model seems like an ad-hoc combination of tricks that lose most of the nice properties of VAEs. The resulting method does not appear to be useful enough to justify itself, and it's not clear that the same ends couldn't be pursued using simpler, more general, and computationally cheaper approaches.",ICLR2019,4: The area chair is confident but not absolutely certain +ALiv6LD7hj8,1610040000000.0,1610470000000.0,1,0Hj3tFCSjUd,0Hj3tFCSjUd,Final Decision,Reject,"Before the discussion phase nearly all reviewers had doubts about the comparison of the current work with state-of-the-art works (notably Yan et al., 2020, RetroXpert, and GraphRETRO). The authors then compared with these works and emphasized that these works rely on hand-crafted features. They argue that the fairest comparison is the one where each method uses the same sort of features during train/test time. This is because in certain real world settings we may not have accurate estimates of such features (e.g., atom mappings, templates, reaction centers). However, in the revised version of the paper the authors did not adhere to this concept of fair comparison in Table 4 of Appendix A.4. Here their method uses reaction centers as input while baselines do not. While the authors claimed that the comparison here was designed to show how reaction centers provided as input improved performance, this doesn't seem like a good way to show it: to isolate the improvement due to reaction center inputs you should fix everything else, i.e., the rest of the method. + +Apart from the above contradiction, I buy the arguments of reviewers that distinguishing between methods that use hand-crafted features and those that do not is not a meaningful distinction. One can apply atom-mapping or reaction center discovery algorithms as data preprocessing before applying other methods. Ablation studies where such preprocessing is added or removed are interesting, but it is completely fair for any method to use such preprocessing before applying their method, it is up to the modeller. + +I would have argued for acceptance had the authors either (a) just included results from SOTA methods (one, RetroXpert was published 1 month after the ICLR submission deadline), and/or (b) reran their approach with such preprocessing. However the authors ended up hurting the submission by emphasizing a difference between using handcrafted features and not, then contradicting their experimental setup in Table 4. + +This is a good paper, but I agree it is not ready to be accepted at ICLR. I recommend the authors do the following: (a) use any preprocessing they want for their method and compare with the state-of-the-art, (b) if they want they can run their method without any preprocessing as an interesting ablation study, (c) remove Table 4 (as (b) already does this type of an ablation study), (d) describe recent work through the lense of EBM, (e) resubmit to a strong ML conference. The new submission will be much stronger.",ICLR2021, +rygRw4j4lN,1545020000000.0,1545350000000.0,1,H1ERcs09KQ,H1ERcs09KQ,metareview for representation learning paper,Reject,"While this was a borderline paper, concerns about the novelty and significance of the presented work exist on the part of all reviewers, and no reviewer was willing to argue for acceptance. Many good points to the work exist, and a stronger case on these issues would greatly strengthen the paper overall. I look forward to a future submission.",ICLR2019,4: The area chair is confident but not absolutely certain +XEA_rN2DDE,1576800000000.0,1576800000000.0,1,Byes0TNFDS,Byes0TNFDS,Paper Decision,Reject,"The paper proposes an entropy penalty related to information bottleneck to deep neural network regression problems. The reviewers had a number of questions and concerns about the paper, which the authors did not address. In light of this, the reviewers agree that the paper is not yet ready for publication. Please carefully read and address the reviewer's concerns in future iterations of this paper.",ICLR2020, +RnUD1qipu,1576800000000.0,1576800000000.0,1,HyetFnEFDS,HyetFnEFDS,Paper Decision,Reject,"This paper proposes an approach for architecture search by framing it as a differentiable optimization over directed acyclic graphs. While the reviewers appreciated the significance of architecture search as a problem and acknowledged that the paper proposes a principled approach for this problem, there were concerns about lack of experimental rigor, and limited technical novelty over some existing works. ",ICLR2020, +ry2GnGLux,1486400000000.0,1486400000000.0,1,SyJNmVqgg,SyJNmVqgg,ICLR committee final decision,Invite to Workshop Track,"The authors propose a meta-learning algorithm which uses an RL agent to selectively filter training examples in order to maximise a validation loss. There was a lot of discussion about proper training/validation/test set practices. The author's setup seems to be correct, but the experiments are quite limited. Pro - interesting idea, very relevant for ICLR. Con - insufficient experiments. This is a cool idea which could be a nice workshop contribution.",ICLR2017, +ryxbNPPEeV,1545000000000.0,1545350000000.0,1,BkgGmh09FQ,BkgGmh09FQ,lack technical contributions,Reject,"This paper targets improving the computation efficiency of super resolution task. Reviewers have a consensus that this paper lacks technical contribution, therefore not recommend acceptance. ",ICLR2019,4: The area chair is confident but not absolutely certain +ji0Rswb2-a,1610040000000.0,1610470000000.0,1,YMsbeG6FqBU,YMsbeG6FqBU,Final Decision,Reject,"This paper introduces a new algorithm to solve game, more or less similar (in the general idea, yet differences are interesting) than CFR. The concept is to sample from past policies to generate trajectories and update sequentially (via regret matching). + +The three reviewers gave rather lukewarm reviews, with possible suggestions of improvements (that were more or less declined by the authors for those proposed by Rev3 and Rev4; the added material focuses more on the clarity of the text than on the content itself). + +I have also read the paper, and find it quite difficult to assess. At the end, it is not clear to a reader whether ARMAC is the new state of the art, or just a ""variant"" of CFR that will be soon forgotten. The performances do not seem astonishing (at least against NSFP) and even though DREAM might not be satisfactory to the authors (EDIT POST DISCUSSION: actually, DREAM is a valid competitor and must be included in the comparative study), it would have been nice to provide some comparison. Maybe the issue is the writing of the paper that could and should be improved so that it is clearer what are the different building blocks of ARMAC (and their respective importance). + +If ARMAC is the new state of the art, then I am sure the authors will be able to clearly illustrate it in a forthcoming revision (maybe with more experiments, as suggested by Rev2). Unfortunately, for the moment, I do not think this paper is mature enough for ICLR.",ICLR2021, +7ADOwXXGc2s,1610040000000.0,1610470000000.0,1,AM0PBmqmojH,AM0PBmqmojH,Final Decision,Reject,"The authors propose to approximate the kernel matrix used in the Sinkhorn algorithm by a combination of sparse + low rank approximation. To do so, the authors propose to compute a low rank approximation of a sparsified (thresholded below a certain value to be 0) kernel matrix using Nyström, and then correct it by adding back the true entries at non-sparse entries, after removing those obtained from the approximation. This results in a matrix whose application then results in sparse + low-rank. + +The first version of the paper contained mostly experimental evidence, which was deemed a bit short by some reviewers.The authors have added theoretical material on the way. Although I believe these are worthy additions, as AC, I do not feel comfortable accepting the paper as of now, because I believe these additions were not properly reviewed. I understand this must be disappointing for the authors, who have sprinted to add new content during the rebuttal phase, but I hope they agree that the rebuttal process is not here to handle entirely new sections, but rather to improve existing parts. In particular, that section should be reviewed by authors knowledgeable on low rank kernel factorization, something I did not see in the pool of reviewers. I also believe the paper still has a few shortcomings. Taken together, I therefore recommend a re-submission. + +ideas to improve the paper + +- the authors claim to use Nyström on a sparsified matrix (see eq. 4). The sparsified kernel is no longer positive definite. I would like the authors to comment on this. I understand Nyström could be used naively without any psd-ness guarantees, but I think a heads-up is needed.There are, furthermore, several local/global factorizations of kernel matrices available out there (e.g. MEKA, https://www.jmlr.org/papers/v18/15-025.html), the main difference here being that the product by such approximation must be guaranteed to be positive for it to work in the Sinkhorn algorithm. I would expect that bounds in expectation to break down sometimes, and therefore result in ""catastrophic"" failures (i.e. nan's). I think that an algorithm that claims to improve or replace another one, and which has such blind spots, needs such additional experiments (I have read the Limitations section in Appendix B, something more precise would improve the paper). I understand these were not part of the original Nyström paper for Sinkhorn, but since this is an increment over that previous work, therefore lacking a bit its originality, more knowledge needs to be contributed. + +- For instance, since the authors write an entire paragraph on this (Appendix B), I am surprised that there is not direct mention to the fact that a sparse sinkhorn may simply *not* converge, because it may not satisfy the fully indecomposable property required of matrices for Sinkhorn's algorithm to converge. + +- i dont think that users have the various identities (14,15) in mind when they think about ""backpropagating"" through Sinkhorn. What is typically needed is to compute the differentiable properties of the regularized OT matric and/or of the regularized OT cost w.r.t. *point locations* (i.e. x_i). The statement ""LCN-Sinkhorn works flawlessly with automatic backpropagation"" is misleading in the sense that it ignores that problem altogether. Since so many extensions of OT today relay on that differentiability, the section, as it is written now, is problematic. + +- several methods claim to be faster of more efficient than Sinkhorn to solve OT. Either these methods display faster theoretical convergence (e.g. by using acceleration) or display faster practical convergence (e.g. heavy ball variants) using synthetic, controlled datasets. Using synthetic data helps exhibit highlight relevant regimes for regularization parameters, including those where LSE Sinkhorn may converge but LCN does not work, or vice-versa. I understand that the authors' wanted to use real data, but it would be great to clarify whether that setup was used because LCN works better there (in which case this becomes more of a paper at the intersection of OT and word embeddings) or because this happened to be the first and only example the authors thought of.",ICLR2021, +fG9nu4rfLpD,1610350000000.0,1610470000000.0,1,HkUfnZFt1Rw,HkUfnZFt1Rw,Final Decision,Reject,"This paper studies various graph measures in depth. The paper was reviewed by three expert reviewers who complemented the ease of understanding because of clear writing. But they also expressed concerns for limited novelty, theoretical justification, and unrealistic setting. The authors are encouraged to continue research, taking into consideration the detailed comments provided by the reviewers.",ICLR2021, +QGMBIDoC9y,1576800000000.0,1576800000000.0,1,H1ggKyrYwB,H1ggKyrYwB,Paper Decision,Reject,"The paper proposes a technique for incorporating prior knowledge as relations between training instances. + +The reviewers had a mixed set of concerns, with one common one being an insufficient comparison with / discussion of related work. Some reviewers also found the clarity lacking, but were satisfied with the revision. One reviewer found the claim of the approach being general but only tested and valid for the VQA dataset problematic. + +Following the discussion, I recommend rejection at this time, but encourage the authors to take the feedback into account and resubmit to another venue.",ICLR2020, +_buiDLO_l9,1610040000000.0,1610470000000.0,1,4ADnf1HqIw,4ADnf1HqIw,Final Decision,Reject,"The 4 reviewers all had a consistent view of this paper: concern that the scope of the work was overstated (paper claims, without evidence, to apply in more generality than the 1 example scenario shown); concern about the difficulty of implementing this approach (1 TSNN required for each rendered viewpoint); and lack of examples showing how the method performs under more challenging scenarios. + +The AC encourages the authors to revise the work in response to the reviews. That would involve additional experimentation and examples, and some attention to revising the manuscript. After two of the reviewers complained of lack of clarity in the algorithm description, the authors replied, ""We explain our algorithm in the paper; the reader can refer to our code for implementation details."" I hope the authors can be more responsive to the readers' concerns than that in their revisions.",ICLR2021, +YtH1GgizWo3,1610040000000.0,1610470000000.0,1,xFYXLlpIyPQ,xFYXLlpIyPQ,Final Decision,Reject,"The paper studies the problem of learning the step size of gradient descent for quadratic loss. Interesting theoretical results are presented, which formally support the empirically observed problems of exploding/vanishing gradients, as well as another result showing that if meta-learning is done based on the validation performance, optimal performance can be achieved for a simple linear regression task. + +On the negative side, there are several issues which preclude publication of the paper in its current stage: + +1. The claims in the text seem to be much stronger than what is actually proved. +2. The contributions are not properly connected to the literature (e.g., the relation to Metz et al. 2013 is not properly discussed). +3. Not mentioned in the reviews, but the paper does not explore the connection to similar results coming from online learning/sequential optimization. Recently there has been a surge of papers analyzing meta-learning from an online learning perspective; as an example, Khodak et al. (2019) presents an adaptive step-size tuning with guarantees for a much more general problem setting. It could also be interesting to explore if the exploding gradient problem is also related to issues with mirror descent as described in Section 4.1 of Orabona and Pal (2018). +4. The presentation in the main text does not provide enough insight about the results, as too much material is relegated to the appendix. +5. The presentation is often imprecise; it is somewhat questionable (though it is a matter of taste) if the informal theorems are useful (why call them theorems?), but Corollary 1 is not indicated to be informal, yet it is hard to interpret formally. There are other issues such as the statements of Theorems 5 and 6 where conditional expectations are used without explicitly showing the conditions, high probability bounds are stated although the error probability never appears, etc. +6. It is not clear how meta-learning helps in Theorem 6 compared to methods adaptively tuning the step size (as a recent work, see, e.g., Joulani et al. 2020 and the references therein). + + + +M. Khodak, M-F. Balcan, A. Talwalkar. Adaptive Gradient-Based Meta-Learning Methods. NeurIPS 2019. +F. Orabona, D. Pal. Scale-free online learning. Theoretical Computer Science 716, 50-69, 2018. +P. Joulani, A. Raj, A. Gyorgy, C. Szepesvari. A simpler approach to accelerated optimization: iterative averaging meets optimism. ICML 2020. + +",ICLR2021, +J7y9cWT3FY,1576800000000.0,1576800000000.0,1,BJgLpaEtDS,BJgLpaEtDS,Paper Decision,Reject,"The paper received 3, 3, 6. All reviewers agree that the method is technically interesting. The main concern shared by the reviewers are the experiments which are somewhat underwhelming. The AC believes that this is a solid technical paper that needs a little bit more work. The authors are encouraged to strengthen their evaluation and resubmit to a future conference.",ICLR2020, +#NAME?,1610040000000.0,1610470000000.0,1,hE3JWimujG,hE3JWimujG,Final Decision,Reject,"Reviewers split on this paper with one arguing that it is an intriguing and significant paper for both neuroscience and deep learning, whereas others argued that it fails to answer some key questions and stops short of offering testable predictions or novel findings. In particular Reviewer 2 questioned the limited experimental predictions and their experimental backing, as well as the plausibility of gradient calculations. Reviewer 4 raised more fundamental concerns about the significance of the paper's contributions. All reviewers appreciated the paper's clarity. Overall, though, Reviewers 2 and 4 raised enough significant concerns that I cannot recommend acceptance. ",ICLR2021, +HkW67J6BG,1517250000000.0,1517260000000.0,231,BkabRiQpb,BkabRiQpb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The reviewer reactions to the initial manuscript were generally positive. They considered the paper to be well written and clear, providing an original contribution to learning to cooperate in multi-agent deep RL in imperfect domains. The reviewers raised a number of specific issues to address, including improved definitions and descriptions, and proper citations of related work. The authors have substantially revised the manuscript to address most or all of these issues. At this point, the only knock on this paper is that the findings seemed unsurprising from a game-theoretic or deep learning point of view. + +Pros: algorithmic contribution, technical quality, clarity +Cons: no real surprises ",ICLR2018, +kHlw7oPBZQs,1610040000000.0,1610470000000.0,1,vsU0efpivw,vsU0efpivw,Final Decision,Accept (Poster),"Shapley values are an important approach in extracting meaning from trained deep neural networks, and the paper proposes an innovative approach to address inefficiencies in post-processing to compute Shapley values, by instead incorporating their computation into training. There was a robust discussion of this paper, and the authors' comments and changes substantially strengthened the paper and the reviewers' view of it, to the point that all reviewers now recommend acceptance. Some lingering concerns remain that the authors should continue to work to address. Is the method of computing Shapley values used as the baseline in the paper really state-of-the-art, or artificially weak? The empirical results were methodologically sound but not as strong as one might expect or hope. These concerns detract somewhat from enthusiasm, but nevertheless the paper yields an innovation to a widely-used approach to one of the most pressing current research problems. The reviewers had a number of smaller suggestions that should also be incorporated including more significance testing and reporting of resulting p-values.",ICLR2021, +kPMp7dDc27,1610040000000.0,1610470000000.0,1,hbzCPZEIUU,hbzCPZEIUU,Final Decision,Reject,"This paper introduces a method for hierarchical classification with deep networks. The idea is interesting, and as far as I know novel: namely, the authors add a regularizer to the last layer in order to enforce a hierarchical structure onto the classifiers. The idea of placing spheres (with a fixed radius) around each classifier and forcing the child-classifiers to lie on these spheres is quite clever. +The reviewers have pointed out some concerns with this paper. Some had to do with terminology (which the authors should fix but which is no big deal), but the main weakness are the experimental results and the ablation study. The reviewers were not convinced that the optimization in the Euclidean space wouldn't be sufficient. A more thorough ablation study could help here. + +This is the kind of paper that I really want to see published eventually, but right now isn't quite ready yet. If you make one more iteration (in particular adding a stronger ablation study) it should be a strong submission to the next conference. Good luck!",ICLR2021, +Byg5N1djkE,1544420000000.0,1545350000000.0,1,S1xNb2A9YX,S1xNb2A9YX,metareview,Accept (Poster),"This paper characterizes a particular kind of fragility in the image classification ability of deep networks: minimal image regions which are classified correctly, but for which neighboring regions shifted by one row or column of pixels are classified incorrectly. Comparisons are made to human vision. All three reviewers recommend acceptance. AnonReviewer1 places the paper marginally above threshold, due to limited originality over Ullman et al. 2016, and concerns about overall significance. +",ICLR2019,4: The area chair is confident but not absolutely certain +GOVqFc_tGlQ,1642700000000.0,1642700000000.0,1,y1faDxZ_-0a,y1faDxZ_-0a,Paper Decision,Reject,"This manuscript proposes an extension of semi-supervised learning to the federated setting. The contributions include a thorough evaluation of performance and some method extensions. + +There are four reviewers. One reviewer points out a name leakage issue in the code that was missed and suggests deks-rejection. The area chair has chosen not to desk-reject the paper. Three other reviews agree that the manuscript addresses an interesting and timely issue -- indeed, label acquisition is a significant issue in federated learning. Three reviewers agree to reject the paper -- raising concerns about novelty compared to existing methods, some details of the evaluation, and some lack of clarity. The authors provide a good rebuttal addressing many of these issues. However, the reviewers are unconvinced that the method is sufficiently novel after reviews and discussion. Authors are encouraged to address the highlighted concerns for future submission of this work.",ICLR2022, +A5Y2Zp-pkh,1610040000000.0,1610470000000.0,1,8YFhXYe1Ps,8YFhXYe1Ps,Final Decision,Reject,"All the reviewers agree that the paper presents an interesting idea, and the main concern raised by the reviewers was the clarity of the paper. I believe that the authors have improved the presentation of the paper after rebuttal, however, I still believe that the paper woudl require another round of reviews before being ready for publication, in order to properly assess its contributions. ",ICLR2021, +OYMnCO123b,1576800000000.0,1576800000000.0,1,HklXn1BKDH,HklXn1BKDH,Paper Decision,Accept (Poster),"The paper presents a method for visual robot navigation in simulated environments. The proposed method combines several modules, such as mapper, global policy, planner, local policy for point-goal navigation. The overall approach is reasonable and the pipeline can be modularly trained. The experimental results on navigation tasks show strong performance, especially in generalization settings. ",ICLR2020, +BygEH5ra14,1544540000000.0,1545350000000.0,1,S1xLZ2R5KQ,S1xLZ2R5KQ,Lack of comparison with recent baselines,Reject,"This paper proposes a framework of image restoration by searching for a MAP in a trained GAN subject to a degradation constraint. Experiments on MNIST show good performance in restoring the images under different types of degradation. + +The main problem as pointed out by R1 and R3 is that there has been rich literature of image restoration methods and also several recent works that also utilized GAN, but the authors failed to make comparison any of those baselines in the experiments. Additional experiments on natural images would provide more convincing evidence for the proposed algorithm. + +The authors argue that the restoration tasks in the experiments are too difficult for TV to work. It would be great to provide actual experiments to verify the claim.",ICLR2019,5: The area chair is absolutely certain +NHCgao-sUr2,1642700000000.0,1642700000000.0,1,UTTrevGchy,UTTrevGchy,Paper Decision,Reject,"This paper proposes InfoMax Termination Critic (IMTC), a new approach for learning option termination conditions with the aim of discovering more diverse options. IMTC relies on a scalable approximation of the gradient of a mutual information objective with respect to the termination function parameters. + +Reviewers liked the motivation and the simplicity of the approach. While there were some initial concerns regarding the similarity of IMTC and VIC, the authors did a good job of clarifying the differences and providing additional results in the rebuttal. While two reviewers raised their scores based on the rebuttal, this left reviewers split on whether to accept or reject the paper. + +Given that the paper’s main contributions are evaluated empirically I based my decision on the strength of the evaluation. The main claim in the paper is that IMTC significantly improves the diversity of the learned options when combined with intrinsic control methods like VIC and RVIC. The main supporting evidence of this claim is a visualization of the option policies and termination probabilities reached by VIC and RVIC. There are several issues with this comparison: +* This is a poor visualization of the kind of option diversity the paper aims to obtain. Given that mutual information based objectives used by VIC, RVIC and IMTC aim to optimize diversity in the final states reached by the options, visualizing the distribution of final states or the trajectories produced by the options is more meaningful. +* The VIC and RVIC baselines are evaluated with a fixed option termination probability of 0.1 which biases the comparison in favor of IMTC because IMTC is able to choose when and where to terminate while VIC and RVIC with random termination get to control neither. Using fixed option duration with MI-based option discovery methods like VIC, DIAYN and RVIC is more standard and is known to produce options with very clear terminal state clusters which are well-separated for different options. Fixed option duration allows VIC and RVIC precise control of where they will terminate since option duration is fixed, hence it should have been included in the comparison. +* As mentioned in point 2 above, it is well established that VIC tends to learn options with well-clustered end states, especially in simple gridworld domains like in Figure 3 (see VIC, DIAYN and RVIC papers). The authors seem to obtain different qualitative results raising questions. + +Overall, I don’t think the qualitative experiments show that IMTC is able to improve the diversity of options discovered by VIC or RVIC due to issues with how the experiments are done (random option duration for VIC and RVIC) and how the results are presented (visualizing action probabilities instead of final states). Given these concerns and the split among the reviewers I recommend rejecting the paper in its current form.",ICLR2022, +rJLRNy6HG,1517250000000.0,1517260000000.0,465,r154_g-Rb,r154_g-Rb,ICLR 2018 Conference Acceptance Decision,Reject,"Overall the reviewers appear to like the ideas in this paper, though this is some disagreement about novelty (I agree with the reviewer who believes that the top-level search can very easily be interpreted as an MDP, making this very similar to SMDPs). The reviewers generally felt that the experimental results need to more closely compare with some existing techniques, even if they're not exactly for the same setting.",ICLR2018, +RUsqeh9ypf4,1610040000000.0,1610470000000.0,1,ot9bYHvuULl,ot9bYHvuULl,Final Decision,Reject,"Given two data measures in R^d, this paper proposes to use a NN to augment the representation of each data point found in these measures with additional coordinates. The measures are then compared using the sliced Wasserstein distance on these augmented representations. Because this augmentation is injective by design (the original vectors are part of the new representation) simple metric properties are kept. The authors propose to learn in a robust/adversarial way these augmentations. They propose simple experiments to illustrate that idea. + +Although I found the idea interesting, I think it falls short of acceptance at ICLR. I agree with the sentiment of other reviewers 1 and 2 that defining another variant of robust/NN inspired variant of the W distance is interesting, but at this point the readership of the conference expects more than simple experiments on toy data and hard to interpret GAN results. I think there is value in the draft as it stands now, but that more efforts are needed to convince this variant is scalable / useful for other downstream tasks (e.g. W barycenters, or other easier to interpret W problems in lower dimensions). + +minor comments +- as it stands, equation 2 is wrong if you do not add more conditions on the cost function d(.,.). +- "" the idea of SWD by projecting distributions onto hypersurfaces rather than hyperplane"" -> this is wrong, the projection is done onto lines or curves, not hyperplanes or hypersurfaces. +",ICLR2021, +SJgASuyt1V,1544250000000.0,1545350000000.0,1,B1GSBsRcFX,B1GSBsRcFX,meta-review,Reject,"The paper proposes an interesting data-dependent regularization method for orthogonal-low-rank embedding (OLE). Despite the novelty of the method, the reviewers and AC note that it's unclear whether the approach can extend other settings with multi-class or continuous labels or other loss functions. ",ICLR2019,4: The area chair is confident but not absolutely certain +LfOSrjQrssL,1610040000000.0,1610470000000.0,1,ztMLindFLWR,ztMLindFLWR,Final Decision,Reject,"The paper explores the representation power of GNNs, in particular, studying the bottleneck and improving expressiveness with new aggregators, which are analyzed theoretically. This issue was highlighted in previous works, but the merit of this paper is a constructive analysis. + +The reviewers were overall not enthusiastic and raised a few concerns: +- Not enough context is provided about related work, in particular, the early work of Corso et. al. +- Insufficiently convincing experiments + +While the authors provided an elaborate rebuttal and extended the experimental section to address experiment concerns raised by most of the reviewers, the final evaluation was still lukewarm. Given that the conference has a very high bar and there have been many very good submissions on graphs, we find the paper not quite above the bar and hence have no choice but to recommend rejection with a heavy heart. The authors should be commended on their efforts and are encouraged to seek publication elsewhere. + +",ICLR2021, +UccjW6ik1Tf,1610040000000.0,1610470000000.0,1,U7-FJu0iE3t,U7-FJu0iE3t,Final Decision,Reject,"This paper introduces the idea of cascading decision trees. The reviewers agree that this is a potentially novel and valuable idea, but they also agree that the paper fall short in execution. The paper would be substantially strengthened with more theoretical analysis, more discussion of why cascading decision trees are useful, and most importantly substantially more empirical evaluation, especially with more data sets and more baselines for comparison.",ICLR2021, +ByxYRHTWeE,1544830000000.0,1545350000000.0,1,Syx_Ss05tm,Syx_Ss05tm,Paper decision,Accept (Poster),"Reviewers mostly recommended to accept after engaging with the authors. I have decided to reduce the weight of AnonReviewer3 because of the short review. Please take reviewers' comments into consideration to improve your submission for the camera ready. +",ICLR2019,4: The area chair is confident but not absolutely certain +1Sqk5ygBVmA,1642700000000.0,1642700000000.0,1,R2AN-rz4j_X,R2AN-rz4j_X,Paper Decision,Reject,"The paper proposes an empirical study on the effect of various types of output layers of deep neural networks in different scenarios of continual learning. The authors draw several insights, such as ways of selecting the best output layer depending on type of scenario and a description of the different sources of performance drop (forgetting, interference, and projection drifts). The paper proposes different ways of mitigating catastrophic forgetting: a weight normalization layer, two masking strategies, and a variant of NMC using median vectors. + +The paper presented a detailed experimental setup covering a large number of scenarios of continual learning: incremental, Lifelong Learning and Mixed Scenario. This was highlighted by Reviewer BTLN, and the AC agrees. + +The main point of criticism for the work is the lack of novelty and the low significance of the findings. These were highlighted by all four reviewers. + +Perhaps the aspect limiting significance is the fact that the feature extractors are assumed to be fixed, which is unlike most interesting settings in continual learning. This was mentioned by reviewers BTLN, uN9P, e1ZF. It is unclear whether the findings provided in this work would generalize to that setting. On that note, Reviewer e1ZF points out that not adapting the feature extractor could be the source of some inconsistencies observed. Studying this further would improve the work. + +Overall, all four reviewers recommend rejecting the paper. The AC agrees with this decision and encourages the authors to consider extending the analysis to situations where the feature extractor is not fixed.",ICLR2022, +ADzYVehUJUi,1642700000000.0,1642700000000.0,1,T_p2GaXuGeA,T_p2GaXuGeA,Paper Decision,Reject,"The reviewers all generally find the paper both well-motivated in addressing an important challenge as well as well-written. However, there's quite a bit of hesitation around whether the proposed metric is convincing enough as an approach to measure local calibration. + +Reviewer 76PS and 784d's concerns around the choice of feature map and associated hyperparameters remain unaddressed, and I agree with their concern. There is no clear understanding of what constitutes a ""good"" feature map, which makes the metric quite difficult to use whether as a benchmark of ML methods or for general application. I recommend the authors use the reviewers' feedback to enhance their preprint should they aim to submit to a later venue.",ICLR2022, +lBwhMeUBZEc,1642700000000.0,1642700000000.0,1,shpkpVXzo3h,shpkpVXzo3h,Paper Decision,Accept (Spotlight),"This paper proposes Adam and Momentum optimizers, where the optimizer state variables are quantized to 8bit using block dynamics quantization. These modifications significantly improve the memory requirements of training models with many parameters (mainly, NLP models). These are useful contributions which will enable training even larger models than possible today. All reviewers were positive.",ICLR2022, +vwetc35iBu,1576800000000.0,1576800000000.0,1,HJluEeHKwH,HJluEeHKwH,Paper Decision,Reject,"This paper proposes a differentiable version of CEM, allowing CEM to be used as an operator within end-to-end training settings. The reviewers all like the idea -- it is simple and should be of interest to the community. Unfortunately, the reviewers also are in consensus that the experiments are not sufficiently convincing. We encourage the authors to expand the empirical analysis, based on the reviewer's specific comments, and resubmit the paper to a future venue.",ICLR2020, +HylyXR-llE,1544720000000.0,1545350000000.0,1,BJzbG20cFQ,BJzbG20cFQ,novel high-performing model; thorough experimental analysis and discussion; clarity could be improved,Accept (Poster),"1. Describe the strengths of the paper. As pointed out by the reviewers and based on your expert opinion. + +- The problem is well-motivated and related work is thoroughly discussed +- The evaluation is compelling and extensive. + +2. Describe the weaknesses of the paper. As pointed out by the reviewers and based on your expert opinion. Be sure to indicate which weaknesses are seen as salient for the decision (i.e., potential critical flaws), as opposed to weaknesses that the authors can likely fix in a revision. + +- Very dense. Clarity could be improved in some sections. + +3. Discuss any major points of contention. As raised by the authors or reviewers in the discussion, and how these might have influenced the decision. If the authors provide a rebuttal to a potential reviewer concern, it’s a good idea to acknowledge this and note whether it influenced the final decision or not. This makes sure that author responses are addressed adequately. + +No major points of contention. + +4. If consensus was reached, say so. Otherwise, explain what the source of reviewer disagreement was and why the decision on the paper aligns with one set of reviewers or another. + +The reviewers reached a consensus that the paper should be accepted. +",ICLR2019,4: The area chair is confident but not absolutely certain +VqyyNWdNL,1576800000000.0,1576800000000.0,1,r1lIKlSYvH,r1lIKlSYvH,Paper Decision,Reject,"This manuscript investigates the posterior collapse in variational autoencoders and seeks to provide some explanations from the phenomenon. The primary contribution is to propose some previously understudied explanations for the posterior collapse that results from the optimization landscape of the log-likelihood portion of the ELBO. + +The reviewers and AC agree that the problem studied is timely and interesting, and closely related to a variety of recent work investigating the landscape properties of variational autoencoders and other generative models. However, this manuscript also received quite divergent reviews, resulting from differences in opinion about the technical difficulty and importance of the results. In reviews and discussion, the reviewers noted issues with clarity of the presentation and sufficient justification of the results. There were also concerns about novelty. In the opinion of the AC, the manuscript in its current state is borderline, and should ideally be improved in terms of clarity of the discussion, and some more investigation of the insights that result from the analysis.",ICLR2020, +bgMqqD6tkI,1576800000000.0,1576800000000.0,1,Hyg96gBKPS,Hyg96gBKPS,Paper Decision,Accept (Poster),"This paper extends previous models for monotonic attention to the multi-head attention used in Transformers, yielding ""Monotonic Multi-head Attention."" The proposed method achieves better latency-quality tradeoffs in simultaneous MT tasks in two language pairs. + +The proposed method is a relatively straightforward extension of the previous Hard and Infinite Lookback monotonic attention models. However, all reviewers seem to agree that this paper is a meaningful contribution to the task of simultaneously MT, and the revised version of the paper (along with the authors' comments) addressed most of the raised concerns. + +Therefore, I propose acceptance of this paper.",ICLR2020, +Vuw0KsxRSug,1610040000000.0,1610470000000.0,1,KtH8W3S_RE,KtH8W3S_RE,Final Decision,Accept (Poster),Reviewers agree that the paper excels in providing a principle pipeline that combines CNNs and GPs with a Poisson-Gamma distribution to provide a generic approach for multiresolution modelling of tumour mutation rates. As a whole such combination of techniques addresses a key challenge in computational biology that also scales to large datasets. ,ICLR2021, +SJlncKTMlV,1544900000000.0,1545350000000.0,1,SyeLno09Fm,SyeLno09Fm,Well-motivated idea but execution and analysis is not convincing,Reject,"This work proposes to use the MAML meta-learning approach in order to tackle the typical problem of insufficient demonstrations in IRL. + +All reviewers found this work to contain a novel and well-motivated idea and the manuscript to be well-written. The combination of MAML and MaxEnt IRL is straightforward, as R2 points out, however the AC does not consider this to be a flaw given that the main novelty here is the high-level idea rather than the technical details. + +However, all reviewers agree that for this paper to meet the ICLR standards, there has to be an increase in rigorousness through (a) a more close examination of assumptions, sensitivity of parameters and connections to imitation learning (b) expanding the experimental section.",ICLR2019,5: The area chair is absolutely certain +DUB3Q0ubbnV,1642700000000.0,1642700000000.0,1,ziRLU3Y2PN_,ziRLU3Y2PN_,Paper Decision,Accept (Poster),"This paper proposes a new wavelet-based model to represent textures. The model incorporates a wide range of statistics, by computing covariances between rectified wavelets coefficients, at different scales, phases and positions. The model can synthesize textures that have a similar quality to state-of-the-art texture models using CNN structure. Qualitative results are shown to demonstrate the effectiveness of the model. The paper studies an important problem in computer vision and neuroscience, which is texture modeling. However, many important related works are missing. After rebuttal, three of four reviewers champion accepting the work because the proposed wavelet-based texture model, which produces competitive synthesis with much less parameters than the CNN-based model, will be beneficial to the fields of computer vision and neuroscience. One reviewer has critical comments on this paper because the paper lacks a comparison again more recent works both quantitatively and qualitatively. However, during rebuttal, the authors expressed their disagreement with it and pointed out that the goal of the paper is to bridge the gap between the classical work of Portilla and Simoncelli (2000), and the CNN-based models and to find what statistics are needed to describe the geometric structures in natural textures. Their discussion didn't reach an agreement after rebuttal. After an internal discussion, AC recommends accepting the paper but urges the authors to improve their paper by taking into account all the suggestions from reviewers, especially include the discussion or comparison with those related works mentioned in the rebuttal.",ICLR2022, +Bkzf71pBG,1517250000000.0,1517260000000.0,85,H1WgVz-AZ,H1WgVz-AZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The submission modifies the SPEN framework for structured prediction by adding an inference network in place of the usual combinatorial optimization based inference. The resulting architecture has some similarity to a GAN, and significantly increases the speed of inference. + +The submission provides links between two seemingly different frameworks: SPENs and GANs. By replacing inference with a network output, the connection is made, but importantly, this massively speeds up inference and may mark an important step forward in structured prediction with deep learning. ",ICLR2018, +HkgzA3qze4,1544890000000.0,1545350000000.0,1,H1MzKs05F7,H1MzKs05F7,The paper has intriguing ideas but requires more work,Reject,"This paper suggests that adversarial vulnerability scales with the dimension of the input of neural networks, and support this hypothesis theoretically and experimentally. + +The work is well-written, and all of the reviewers appreciated the easy-to-read and clear nature of the theoretical results, including the assumptions and limitations. (The AC did not consider the criticisms raised by Reviewer 3 justified. The norm-bound perturbations considered here are a sufficiently interesting unsolved problem in the community and a clear prerequisite to solving the broader network robustness problem.) + +However, many of the reviewers also agreed that the theoretical assumptions - and, in particular, the random initialization of the weights - greatly oversimplify the problem. Reviewers point out that the lack of data dependence and only considering the norm of the gradient considerably limit the significance of the corresponding theoretical results, and also does not properly address the issue of gradient masking. ",ICLR2019,5: The area chair is absolutely certain +NDZd-r7WJyn,1610040000000.0,1610470000000.0,1,1OP1kReyL56,1OP1kReyL56,Final Decision,Reject,"While the paper has merits, the experiments are lacking in important respects: I agree with Reviewer 1 that it is a serious problem that the approach is not evaluated on truly low-resource languages - since a significant pivot-to-target language bias is to be expected (as also suggested by Reviewer 2). I also agree with the sentiment that the work is not properly baselined, without considering alternative ways of using the pivot language development data. I also agree with Reviewer 3 that the 1:1 assumption is limiting, given that multi-source transfer has been de facto standard since 2011 (see, e.g., work by McDonald, Søgaard, Cohen, etc.). I’m also a little worried about using dev data for unlabelled data, since this is data from the exact same sample as the test data. In practice, dev data will be biased, and artificially removing this bias will lead to overly optimistic results. ",ICLR2021, +SJAwYKJaDC,1576800000000.0,1576800000000.0,1,HJx-3grYDB,HJx-3grYDB,Paper Decision,Accept (Poster),"The paper extends recent value function factorization methods for the case where limited agent communication is allowed. The work is interesting and well motivated. The reviewers brought up a number of mostly minor issues, such as unclear terms and missing implementation details. As far as I can see, the reviewers have addressed these issues successfully in their updated version. Hence, my recommendation is accept.",ICLR2020, +HylMSdDexV,1544740000000.0,1545350000000.0,1,HyGh4sR9YQ,HyGh4sR9YQ,Improvement needed.,Reject,"This paper presents an empirical study of the applicability of genetic algorithms to deep RL problems. Major concerns of the paper include: 1. paper organization, especially the presentation of the results, is hard to follow; 2. the results are not strong enough to support that claims made in this paper, as GAs are currently not strong enough when compared to the SOTA RL algorithms; 3. Not quite clear why or when GAs are better than RL or ES; Lack of insights. Overall, this paper cannot be accepted yet. +",ICLR2019,5: The area chair is absolutely certain +d63t9IH680,1576800000000.0,1576800000000.0,1,rylCP6NFDB,rylCP6NFDB,Paper Decision,Reject,"The paper pursues an interesting approach, but requires additional maturation. The experienced reviewers raise several concerns about the current version of the paper. The significance of the contribution was questioned. The paper missed key opportunities to evaluate and justify critical aspects of the proposed approach, via targeted ablation and baseline studies. The quality and clarity of the technical exposition was also criticized. The comments submitted by the reviewers should help the authors strengthen the paper. ",ICLR2020, +shxyC0wba,1576800000000.0,1576800000000.0,1,BkxoglrtvH,BkxoglrtvH,Paper Decision,Reject,"This paper investigates the properties of deep neural networks as they learn, and how they may relate to human visual learning (e.g. how learning develops across regions of the infant brain). The paper received three reviews, all of which recommended Weak Reject. The reviewers generally felt the topic of the paper was very interesting, but overall felt that the insights that the paper revealed were relatively modest, and had concerns about the connections between DNN and human learning (e.g., the extent to which DNNs are biologically plausible -- including back propagation, batch normalization, random initialization, etc. -- and whether this matters for the conclusions of the present study). In response to comments, the authors undertook a significant revision to try to address these points of confusion. However, the reviewers were still skeptical and chose to keep their Weak Reject scores. + +The AC agrees with reviewers that investigations of the similarity -- or not! -- between infant and deep neural networks is extremely interesting and, as the authors acknowledge, is a high risk but potentially very high reward research direction. However, in light of the reviews with unanimous Weak Reject decisions, the AC is not able to recommend acceptance at this time. I strongly encourage authors to continue this work and submit to another venue; this would seem to be a perfect match for CogSci conference, for example. We hope the reviews below help authors to improve their manuscript for this next submission.",ICLR2020, +60weOraUqt,1576800000000.0,1576800000000.0,1,S1efAp4YvB,S1efAp4YvB,Paper Decision,Reject,"The paper addresses interpretability in the video data domain. The authors study and compare the saliency maps for 3D CNNs and convolutional LSTMs networks, analysing what they learn, and how do they differ from one another when capturing temporal information. To search for the most informative part in a video sequence, the authors propose to adapt the meaningful perturbations approach by Fong & Vedaldi (2017) to the video domain using temporal mask perturbations. +While all reviewers and AC acknowledge the importance and potential usefulness of studying and comparing different generative models in continual learning, they raised several important concerns that place this paper below the acceptance bar: +(1) in an empirical study paper, an in-depth analysis and insightful evaluations are required to better understand the benefits and shortcomings of the available and proposed models (R5 and R2). Specifically: +(i) providing a baseline comparison to assess the benefits of the proposed approach -- please see R5’s suggestions on the baseline methods; +(ii) analyzing how the proposed approach can elucidate meaningful differences between 3D CNNs and LSTMs (R5, R2). The authors discussed in their rebuttal some of these questions, but a more detailed analysis is required to fully understand the benefits of this study. +(2) R5 and R2 raised an important concern that the temporal mask generation developed in this work is grounded on the generation of the spatial masks, which is counterintuitive when analysing the temporal dynamics of the NNs - see R5’s suggestions on how to improve. +Also R5 has raised concerns regarding the qualitative analysis of the Grad-CAM visualizations. Happy to report that the authors have addressed these concerns in the rebuttal, namely reporting the results in Table 2 and providing an updated discussion. R1 has raised a concern about the importance of the sub-sampling in the CNN framework, which was partially addressed in the rebuttal. +To conclude, the AC suggest that in its current state the manuscript is not ready for a publication and needs a major revision before submitting for another round of reviews. We hope the reviews are useful for improving and revising the paper. +",ICLR2020, +F3qSjZNAzwc,1610040000000.0,1610470000000.0,1,7I12hXRi8F,7I12hXRi8F,Final Decision,Accept (Poster),"This paper proposes an approach to learn the causal structure underlying a dataset with acyclicity and other structure constraints, and then used the inferred structure to compute partial causal effects. The authors show that, on simulated data, the proposed method outperforms others in the literature. The manuscript also contains an analysis of real-world data that describes the causal effects of the lockdown of cities in the Hubei province (China) to reduce the spread of COVID-19. + +Overall, the reviewers think that this is a well structured and written paper. From a novelty viewpoint, the main contribution consists in formalising the causal contribution of mediators, as the method for computing the causal structure is based on a small modification to previous literature. + +The main concern raised by the reviewers were on the experimental evaluation. Some of these concerns were addressed by the authors during rebuttal, whilst some on the number of nodes remained. We encourage the authors to consider these concerns in the final version of the manuscript.",ICLR2021, +QJCufRw2QB,1642700000000.0,1642700000000.0,1,XhMa8XPHxpw,XhMa8XPHxpw,Paper Decision,Reject,"The paper investigates the performance of low-precision Stochastic Gradient Langevin Dynamics (SGLD). While similar low-precision techniques have been widely used in optimization, much less is known for Markov Chain Monte Carlo (MCMC) methods. The paper develops a new quantization function to make SGLD suitable for low-precision setups and argues for its use in deep learning. + +The main concerns among the reviewers were related to the paper presentation (separation and comparison between optimization and sampling), comparison to Dalalyan-Karagulyan'19 and overview of this work, technical depth, and numerical experiments. The authors have adequately responded to the reviewers' comments and addressed them to the extent possible. However, there was ultimately not enough support to lead this paper to acceptance. + +I find low-precision sampling a worthy topic of study and the contributions of the paper are interesting. The authors are encouraged to revise the paper based on the reviewers' comments, more clearly highlight the contributions, and resubmit.",ICLR2022, +CofXT5DkYhM,1642700000000.0,1642700000000.0,1,MeMMmuWRXsy,MeMMmuWRXsy,Paper Decision,Reject,"Meta Review of Robust Robotic Control from Pixels using Contrastive Recurrent State-Space Models + +This work investigates a recurrent latent space planning model for robotic control from pixels, but unlike some previous work such as Dreamer and RNN+VAE-based World Models, they use a simpler contrastive loss for next-observation prediction. They presented results on the DM-control suite (from pixels) with distracting background settings. All reviewers (including myself) agree that this is a well-written paper, with clear explanation of their approach. The main weaknesses of the approach are on the experimental side (see review responses to author’s rebuttal by skrV and cjX3). Another recommendation from me is to strengthen the related work section to clearly position the work to previous work - there is clear novelty in this work, but this should be done to avoid confusion. The positive sign is that in the discussion phase, even the very critical cjX3, had increased their score and acknowledged the novelty from previous related work. In the current state, I cannot recommend acceptance, but I’m confident that with more compelling experiments recommended by the reviewers, and better positioning of the paper to previous work, I believe that this paper will surely be accepted at a future ML conference or journal. I’m looking forward to seeing a revised version of this paper for publication in the future.",ICLR2022, +qZJuFWGOZ2s,1610040000000.0,1610470000000.0,1,0N8jUH4JMv6,0N8jUH4JMv6,Final Decision,Accept (Spotlight),"The paper introduces convex reformulations of problems arising in the training of two and three layer convolutional neural networks with ReLU activations. These formulations allow shallow CNNs to be training in time polynomial in the number of data samples, neurons and data dimension (albeit exponential in filter lengths). These problems are regularized in different ways (L2 regularization for two layers, L1 regularization for three layers), providing new insights into the connection between architectural choices and regularization. The paper also provides experiments showing convex training of neural networks on small datasets. + +Pros and cons: + +[+] The theoretical results show that globally optimal training of shallow CNNs can be achieved in time fully polynomial, i.e., polynomial in the number of data samples, neurons and data dimension. This is significant theoretical progress, since the corresponding results for fully connected neural networks require time exponential in the rank of the data matrix. There is, however, an exponential dependence on the filter length (or the rank of the patch matrix). In particular, the computational complexity is proportional to $(nK/r_c)^{3r_c}$, where $n$ is the number of data points. While CNNs do use relatively small filters, this becomes prohibitive even when $r_c$ is a moderate constant. E.g., the experiments use filters of length $3$. Here, the comments of the reviewers about generalization may be appropriate; perhaps experiments that evaluate the performance of these networks in terms of generalization may show the disadvantages of using very small filters. + +[+] The work provides interesting and rigorous insights into the relationship between architecture and implicit regularization, with different network architectures leading to different regularizers (L1, L2, nuclear). Developing these insights for deeper architectures could lead to important insights even in situations where the convex relaxation is challenging to solve efficiently. + +[+] Although the theoretical results require overparameterization, in the sense that strong duality holds when the number of filters is large relative to the number of data points, the authors convincingly argue that this degree of overparameterization is commensurate with, or even smaller than, the degree of overparameterization present in many experimental/theoretical works in the literature. + +[+/-] The paper is mathematically precise and is written in a rigorous fashion, but is occasionally heavy on notation. The paper could be more impactful on empirical work on neural networks if it could provide more intuition about how the various forms of equivalent regularization arise from different architectures. + +All three reviewers express appreciation for the paper’s fresh insights into global optimization of shallow CNNs and the connection between architectural choices and regularization. The AC recommends acceptance. +",ICLR2021, +QkPzyN_Bu,1576800000000.0,1576800000000.0,1,S1xKd24twB,S1xKd24twB,Paper Decision,Accept (Poster),"The authors present a simple alternative to adversarial imitation learning methods like GAIL that is potentially less brittle, and can skip learning a reward function, instead learning an imitation policy directly. Their method has a close relationship with behavioral cloning, but overcomes some of the disadvantages of BC by encouraging the agent via reward to return to demonstration states if it goes out of distribution. The reviewers agree that overcoming the difficulties of both BC and adversarial imitation is an important contribution. Additionally, the authors reasonably addressed the majority of the minor concerns that the reviewers had. Therefore, I recommend for this paper to be accepted.",ICLR2020, +ASh91iFZ1,1576800000000.0,1576800000000.0,1,r1xCMyBtPS,r1xCMyBtPS,Paper Decision,Accept (Poster),"This paper proposes a method to improve alignments of a multilingual contextual embedding model (e.g., multilingual BERT) using parallel corpora as an anchor. The authors show the benefit of their approach in a zero-shot XNLI experiment and present a word retrieval analysis to better understand multilingual BERT. + +All reviewers agree that this is an interesting paper with valuable contributions. The authors and reviewers have been engaged in a thorough discussion during the rebuttal period and the revised paper has addressed most of the reviewers concerns. + +I think this paper would be a good addition to ICLR so I recommend accepting this paper.",ICLR2020, +B_JqXFdQ8H,1576800000000.0,1576800000000.0,1,rygixkHKDH,rygixkHKDH,Paper Decision,Accept (Talk),"This paper investigates the use non-convex optimization for two dictionary learning problems, i.e., over-complete dictionary learning and convolutional dictionary learning. The paper provides theoretical results, associated with empirical experiments, about the fact that, that when formulating the problem as an l4 optimization, gives rise to a landscape with strict saddle points and as such, they can be escaped with negative curvature. As a result, descent methods can be used for learning with provable guarantees. All reviews found the work extremely interesting, highlighting the importance of the results that constitute ""a solid improvement over the prior understandings on over-complete DL"" and ""extends our understanding of provable methods for dictionary learning"". This is an interesting submission on non-convex optimization, and as such of interest to the ML community of ICLR . I'm recommending this work for acceptance.",ICLR2020, +BJg2ACxfxN,1544850000000.0,1545350000000.0,1,SJLhxnRqFQ,SJLhxnRqFQ,Meta-Review,Reject,"The paper presents a method for unsupervised/semi-supervised clustering, combining adversarial learning and the Mixture of Gaussians model. The authors follow the methodology of ALI, extending the Q and P models with discrete variables, in such a way that the latent space in the P model comprises a mixture-of-Gaussians model. + +The problem of generative modeling and semi-supervised learning are interesting topics for the ICLR community. + +The reviewers think that the novelty of the method is unclear. The technique appears to be a mix of various pre-existing techniques, combined with a novel choice of model. The experimental results are somewhat promising, and it is encouraging to see that good generative model results are consistent with improved semi-supervised classification results. The paper seems to rely heavily on empirical results, but they are difficult to verify without published source code. The datasets chosen for experimental validation are also quite limited, making it it difficult to assess the strengths of the proposed method.",ICLR2019,3: The area chair is somewhat confident +0jv8EVbrESe,1610040000000.0,1610470000000.0,1,ox8wgFpoyHc,ox8wgFpoyHc,Final Decision,Reject,"The paper introduces some good ideas, but I don't think it is quite there in terms of a method to be recommended for publications. I think it is mostly reasonably written (I do not agree with the comment of a 'complete rewrite') but there are indeed some passages for improvement (for instance, an equation as y = σ−1[Q0(t, x)] + εH(t, x)], Section 2, needs comments, as the left hand side is discrete and the right hand side is continuous, unbounded). + +My main concern is the disregard for identification. Some citations are unclear (the second-to-last paragraph in Section 4 cites a few papers in identification that have little to do with the problem here, which is proxy-based. The papers cited don't even mention latent variables at all). As stated, the split in three sets of variables as suggested by Figure 1 is just an idealization: there is no reason at all they can be identified, and actually the theory where just Zc is considered impose a lot of restrictions on when we can possible identify Zc (see e.g., Miao et al. 2018, Biometrika, https://arxiv.org/pdf/1609.08816.pdf ). I know that some papers like Louizos et al. play fast and loose with identification too, but at least their Z_c structure they aim at has been studied elsewhere (like the Miao et al. paper), while here, like the Zhang et al. paper cited, may be leading researchers to an unfruitful path. This, combined with the relative modesty of the novelty, is the primary reason for my recommendation. I do think the paper can be improved in a productive way by investigating it from the point of view of either i) the theoretical justification for identification; ii) or from a more empirical direction with much experimentation on the different ways the structured latent space is capturing confounding (the target learning aspect of it is pretty much orthogonal to this).",ICLR2021, +xMTwCvuu1Vw,1610040000000.0,1610470000000.0,1,6FqKiVAdI3Y,6FqKiVAdI3Y,Final Decision,Accept (Poster),"The paper presents a decomposition of the value function in the context of CCDA. + +Most reviewers find this paper clear and well written, although one reviewer suggests to change the paper structure. + +The method presented in this paper is simple and well justified by a theoretical section. Experiments on several domains, including Starcraft 2 micro-management tasks, are supporting the claims of that section. After some reviewers pointed out that the tabular setup is not useful in practice, the authors have extended the empirical and theoretical results to a more general setup. + +Some reviewers point out that some theoretical results may not be directly related to the experimental findings. In particular, reviewer 3 does not support a central claim of the paper, and find that CDM is misleading and not provably representing the core problem. +In general, reviewer 3 does not support acceptance of this paper, but I still believe this paper should be accepted based on the other reviews (clearly in favour of acceptance). I hope that the authors and reviewer 3 will be able to further discuss and reach understanding, which hopefully should lead to fruitful results.",ICLR2021, +IKhyUAM9aew,1642700000000.0,1642700000000.0,1,9u5E8AFudRx,9u5E8AFudRx,Paper Decision,Reject,"The paper introduces GANGSTR, an agent that performs goal-directed exploration both individually and ""socially"", with suggestions from a partner. It builds a graph of different configurations of a 5-block manipulation domain, and navigates this graph. The theoretical motivations for this algorithm are solid, and the direction is interesting. However, the results are less than convincing. In particular, as was mentioned in the discussion, it is not clear how this algorithm would generalize beyond the very simple 5-block manipulation domain. While having a simple benchmark has the advantage that you can explore it in depth, it also might obscure problems with the algorithm, unless complemented by a set of other benchmarks. It therefore seems that the paper is not ready for publication yet.",ICLR2022, +Cgke5r68hw_,1610040000000.0,1610470000000.0,1,QtTKTdVrFBB,QtTKTdVrFBB,Final Decision,Accept (Spotlight)," +This paper proposes an efficient attention mechanism linear in time and space using random features. +The approach has some similarities with the simultaneous ICLR 2021 +submission ""Rethinking Attention with Performers"", with a key difference of a gating +mechanism present in this work, motivated by recency bias. This paper is a +valuable contributions to the efficient attention research topic. The reviewers +appreciate the experiments and the in-depth analysis. I recommend acceptance. + +A noteworthy concern brought up in the discussion period has to do with whether the attention mechanism dominates the feed-forward computations in the neural network, and how much this is architecture-specific. The authors provide TPU timings, but I encourage the authors to add a discussion and timings of relative performance of feed-forward vs. attention layers that covers GPU and CPU optimizers as well.",ICLR2021, +3th7PGgro8,1642700000000.0,1642700000000.0,1,kWuBTQmkO8_,kWuBTQmkO8_,Paper Decision,Reject,"This paper generalize the idea of Mixup-based data augmentation for regression. Compared to classification for which Mixup was used, the paper argues that in regression the linearity assumption only holds within specific data or label distances. The paper thus proposes MixRL to select suitable pairs using k-nearest neighbor in a batch for mixup. The selection policy is trained with meta-learning by minimizing the validation-set loss. The approach provides consistent but small improvement over mixup on several datasets. Reviewers have also suggested discussion and comparison with more baselines, such as respective method using other (lower-variant) gradient estimators (e.g., gumbel-softmax), and using local input/output kernels for data selection, etc.",ICLR2022, +HygoWg7-x4,1544790000000.0,1545350000000.0,1,B1MhpiRqFm,B1MhpiRqFm,Meta-review,Reject,"Pros: +- a method that obtains convergence results using a using time-dependent (not fixed or state-dependent) softmax temperature. + +Cons: +- theoretical contribution is not very novel +- some theoretical results are dubious +- mismatch of Boltzmann updates and epsilon-greedy exploration +- the authors seem to have intended to upload a revised version of the paper, but unfortunately, they changed only title and abstract, not the pdf -- and consequently the reviewers did not change their scores. + +The reviewers agree that the paper should be rejected in the submitted form.",ICLR2019,4: The area chair is confident but not absolutely certain +ItQvqTseuZw,1642700000000.0,1642700000000.0,1,UvNXZgJAOAP,UvNXZgJAOAP,Paper Decision,Reject,"The paper proposed a sharp attention mechanism in the context of image to sequence modeling. It seeks to build a “clear” alignment from the attention in order to improve the performance of the task. I don’t think there is a general consensus in the research community that the “clear” or “hard” attention performs better than the vanilla “soft” attention. Therefore, experiments become the key in justifying such motivation (and the model). However, as all the reviewers point out, the experiments in this paper are not satisfying. The numbers are far from the current mainstream results (Reviewer mESb). The experiments are done on relatively small datasets/tasks (Reviewer 1PRC) and the comparisons aren’t strictly speaking fair (Reviewer 5Hay). I think this alone is enough reason for the rejection of this paper. + +Additionally, on the algorithmic side, the novelty of this mechanism is not that high, given the existence of work such as the hard attention (Xu et al, 2015) and variational attention (Deng et al, 2018). It is also an unanswered question how such a mechanism can be introduced into the modern architectures that use attention (e.g. the multi-head attention). The authors did not respond to the questions of the reviewers.",ICLR2022, +rJgflZ32J4,1544500000000.0,1545350000000.0,1,HkljioCcFQ,HkljioCcFQ,metareview,Accept (Poster),"The paper proposes a new attentional pooling mechanism that potentially addresses the issues of simple attention-based weighted averaging (where discriminative parts/frames might get disportionately high attentions). A nice contribution of the paper is to propose an alternative mechanism with theoretical proofs, and it also presents a method for fast recurrent computation. The experimental results show that the proposed attention mechanism improves over prior methods (e.g., STPN) on THUMOS14 and ActivityNet1.3 datasets. In terms of weaknesses: (1) the computational cost may be quite significant. (2) the proposed method should be evaluated over several tasks beyond activity recognition, but it’s unclear how it would work. + +The authors provided positive proof-of-concept results on weakly supervised object localization task, improving over CAM-based methods. However, CAM baseline is a reasonable but not the strongest method and the weakly-supervised object recognition/segmentation domains are much more competitive domains, so it's unclear if the proposed method would achieve the state-of-the-art by simply replacing the weighted-averaging-attentional-pooling with the proposed attention mechanism. In addition, the description on how to perform attentional pooling over images is not clearly described (it’s not clear how the 1D sequence-based recurrent attention method can be extended to 2-D cases). However, this would not be a reason to reject the paper. + +Finally, the paper’s presentation would need improvement. I would suggest that the authors give more intuitive explanations and rationale before going into technical details. The paper starts with Figure 1 which is not really well motivated/explained, so it could be moved to a later part. Overall, there are interesting technical contributions with positive results, but there are issues to be addressed. +",ICLR2019,3: The area chair is somewhat confident +HkOwX1TrG,1517250000000.0,1517260000000.0,159,rkZvSe-RZ,rkZvSe-RZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The paper studies a defense against adversarial examples that re-trains convolutional networks on adversarial examples constructed to attack pre-trained networks. Whilst the proposed approach is not very original, the paper does present a solid empirical baseline for these kinds of defenses. In particular, it goes beyond the ""toy"" experiments that most other studies in this space perform by experimenting on ImageNet. This is important as there is evidence suggesting that defenses against adversarial examples that work well on MNIST/CIFAR do not necessarily transfer well to ImageNet. The importance of the baseline method studied in this paper is underlined by its frequent application in the recent NIPS competition on adversarial examples.",ICLR2018, +exjcoZ_ODkS4,1642700000000.0,1642700000000.0,1,i4qKmHdq6y8,i4qKmHdq6y8,Paper Decision,Reject,"This paper studies a learning scenario in which there exist 2 classes of examples: ""predictable"" and ""noise"". Learning theory is provided for this setting and a novel algorithm is devised that identifies predictable examples and makes predictions at the same time. A more practical algorithm is devised as well. Results are supported by experiments. + +Reviewers have raised a number of concerns (ranging from how realistic this settings is to missing references). Overall they found this work interesting and relevant to ML community and appreciate the effort that authors have put in in their thoughtful response. However, after a thorough deliberation conference program committee decided that the paper is not sufficiently strong in its current form to be accepted.",ICLR2022, +KuLk5umrx,1576800000000.0,1576800000000.0,1,Bke13pVKPS,Bke13pVKPS,Paper Decision,Reject,"This paper proposes a GA-based method for optimizing the loss function a model is trained on to produce better models (in terms of final performance). The general consensus from the reviewers is that the paper, while interesting, dedicates too much of its content to analyzing one such discovered loss (the Baikal loss), and that the experimental setting (MNIST and Cifar10) is too basic to be conclusive. It seems this paper can be so significantly improved with some further and larger scale experiments that it would be wrong to prematurely recommend acceptance. My recommendation is that the authors consider the reviewer feedback, run the suggested further experiments, and are hopefully in the position to submit a significantly stronger version of this paper to a future conference.",ICLR2020, +5A9or_edaw,1642700000000.0,1642700000000.0,1,bgAS1ZvveZ,bgAS1ZvveZ,Paper Decision,Reject,"In the end, this paper essentially proposes a minor variation on an idea that 1) has been published before, 2) is not used extensively at all, and 3) seems applicable (in its current form) only on deterministic environments. This, without additional insights or analyses, seems too marginal a contribution for acceptance. + +The paper is not poorly executed, and the authors engaged well during discussion, for which I would like to thank them. I would like to encourage the authors to consider the reviewers comments, and in particularly perhaps answer more clearly and directly what they are adding to the literature. It could be that there is something particularly insightful in the detailed differences with past work, but this has not become sufficiently clear to me during this discussion phase.",ICLR2022, +GcuNvXL4OL,1610040000000.0,1610470000000.0,1,3eNrIs9I78x,3eNrIs9I78x,Final Decision,Reject,"This paper proposes a method to update the learning rate dynamically by increasing it in areas with higher sharpness and decreasing it otherwise. This would the hopefully leads to escaping sharp valleys and better generalization. Authors further provide some related theoretical results and several experiments to show effectiveness of their models. + +All reviewers find the proposed method well-motivated, novel and interesting. The paper is well-written and easy to follow. However, both theoretical results and empirical evaluations could be improved significantly: + +1- The theoretical results as is provides little to no insight about the algorithm and unfortunately, authors do not discuss the insights from the theoretical results adequately in the paper. See for eg. R1's comments about this. + +2- Given that the theoretical results are not strong, the thoroughness in empirical evaluation is important and unfortunately the current empirical results is not convincing. In particular, there are two main areas to improve: + +a) Based on the Appendix D, the choice of hyper-parameters seem to be made in an arbitrary way and all models are forced to use the same hyper-parameters. This way, the choice of hyper-parameters could potentially favor one method over the other. A more principled approach is to tune hyper-parameters separately for each method. + +b) It looks like the choice of #epochs has been made in an arbitrary way. For all experiments, it would be much more informative to have a figure similar to the left panel of Fig. 4 but with much more #epochs so that reader can clearly see if the benefit of SALR would disappear with longer training or not. + +c) Based on the current results, SALR's performance is on par with that of Entropy-SGD on CIFAR-100 and WP and there is a very small gap between them on CIFAR-10 and PTB. I highly recommend adding ImageNet results to make the empirical section stronger. The other option is to compare against other methods in fine-tuning tasks. That is, take a checkpoint of a trained model on ImageNet and compare SALR with other methods on several fine-tuning tasks. + +Given the above issues, my final recommendation is to reject the paper. I want to thank authors for engaging with reviewers during the discussion period and adding several empirical results to the revision. I hope authors would address the above issues as well and resubmit their work.",ICLR2021, +5FFyZU4_R4U,1642700000000.0,1642700000000.0,1,P1QUVhOtEFP,P1QUVhOtEFP,Paper Decision,Accept (Poster),"This paper proposes loss functions to encode topological priors during data embedding, based on persistence diagram constructions from computational topology. The paper initially had some expositional issues and technical questions, but the authors did an exceptional job of addressing them during the rebuttal period----nearly all reviewers raised their scores (or intended to but didn't update the numbers on their original reviews). + +The AC is willing to overlook some of the remaining questions. For example, concerns that topology isn't well known in the ICLR community (8muq) are partially addressed by the improved exposition (and it's OK to have technically sophisticated papers so long as some reviewers were able to evaluate them). And, future work can address scalability of the algorithm, which indeed does seem to be a challenge here (ey6b). + +In the final ""camera ready,"" the authors are encouraged to address any remaining comments and to consider adding experiments/discussion regarding scalability to larger datasets.",ICLR2022, +EuhsP8HLY4v,1610040000000.0,1610470000000.0,1,LtgEkhLScK3,LtgEkhLScK3,Final Decision,Reject,"The paper studies mixture of expert policies for reinforcement learning agents, focusing on the problem of policy gradient estimation. The paper proposes a new way to compute the gradient, apply it to two reinforcement learning algorithms, PPO and SAC, and demonstrate it in continuous MuJoCo environments, showing results that are comparable to or slightly exceeds unimodal policies. The main issue raised by multiple reviewers is novelty. Mixture of expert models have been widely studied in the context of reinforcement learning, and while the paper proposes a new method for the gradient computation, a more suitable format, as pointed out by Reviewer 2, could be to ground the paper around the proposed gradient estimator, and compare, both analytically and empirically, it to existing alternatives. Therefore, I recommend rejecting this submission.",ICLR2021, +iLf3K8pue,1576800000000.0,1576800000000.0,1,SkeBBJrFPH,SkeBBJrFPH,Paper Decision,Reject,"This paper suggests that datasets have a strong influence on the effects of attention in graph neural networks and explores the possibility of transferring attention for graph sparsification, suggesting that attention-based sparsification retains enough information to obtain good performance while reducing computational and storage costs. + +Unfortunately I cannot recommend acceptance for this paper in its present form. Some concerns raised by the reviewers are: the analysis lacks theoretical insights and does not seem to be very useful in practice; the proposed method for graph sparsification lacks novelty; the experiments are not thorough to validate its usefulness. I encourage the authors to address these concerns in an eventual resubmission. +",ICLR2020, +BkeRa8xzeN,1544840000000.0,1545350000000.0,1,BygqBiRcFQ,BygqBiRcFQ,Unanimously accept for ICLR publication.,Accept (Poster),"The paper gives an extension of scattering transform to non-Euclidean domains by introducing scattering transforms on graphs using diffusion wavelet representations, and presents a stability analysis of such a representation under deformation of the underlying graph metric defined in terms of graph diffusion.  + +Concerns of the reviewers are primarily with what type of graphs is the primary consideration (small world social networks or point cloud submanifold samples) and experimental studies. Technical development like deformation in the proposed graph metric is motivated by sub-manifold scenarios in computer vision, and whether the development is well suitable to social networks in experiments still needs further investigations. + +The authors make satisfied answers to the reviewers’ questions. The reviewers unanimously accept the paper for ICLR publication.",ICLR2019,5: The area chair is absolutely certain +in1UlSwa-l,1642700000000.0,1642700000000.0,1,oapKSVM2bcj,oapKSVM2bcj,Paper Decision,Accept (Oral),"All reviewers agree that this paper is a useful and valuable contributions to ML engineering. + - insightful analysis .. highly user friendly operator design + - ""useful and I can see it having large adoption in the community of scientific computing"" ... "" + - ""Personally I tend to buy these advantages of einops"" ... ""However, there is a lack of solid empirical study to validate the effectiveness and efficiency of the design"" + - ""a useful and appealing new coding tool."" + +The negative reviewers appear fixated on the (true) observation that the paper does not look like a conventional ICLR paper, thati it ""reads like a technical blog"", and ""lacks rigour"". +I belive it is fair and measured to state that these reviews may be considered to exhibit aspects of gatekeeping: requiring more ""mathiness"" that does not help the paper, or more ""rigour"" through user studies that are in fact less valuable than the reviewers' own observations ""I could see myself..."", ""I tend to buy..."". + +This is a paper about design, not about models or algorithms (although the algorithmic work is good). It is about the design of tools that we all use, and about the decisions and thought processes that led to that design. A reviewer decries ""many non-rigorous claims"". These are claims about the usability of existing systems, and mostly appear in the discussion and footnotes, as the authors note in rebuttal. Of course, one could have run user studies to back up each claim, but I am just as convinced by the examples shown in the paper. It matters not to me what some users corralled into a user study thought. It matters what I and my colleagues will think, and I am now sure to recommend einops to colleagues. I would not have met it had the paper not been submitted to ICLR, and hence I am certain it should be accepted, so more can see that we care not just about mathiness, but actually enabling progress in our field. + +The job of a conference like ICLR is to expose researchers and practitioners in machine learning to ideas and techniques that may advance their research and practice. Programming, and the translation of mathematical ideas to efficient computer code, are fundamental to all of machine learning, and hence programming models are very much suitable for presentation to an ICLR audience. I am certain that this paper, and the technology it describes, are more important to ICLR readers than knowing that if module A is co-trained with module B, then combined with compression C, the SOTA on some arbitrary benchmark is increased by 0.31 +/- 1.04. + +Reviewer gRMH says ""there is no code"", but the code has been in the open for three years; it is an accident of our misapplication of the principles of blind review that the reviewer felt they could not search for the code, and that the authors felt they could not bring to bear the evidence that three years of real-world usage have brought. + +Reviewers say the work is just an extension of einsum, while noting that the extension is useful and nontrivial. Yes, it is an extension, and the paper's examples show how it yields more compact code that is also more readable and maintainable. + +I could add more examples, but in short, I tend to side with the authors' response at almost every point. At the same time, the final version of the paper has been strengthened by this dialectic, and I expect further strenghtening through exposure to the ICLR community. + +To the authors: Listing 1 is useful, but should be in an appendix. Instead, add examples of ellipses on P5, and show more inline examples in general. The paper would be strengthened by another pass over the English -- after the decision is made I would be happy to volunteer to help.",ICLR2022, +s603BDuqe7,1610040000000.0,1610470000000.0,1,rd_bm8CK7o0,rd_bm8CK7o0,Final Decision,Reject,"This paper proposed Q-value-weighted regression approach for improving the sample efficiency of DRL. It is related to recent papers on advantage-weighted regression methods for RL. The approach is interesting, intuitive, and bears merits. Developing a simple yet sample-efficient algorithm using weighted regression would be a critical contribution to the field. The work has the potential to make an impact, if it has all the necessary ingredients of a strong paper. + +However, reviewers raised a few issues that have to be addressed before the paper can be accepted. As some reviewers pointed out, there seem to be unaddressed major issues from previous submissions. Novelty appears limited, especially because the proposed approach is very similar to recent works (e.g., AWR). The experiment section lacks comparison to recent similar algorithms, and the available comparisons appear to be not strong enough to justify merits of the proposed algorithm. Theorem 1 requires an unrealistic state-determines-action assumption for the replay buffer. Although the authors made an effort to justify this assumption, it remains very problematic and rules out most randomized/exploration algorithms.",ICLR2021, +nKybGpU3_5,1576800000000.0,1576800000000.0,1,SylzhkBtDB,SylzhkBtDB,Paper Decision,Accept (Poster),"Many existing approaches in multi-task learning rely on intuitions about how to transfer information. This paper, instead, tries to answer what does ""information transfer"" even mean in this context. Such ideas have already been presented in the past, but the approach taken here is novel, rigorous and well-explained. + +The reviewers agreed that this is a good paper, although they wished to see the analysis conducted using more practical models. + +For the camera ready version it would help to make the paper look less dense. +",ICLR2020, +ibZvP_44y5K,1610040000000.0,1610470000000.0,1,5B8YAz6W3eX,5B8YAz6W3eX,Final Decision,Reject,"Dear authors, + +I took your concerns into account, and I also understand the whole crazy situation around the COVID-19. Many of the reviewers have families (e.g., in US, many kids are now homeschooled, and there are no good daycare solutions as well). I do not plan to list all the good parts of the paper and list weaknesses that are already mentioned and visible to you. Hence, let me focus on my concerns about this paper (and I hope you could find them interesting and they will help you to improve your paper). + ++ I personally find the use of 2nd order method in DNN a way to improve many inefficiencies of ADAM/SGD, .... and using diagonal scaling is one way to do it. + +-- I personally find some sections not very motivated and explanatory. E.g. Section 3.2 is just telling half of the story and is missing some details to give the reader the full understanding. + +-- The fact that B_t is not necessary >0, it makes intermediate sense to use some kind of \max\{B_i, \sigma\} to have the ""scaling"" to be $\succ 0$. +Note that there are also SR1 methods that would guarantee the matrix to be not necessary pd. + +-- Your main motivation was non-convex problems, but the only theorem in the main paper was for convex loss only, right? In this case, I guess there is no issue with B_t to have some coordinates <0, right? + +Overall, I find the topic interesting and would like to see an updated paper in some of the top ML venues, but right now I cannot recommend it for acceptance! + +",ICLR2021, +S1ePcf8rg4,1545070000000.0,1545350000000.0,1,rJxcHnRqYQ,rJxcHnRqYQ,unconvincing,Reject,"This paper proposed a LBPNet for character recognition, which introduces the LBP feature extraction into deep learning. Reviewers are confused on implementation and not convinced on experiments. The only score 6 reviewer is also concerned ""Empirically weak, practical advantage wrt to literature unclear"". Only evaluating on MNIST/SVHN etc is not convincing to demo the effectiveness of the proposed method.",ICLR2019,4: The area chair is confident but not absolutely certain +QUTbISHtkeA,1610040000000.0,1610470000000.0,1,gBpYGXH9J7F,gBpYGXH9J7F,Final Decision,Reject,"The discussion with the expert reviewers reached the consensus that the paper lacks in novel *technical* contributions, and as such it does not meet the bar for a theory-oriented paper at ICLR.",ICLR2021, +aSgIS9tbTLT,1642700000000.0,1642700000000.0,1,uVXEKeqJbNa,uVXEKeqJbNa,Paper Decision,Accept (Poster),"This paper introduces the Stiffness-aware neural network (SANN) for improving numerical stability in Hamiltonian neural networks. To this end, the authors introduce the stiffness-aware index (SAI) to classify time intervals into stiff and non-stiff portions, and propose to adapt the integration scheme accordingly. + +The paper initially received three weak accept and one weak reject recommendations. The main limitations pointed out by reviewers relate to missing references from the literature, assumptions behind the proposed approach (e.g. structure of the mass matrix, separable Hamiltonian), and clarifications on experiments including additional baselines and hyper-parameter settings. +The rebuttal did a good job in answering reviewers' concerns: RiTTU increased his rating to a clear accept, and RMYXe increased his rating to weak accept. +Eventually, there is a consensus among reviewers to accept the paper. + +The AC's own readings confirmed the reviewers' recommendation. The method is straightforward yet effective, and the paper is well written. The effectiveness of the proposed approach is shown in different contexts. Since several complex systems exhibit chaotic characteristics, the paper brings a meaningful contribution to the community.",ICLR2022, +0WutUIw9X-O,1642700000000.0,1642700000000.0,1,x3F9PuOUKZc,x3F9PuOUKZc,Paper Decision,Reject,"The paper describes an interesting approach to predicting continuous closed surface segmentations from discretized image data using a wavelet output representation. This is an interesting idea with a lot of potential. Unfortunately, the paper currently suffers from major weaknesses which we encourage the authors to address. + +1. While the idea of generating a continuous output representation of a segmentation is technically interesting, what are some applications where this is actually useful? +2. The ground truth annotations in the datasets evaluated are implicitly quite variable. The annotations are not made to sub-pixel precision and there is likely to be large multi-pixel variability across different annotations of the same image. This makes a poor problem to demonstrate the need and potential of a sub-pixel accurate segmentation algorithm. + +I encourage the authors to find applications and datasets where reliable sub-pixel ground truth annotations exist, and to demonstrate that their approach to generating sub-pixel segmentations is superior to appropriate baselines which also predict sub-pixel segmentations.",ICLR2022, +ZS3Tk2gIwv,1610040000000.0,1610470000000.0,1,Rd138pWXMvG,Rd138pWXMvG,Final Decision,Accept (Poster),"This is an interesting, controversial paper that contributes to an ongoing debate in Bayesian deep learning. + +Bayesian inference with artificially “cooled” posteriors (e.g., trained with Langevin dynamics with down-weighted noise) was recently found to outperform over both point estimation and fully-Bayesian treatments (Wenzel et al., 2020). This paper proposes a new explanation for these observed phenomena in terms of a data curation mechanism that popular benchmark data sets such as CIFAR underwent. The analysis boils down to an evidence overcounting/undercounting argument and takes into account that curated data sets only contain data points for which all labelers agreed on a label. The authors claim that, when modeling the true generative process of the data, the cold posterior effect (partially) vanishes. + +The paper is well-written and provides a consistent analysis by modeling the data curation mechanism in terms of an underlying probabilistic graphical model of the labeling mechanism. Unfortunately, several observed phenomena of (Wenzel et al., 2020) remain unexplained by the theoretical arguments, e.g., the fact that “very cold” (T --> 0) posteriors don’t hurt performance, or the observation that the optimal temperature seems to depend on the model capacity. While the proposed explanation doesn’t capture the full picture (upon which both authors and reviewers agree), the paper’s focus on the data curation process, supported extensive experiments, gives a partial explanation and provides an interesting perspective that will spur further discussion and should be of broad interest to the Bayesian deep learning community. +",ICLR2021, +By5u7yaHf,1517250000000.0,1517260000000.0,174,HkNGsseC-,HkNGsseC-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The paper received scores of 8 (R1), 6 (R2), 6 (R3). R1's review is brief, and also is optimistic that these results demonstrated on ConvACs generalize to real convnets. R2 and R3 feel this might be a potential problem. R2 advocates weak accept and given that R1 is keen on the paper, the AC feels it can be accepted. + +",ICLR2018, +3LF_oNwwOU,1610040000000.0,1610470000000.0,1,CGFN_nV1ql,CGFN_nV1ql,Final Decision,Reject,"This paper investigates a non-attentive architecture of Tacotron 2 for TTS where the attention mechanism is replaced by a duration predictor. The authors show that this change can significantly improve the robustness. In addition, the authors propose two evaluation metrics for TTS robustness, namely, unaligned duration ratio (UDR) and word deletion rate (WDR), which appear to be novel to the TTS community. The proposed non-attentive architecture yields good MOS scores in the experiments. + +Overall, the paper is well written but the reviewers commented on the technical novelty of the work as it is essentially an improvement within the Tacotron 2 framework. There is also a lack of comparative study with other existing frameworks with similar techniques. Although the authors put together a detailed rebuttal to address the comments, in the end the above two major concerns remain. ",ICLR2021, +B1k73zLdx,1486400000000.0,1486400000000.0,1,ryZqPN5xe,ryZqPN5xe,ICLR committee final decision,Reject,"The method was developed to provide an alternative for fine-tuning by augmenting a pre-trained network with new capacity. The differential from other related methods is low, and the evaluated baselines were not well-chosen, so this is not a strong submission.",ICLR2017, +PVyB_HNMUP-,1642700000000.0,1642700000000.0,1,w7Nb5dSMM-,w7Nb5dSMM-,Paper Decision,Reject,"The reviewers agree that this is an interesting treatise on some relationships between SGD fine tuning and evolutionary algorithms. All reviewers have requested some experimental validation or demonstration of the theory developed in this paper, which is not currently included. Whilst the computational requirements (and time required) may be long, this will significantly assist the many readers of the paper and save them from having to run such an experiment many times themselves. The reviewers provided a number of suggestions of how this might be done. The reviewers also highlighted a number of specific improvements that can be made to the writing of the paper.",ICLR2022, +rkx2ElI-lN,1544800000000.0,1545350000000.0,1,ryxDjjCqtQ,ryxDjjCqtQ,"Interesting work, with unclear motivation and relation to previous work",Reject,"The paper studies RL based on data with confounders, where the confounders can affect both rewards and actions. The setting is relevant in many problems and can have much potential. This work is an interesting and useful attempt. However, reviewers raised many questions regarding the problem setup and its comparison to related areas like causal inference. While the author response provided further helpful details, the questions remained among the reviewers. Therefore, the paper is not recommended for acceptance in its current stage; more work is needed to better motivate the setting and clarify its relation to other areas. + +Furthermore, the paper should probably discuss its relation to (1) partially observable MDP; and (2) off-policy RL.",ICLR2019,4: The area chair is confident but not absolutely certain +60wBNwWgB,1576800000000.0,1576800000000.0,1,HylA41Btwr,HylA41Btwr,Paper Decision,Reject,The paper is proposed a rejection based on majority reviews.,ICLR2020, +dISKwXCdav9I,1642700000000.0,1642700000000.0,1,bVkRc9NDHcK,bVkRc9NDHcK,Paper Decision,Reject,"This paper presents a steganographic approach called Variable Length Variable Quality Audio Steganography (VLVQ) that encodes variable length audio data inside images with varying quality trade-offs. However, according to the reviewers, the proposal made in this paper is not novel enough, there are many details missing in the paper, and the experimental study is far from comprehensive and conclusive. Afte the reviewers provided their comments, the authors did not submit their rebuttals. Therefore, as a result, we do not think the paper is ready for publication at ICLR.",ICLR2022, +NacLPIr1fbd,1642700000000.0,1642700000000.0,1,VNXYZjGcsty,VNXYZjGcsty,Paper Decision,Reject,"The paper proposes a new approach for linked-view clustering based on chained non-negative matrix factorization. Reviewers highlighted that paper proposes a novel and interesting approach to an important problem. However, reviewers raised also significant concerns regarding clarity of presentation (motivation, general approach, contributions, scope) as well as the experimental evaluation. Reviewers raised also concerns regarding justification of the approach being a novel paradigm. After author response and discussion, all reviewers and the AC agree that the paper is not yet ready for publication due to the aforementioned issues.",ICLR2022, +oHNuQ1t2SM,1576800000000.0,1576800000000.0,1,B1x9ITVYDr,B1x9ITVYDr,Paper Decision,Reject,"After reading the author's response, all the reviwers still think that this paper is a simple extension of gradient masking, and can not provide the robustness in neural networks.",ICLR2020, +VlhYRE9-3DW,1642700000000.0,1642700000000.0,1,0uZu36la_y4,0uZu36la_y4,Paper Decision,Reject,"The paper looks at the worst-class adversarial error for multi-class classification problems. The question is given a certain level of adversarial error on average, is it possible that some classes have adversarial error significantly worse than average? And if so, is this a problem? I agree with the authors that there are applications where such an imbalance could be problematic; other than the examples provided by the authors I can also think of this being important from a point of view of fairness, depending on what exactly the class labels represent. The reviewers have raised the question of low accuracies reported in the empirical results compared to the state of the art on those datasets for adversarial learning. I share these concerns -- especially it's worth understanding whether more accurate models also have such an imbalance, or whether this imbalance is a result of incomplete training or models that are not representationally powerful enough. While I agree with the authors that 'state of the art' results' are not required for ICLR submissions, especially those making conceptual contriubtions, in this case I think further experiments may be needed in addition to addressing the other questions raised in the reviews. The authors acknowledge that they have made significant revisions in response to the reviews, but I think that would require a fresh review cycle.",ICLR2022, +ryX4hfUux,1486400000000.0,1486400000000.0,1,H1fl8S9ee,H1fl8S9ee,ICLR committee final decision,Accept (Poster),"Despite it's initial emphasis on policy search, this paper is really about a learning method for Bayesian neural networks, which it then uses in a policy search setting. Specifically, the authors advocate modeling a stochastic system using a BNN trained to minimize alpha-divergence with alpha=0.5 (this involves a great deal of approximation to make computationally tractable). They then use this in the policy search setting. + + The paper is quite clear, and proposes a nice approach to learning BNNs. The algorithmic impact honestly seems fairly minor (the idea of using different alpha-divergences instead of KL divergence has been considered many times in the content of general variational approximations), but combining this with the policy search setting, and reasonable examples of industrial control, together these all make this a fairly strong paper. + + Pros: + + Nice derivation of alternative variational formulation (I'll still call it variational even though it uses alpha-divergence with alpha=0.5) + + Good integration into policy search setting + + Nice application to industrial control systems + + Cons: + - Advance from the algorithmic standpoint seems fairly straightforward, if complicated to make tractable (variational inference using alpha-divergence is not a new idea) + - The RL components here aren't particularly novel, really the novelty is in the learning",ICLR2017, +kofGORi9z0,1576800000000.0,1576800000000.0,1,ryx1wRNFvB,ryx1wRNFvB,Paper Decision,Accept (Poster),"This paper proposes to explore nonnormal matrix initialization in RNNs. Two reviewers recommended acceptance and one recommended rejection. The reviewers recommending acceptance highlighted the utility of the approach, its potential to inspire future work, and the clarity and quality of writing and accompanying experiments. One reviewer recommending weak acceptance expressed appreciation of the quality of the rebuttal and that their concerns were largely addressed. The reviewer recommending rejection was primarily concerned with the novelty of the method. Their review suggested the inclusion of an additional citation, which was included in a revised version for the rebuttal but not with a direct comparison of results. On the balance, the paper has a relatively high degree of support from the reviewers, and presents an interesting and potentially useful initialization in a clear and well-motivated way.",ICLR2020, +SJgv4LxQlN,1544910000000.0,1545350000000.0,1,B1lwSsC5KX,B1lwSsC5KX,Meta-Review,Reject,"This paper studies memorization properties of convnets by testing their ability to determine if an image/set of images was used during training or not. The experiments are reported on large-scale datasets using high-capacity networks. + +While acknowledging that the proposed model is potentially useful, the reviewers raised several important concerns that were viewed by AC as critical issues: +(1) more formal justifications are required to assess the scope and significance of this work contributions -- see very detailed comments by R2 about measuring networks capacity to memorize and the role of network weights and depth as studied in MacKay,2002. In their response the authors acknowledged they didn’t take into account network weights and depth but strived at an empirical evaluation scenario. +(2) writing and presentation clarity of the paper could be substantially improved – see very detailed comments by R3 and also R2; +(3) empirical evaluations and effect of the negative set used for training are not well explained and analysed (R2, R3). + +AC can confirm that all three reviewers have read the author responses and have contributed to the final discussion. +AC suggests, in its current state the manuscript is not ready for a publication. We hope the reviews are useful for improving and revising the paper. +",ICLR2019,5: The area chair is absolutely certain +XDvh4JSfLuT,1610040000000.0,1610470000000.0,1,Hw2Za4N5hy0,Hw2Za4N5hy0,Final Decision,Reject,"The authors’ feedback has not fully addressed the reviewers’ concerns and the reviewers think that the paper is not ready for the publication. The authors should consider the following issues for the future submission: + +1) The concern from Reviewer 1: if a local device receives very little data but its data come from a mixture component with large weight, its gradient will likely be biased (due to the lack of data) but will still dominate others (due to its large mixture weight). + +2) Numerical experiments are not consistent with theoretical results. The theory is for convex but experiments are with non-convex loss. The response from authors does not resolve this issue. + +3) Notation is confusing and changing throughout. We strongly suggest the authors revise carefully this and make it clear. + +Although the experimental results are potential, we would like the authors to revise it carefully by addressing the reviewers’ concerns and further improve it by considering theoretical results for non-convex in order to submit to the next venues. +",ICLR2021, +xabwLlZC05w,1610040000000.0,1610470000000.0,1,GtCq61UFDId,GtCq61UFDId,Final Decision,Reject,The reviewers still have several concerns about the paper after the author feedback stage: the novelty of the paper is not sufficient; the experimental results are not very encouraging. We encourage the authors fixing these issues in the next revision.,ICLR2021, +Hy_97WZ-ttJ,1642700000000.0,1642700000000.0,1,NPJ5zWk_IQj,NPJ5zWk_IQj,Paper Decision,Reject,"The paper proposes an algorithm for unsupervised skill transfer between robots with different kinematics. Integral to the approach is the idea that while the robots differ, they may use similar strategies to perform similar tasks. Without access to paired data, the paper formulates the problem of learning correspondences between robots as one of matching skill distributions across robots. Drawing insights from work in machine translation, the paper proposes an unsupervised objective that encourages the model to learn to align the distribution over skill sequences. Experimental results demonstrate the ability to use learned skill correspondences to support transfer across different robots in different domains. + +As several reviewers point out, the problem of learning to transfer skills across robots with different kinematic designs from video demonstrations raises a number of interesting challenges that are relevant to the robotics and learning communities. Among them, a fundamental contribution of the paper is the ability to learn skill correspondences in an unsupervised manner based on unlabeled demonstrations. The approach by which this is achieved (i.e., using distribution matching) is sensible and clearly described. While the reviewers agree on the significance of the research problem, they raise a few key concerns regarding the initial submission. Among them are questions about the nature and extent of the domain variations that the method can handle (e.g., between robots with different degrees-of-freedom); the significance of the contributions; and how this work is situated in the context of existing approaches to robot skill learning. Several reviewers question the definition of morphological variation and comment that these variations may violate the assumption that task strategies are similar across designs. The authors provided detailed feedback to each of the reviews, which helps to clarify several of these concerns. Unfortunately, several reviewers did not respond to multiple requests to update their reviews. The one who did decided to maintain their score. + +The paper tackles an important problem in robot learning and the work has the potential to have significant impact on the way in which robots acquire new skills. The original submission together with the author responses suggest that there is are solid contributions here. The authors are strongly encouraged to revisit the discussion of the approach to more clearly convey the novelty of the approach and to consider experimental evaluations that better support these claims.",ICLR2022, +rkx9icGgg4,1544720000000.0,1545350000000.0,1,Bkf1tjR9KQ,Bkf1tjR9KQ,limited novelty,Reject,"The paper describes an architecture search method which optimises multiple objectives using a genetic algorithm. All reviewers agree on rejection due to limited novelty compared to the prior art; while the results are solid, they are not ground-breaking to justify acceptance based on results alone. +",ICLR2019,5: The area chair is absolutely certain +Ei-OeJhS67b,1610040000000.0,1610470000000.0,1,nCY83KxoehA,nCY83KxoehA,Final Decision,Reject,"The paper proposes a method for using multiple word embeddings in structured prediction tasks. The reviewers shared the concerns that the method seems rather specific to this use case and the empirical improvements do not justify the complexity of the approach. They also questioned the definition of the method as ""architecture search"" vs a particular ensembling method. +Finally, I think the authors should provide more discussion of why using all the embeddings (in the sense of bias-variance tradeoffs). + +",ICLR2021, +2A8xLMkbDxi,1610040000000.0,1610470000000.0,1,fgX9O5q0BT,fgX9O5q0BT,Final Decision,Reject,"This paper studies the role of “noise injection” in GANs with tools from Riemannian geometry, and derives a new noise injection approach that aims to learn a fuzzy coordinate system to model non-Euclidean geometry. The new noise injection approach is shown to improve over StyleGANv2 noise injection on lower-resolution 128x128 FFHQ, LSUN, and 32x32 CIFAR-10 images. + + +Some reviewers found the experimental results a “considerable improvement on DCGAN and StyleGANv2” (R3), “extensive and convincing” (R2), while others had concerns around the experimental setup using lower resolution images (R1, R4). While reviewers were mostly positive about the experimental wins of the paper, there was confusion (R3) and several concerns (R4) around the theory and the relationship between the theory and the practical noise injection algorithm. I additionally had several concerns around the presentation and relation to prior work on generative models. Thus in the current state I cannot recommend this paper for acceptance. Below I highlight concerns that should be addressed in future revisions. + + +1. My biggest concern is the tremendous gap between the theoretical claims and the practical implementation. When training a GAN with the new form of noise injection, does it learn the skeleton and fuzzy equivalence relationships you claim? This paper is missing any kind of toy experimenting showing that training a GAN with fuzzy reparameterization discovers these relationships or coordinates. Such an experiment would greatly strengthen the paper and help to answer the question of why this new method works (i.e. it’s not just more parameters, a slightly better architecture, or better hyperprameters as mentioned by R3 and R4). There’s also no discussion of what happens theoretically when you have multiple layers of fuzzy reparameterization, and the claims that StyleGAN2’s noise injection limits to Euclidean geometry is false in this case (and thus StyleGAN2’s noise injection can also overcome the “adversarial dimension trap”). + +2. Theoretical setting: As mentioned by R4, there is much prior work on the difficulties in fitting a lower-dimensional model manifold to a higher-dimensional data manifold (e.g. WGAN). Theorem 1 highlights the impossibility of exactly fitting the data manifold with (smooth) neural networks, but the resulting solutions of increasing the dimensionality of the latent space is well-known and commonly used (e.g. StyleGAN). This paper also doesn’t discuss the alternative of *approximately* fitting the data manifold with a lower-dimensional structure, which is what is often studied in practice. + + +3. Clarity: The term “noise injection” is overloaded in the literature, and the current presentation of the paper does not sufficiently describe the method. There’s also no discussion of “instance noise” that is another solution to this problem that adds noise to inputs of the discriminator to yield finite f-divergences (Sonderby et al., 2016, Roth et al., 2017). The work on instance noise is very related to the approach here, but only adds noise to the output of the generator, not at all levels. +There's also no discussion of how adding noise is just expanding the generative model with additional latent variables, a standard approach that is often discussed in the context of hierarchical generative models. The authors mention the relation to reparameterization trick in VAEs, but argue it is doing something fundamentally different. However, modern VAE architectures (IAF-VAE, Very Deep VAE), use a very similar form of modulation at multiple levels in the hierarchy. + + +4. Experiments: There are no error bars in experimental results, and many results are presented in a new experimental setting defined by the authors (lower resolution than prior work even if using prior code). Rerunning experiments in more standard settings on full resolution images would greatly improve the confidence that the new noise injection strategy is effective. +",ICLR2021, +xdlA-kgqOmY,1642700000000.0,1642700000000.0,1,g5tANwND04i,g5tANwND04i,Paper Decision,Accept (Poster),"The paper presents an asymptotic analysis of the convergence of the last iterate of mSGD and Adagrad. This result extends previous work providing stronger results under weaker assumptions. Even if these topics received less attention from the community, they are key problems in stochastic optimization. + +The reviewers and I had several doubts about the proofs and relation with previous work. However, the rebuttal phase essential acted as a minor revision process. In fact, the authors fixed all the issues, convincing the reviewers (and me) that the results are novel, correct, and interesting. + +For the above reasons, I recommend the acceptance of this paper.",ICLR2022, +Y77alNuL58Q,1642700000000.0,1643410000000.0,1,NrkAAcMpRoT,NrkAAcMpRoT,Paper Decision,Reject,"This was a somewhat unusual submission in that the authors tried to motivate their paper by pointing to a separate anonymous manuscript. However, the authors didn't seem to want to confirm they would merge the manuscripts when asked about this. It was thought that in fairness the submitted manuscript should be judged on its own. After discussion, it was agreed that the submitted paper on its own, did not generate enough enthusiasm to merit acceptance.",ICLR2022, +BJk6z16HM,1517250000000.0,1517260000000.0,25,B1zlp1bRW,B1zlp1bRW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper is generally very strong. I do find myself agreeing with the last reviewer though, that tuning hyperparameters on the test set should not be done, even if others have done it in the past. (I say this having worked on similar problems myself.) I would strongly encourage the authors to re-do their experiments with a better tuning regime.",ICLR2018, +HyuCifIdx,1486400000000.0,1486400000000.0,1,S1j4RqYxg,S1j4RqYxg,ICLR committee final decision,Reject,"The approach/problem seems interesting, and several reviewers commented on this. However, the experimental evaluation is quite preliminary and the paper would be helped a lot with a connection to a motivating application. All of the reviewers pointed out that the work is not written in the usual in the scope of ICLR papers, and putting these together at this time it makes sense to reject the paper.",ICLR2017, +5O5cXPy72qE,1642700000000.0,1642700000000.0,1,kOtkgUGAVTX,kOtkgUGAVTX,Paper Decision,Reject,"The paper addresses the question of skill discovery in reinforcement learning: can we (without supervision) discover behaviors so that later (when supervision is available via a reward signal) we can learn faster? The paper proposes a new contrastive loss that an agent can optimize for this purpose, based on a decomposition of mutual information between skills and transitions. The reviewers praised the extensive experimental evaluation and good empirical results, as well as the analysis of failure modes of related algorithms. + +Unfortunately, there appeared to be errors in the derivation and implementation. (These include typos in derivations that made them difficult to follow, as well as uploaded code that didn't match the experimental results.) While the authors claim to have fixed all of them, the reviewers were not all completely convinced by the end of the discussion period. In any case, these errors caused confusion during review; so, whether the errors are fixed or not, it seems clear that there hasn't been time for a full evaluation of the corrected derivations and code. For this reason, it seems wise to ask that this paper be reviewed again from scratch before being published.",ICLR2022, +RIr5ppA-phw,1642700000000.0,1642700000000.0,1,0q0REJNgtg,0q0REJNgtg,Paper Decision,Reject,"One of the four reviewers failed to engage in discussion, two acknowledged the author's response and paper revision without changing their scores, and one reviewer engaged in considerable discussion resulting in a score increase to a weak accept. No reviewer gave the paper a strong endorsement. I do appreciate the large effort that the authors put into revising their paper and addressing reviewers concerns. However, major post-submission revision puts an inappropriate burden on reviewers. In any case, there is not strong support for this paper even from the one heavily engaged reviewer.",ICLR2022, +9Jdn9nq4Ui1,1610040000000.0,1610470000000.0,1,PObuuGVrGaZ,PObuuGVrGaZ,Final Decision,Accept (Poster)," +This paper studies the effect of label smoothing on knowledge-distillation. A previous work on this topic (Muller et al.) has claimed that label smoothing can hurt the performance of the student model in knowledge-distillation. The rationale behind this argument is that label smoothing erases information encoded in the labels. This work shows that such claimed effect does not necessarily happen. Specifically, by a comprehensive study on image classification, binary neural networks, and neural machine translation, the authors show that label smoothing can be compatible with knowledge distillation. However, they conclude that label smoothing will lose its effectiveness with long-tailed distribution and increased number of classes. + +Overall ratings of this paper are all on the positive side, and R2 finding this paper an important step toward understanding the interaction between knowledge-distillation and label smoothing. I concur with the reviewers about the importance of this research direction and I think this submission provides a reasonable empirical evidence to change our earlier perspectives. I recommend accept. + +While the paper specifically studies the effect of label smoothing on knowledge-distillation, I think providing a bigger context and reviewing some of the recent demystifying efforts on understanding knowledge-distillation could allow paper to communicate with a broader audience. I hope this can be accommodated in the final version. +",ICLR2021, +ryZfaGU_e,1486400000000.0,1486400000000.0,1,B1gtu5ilg,B1gtu5ilg,ICLR committee final decision,Accept (Poster),"The paper proposes a model for multi-view learning that uses a triplet loss to encourage different views of the same object to be closer together then the views of two different objects. The technical novelty of the model is somewhat limited, and the reviewers are concerned that experimental evaluations are done exclusively on synthetic data. The connections to human perception appear interesting. Earlier issues with missing references to prior work and comparisons with baseline models appear to have been substantially addressed in revisions of the paper. We strongly encourage the authors to further revise their paper to address the remaining outstanding issues mentioned above.",ICLR2017, +SJl6aLrWxV,1544800000000.0,1545350000000.0,1,SJMO2iCct7,SJMO2iCct7,borderline - but leaning to reject because of reviewer reservations,Reject,The reviewers in general like the paper but has serous reservations regarding relation to other work (novelty) and clarity of presentation. Given non-linear state space models is a crowded field it is perhaps better that these points are dealt with first and then submitted elsewhere.,ICLR2019,4: The area chair is confident but not absolutely certain +fXzZXDk3DJ6,1610040000000.0,1610470000000.0,1,ijVgDcvLmZ,ijVgDcvLmZ,Final Decision,Reject,"Although the paper presents some interesting ideas, in general the reviewers agree that the paper lacks clear results and is not an easy read. The paper proposes a factorisation of value functions, a topic that has received quite some attention in the literature (e.g. QPLEX), and it seems that their is not sufficient innovation in the proposed method in the paper. There are also a number of claims in the paper (e.g. partial observability etc.) with which some of the reviewers disagree, and should be discussed more carefully in a revised version of the article, that all in all seems to need more work.",ICLR2021, +Bk2e8tklE,1544690000000.0,1545350000000.0,1,ryeoxnRqKQ,ryeoxnRqKQ,the only favorable review does not make a convincing argument to accept the paper,Reject,"Although one review is favorable, it does not make a strong enough case for accepting this paper. Thus there is not sufficient support in the reviews to accept this paper. + +I am recommending rejecting this submission for multiple reasons. + +Given that this is a ""black box"" attack formalized as an optimization problem, the method must be compared to other approaches in the large field of derivative-free optimization. There are many techniques including: Bayesian optimization, (other) evolutionary algorithms, simulated annealing, Nelder-Mead, coordinate descent, etc. Since the method of the paper does not use anything about the structure of the problem it can be applied to other derivative-free optimization problems that had the same search constraint. However, the paper does not provide evidence that it has advanced the state of the art in derivative-free optimization. + +The method the paper describes does not need a new name and is an obvious variation of existing evolutionary algorithms. Someone facing the same problem could easily reinvent the exact method of the paper without reading it and this limits the value of the contribution. + +Finally, this paper amounts to breaking already broken defenses, which is not an activity of high value to the community at this stage and also limits the contribution of this work. +",ICLR2019,5: The area chair is absolutely certain +r1l-kGSWgE,1544800000000.0,1545350000000.0,1,rkx1m2C5YQ,rkx1m2C5YQ,Borderline - but missing clarity,Reject,A lot of work has appeared recently on recurrent state space models. So although this paper is in general considered favorable by the reviewers it is unclear exactly how the paper places itself in that (crowded) space. So rejection with a strong encouragement to update and resubmission is encouraged. ,ICLR2019,4: The area chair is confident but not absolutely certain +TyauNVHbmyk,1642700000000.0,1642700000000.0,1,o6dG7nVYDS,o6dG7nVYDS,Paper Decision,Reject,"This paper provides a learning theoretic account of domain generalization in which domains themselves are treated as data, generated from some domain generating distribution. All of the reviewers were positive about this approach and found it interesting. There were, however, a couple of critiques raised by reviewers that lead me to recommend that it is rejected: + +- the theory provided in this paper does not remotely apply to the datasets that are used in the experiments. While, I agree with one of the author responses that DG benchmarks exist with many domains, DomainBed has very few domains, and it's not clear that their theory is a remotely satisfactory account of the experimental results presented in the paper. +- Despite some back and forth on the wording and positioning of the paper, I think the writing still does not give enough credit to worst-case analyses of DG.",ICLR2022, +kP5i2Czt9P,1576800000000.0,1576800000000.0,1,SJxbHkrKDH,SJxbHkrKDH,Paper Decision,Accept (Poster),"The paper proposes a curriculum approach to increasing the number of agents (and hence complexity) in MARL. + +The reviewers mostly agreed that this is a simple and useful idea to the MARL community. There was some initial disagreement about relationships with other RL + evolution approaches, but it got resolved in the rebuttal. Another concern was the slight differences in the environments considered by the paper compared to the literature, but the authors added an experiment with the unmodified version. + +Given the positive assessment and the successful rebuttal, I recommend acceptance.",ICLR2020, +u3CswdMisIk,1642700000000.0,1642700000000.0,1,TKrlyiqKWB,TKrlyiqKWB,Paper Decision,Reject,"This paper extends prototypical classification networks to handle class hierarchies and fairness. New neural architecture is proposed and experimental results in support of it are presented. + +Unfortunately, reviewers found that paper in its current for is not sufficiently strong to be accepted at ICLR. Authors have made a significant attempt to clarify and improve the paper in their response. However, reviewers believe that contributions and motivation can be clarified further. We encourage authors to improve their work according to the specific suggestions made by the reviewers and resubmit.",ICLR2022, +SJZcByTHM,1517250000000.0,1517260000000.0,621,rJbs5gbRW,rJbs5gbRW,ICLR 2018 Conference Acceptance Decision,Reject,"The paper appears unfinished in many ways: the experiments are preliminary, the paper completely ignored a large body of prior work on the subject, and the presentation needs substantial improvements. The authors did not provide a rebuttal. + +I encourage the authors to refrain from submitting unfinished papers such as this one in the future, as it unnecessarily increases the load on a review system that is already strained.",ICLR2018, +a-UEJJcQhoN,1610040000000.0,1610470000000.0,1,jXe91kq3jAq,jXe91kq3jAq,Final Decision,Accept (Poster),"This paper proposes a unified model-based framework for high-level skill learning and composition through hierarchical RL. The proposed approach combines high-level planning in a low dimensional space with low-level skill learning, where each low-level skill is a policy conditioned on the high-level task. The low-level policies are learned by using a mutual information objective. The proposed approach is evaluated on locomotion tasks, and is shown to be overall more data efficient than alternative baselines. +The reviewers agree that this work is original and sufficiently empirically motivated for acceptance. Two reviewers were concerned by the experimental setup and the transfer setting that are somehow too simple, but the authors fixed these issues in the improved version based on the feedback.",ICLR2021, +rk-xB1THf,1517250000000.0,1517260000000.0,488,H113pWZRb,H113pWZRb,ICLR 2018 Conference Acceptance Decision,Reject,"The authors provide an extension to GCNs of Kipf and Welling in order to incorporate information about higher order neighborhoods. The extension is well motivated (and though I agree that it is not trivial modification of the K&W approach to the second order, thanks to the authors for the clarification). The improvements are relatively moderate. + +Pros: +-- The approach is well motivated +-- The paper is clearly written +Cons: +-- The originality and impact (as well as motivation) are questioned by the reviewers +",ICLR2018, +2vnlfwGmByZ,1610040000000.0,1610470000000.0,1,Kz42iQirPJI,Kz42iQirPJI,Final Decision,Reject,"The paper proposes a sequential meta-learning method over few-shot sequential domains, which meta learns both model parameters and learning rate vectors to capture task-general representations. + +Reviewers raised many insightful and constructive comments. The main themes are as follows: +- The problem setting needs further motivation and clarifications, to make it more realistic and applicable. +- The novelty is relatively weak, e.g. the approach is too simple, and learning the learning rate is a common trick. +- The method needs great effort for better presentation and justification. The current presentation simply lists several equations in a dense way without detailed explanation. Some main claims such as mitigating catastrophic forgetting are not elaborated extensively. + +AC scanned through the paper and agreed with the reviewers' main points. Authors' rebuttal in general did not address these concerns to the satisfaction. For example, even after revision, the readability of this paper is not good enough. The authors are encouraged to perform a thorough revision.",ICLR2021, +wmjX6QIqw2,1576800000000.0,1576800000000.0,1,HklQYxBKwS,HklQYxBKwS,Paper Decision,Accept (Poster),"The paper considers representational aspects of neural tangent kernels (NTKs). More precisely, recent literature on overparametrized neural networks has identified NTKs as a way to characterize the behavior of gradient descent on wide neural networks as fitting these types of kernels. This paper focuses on the representational aspect: namely that functions of appropriate ""complexity"" can be written as an NTK with parameters close to initialization (comparably close to what results on gradient descent get). + +The reviewers agree this content is of general interest to the community and with the proposed revisions there is general agreement that the paper has merits to recommend acceptance.",ICLR2020, +9fd8l5BHRSN,1610040000000.0,1610470000000.0,1,eHg0cXYigrT,eHg0cXYigrT,Final Decision,Reject,"The paper proposes to use conditional GANs to generate protein sequence with respect to GO molecular functions. The idea is nice. But the reviewers find that there are many things that are not clear. For example, some sentences, phrase, the model and experiments pointed by the reviewers that should be rigorous described. The technical contribution is also limited. The author are encouraged to revise the paper according to the comments.",ICLR2021, +o1bwAPcZr5T,1642700000000.0,1642700000000.0,1,FmBegXJToY,FmBegXJToY,Paper Decision,Accept (Poster),"The paper evaluates the generalization capabilities of model-based agents, in particular, MuZero, compared with model-free agents. Reviewers agree that the paper is well-written and the topic is interesting. The ablation study is especially interesting, as it disentangles the effect of different algorithmic components. Some concerns are raised about the significance of this work, as the scope is limited to an empirical study and the results are not necessarily very surprising. + +Since the paper presents clear results on an important and relevant topic, I recommend acceptance.",ICLR2022, +fst_BZHbGLT,1610040000000.0,1610470000000.0,1,nzLFm097HI,nzLFm097HI,Final Decision,Reject,The reviewers are in agreement that this paper could benefit further improvement. There are several areas: novelty of the proposed approach and evaluation on real-world datasets (beyond just CLEVR).,ICLR2021, +vzrdjChkjRJ,1642700000000.0,1642700000000.0,1,BwPaPxwgyQb,BwPaPxwgyQb,Paper Decision,Accept (Poster),"Dear Authors, + +The paper was received nicely and discussed during the rebuttal period. +There is consensus among the reviewers that the paper should be accepted: + +- This paper does contribute solidly to a timely topic of theoretical understanding of sparisty recovery with deep unroling. +- The original version had very limited experiments and only synthetic ones, which raised concerns about whether the setting is motivated and whether the algorithm works on actual real data. The revision fixed that to an extent with some experiments on real data. + +Yet, there are still some concerns that we suggest to be tackled for the final version: +- The capacity analysis is carried out inside a strongly convex regime while the algorithm is advocated for nonconvex sparsity recovery (see, e.g., the Discussion at the end of Section 2.1 ); +- The analysis is relatively loosely connected to the adopted fist-order optimization procedure; +- While the depth of network plays a role in the upper bound of Equation (15), its real impact on generalization gap looks quite limited. + +The above are just suggestions to be looked more carefully, but there are not necessary. + +The current consensus is that the paper deserves publication. + +Best +AC",ICLR2022, +AkrvZGFC0y,1576800000000.0,1576800000000.0,1,S1e1EAEFPB,S1e1EAEFPB,Paper Decision,Reject,"This paper proposes a new mechanism to visualize the latent space of a neural network. The idea is simple and the paper includes several experiments to test the effectiveness of the method. However, the method bears similarity to previous work and the evaluation does not sufficiently show quantitative improvements over other introspection techniques. The reviewers found this was a substantial problem and for this reason the paper is not ready for publication. The paper should improve its discussion of prior work and better establish its place in this regard.",ICLR2020, +mygBatm-N5,1576800000000.0,1576800000000.0,1,Skxw-REFwS,Skxw-REFwS,Paper Decision,Reject," + + +The paper presents a semi-supervised data streaming approach. The proposed architecture is made of a layer-wise k-means structure (more specifically a epsilon-means approach, where the epsilon is adaptively defined from the distortion percentile). Each layer is associated a scope (patch dimensions); each patch of the image is associated its nearest cluster center (or a new cluster is created if needed); new cluster centers are adjusted to fit the examples (Short Term Memory); clusters that have been visited sufficiently many time are frozen (Long Term Memory). Each cluster is associated a label distribution from the labelled examples. The label for each new image is obtained by a vote of the clusters and layers. + +Some reviews raise some issues about the robustness of the approach, and its sensitivity w.r.t. hyper-parameters. Some claims (""the distribution associated to a class may change with time"") are not experimentally confirmed; it seems that in such a case, the LTM size might grow along time; a forgetting mechanism would then be needed to enforce the tractability of classification. + +Some claims (the mechanism is related to how animal learn) are debatable, as noted by Rev#1; see hippocampal replay. + +The area chair thinks that a main issue with the paper is that the Unsupervised Progressive Learning is considered to be a new setting (""none of the existing approaches in the literature are directly applicable to the UPL problem""), preventing the authors from comparing their results with baselines. + +However, after a short bibliographic search, some related approaches exist under another name: +* Incremental Semi-supervised Learning on Streaming Data, Pattern Recognition 88, Li et al., 2018; +* Incremental Semi-Supervised Learning from Streams for Object Classification, Chiotellis et al., 2018; +* Online data stream classification with incremental semi-supervised learning, Loo et al., 2015. + +The above approaches seem able to at least accommodate the Uniform UPL scenario. I therefore encourage the authors to consider some of the above as baselines and provide a comparative validation of STAM.",ICLR2020, +ElJH0lSuMT,1576800000000.0,1576800000000.0,1,S1el9TEKPB,S1el9TEKPB,Paper Decision,Reject,The paper is rejected based on unanimous reviews.,ICLR2020, +7wARatkxDv,1610040000000.0,1610470000000.0,1,ZlIfK1wCubc,ZlIfK1wCubc,Final Decision,Reject,"I thank the authors both for going the extra mile in doing further experiments for their response, and making the efforts to synthesize the main comments and concerns of the reviewers. + +Overall, I'm pretty sympathetic to the idea that syntactic and semantic representations should be very helpful to learning sentence embeddings. They provide a form of scaffolding. But a reviewer notes and I think anyone will admit in 2020 that contextual language models like BERT also provide much of this scaffolding, and it falls to the paper author to provide convincing evidence that using external parsers is valuable and necessary in this quest. In this, the current paper seems to fall somewhat short. + +Pros: + - Clearly and honestly written paper + - Good exploration of value of constituency & dependency parse representations + - Exploits recent work in contrastive learning + +Cons: + - Insufficient novelty + - Experimental comparisons not well controlled – too much apples and oranges. + - No comparisons of inference speed tradeoffs + - Value of exploiting explicit syntax is too much assumed rather than explored + - It's not established that use of explicit syntax really delivers versus alternatives such as contextual language models + +Several of the reviewers felt that this paper was a fairly limited extension of L & L 2018, without any clearly novel contribution. The issue of comparability in results is complicated. There is a reason to move to a new standard corpus, rather than privileged people passing around archived copies of the old BooksCorpus, and I think your additional experiments show the results are ""near enough"" but there would still be much more archival value in a new paper having a set of comparable results on a new corpus. The big question of whether to do this or use BERT is better addressed in your additional experiments presenting a random projection of BERT to a comparable higher dimensional space. But unfortunately these results further weaken the clarity of the case for needing to head in the direction of this paper rather than just using a large pre-trained contextual LM.",ICLR2021, +mrfnjqG8qks,1642700000000.0,1642700000000.0,1,UJ9_wmscwk,UJ9_wmscwk,Paper Decision,Reject,"This paper revisits the problem of influence maximization and suggests using graph neural networks to estimate an upper bound on the influence, which can then be used to find good seed sets. The paper gives a variety of experimental evidence that the methods improve on various algorithms in the literature. There was a wide variation in opinions. Some reviewers felt that the overall idea was not particularly novel, as methods that combine graph embeddings and reinforcement learning to solve influence maximization have already been proposed in the literature. Additionally some reviewers felt that the experiments were missing important comparisons, particularly to learning-based methods, without which it is difficult to argue that these methods really do advance the state of the art.",ICLR2022, +LTAafnEG6,1576800000000.0,1576800000000.0,1,rkewaxrtvr,rkewaxrtvr,Paper Decision,Reject,"The paper leverages variational auto-encoders (VAEs) and disentanglement to generate data representations that hide sensitive attributes. The reviewers have identified several issues with the paper, including its false claims or statements about differential privacy, unclear privacy guarantee, and lack of related work discussion. The authors have not directly addressed these issues.",ICLR2020, +ma9I6_agvBB,1642700000000.0,1642700000000.0,1,86sEVRfeGYS,86sEVRfeGYS,Paper Decision,Reject,"The paper studies an important newly identified problem in continual learning of rapid adaptation, and proposes the use of a generate-and-test method to continually inject random features alongside SGD, enabling better learning on non-stationary data streams. +Unfortunately the paper remained borderline in the discussions. While reviewers liked the overall research direction and contributions, they also agreed the paper in current form still would benefit from deeper insights into the proposed method, stronger empirical evidence. +Experiments cover broad applications including RL, but don't seem to give very clear advantages over other weight regularization schemes, and other metrics of quality could be added. We appreciate the authors have added additional experiments testing it both for the two important regimes of under- and over-parameterized networks, though those can be expanded. +We are sorry that this good paper remained narrowly below the bar in this case, and hope the detailed feedback helps to strengthen the paper for a future occasion.",ICLR2022, +AL3e5BZiiMB,1642700000000.0,1642700000000.0,1,QKEkEFpKBBv,QKEkEFpKBBv,Paper Decision,Reject,"All reviewers concur that the paper has promise, but fails to deliver on that promise. The idea of learning potentials based on DNNs is appreciated, but the evaluation of the contribution is considered lacking by all reviewers. In addition, reviewers note that the training is not differentiable, which the rebuttal acknowledges is future work. + +I do not reject the paper simply for failing to beat a deep learning baseline, but for having chosen applications which do not even test the paper's hypotheses: reviewers note that the models are tree structured, so loopy BP is not tested, despite the revised paper's claim that ""the inference strategy is compatible with graphs containing cycles"".",ICLR2022, +MNyyGBIoLWD,1642700000000.0,1642700000000.0,1,Y8KfxdZl-rI,Y8KfxdZl-rI,Paper Decision,Reject,"The paper proposes a new approach for weakly supervised learning, based on conditional normalizing flows. Reviewers generally found the paper to have an interesting, novel proposal with empirical promise. However, some concerns were raised: to name a few, + +(1) _Clarity._ Several reviewers found portions of the technical content hard to follow, e.g., the description of constraints in Sec 4. + +(2) _Scalability compared to data programming._ One reviewer was unsure of how the present approach compares in terms of inference time and/or accuracy to a two-stage data programming approach. + +(3) _Infeasibility of sampling from Equation 2._ One reviewer suggested the paper discuss and compare to a simpler baseline, which is to perform rejection sampling from the constraint set. + +(4) _Suitability of point cloud problem._ One reviewer was unsure of whether the point cloud problem, considered as an experimental setting in this paper, is reflective of weakly supervised learning. + +(5) _Practicality of knowing weak labeler error rates._ The paper assumes knowledge of the weak labeler error rates in constructing constraints. Some reviewers raised concerns on the practical viability of this assumption. + +For point (2), the relevant reviewer was not convinced following the discussion. The suggestion is to treat LLF as a label model, which serves as input to a non-MC predictor. The question then is what the predictive performance of this combined approach looks like, as opposed to the LLF's themselves. + +For point (3), the response clarified that the number of constraints might make rejection sampling infeasible. This appears to be true, but it is suggested that the paper at a minimum discuss this, and ideally also clarify claims about the general-purpose need for the proposed approach (since in some cases one might be able to do rejection sampling). + +For point (4), the discussion was somewhat inconclusive. It is suggested that the authors explicitly discuss some of the points brought up in the response. + +For point (5), while the assumption not wholly uncommon in the literature, it would be better for the authors to perform some sensitivity analysis against misspecification of the error rates. + +Overall, the paper has some interesting ideas that are well worth exploring. The present execution appears to have some scope for improvement, with the reviews providing a range of suggestions of areas of the paper that could be made clearer or strengthened. The paper would be best served by incorporating these comments and undergoing a fresh review.",ICLR2022, +TZRyep0tV4b,1642700000000.0,1642700000000.0,1,#NAME?,#NAME?,Paper Decision,Reject,"The paper contributes to the understanding of out-of-distribution detection by showing that binary discrimination between in- and out-distribution examples 'is equivalent to several different formulations of the out-of-distribution detection problem'. The paper shows this in an asymptotic setup based on studying likelihood ratios for distinguishing in-distribution examples from out-of-distribution examples. The paper also provides numerical results showing that a simple baseline based on binary classification works well. + +The paper got very mixed responses ranging from strong accept to reject: +- Reviewer YhZ7 (recommending 3: reject) raises several important concerns, specifically that the paper doesn't explain the significance of its contributions adequately, that experiments are not thorough enough (for example that only one out-of-distribution dataset is considered), and that to train a binary classifier one needs to have sufficiently many out-of-distribution examples. +The authors argued in response that the purpose of the paper is to provide an understanding of existing methods that are often empirically driven, made revisions to the exposition, and point out that they actually evaluate on six/seven out-of-distribution test sets. +After discussion, the reviewer is still concerned that the paper states 'We show that when training the binary discriminator between in- and out-distribution together with a standard classifier on the in-distribution in a shared fashion, the binary discriminator reaches state-of-the-art OOD detection performance' as a contribution and that this claim is not supported by the results in the paper. The authors say they are happy to drop this particular statement and emphasize that their contribution is that that a binary classifier can be a useful tool for OOD detection. The reviewer is not satisfied by this response, as the reviewer feels that this makes the contribution much less impactful. + +- Reviewer iH61 (recommending 6: marginally above, initially reject) pointed out that the significance of one of the contributions is limited, since the claims resemble the ones by Thulasidasan et al. [2021] and Mohseni et al. [2020], and initially recommends to reject. The authors respond that those two papers only aim at good performance, but do not unify existing approaches, as the paper under review does. The reviewer slightly raised their score, but again points out that the previous works already show that a binary discriminator performs well. + +- Reviewer Lwwq (recommending 10: strong accept) appreciates the unification of different methods and votes for strong acceptance. The reviewer also points out that he/she is not an expert in the field, and thus this reviewer's rating should be taken with care. + +- Reviewer YRfA (recommending 8: accept) points out that the authors make notable progress towards a better understanding of OOD methods, but is concerned about what problem the authors are trying to solve and its significance, and states that he/she cannot judge the importance of the paper. + +- Reviewer vYWv (recommending 6: marginally above, initially recommending reject) finds that the paper provides helpful insights to connect methods for OOD detection tasks, and weakly recommends acceptance. + +The reviewer's opinions on this paper vary significantly. Initially, a major selling point of the paper was that 'the binary discriminator reaches state-of-the-art OOD detection performance', but after discussion, the authors and reviewers agree that this statement is not supported by experiments, and the idea of using a binary discriminator is also not new, and thus everyone agrees that this statement should be removed. +This leaves as the major contribution an improved understanding of a variety of methods, and casting them as versions of a binary classifier. +This by itself would be sufficient to carry a paper, however the stated equivalence is rather weak as it is based on an asymptotic analysis, and in the asymptotic regime, out-of-distribution detection is rather trivial because the distributions are given. This also explains why in the paper's experiments all the methods that are asymptotically related behave quite differently in experiments. + +I do not recommend this paper for acceptance. I've read the paper and I've thought quite a while it and its reviews. I have also discussed the paper with a colleague who works actively on out-of-distribution detection, since I'm not an expert on this topic myself. While in general I find it very valuable to unify and to understand existing out-of-distribution algorithms better, I don't see how the particular interpretation provided by the paper is impactful, since it is unclear how the connection drawn in an asymptotic setup for Bayes classifiers actually extend to concrete OOD detection algorithms, which operate in the finite sample regime.",ICLR2022, +H1YPX1Trz,1517250000000.0,1517260000000.0,160,SJyVzQ-C-,SJyVzQ-C-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The paper studies a dropout variant, called fraternal dropout. The paper is somewhat incremental in that the proposed approach is closely related to expectation linear dropout. Having said that, fraternal dropout does improve a state-of-the-art language model on PTB and WikiText2 by ~0.5-1.7 perplexity points. The paper is well-written and appears technically sound. + +Some reviewers complain that the authors could have performed a more careful hyperparameter search on the fraternal dropout model. The authors appear to have partly addressed those concerns, which frankly, I don't really agree with either. By doing only a limited hyperparameter optimization, the authors are putting their ""own"" method at a disadvantage. If anything, the fact that their method gets strong performance despite this disadvantage (compared to very strong baseline models) is an argument in favor of fraternal dropout.",ICLR2018, +Sygjczi-xE,1544820000000.0,1545350000000.0,1,rkxraoRcF7,rkxraoRcF7,metareview,Reject,"This is a proposed method that studies learning of disentangled representations in a relatively specific setting, defined as follows: given two datasets, one unlabeled and another that has a particular factor of variation fixed, the method will disentangle the factor of variation from the others. The reviewers found the method promising, with interesting results (qual & quant). + +The weaknesses of the method as discussed in the reviews and after: + +- the quantitative results with weak supervision are not a big improvement over beta-vae-like methods or mathieu et al. +- a red flag of sorts to me is that it is not very clear where the gains are coming from: the authors claim to have done a fair comparison with the various baselines, but they introduce an entirely new encoder/decoder architecture that was likely (involuntarily, but still) tuned more to their method than others. +- the setup as presented is somewhat artificial and less general than it could be (however, this was not a major factor in my decision). It is easy to get confused by the kind of disentagled representations that this work is aiming to get. + +I think this has the potential to be a solid paper, but at this stage it's missing a number of ablation studies to truly understand what sets it apart from the previous work. At the very least, there is a number of architectural and training choices in Appendix D -- like the 0.25 dropout -- that require more explanation / empirical understanding and how they generalize to other datasets. + +Given all of this, at this point it is hard for me to recommend acceptance of this work. I encourage the authors to take all this feedback into account, extend their work to more domains (the artistic-style disentangling that they mention seems like a good idea) and provide more empirical evidence about their architectural choices and their effect on the results.",ICLR2019,4: The area chair is confident but not absolutely certain +#NAME?,1642700000000.0,1642700000000.0,1,HRL6el2SBQ,HRL6el2SBQ,Paper Decision,Reject,"The paper proposes to use intra-class mixup supplemented with angular margin to improve OOD detection. + +Strengths: ++ Simple idea ++ Experiments on multiple datasets (although mostly focused on image benchmarks) + +Weaknesses: +- Justification for the idea could be improved. It'd be nice to understand when we expect this to (not) work. +- Differences from prior work ""Angle-based outlier detection in high-dimensional data"" could be better explained. + +While the paper has some interesting contributions, the reviewers and I feel that the current version falls short of the acceptance threshold. I encourage the authors to revise and resubmit to a different venue.",ICLR2022, +B9RXwIdC4Dn,1610040000000.0,1610470000000.0,1,qcKh_Msv1GP,qcKh_Msv1GP,Final Decision,Reject,"This paper presents a pre-training strategy for learning graph representations using a graph-to-subgraph contrastive learning objective that also simultaneously discovers motifs. Pre-training for graph representation learning is an important research topic and this work presents a unique solution leveraging the fact that graphs sharing a lot of motifs should be similar to one another. The approach is novel and interesting, the ability to simultaneously identify motifs are highly desirable. The results are promising showing that the proposed approach, when pretrained on the ogbn-molhiv molecule dataset, worked well for several downstream chemical property prediction tasks. + +However, the paper is not without weaknesses and the reviewers noticed several of them. There are many parts of the system, the graph segmenter, which relies on spectral clustering (on the affinity matrix), the EM style clustering component to extract the motifs based on the subgraphs, the sampling loss based on the subgraph-to-motif similarity, and the graph-to-subgraph contrastive learning loss. These parts are tied together through different mechanisms and the training procedure becomes very confusing. It is unclear which parts are updated on the backpropagation path from which loss, and what choices are decided offline (i.e., not integrated into the backpropagation). This presents great difficulty in understanding and probably using /building-on the method. The paper has improved some aspects of its presentation during the review/discussion process, but the training/optimization procedure of the current version still appears quite opaque, and the reviewers heavily relied on the back and forth discussion to understand what is really going on. + +Another concern is that the intuition behind some aspects of the approach and the connections between different components of the approach are a bit difficult to get/digest at places. The intuition behind graph to subgraph contrastive learning appeared weak to the reviewer. It would be desirable to see a directly comparison to the subgraph-to-subgraph version. The connection between the motif discovery and the representation learning can be somewhat lost as we try to keep the many moving parts straight in the mind. For these reasons, the paper, in its current form, cannot be accepted.",ICLR2021, +5XXJl5iMZ,1576800000000.0,1576800000000.0,1,Hkex2a4FPr,Hkex2a4FPr,Paper Decision,Reject,"This paper analyzes the behavior of VAE for learning controllable text representations and uses this insight to introduce a method to constrain the posterior space by introducing a regularization term and a structured reconstruction term to the standard VAE loss. Experiments show the proposed method improves over unsupervised baselines, although it still underperforms supervised approaches in text style transfer. + +The paper had some issues with presentation, as pointed out by R1 and R3. In addition, it missed citations to many prior work. Some of these issues had been addressed after the rebuttal, but I still think it needs to be more self contained (e.g., include details of evaluation protocols in the appendix, instead of citing another paper). + +In an internal discussion, R1 still has some concerns regarding whether the negative log likelihood is less affected by manipulations in the constrained space compared to beta-VAE. In particular, the concern is about whether the magnitude of the manipulation is comparable across models, which is also shared by R3. R1 also think some of the generated samples are not very convincing. + +This is a borderline paper with some interesting insights that tackles an important problem. However, due to its shortcoming in the current state, I recommend to reject the paper.",ICLR2020, +bkansA7DU5,1610040000000.0,1610470000000.0,1,_XYzwxPIQu6,_XYzwxPIQu6,Final Decision,Accept (Spotlight),"This paper describes a clever new class of piecewise-linear RNNs that contains a long-time scale memory subsystem. The reviewers found the paper interesting and valuable, and I agree. The four submitted reviews were unanimous in their vote to accept. The theoretical insights and empirical results are impactful and would be suitable for spotlight presentation.",ICLR2021, +7baJCnTEMPF,1610040000000.0,1610470000000.0,1,K9bw7vqp_s,K9bw7vqp_s,Final Decision,Accept (Poster),Reviewers like the simplicity of the approach and the fact that it works well. ,ICLR2021, +pd6r5u_oQNw,1610040000000.0,1610470000000.0,1,VJnrYcnRc6,VJnrYcnRc6,Final Decision,Accept (Poster),"The paper proposes a model and a training mechanism for multimodal generation. The reviews are generally positive: they praise the generality of the method, the extensive experimental evaluation, and the good empirical results. Overall, no major concerns were raised, and all reviewers recommend acceptance. + +A couple of concerns remain, in my view: +- The method is generally heuristic, and intuitively rather than theoretically motivated. This is compensated of course by the empirical evaluation, which is thorough. +- The paper could be better written. The reviewers suggested some minor improvements which were implemented in the updated version, but I believe there is room for further improvement. + +Due to the above concerns, I consider the rating of reviewer #3 (10: Top 5% of accepted papers, seminal paper) to be unjustifiably high. On balance, however, I'm happy to recommend acceptance. + +Message to the authors: + +In the abstract you write: ""a simple generic model that can beat highly engineered pipelines"". Please be aware that the word ""beat"" evokes competition, winners and losers, so it's not appropriate in the context of scientific evaluation. Please consider replacing it with something neutral, such as ""a simple generic model that can perform better than ..."".",ICLR2021, +S1lrZfybxV,1544770000000.0,1545350000000.0,1,SJNRHiAcYX,SJNRHiAcYX,"Marginal contribution, need stronger experiments ",Reject,"This work proposes to improve trust region policy search (TRPO) by using normalizing flow policies. This idea is a straightforward combination of two existing techniques and is not super surprising in terms of novelty. In this case, really strong experiments are needed to support the work; this is , unfortunately, is the not the case. For example, it was notice by the reviewers that the Mujoco TRPO experiments does not use the best implementation of TRPO, which makes it difficult to judge the strength of the work compared with state of the art. ",ICLR2019,5: The area chair is absolutely certain +aO_H8El6XE,1576800000000.0,1576800000000.0,1,rkeeoeHYvr,rkeeoeHYvr,Paper Decision,Reject,"This paper proposes a method for generating text examples that are adversarial against a known text model, based on modifying the internal representations of a tree-structured autoencoder. + +I side with the two more confident reviewers, and argue that this paper doesn't offer sufficient evidence that this method is useful in the proposed setting. I'm particularly swayed by R1, who raises some fairly basic concerns about the value of adversarial example work of this kind, where the generated examples look unnatural in most cases, and where label preservation is not guaranteed. I'm also concerned by the fact, which came up repeatedly in the reviews, that the authors claimed that using a tree-structured decoder encourages the model to generate grammatical sentences—I see no reason why this should be the case in the setting described here, and the paper doesn't seem to offer evidence to back this up.",ICLR2020, +Jj1fIxaqDSl,1642700000000.0,1642700000000.0,1,6PlIkYUK9As,6PlIkYUK9As,Paper Decision,Reject,"The paper studied an interesting yet challenging problem in active learning and provided an intuitive heuristic for selecting informative subset(s) of training examples. The reviewers generally find the paper well presented and highlight that the clarity of the exposition of the issues of existing query heuristics, especially for training deep models with class-imbalance data. + +However, there are shared concerns among all reviewers in whether the existing experiments sufficiently justify the practical significance of the proposed heuristic (Reviewer 4ATq: Missing comparison against important baselines such as Gal et al 2017; Reviewer Cp2k: ablations of class and boundary balancing; Reviewer yngU: lack of comparison to SOTA and ablation for important hyperparameters; Reviewer oEcZ: lack comparison against SOTA). Reviewers also point out that the approximation guarantee is against an algorithm that is optimal wrt a somewhat ad-hoc objective, which makes the theoretical components of the paper not as significant. Given the above concerns, the paper does not appear to be ready for acceptance at the current stage.",ICLR2022, +BJgdd4YlxV,1544750000000.0,1545350000000.0,1,r1MxciCcKm,r1MxciCcKm,Borderline paper,Reject,"I enjoyed reading the paper myself and I appreciate the unifying framework connecting RAML and SPG. While I do not put a lot of weight on the experiments, I agree with the reviewers that the experimental results are not very strong, and I am not convinced that the theoretical contribution meets the bar at ICLR. + +In the interpolation algorithm, there seems to be an additional annealing parameter and two tuning parameters. It is important to describe how the parameters are tuned. Given the additional hyper-parameters, one may consider giving all of the algorithms the same budget of hyper-parameter tuning. I also agree with reviewers that the policy gradient baseline seems to underperform typical results. One possible way to strengthen the experiments is to try to replicate the results of SPG or RAML and discuss the behavior of each algorithm as a function of hyper-parameters. +",ICLR2019,4: The area chair is confident but not absolutely certain +7qEgKGt4aXS,1642700000000.0,1642700000000.0,1,gciJWCp3z1s,gciJWCp3z1s,Paper Decision,Reject,"The paper considers the Equitable and Optimal Transport (EOT) problem which is arises in fair division of goods and multi-resource allocation. The resulting problem is a linear program, which is polynomial-time solvable; however, the existing polynomial-time solvers either do not scale well with the dimension or are dual methods with entropic regularization for which it is unclear how to extract a primal solution. The paper shows how to extract a primal solution and also provides complexity analysis of a recently proposed projected alternating minimization method (PAM). The paper further provides a Nesterov accelerated variant of PAM. + +Overall, the paper is a meaningful contribution and was considered borderline. On one side, EOT seems like an interesting problem, the paper is well-presented, and the provided complexity results are technically sound. On the other hand, the reviewers felt that the EOT problem was not motivated enough, that the techniques for proving the results were mostly standard, and that the numerical experiments were insufficient. Even though the authors provided additional numerical experiments, I did not find the responses regarding motivation (particularly in the context of ML applications) and technical novelty convincing enough. The paper could have gone in either direction, but as there was ultimately no particularly strong support from any of the reviewers, I recommend rejection. The authors are advised to carefully revise the paper and resubmit.",ICLR2022, +4DwCnKn-t,1576800000000.0,1576800000000.0,1,rkerLaVtDr,rkerLaVtDr,Paper Decision,Reject,"Given two distributions, source and target, the paper presents an upper bound on the target risk of a classifier in terms of its source risk and other terms comparing the risk under the source/target input distribution and target/source labeling function. In the end, the bound is shown to be minimized by the true labeling function for the source, and at this minimum, the value of the bound is shown to also control the ""joint error"", i.e., the best achievable risk on both target and source by a single classifier. + +The point of the analysis is to go beyond the target risk bound presented by Ben-David et al. 2010 that is in terms of the discrepancy between the source and target and the performance of the source labeling function on the target or vice versa, whichever is smaller. Apparently, concrete domain adaptation methods ""based on"" the Ben-David et al. bound do not end up controlling the joint error. After various heuristic arguments, the authors develop an algorithm for unsupervised domain adaptation based on their bound in terms of a two-player game. + +Only one reviewer ended up engaging with the authors in a nontrivial way. This review also argued for (weak) acceptance. Another reviewer mostly raised minor issues about grammar/style and got confused by the derivation of the ""general"" bound, which I've checked is ok. The third reviewer raised some issues around the realizability assumption and also asked for better understanding as to what aspects of the new proposal are responsible for the improved performance, e.g., via an ablation study. + +I'm sympathetic to reviewer 1, even though I wish they had engaged with the rebuttal. I don't believe the revision included any ablation study. I think this would improve the paper. I don't think the issues raised by reviewer 3 rise to the level of rejection, especially since their main technical concern is due to their own confusion. Reviewer 2 argues for weak acceptance. However, if there was support for this paper, it wasn't enough for reviewers to engage with each other, despite my encouragement, which was disappointing.",ICLR2020, +aPQQYnrGKU,1576800000000.0,1576800000000.0,1,r1eIiCNYwS,r1eIiCNYwS,Paper Decision,Accept (Poster),This work examines a problem that is of considerable interest to the community and does a good job of presenting the work. The AC recommends acceptance.,ICLR2020, +Ykoc8KR8VD,1576800000000.0,1576800000000.0,1,HylznxrYDr,HylznxrYDr,Paper Decision,Reject,"This paper presents FinBERT, a BERT-based model that is further trained on a financial corpus and evaluated on Financial PhraseBank and Financial QA. The authors show that FinBERT slightly outperforms baseline methods on both tasks. + +The reviewers agree that the novelty is limited and this seems to be an application of BERT to financial dataset. There are many cases when it is okay to not present something entirely novel in terms of model as long as a paper still provides new insights on other things. Unfortunately, the new experiments in this paper are also not convincing. The improvements are very minor on small evaluation datasets, which makes the main contributions of the paper not enough for a venue such as ICLR. + +The authors did not respond to any of the reviewers' concerns. I recommend rejecting this paper.",ICLR2020, +7x_6G9OVWG,1576800000000.0,1576800000000.0,1,rygf-kSYwH,rygf-kSYwH,Paper Decision,Accept (Spotlight),"This paper proposes a platform for benchmarking and evaluating reinforcement learning algorithms. While reviewers had some concerns about whether such a tool was necessary given existing tools, reviewers who interacted with the tool found it easy to use and useful. Making such tools is often an engineering task and rarely aligned with typical research value systems, despite potentially acting as a public good. The success or failure of similar tools rely on community acceptance and it is my belief that this tool surpasses the bar to be promoted to the community at a top tier venue. ",ICLR2020, +oOM8PH7tVx,1576800000000.0,1576800000000.0,1,B1xRGkHYDS,B1xRGkHYDS,Paper Decision,Reject,"This paper addresses the challenge of time complexity in aggregating neighbourhood information in GCNs. As we aggregate information from larger hops (deeper neighbourhoods) the number of nodes can increases exponentially thereby increasing time complexity. To overcome this the authors propose a sampling method which samples nodes layer by layer based on bidirectional diffusion between layers. They demonstrate the effectiveness of their approach on 3 large benchmarks. + +While the ideas presented in the paper were interesting the reviewers raised some concerns which I have summarised fellow: + +1) Novelty: The reviewers felt that the techniques presented were not very novel and is very similar to one existing work as pointed out by R4 +2) Writing: The writing needs to be improved. The authors have already made an attempt towards this but it could be improved further +3) Comparisons with baselines: R4 has raised some concerns the settings/configurations used for the baseline methods. In particular, the results for the baseline methods are lower than those reported in the original papers. I have read the author's rebuttal for this but I am not completely convinced about it. I would suggest that the authors address this issue in subsequent submissions + +Based on the above reasons I recommend that the paper cannot be accepted. + + ",ICLR2020, +h365St-Jmu8,1642700000000.0,1642700000000.0,1,kcwyXtt7yDJ,kcwyXtt7yDJ,Paper Decision,Accept (Poster),"This paper proposes to leverage topological structure between domains, expressed as a graph, towards solving the domain adaptation problem. + +Reviewer n4Lk thought the ideas were interesting, appreciated the theoretical analysis and indicated that the experiments were “well thought out”. The reviewer asked for more detail on Lemma 4.1 and suggested that a proof be provided for Proposition 4.1. They asked for more justification on why the change of task for the discriminator from classification to generation would improve performance. The authors responded to these comments, clarifying the proof of Lemma 4.1 in the appendix. They clarified that proposition 4.1 can be derived from Corollary 4.3 or Corollary 4.4. On the point of classical vs. enhanced discriminator the authors provided additional experiments. + +Reviewer rNQp commented that the method was easy to follow and noted the theoretical and empirical analysis. They expressed some concern that previous work on graph-based domain adaptation was inadequately addressed. Like reviewer n4Lk they seemed unconvinced that the proposed graph discriminator was an improvement over past SOTA and questioned its novelty. In terms of claims about novelty and competitiveness relative to previous works I would have liked to see the reviewer make specific references rather than criticize in general terms. The authors’ responded to the reviewer, adding a recent entropy-based method (SENTRY) to the experiments and showed that their method outperformed this ICCV 2021 work by a large margin. They responded to the reviewer’s remarks about the original discriminator and variants, pointing out that this was already established in the paper. They used the other reviews to dispute the claim of lack of novelty. + +Reviewer uDYW felt that the work was novel and interesting. Like rNQp they thought the paper was clear. They questioned the practical advantage over baselines. The authors responded to the reviewer’s question about using a data graph. They responded to the question about parameter tuning and computational cost. They addressed the question about limited improvements in real-world datasets. + +I had some difficulty motivating the reviewers to engage in the discussion and acknowledge the authors’ response. The authors also politely attempted to nudge the reviewers to consider their updated results. In my opinion, the author responses have addressed most of the reviewer concerns and I don’t see any critical issues remaining. Therefore I think that this paper should be accepted as a poster.",ICLR2022, +kF8rX0M39,1576800000000.0,1576800000000.0,1,r1lL4a4tDB,r1lL4a4tDB,Paper Decision,Accept (Poster),"The authors propose to decompose control in a POMDP into learning a model of the environment (via a VRNN) and learning a feed-forward policy that has access to both the environment and environment model. They argue that learning the recurrent environment model is easier than learning a recurrent policy. They demonstrate improved performance over existing state-of-the-art approaches on several PO tasks. + +Reviewers found the motivation for the proposed approach convincing and the experimental results proved the effectiveness of the method. The authors response resolved reviewers concerns, so as a result, I recommend acceptance.",ICLR2020, +HJxChFKCyV,1544620000000.0,1545350000000.0,1,BJxOHs0cKm,BJxOHs0cKm,ICLR 2019 decision,Reject,"This paper proposes a generalization metric depending on the Lipschitz of the Hessian. + +Pros: Paper has some nice experiments correlating their Hessian based generalization metric with the generalization gap, + +Cons: The paper does not compare its results with existing generalization bounds, as there is substantial work in the area now. It is not clear whether existing generalization bounds do not capture this phenomenon with different batch sizes/learning rates, and the necessity of having and explicit dependence on the Lipschitz of the Hessian. + +The bound by itself is also weak because of its dependence on number of parameters 'm'. + +The paper is poorly written and all reviewers complain about its readability. + +I suggest authors to address concerns of the reviewers before submitting again. ",ICLR2019,4: The area chair is confident but not absolutely certain +PnMKsJVIWbQ,1610040000000.0,1610470000000.0,1,kmqjgSNXby,kmqjgSNXby,Final Decision,Accept (Poster),"The paper is about the use of autoregressive dynamics models in the context of offline model-based reinforcement learning. +After reading the authors' responses and the other reviews, the reviewers agree that this paper has several strengths (well written, easy to follow, the approach is novel and simple to implement, the empirical evaluation is well executed and the results are reproducible) and it deserves acceptance. +The authors need to update their manuscript by keeping into considerations all the suggestions provided by the reviewers (clarifications and additional empirical comparisons).",ICLR2021, +H1xS9IubeN,1544810000000.0,1545350000000.0,1,Hkxarj09Y7,Hkxarj09Y7,"Limited novelty, and needs better presentation",Reject,"This paper presents an algorithm for combining various feature types when training recurrent networks. The features are handled by modifying the update rules and cell states based on the features' type -- dense, sparse, static, w/ decay, etc. + +Strengths +- The model handles each feature according to its type and handles cell state and transitions appropriately. +- Extends earlier work to handle more feature types, like sparse features. + +Weaknesses +- Limited novelty. Models similar to various aspects of the proposed system have been presented in prior works. For example: TLSTM, which the authors use as a baseline. Although some components are novel, like the treatment of sparse features, contributions, in my opinion, are not sufficient to be accepted at ICLR. +- Presentation: Confusing and not enough information for reproducing results; multiple reviewers raised concerns about presentation of the feature types and experimental results. There were suggestions to improve, which the authors did consider during revision, but some concerns still remain. + +In the end, the reviewers agreed about the limited novelty of this work, given existing literature. The recommendation, therefore, is to reject the paper. +",ICLR2019,5: The area chair is absolutely certain +rkeS6nQglV,1544730000000.0,1545350000000.0,1,Byx93sC9tm,Byx93sC9tm,A well written paper with some interesting insights but lacking novelty,Reject,"The reviewers in general found the paper approachable, well written and clear. They noted that the empirical observation of mode collapse in active learning was an interesting insight. However, all the reviewers had concerns with novelty, particularly in light of Lakshminarayanan et al. who also train ensembles to get a measure of uncertainty. An interesting addition to the paper might be some theoretical insight about what the model corresponds to when one ensembles multiple models from MC Dropout. One reviewer noted that it's not clear that the ensemble is capturing the desired posterior. + +As a note, I don't believe there is agreement in the community that MC dropout is state-of-the-art in terms of capturing uncertainty for deep neural networks, as argued in the author response (and the abstract). To the contrary, I believe a variety of papers have improved over the results from that work (e.g. see experiments in Multiplicative Normalizing Flows from over a year ago).",ICLR2019,5: The area chair is absolutely certain +H1gatw6-xE,1544830000000.0,1545350000000.0,1,Hkemdj09YQ,Hkemdj09YQ,metareview,Reject,"The main goal of the submission is to figure out a way to produce less ""noisy"" saliency maps. The RectGrad method uses some thresholding during backprop, like Guided Backprop. The visuals of the proposed method are good, but the reviewers rightfully point out that evaluating whether the proposed method is any good is not obvious. The ROAR/KAR results are perhaps not telling the whole story (and the authors claim that RectGrad is not expected to get a high ROAR score, but I would like to see this developed more in a further version of this work). + +Generally, I feel like there was a healthy back and forth between authors and R3 on the main concerns of this work. I agree that the mathematical justification for RectGrad seems not fully developed. Given all of these concerns, at this point I cannot support acceptance of this work at ICLR.",ICLR2019,4: The area chair is confident but not absolutely certain +fmuqVMOM6,1576800000000.0,1576800000000.0,1,BkgStySKPB,BkgStySKPB,Paper Decision,Reject,"This paper proposes to use contrastive predictive coding for self-supervised learning. The proposed approach is shown empirically to be more effective than existing self-supervised learning algorithms. While the reviewers found the experimental results encouraging, there were some questions about the contribution as a whole, in particular the lack of theoretical justification.",ICLR2020, +AbMzpa7823,1576800000000.0,1576800000000.0,1,HkxZigSYwS,HkxZigSYwS,Paper Decision,Reject,"This paper gave a general L2O convergence theory called Learned Safeguarded KM (LSKM). The reviewers found flaws both in theory and in experiments. While all the reviewers have read the authors' rebuttal and gave detailed replies, they all agree to reject this paper. I agree also.",ICLR2020, +BkbJaM8Oe,1486400000000.0,1486400000000.0,1,rJY3vK9eg,rJY3vK9eg,ICLR committee final decision,Reject,"This was one of the more controversial submissions to this area, and there was extensive discussion over the merits and contributions of the work. The paper also benefitted from ICLRs open review system as additional researchers chimed in on the paper and the authors resubmitted a draft. The authors did a great job responding and updating the work and responding to criticisms. In the end though, even after these consideration, none of the reviewers strongly supported the work and all of them expressed some reservations. + + Pros: + - All agree that the work is extremely clear, going as far as saying the work is ""very well written"" and ""easy to understand"". + - Generally there was a predisposition to support the work for its originality particularly due to its ""methodological contributions"", and even going so far as a saying it would generally be a natural accept. + + Cons: + - There was a very uncommonly strong backlash to the claims made by the paper, particularly the first draft, but even upon revisions. One reviewer even saying this was an ""excellent example of hype-generation far before having state-of-the-art results"" and that it was ""doing a disservice to our community since it builds up an expectation that the field cannot live up to"" . This does not seem to be an isolated reviewer, but a general feeling across the reviews. Another faulting ""the toy-ness of the evaluation metric"" and the way the comparisons were carried out. + - A related concern was a feeling that the body of work in operations research was not fully taken account in this work, noting ""operations research literature is replete with a large number of benchmark problems that have become standard to compare solver quality"". The authors did fix some of these issues, but not to the point that any reviewer stood up for the work.",ICLR2017, +r1gCKEg-gN,1544780000000.0,1545350000000.0,1,HyxUIj09KX,HyxUIj09KX,Paper unreadable ,Reject,"The paper is extremely difficult to read, even given that both reviewers have very strong math / theoretical background. Although it may potentially include interesting ideas, nothing in the work could not be understood by the ICLR audience. +",ICLR2019,5: The area chair is absolutely certain +yjaXmBk1rN,1576800000000.0,1576800000000.0,1,S1lBTerYwH,S1lBTerYwH,Paper Decision,Reject,"This paper proposes a method to do zero-shot ICD coding, which involves assigning natural language labels (ICD codes) to input text. This is an important practical problem in healthcare, and it is not straightforward to solve, because many ICD codes have none or very few training examples due to the long distribution tail. The authors adapt a GAN-based technique previously used in vision to solve this problem. All of the reviewers agree that the paper is well written and well executed, and that the results are good. However, the reviewers have expressed concerns about the novelty of the GAN adaptation step, and left this paper very much borderline based on the scores it received. Due to the capacity restrictions I therefore have to recommend rejection, however I hope that the authors resubmit elsewhere. ",ICLR2020, +SyigaGIdg,1486400000000.0,1486400000000.0,1,ryjp1c9xg,ryjp1c9xg,ICLR committee final decision,Reject,"This paper is clearly written, and contains original observations on the properties of the neural GPU model. These observations are an important part of research, and sharing them (and code) will help the field move forward. However, these observations do not quite add up to a coherent story, nor is a particular set of hypotheses explored in depth. So the main problem with this paper is that it doesn't fit the 'one main idea' standard format of papers, making it hard to build on this work. + + The other big problem with this paper is the lack of comparison to similar architectures. There is lots of intuition given about why the NGPU should work better than other architectures in some situations, but not much empirical validation of this idea.",ICLR2017, +My1rB7MCdBQ,1610040000000.0,1610470000000.0,1,gwFTuzxJW0,gwFTuzxJW0,Final Decision,Accept (Poster),"Although the technical novelty is not very high, the finding that long-run Langevin dynamics with convergently learned model provides comparable defense performance to adversarial training will give some impact to the community. +",ICLR2021, +SJxpgPh7xE,1544960000000.0,1545350000000.0,1,r1E0OsA9tX,r1E0OsA9tX,Novelty is limited and related work is missing. ,Reject,"This paper proposes a method called approximate empirical Bayes to learn both the weights and hyperparameters. Reviewers have had a mixed feeling about this paper. Reviewers agree that the novelty of this paper is limited since AEB is already a well known method (in fact, iterative conditional modes is a well known algorithm). Unfortunately, the paper completely ignores the huge literature on this topic; the previous reference to use AEB is from McInerney (2017). + +Another issue is that the paper seems to be unaware of any issues that this type of approach might have. Here is a reference that discusses some problems with this type of approach: +""Deterministic Latent Variable Models and their Pitfalls"" +Max Welling∗ Chaitanya Chemudugunta, Nathan Sutter, 2008 + +The experiments presented in the paper are interesting, but then are not really doing a good job to assess why the method works well here even though in theory it should not be as good as the exact empirical Bayes method. + +This paper does not meet the bar for acceptance at ICLR and therefore I recommend a reject for this paper. +",ICLR2019,5: The area chair is absolutely certain +6_CumnT-kK,1642700000000.0,1642700000000.0,1,0EL4vLgYKRW,0EL4vLgYKRW,Paper Decision,Reject,"This paper makes a key observation that the gradient-based method gets more likely to suffer from poor local optima in multi-agent reinforcement learning (MARL) with more agents particularly in the offline setting. The paper proposes the use of zeroth order optimization method to avoid local optima. Specifically, it samples multiple actions and regularize the policy to get closer to the optimal action among those. The use of such zeroth order method to avoid poor local optima is not particularly new, although finding its effective in MARL and the empirical support are valuable. The main discussion point was the insufficiency of experimental support, and the additional experiments during the discussion have addressed the original concerns of the reviewers to some extent. Overall, given the limited novelty and inefficiency of support (either theoretical or empirical), the paper is slightly below the borderline.",ICLR2022, +O-ahCgHPq_e,1642700000000.0,1642700000000.0,1,f9AIc3mEprf,f9AIc3mEprf,Paper Decision,Reject,"This paper introduces an ImageNet-scale benchmark UIMNET for uncertainty estimation of deep image classifiers and evaluates prior works under the proposed benchmark. Two reviewers suggest reject, and one reviewer does acceptance. In the discussion period, the authors did not provide any response for many concerns of reviewers, e.g., weak baselines, weak novelty, and lack of justification for the current design. Hence, given the current status, AC recommends reject.",ICLR2022, +HTRUGSmwEiN,1642700000000.0,1642700000000.0,1,GsH-K1VIyy,GsH-K1VIyy,Paper Decision,Accept (Poster),"The authors give an effective framework PRIME to tackle the challenges of automating hardware design optimization. This problem is of importance to the community. Overall, the reviewers thought the paper gave a nice clean approach to the problem and that the community would be interested with these results.",ICLR2022, +SyeloZtfgE,1544880000000.0,1545350000000.0,1,Syx0Mh05YQ,Syx0Mh05YQ,meta-review,Accept (Poster),"The authors have presented a simple yet elegant model to learn grid-like responses to encode spatial position, relying only on relative Euclidean distances to train the model, and achieving a good path integration accuracy. The model is simpler than recent related work and uses a structure of 'disentangled blocks' to achieve multi-scale grids rather than requiring dropout or injected noise. The paper is clearly written and it is intriguing to get down to the fundamentals of the grid code. On the negative side, the section on planning does not hold up as well and makes unverifiable claims, and one reviewer suggests that this section be replaced altogether by additional analysis of the grid model. Another reviewer points out that the authors have missed an opportunity to give a theoretical perspective on their model. Although there are aspects of the work which could be improved, the AC and all reviewers are in favor of acceptance of this paper.",ICLR2019,5: The area chair is absolutely certain +z956HDKXUf2,1610040000000.0,1610470000000.0,1,_i3ASPp12WS,_i3ASPp12WS,Final Decision,Accept (Poster),"This paper presents a defense scheme for adversarial attacks, called self-supervised online adversarial purification (SOAP), by purifying the adversarial examples at test time. The novelty of this work is in its incorporation of self-supervised representation learning into adversarial defense through purification via optimizing an auxiliary self-supervised loss. This is done by jointly training the model on a self-supervised task while it is learning to perform the target classification task in a multi-task learning setting. Compared with existing adversarial defense schemes such as adversarial training and purification techniques, SOAP has a lower computation overhead during the training stage. + +**Strengths:** + * It is novel to incorporate self-supervised learning for adversarial purification at test time. + * SOAP’s training stage based on multi-task learning incurs low computation overhead compared with the original classification task. + +**Weaknesses:** + * Although the proposed adversarial defense scheme is computationally cheaper than the other existing methods during the training stage, it does incur some overhead during test time. This may be undesirable for some applications in which efficiency during test time is an important factor to consider. + * The choice of a suitable self-supervised auxiliary task is somewhat ad hoc. The performance varies a lot for different auxiliary tasks. + * The experimental evaluation is only based on relatively small and unrealistic datasets even after new experiments on CIFAR-100 have been added by the authors. + +It is said in the paper that SOAP can exploit a wider range of self-supervised signals for purification and hence conceptually can be applied to any format of data and not just images, given an appropriate self-supervised task. However, this claim has not been substantiated in the paper using non-image data. + +Despite some limitations and that some claims still need to be better substantiated, the paper presents some novel ideas which are expected to arouse interest for follow-up work in the adversarial attack and defense research community. +",ICLR2021, +CyBEqwmqV,1576800000000.0,1576800000000.0,1,SJeQEp4YDH,SJeQEp4YDH,Paper Decision,Accept (Poster),"This work addresses the problem of detecting an adversarial attack. This is a challenging problem as the detection mechanism itself is also vulnerable to attack. The paper proposes asymmetrical adversarial training as a robust solution. This approach partitions the feature space according to the output of the robust classifier and trains an adversarial example detector per partition. The paper demonstrates improvements over state-of-the-art detection techniques. + +All three reviewers recommend acceptance of this work. Some positive points include the paper being well-written with strong experimental evidence. One potential difficulty with the proposed approach is the additional computational cost associated with a per class adversarial attack detector. The authors have responded to this concern by claiming that the straightforward version of their approach is K times slower (10 in the case of 10 classes), but their integrated version is 2x slower as they only run the detector associated with the example-specific class prediction. We encourage the authors to include a discussion on computational cost in the final version. In addition, there was a community comment about black-box testing which will be of relevance to many in the community. The authors have already provided additional experiments to address this question as well as code to reproduce the new experiment. + +Overall, the paper addresses an important problem with a two-step solution of training a robust model and detecting potentially perturbed samples per class. This is a novel solution with comprehensive experiments and therefore recommend acceptance. +",ICLR2020, +D6nAvVPC_jb,1642700000000.0,1642700000000.0,1,YLglAn-USkf,YLglAn-USkf,Paper Decision,Reject,"The paper focuses on zero-shot capability of BERT-like models. The key contribution boils down to a novel prompting techniques that effectively ensembles predictions made for [MASK] tokens inserted at different places. + +Reviewers B5Rv and 9k3X voted for rejecting mostly on the grounds that the contribution is not significant enough for ICLR. In particular, there are already existing works show that null prompting works, and other works that suggest that using multiple prompts works. While these insights have not been combined before, it is to some extend incrementally. + +On the positive side, the multi-null prompting strategy is a genuinely useful tool. I think it is likely to find applications in different NLP applications as an effective way to generate ensemble predictions. The paper has also many carefully carried out experiments that will likely help guide future efforts in designing effective prompting strategies. + +On the whole, I am recommending rejection. I know this is a disappointing result. Thank you for your submission, and I hope the reviews will help improve your paper.",ICLR2022, +-8Bm2Q6j6mm,1610040000000.0,1610470000000.0,1,OtAnbr1OQAW,OtAnbr1OQAW,Final Decision,Reject,"The paper introduces a variant to the option-critic framework that encourages options to display a certain level of ""diversity"" and this is induced by introducing a mutual-information objective between the options and their transitions. The authors conjecture that such criterion makes options more suitable for exploration. + +Overall, reviewers agree that the idea behind the proposed method and the general approach is sound and interesting. Nonetheless, there is general consensus that the current submission suffers from a number of shortcomings that make it unsuitable for acceptance. + +Following the detailed comments provided by the reviewers, I strongly encourage the authors to focus on the following dimensions to improve the paper: +1- The current experiments indeed provide a first illustration of how the proposed algorithm works, but they need significant improvement in variety and scope: As pointed out by the reviewers, the current experiments do not cover single-reward challenging exploration benchmarks (such as Montezuma). I agree with the authors that the inductive bias implemented in their algorithm is designed with diversity of goals in mind, but if the main point is to improve exploration, it is natural to expect results in that respect. Alternatively, the authors should state more explicitly the type of problems their method is intended to solve from the very beginning of the paper and design experiments accordingly. +2- The initial mutual information objective is simplified across multiple steps and it is unclear how much the approximations impact the original ""semantic"" of the objective. +3- A more thorough comparison with mutual-information-based methods such as DIAYN or VIC is needed. Also, I wonder what is the connection with more goal-based exploration approaches such as GoExplore or SkewFit.",ICLR2021, +WtPwjWZsBJX,1642700000000.0,1642700000000.0,1,O5Wr-xX0U2y,O5Wr-xX0U2y,Paper Decision,Reject,"This paper proposes a risk-sensitive actor critic reinforcement learning (RL) method that optimizes the policy with respect to a dynamic (iterated) expectile risk measure. The expectile risk measure has the elicitability property and can be expressed as the minimizer of an expected scoring function, which is exploited in critic update. The proposed approach is applied to option pricing and hedging. + +A main point of discussion was the applicability and effectiveness of the proposed method beyond particular financial problems. The original submission was indeed specialized to particular financial tasks. The authors have rewritten the paper in a way that it claims to propose a risk-sensitive RL method for the general MDP with finite horizon. This however leaves the question regarding the advantages of the proposed approach over existing methods of risk-sensitive RL, including those that work with non-coherent (dynamic) risk measures (since coherence is often not needed in domains outside finance).",ICLR2022, +Bke8K55QeE,1544950000000.0,1545350000000.0,1,rylIy3R9K7,rylIy3R9K7,"A primal-dual algorithm for linear discriminator WGANs with first order convergence, as a special non-convex optimization problem.",Reject,"The paper studies the convergence of a primal-dual algorithm on a special min-max problem in WGAN where the maximization is with respect to linear variables (linear discriminator) and minimization is over non-convex generators. Experiments with both simulated and real world data are conducted to show that the algorithm works for WGANs and multi-task learning. + +The major concern of reviewers lies in that the linear discriminator assumption in WGAN is too restrictive to general non-convex mini-max saddle point problem in GANs. Linear discriminator implies that the maximization part in min-max problem is concave, and it is thus not surprise that under this assumption the paper converts the original problem to a non-convex optimization instance and proves its first order convergence with descent lemma. This technique however can’t be applied to general non-convex saddle point problem in GANs. Also the experimental studies are also not strong enough. Therefore, current version of the paper is proposed as borderline lean reject. +",ICLR2019,4: The area chair is confident but not absolutely certain +bkRilP-ExqB,1642700000000.0,1642700000000.0,1,adjl32ogfqD,adjl32ogfqD,Paper Decision,Reject,"This paper studies the stochastic shortest path (SSP) problem with a linear approximation to the transition model. The authors propose a doubling algorithm for regret minimization in this setting and bound its regret. This is a theory paper with no experiments. + +This paper received three borderline reviews. All reviewers agreed on its strengths and weaknesses during the discussion. The strengths are that the paper is well written and that the results are novel. The weaknesses are that the proposed solution is standard and analyzed using standard tools. The reviewers noted departures from the standard analyses but these seem to be minor technical issues. Therefore, although well executed, this paper lacks novelty. No reviewer argued for the acceptance of this paper and therefore it is rejected.",ICLR2022, +o3iKFU8Xu,1576800000000.0,1576800000000.0,1,BJeuKnEtDH,BJeuKnEtDH,Paper Decision,Reject,"This work combines style transfer approaches either in a serial or parallel fashion, and shows that the combination of methods is more powerful than isolated methods. +The novelty in this work is extremely limited and not offset by insightful analysis or very thorough experiments, given that most results are qualitative. Authors have not provided a public response. +Therefore, we recommend rejection.",ICLR2020, +KGr_QNCSSd,1642700000000.0,1642700000000.0,1,n6Bc3YElODq,n6Bc3YElODq,Paper Decision,Reject,"This paper tackles the challenging problem of learning against an opponent that may or may not be simultaneously learning as well. The key contribution of this paper is a learning algorithm that accounts for how the opponents may update their policies from past interactions. The proposed algorithm, MBOM, relies on the environment model to model a hierarchy of opponents using different depths of recursive reasoning (from non-learning agents to deep recursive agents). It is agreed that this papers studies an important problem and shows promise. However, the current results aren't convincing enough. In particular, since there is no theoretical analysis, more empirical validation of the method is expected. The current experiments only considers a single opponent, and it is unclear how well the method works given accumulated errors through the recursion. Future submissions would benefit from additional empirical analysis (e.g., ablations) to help understand when and why MBOM works.",ICLR2022, +vAFRmcIOJI,1576800000000.0,1576800000000.0,1,SJgaRA4FPH,SJgaRA4FPH,Paper Decision,Accept (Poster),"The paper provides methods for training generative models by combining federated learning techniques with differentiable privacy. The paper also provides two concrete applications for the problem of debugging models. Even though the method in the paper seems to be a standard combination of DP deep learning and federated learning, the paper is well-written and presents interesting use cases.",ICLR2020, +Skx_zGfBlE,1545050000000.0,1545350000000.0,1,H1fevoAcKX,H1fevoAcKX,not convincing,Reject,"This paper proposes new heuristics to prune and compress neural networks. The paper is well organized. However, reviewers are concerned that the novelty is relatively limited. The advantage of the proposed method is marginal on ImageNet. What is effective is not very clear. Therefore, recommend for rejection. ",ICLR2019,4: The area chair is confident but not absolutely certain +lhcB_89a2l,1576800000000.0,1576800000000.0,1,ryxn8RNtvr,ryxn8RNtvr,Paper Decision,Reject,"The paper aims to extract the set of features explaining a class, from a trained DNN classifier. + +The proposed approach relies on LIME (Ribeiro et al. 2016), modified as follows: i) around a point x, a linearized sparse approximation of the classifier is found (as in LIME); ii) for a given class, the importance of a feature aggregates the relative absolute weight of this feature in the linearized sparse approximations above; iii) the explanation is made of the top features in terms of importance. + +This simple modification yields visual explanations that significantly better match the human perception than the SOTA competitors. + +The experimental setting based on the human evaluation via a Mechanical Turk setting is the second contribution of the approach. The feature importance measure is also assessed along a Keep and Retrain mechanism, showing that the approach selects actually relevant features in terms of prediction. +Incidentally, it would be good to see the sensitivity of the method to parameter $k$ (in Eq. 1). + +As noted by Rev#1, NormLIME is simple (and simplicity is a strength) and it demonstrates its effectiveness on the MNIST data. However, as noted by Rev#4, it is hard to assess the significance of the approach from this only dataset. + +It is understood that the Mechanical Turk-based assessment can only be used with a sufficiently simple problem. However, complementary experiments on ImageNet for instance, e.g., showing which pixels are retained to classify an image as a husky dog, would be much appreciated to confirm the merits and investigate the limitations of the approach. +",ICLR2020, +xsyY8RPBSzm,1610040000000.0,1610470000000.0,1,5rc0K0ezhqI,5rc0K0ezhqI,Final Decision,Reject,We have a very well informed reviewer who strongly feels that this paper is insufficiently novel and significant further discussion on how the paper might be raised to a publishable level with more empirical results. I will have to side with the more engaged reviewers who feel that the paper should be rejected.,ICLR2021, +KqnNBBz_m6,1576800000000.0,1576800000000.0,1,SJgVU0EKwS,SJgVU0EKwS,Paper Decision,Accept (Poster),"The submission proposes an approach to accelerate network training by modifying the precision of individual weights, allowing a substantial speed up without a decrease in model accuracy. The magnitude of the activations determines whether it will be computed at a high or low bitwidth. + +The reviewers agreed that the paper should be published given the strong results, though there were some salient concerns which the authors should address in their final revision, such as how the method could be implemented on GPU and what savings could be achieved. + +Recommendation is to accept.",ICLR2020, +QFw9mIxNwM2,1642700000000.0,1642700000000.0,1,QCeFEThVn3,QCeFEThVn3,Paper Decision,Reject,"This paper proposes to use an energy-based model for a multi-objective molecular generation. The energy function is parameterized by relational graph convolutional network (R-GCN) so that it has a permutation invariance property. The model is trained by contrastive divergence and the generation is performed by Langevin dynamics. Experiments on single and multi-objective molecule generation are conducted to verify the effectiveness of the proposed framework. The paper is well-written, and the experiments are comprehensive. The major shortcoming of the paper is its limited novelty, since using EBM for graph generation is a straightforward application of the existing deep EBM framework. The contribution is marginal. + +During the discussion, two of the reviewers pointed out that the contribution is limited and marginal. Two reviewers pointed out that the performance gain obtained by the proposed model is marginal and not significant. One reviewer has a concern about the computational cost of MCMC. However, the authors didn’t provide a rebuttal to address the concerns raised by the reviewers. Given the fact that all the concerns from the reviewers remain, and the contribution and performance gain of the work are marginal, the AC recommends rejecting the paper.",ICLR2022, +H_El7jh7d2,1610040000000.0,1610470000000.0,1,_MxHo0GHsH6,_MxHo0GHsH6,Final Decision,Reject,"This paper proposed a method to train quantized supernets which can be directly deployed without retraining. A main concern is that there is limited novelty. The proposed method looks like a combination of well-known techniques. Experimental results are promising. However, it is not clear if the comparisons are fair and if all the methods are using the same setup. It is desirable to have additional analysis and ablation studies. The writing can also be improved.",ICLR2021, +vCvdqC0Uq3,1576800000000.0,1576800000000.0,1,rJxRmlStDB,rJxRmlStDB,Paper Decision,Reject,"This paper presents a method for curriculum learning based on extracting parallel sentences from comparable corpora (wikipedia), and continuously retraining the model based on these examples. Two reviewers pointed out that the initial version of the paper lacked references and baselines from methods of mining parallel sentences from comparable corpora such as Wikipedia. The authors have responded at length and included some of the requested baseline results. This changed one reviewer's score but has not tipped the balance strongly enough for considering this for publication. ",ICLR2020, +26QlE5mo5ZD,1610040000000.0,1610470000000.0,1,uCQfPZwRaUu,uCQfPZwRaUu,Final Decision,Accept (Spotlight),"The authors propose self predictive representations (predicting the agents own future latents of a forward model with data augmentation) as a means of improving the data efficiency of agents. The reviewers found the paper to be compelling (after the authors made adjustments) and pointed out that the method is likely generic and might be widely applicable. Experimental improvements in the work are significant, and the method is well explored.",ICLR2021, +GC8U9XVJdBZ,1642700000000.0,1642700000000.0,1,uB12zutkXJR,uB12zutkXJR,Paper Decision,Reject,"This paper presents an approach for machine learning to fix programming errors via edits to abstract syntax trees. The main contributions are a pretraining scheme based on masking out subtrees and some minor architectural modifications compared to previous work. Reviewers found the paper to contain a significant amount of work, but there are some questions about significance relative to previous work that framed the problem similarly, and about experimental methodology. Authors did a great deal of work in the rebuttal to address many of the experimental methodology questions, but this also introduced substantial unreviewed changes to the model, the pretraining approach, and the experiments. In total, the remaining concerns about significance and the substantial changes lead us to recommend that this paper be revised and resubmitted to the next conference.",ICLR2022, +EqS0nMVew5Q,1642700000000.0,1642700000000.0,1,bmGLlsX_iJl,bmGLlsX_iJl,Paper Decision,Reject,"This paper proposes a data imputation method for MCAR and MAR data by combining EM and normalizing flows. The paper is clearly written. The idea is interesting and they show better performance compared to MCFlow and competing methods on ten multivariate UCI data, MNIST and CFAR10 image data. + +Issues regarding limited novelty compared to MCFlow was raised. +Issues regarding the validity of Assumption 2 on the dependencies in the latent space and observation space was also raised.",ICLR2022, +H1gStaMeyE,1543680000000.0,1545350000000.0,1,SkNSOjR9Y7,SkNSOjR9Y7,Missing important references and the proposed algorithm is essentially the well-known REINFORCE estimator,Reject,"The paper is addressing an important problem, but misses many related references (see Reviewer 2's comments for a long list of highly relevant papers). + +More importantly, as Reviewer 3 pointed out (which the AC fully agrees): + +""The gradient estimator the paper proposes is the REINFORCE estimator [Williams, ML 1992] re-derived through importance sampling."" + +""The equivalence would not be exact if the authors chose the importance distribution to be different than the variational approximation q(z|x), so there still may be room for novelty in their proposal, but in the current draft only q(z|x) is considered."" + +",ICLR2019,5: The area chair is absolutely certain +HJDUNJaBf,1517250000000.0,1517260000000.0,357,HJYQLb-RW,HJYQLb-RW,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,All the reviewers agree that the paper is studying an important problem and makes a good first step towards understanding learning in GANs. But the reviewers are concerned that the setup is too simplistic and not relevant in practical settings. I recommend the authors to carefully go through reviews and to present it at the workshop track. This will hopefully foster further discussions and lead to results in more practically relevant settings.,ICLR2018, +_Tg7uKNJ5L6,1610040000000.0,1610470000000.0,1,2KSsaPGemn2,2KSsaPGemn2,Final Decision,Reject,"This paper extends the idea of successor representations. Typically the reward is compute linearly on top of states in this setting but the authors relax it to have a quadratic form. + +${\bf Pros}$: +1. A novel formulation of the successor representation where the reward does not follow the linearity assumption +2. The idea of using a second order term for the reward branch is interesting and could have meaningful implications for learning and exploration. + +${\bf Cons}$: +1. All authors agree that the experimental results do not clearly validate the advantage of this method. More work is needed to establish the effects of using this particular reward structure on a wide variety of tasks + +2. Both R2 and R4 were unconvinced of the limitations of the linearity assumptions in the original successor representation formulation -- especially in the case when the state is represented by a non-linear function approximator. + +The ideas presented in this paper are quite interesting and promising. But more experimental work is needed to show the benefits of this approach. ",ICLR2021, +fAMy9BrMX,1576800000000.0,1576800000000.0,1,H1lTUCVYvH,H1lTUCVYvH,Paper Decision,Reject,"While the reviewers appreciated the ideas presented in the paper and their novelty, there were major concerns raised about the experimental evaluation. Due to the serious doubts that the reviewers raised about the effectiveness of the proposed approach, I do not think that the paper is quite ready for publication at this time, though I would encourage the authors to revise and resubmit the work at the next opportunity.",ICLR2020, +Up_neGBg1Y,1576800000000.0,1576800000000.0,1,HJl8_eHYvS,HJl8_eHYvS,Paper Decision,Accept (Poster),"The authors introduce an RL algorithm / architecture for partially observable +environments. +At the heart of it is a filtering algorithm based on a differentiable version of +sequential Monte Carlo inference. +The inferred particles are fed into a policy head and the whole architecture is +trained by RL. +The proposed methods was evaluated on multiple environments and ablations +establish that all moving parts are necessary for the observed performance. + +All reviewers agree that this is an interesting contribution for addressing the +important problem of acting in POMDPs. + +I think this paper is well above acceptance threshold. However, I have a few points that I +would quibble with: +1) I don't see how the proposed trampling is fully differentiable; as far as I +understand it, no credit is assigned to the discrete decision which particle to +reuse. Adding a uniform component to the resampling distribution does not +make it fully differentiable, see eg [Filtering Variational Objectives. Maddison +et al]. I think the authors might use a form of straight-through gradient approximation. +2) Just stating that unsupervised losses might incentivise the filter to learn +the wrong things, and just going back to plain RL loss is not in itself a novel +contribution; in extremely sparse reward settings, this will not be +satisfactory. ",ICLR2020, +0C1SjzhdqrF,1610280000000.0,1610470000000.0,1,EsA9Nr9JHvy,EsA9Nr9JHvy,Final Decision,Reject,"This work seeks to describe the heavy-tail phenomenon observed for deep networks learned with SGD. The work presents proof of a relationship between curvature, step size, batch size, and a heavy-tail weight distribution. The proofs assume a quadratic optimization problem and the authors speculate that the results may also be relevant for non-convex deep learning settings. On the positive side the reviewers agreed that this work is one of the first, if not the first, to try to theoretically describe a poorly understood phenomenon in deep learning. On the less positive side, the reviewers believe that the proofs developed in this paper are for an idealized setting that is too different from the settings under which deep models are trained. As such, even though the authors provide some (somewhat mixed) experimental results to support the claim of relevance to deep learning, the reviewers were not convinced. Given that the stated goal of the work is to attempt to explain this phenomenon in deep models, the majority view is that this work, while promising, needs further development to convincingly claim some relevance to the original phenomenon being studied.",ICLR2021, +SlKuUi-3Q,1576800000000.0,1576800000000.0,1,S1evHerYPr,S1evHerYPr,Paper Decision,Accept (Spotlight),"This paper proposes a meta-RL algorithm that learns an objective function whose gradients can be used to efficiently train a learner on entirely new tasks from those seen during meta-training. Building off-policy gradient-based meta-RL methods is challenging, and had not been previously demonstrated. Further, the demonstrated generalization capabilities are a substantial improvement in capabilities over prior meta-learning methods. There are a couple related works that are quite relevant (and somewhat similar in methodology) and overlooked -- see [1,2]. Further, we strongly encourage the authors to run the method on multiple meta-training environments and to report results with more seeds, as promised. The contributions are significant and should be seen by the ICLR community. Hence, I recommend an oral presentation. + +[1] Yu et al. One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning +[2] Sung et al. Meta-critic networks",ICLR2020, +giVCZZrPZpt,1642700000000.0,1642700000000.0,1,T8wHz4rnuGL,T8wHz4rnuGL,Paper Decision,Accept (Spotlight),"The paper addresses the problem of inconsistent gradients in multi-task learning, proposing ways to handle both their magnitude nd direction. Gradient directions are aligned by introducing a rotation layer between the shared backbone and task-specific branches. +Reviewers appreciated the technical approach, higlighting the novelty of the rotation layers in this context. The empirical evaluations are systematic fair and insightful, and the presentation is polished. Reviewers unanimously supported accepting the paper.",ICLR2022, +r1lYxf1bl4,1544770000000.0,1545350000000.0,1,BJemQ209FQ,BJemQ209FQ," Important topic, solid contribution",Accept (Poster),"All reviewers (including those with substantial expertise in RL) were solid in their praise for this paper that is also tackling an interesting application that is much less well studied but deserves attention. + +",ICLR2019,3: The area chair is somewhat confident +o3xNC7m7aK,1576800000000.0,1576800000000.0,1,H1epaJSYDS,H1epaJSYDS,Paper Decision,Reject,"The paper proposes a method to produce embeddings of discrete objects, jointly learning a small set of anchor embeddings and a sparse transformation from anchor objects to all the others. While the paper is well written, and proposes an interesting solution, the contribution seems rather incremental (as noted by several reviewers), considering the existing literature in the area. Also, after discussions the usefulness of the method remains a bit unclear - it seems some engineering (related to sparse operations) is still required to validate the viability of the approach. +",ICLR2020, +5vwcqiJdi31,1642700000000.0,1642700000000.0,1,aisKPsMM3fg,aisKPsMM3fg,Paper Decision,Accept (Poster),"This paper applies deep learning to a problem from OR, namely multistage stochastic optimization (MSSO). The main contribution is a method for learning a neural mapping from MSSO problem instances to value functions, which can be used to warm-start the SDDP solver, a state-of-the-art method for solving MSSO. The method is tested on two typical OR problems, inventory control and portfolio management. The reviewers think that the idea is interesting, the empirical results are impressive, and the paper is well-written. However, there are reservations on its relevance to the ICLR community.",ICLR2022, +bmfxDJGg0G,1642700000000.0,1642700000000.0,1,1L0C5ROtFp,1L0C5ROtFp,Paper Decision,Accept (Oral),"This paper introduces the Filtered-CoPhy method, an approach for learning counterfactual reasoning of physical processes in pixel space. The approach enables forecasting raw videos over long horizons, without requiring strong supervision, e.g. object positions or scene properties. + +The paper initially received one strong accept, one weak accept, and one weak reject recommendations. The main reviewers' concerns relate to clarifications and consolidations in experiments, including stronger baselines, experiments on real data, or more diversity on the datasets. The rebuttal did a good job in answering reviewers' concerns, especially by providing new experimental results and analysis. Eventually, all reviewers recommended a clear acceptance after authors' feedback. + +The AC's own readings confirmed the reviewers' recommendations. The proposed approach is a meaningful extension of CoPhy for the unsupervised prediction at the pixel level. The proposed approach is solid, clearly described, and overcomes important limitations of previous methods. The dataset is also an important outcome for the community. Causality and counterfactual reasoning are of primary importance for the design of effective and explainable AI prediction models: this paper brings therefore an important contribution to the ICLR community.",ICLR2022, +B01tR0Mwk,1576800000000.0,1576800000000.0,1,HJewiCVFPB,HJewiCVFPB,Paper Decision,Reject,"This paper presents a method for improving optimization in multi-task learning settings by minimizing the interference of gradients belonging to different tasks. + +While the idea is simple and well-motivated, the reviewers felt that the problem is still not studied adequately. The proofs are useful, but there is still a gap when it comes to practicality. + +The rebuttal clarified some of the concerns, but still there is a feeling that (a) the main assumptions for the method need to be demonstrated in a more convincing way, e.g. by boosting the experiments as suggested with other MTL methods (b) by placing the paper better in the current literature and minimizing the gap between proofs/underlying assumptions and practical usefulness. +",ICLR2020, +SklrfakkxE,1544650000000.0,1545350000000.0,1,rkgsvoA9K7,rkgsvoA9K7,Dirichlet Variational Autoencoder,Reject,"This paper applies Dirichlet distribution to the latent variables of a VAE in order to address the component collapsing issues for categorical probabilities. The method is clearly presented, and extensive experiments are carried out to prove the advantage against VAEs with other prior distributions. + +The main concern of the paper is the limited novelty. The main methodology contribution of this paper is to combine the decomposition a Dirichlet distribution as Gamma distributions, and approximating Gamma component with inverse Gamma CDF, but both components are common practices. + +R3 also points out that the paper is distracted by two different messages the authors try to convey. The presentation and experiments are not designed to provide a cohesive message. The concern is not solved in the authors' feedback. + +Based on the current reviews, this paper does not meet the standard for ICLR publication. Despite the limited novelty in the proposed model, if the paper could be revised to show that a simple modification is good for solve one problem with general applications, it would make a good publication in a future venue.",ICLR2019,4: The area chair is confident but not absolutely certain +O3KLuXQCdG,1642700000000.0,1642700000000.0,1,9Cwxjd6nRh,9Cwxjd6nRh,Paper Decision,Reject,"This paper proposes a method for visualizing representations of neural networks trained with self-supervised learning with conditional denoising diffusion probabilistic models. By generating multiple images conditioned on a representation, one can identify what aspects the representation is and is not sensitive to. The proposed method allows for high fidelity generated images that can be used to compare different self-supervised methods and layers. + +Reviewers agreed that the paper proposed reasonable methodology, targeted an interesting problem of understanding what is learned by self-supervised methods, and presented interesting qualitative evaluations. However, there remained concerns on the novelty of results in comparison to other methods for probing representations (e.g. classification based), subjectiveness of interpretation of the qualitative results, and limited quantifications of the intuition gained from the visualizations. While the authors have argued that the point of the paper is to showcase the merits of qualitative visual analysis method, reviewers found that the presented results were insufficient to demonstrate the value of the proposed approach. A number of ideas were discussed with reviewers on how to highlight the value of visualization which could strengthen the paper in the future. Given the lack of novelty on the conditional generation side, and limited insight gained from the qualitative results, I cannot recommend this paper for acceptance in its current form.",ICLR2022, +sF4YxvyekWf,1642700000000.0,1642700000000.0,1,4QUoBU27oXN,4QUoBU27oXN,Paper Decision,Reject,"This paper tackles the challenge of continual learning. It approaches the problem by combining a Gaussian Mixture Model (GMM) to model concepts in a latent space and and a decoder system to generate new data points for pseudo-rehearsal and maintenance of previous information. When new concepts arrive, the GMM can be updated with rehearsal serving to prevent forgetting. The authors show competitive results on incremental learning of MNIST and FMNIST. + +The scores were mostly below threshold, with one above threshold (5,3,5,6). The reviewers generally agreed the approach was interesting and they appreciated the theoretical treatments. However, there were a number of concerns, the central ones being the lack of clarity and the lack of convincing empirical demonstrations of scalability. The authors attempted to address the concerns, but they were not able to show good performance on larger datasets. The suggested this was due to the complexity of the encoding model, but they were unable to demonstrate this concretely. The reviewers' scores did not change, though, and the consensus was that this paper was not quite ready for publication. Given these considerations, and an average final score of 4.75, a decision of reject was reached.",ICLR2022, +BJPMnGLOx,1486400000000.0,1486400000000.0,1,BJ9fZNqle,BJ9fZNqle,ICLR committee final decision,Reject,"This paper explores a variational autoencoder variant. + + ICLR gives authors some respect that other conferences don't. It is flexible about the length of the paper, and allows revisions to be submitted. The understanding should be that authors should in turn treat reviewers with respect. The paper should still be finished. Reviewers can't be expected to read a churn of large revisions. The final paper should be roughly the right length, unless with very good reason. + + This paper was clearly not finished, and now is too long, with issues remaining. I hope that it will be submitted again, but not until it is actually ready.",ICLR2017, +BEnEpHId9V1,1610040000000.0,1610470000000.0,1,zgGmAx9ZcY,zgGmAx9ZcY,Final Decision,Reject,"This paper investigates an improvement to the direct feedback alignment (DFA) algorithm where the ""backward weights"" are learned instead of being fixed random matrices. The proposed approach essentially applies the technique of DFA to Kolen-Pollack learning. While reviewers found the paper reasonably clear and thought the experiments were acceptable, there were significant concerns about the novelty of the approach and the fact that the proposed approach was a straightforward combination of existing ideas. Further, the paper could have done a better job situating (and applying) the proposed method to DFA variants that have been proposed since the original DFA paper came out.",ICLR2021, +ByYoBJTrM,1517250000000.0,1517260000000.0,644,S1fHmlbCW,S1fHmlbCW,ICLR 2018 Conference Acceptance Decision,Reject,"The scores were not favorable: 5,5,2. R2 felt the motivation of the paper was inadequate. R3 raised numerous technical points, some of which were addressed in the rebuttal, but not all. R3 continues to have issue with some of the results. The AC agrees with R3's concerns and feels that the paper cannot be accepted in its current form. ",ICLR2018, +HJgUsOYklN,1544690000000.0,1545350000000.0,1,HygBZnRctX,HygBZnRctX,meta review,Accept (Oral),"This paper proposes an approach for learning to transfer knowledge across multiple tasks. It develops a principled approach for an important problem in meta-learning (short horizon bias). Nearly all of the reviewer's concerns were addressed throughout the discussion phase. The main weakness is that the experimental settings are somewhat non-standard (i.e. the Omniglot protocol in the paper is not at all standard). I would encourage the authors to mention the discrepancies from more standard protocols in the paper, to inform the reader. The results are strong nonetheless, evaluating in settings where typical meta-learning algorithms would struggle. The reviewers and I all agree that the paper should be accepted, and I think it should be considered for an oral presentation.",ICLR2019,4: The area chair is confident but not absolutely certain +K_5-6pVInf5,1610040000000.0,1610470000000.0,1,RayUtcIlGz,RayUtcIlGz,Final Decision,Reject,"This paper discusses a method to update/optimise invertible matrices via low-rank updates. The key property of the proposed method is that it keeps track of the matrix inverse and its determinant through the optimisation (with updates that are much cheaper to compute than a direct inversion/determinant computation). + +While the method of performing low-rank updates for invertible matrices itself has already been extensively studied in the literature as pointed out by reviews, this work focuses (after extensive revision) on the properties of this update method. + Since the updates may leave the manifold of invertible matrices, a numerical stabilisation step was introduce whereby updates that produce ill-conditioned matrices are rejected during optimisation. + +Rank-one updates allow for fast update of matrix inverse and determinants. So this is particularly interesting when applied to normalising flows, as it allows for cheaper computation of the log-det-Jacobian terms. + +The novelty of this approach is rather limited (as also pointed out by R2). The experiments and, in particular, the application to normalising flows are interesting, well-executed. It is not clear if there are advantages of the method in other domains where log-det-Jacobians are not necessary relative to existing literature. +",ICLR2021, +5xt7bqjeqCW,1642700000000.0,1642700000000.0,1,IbyMcLKUCqT,IbyMcLKUCqT,Paper Decision,Reject,"This paper shows how constraining the representation to be invariant to augmentation shrinks the hypothesis space to improve generalization more than just introducing additional samples through augmentation. I agree with the reviewers that this is a novel, intuitive, and interesting finding. However, there were many technical and clarity issues with the original submission. These were partially addressed by the authors in the rebuttal. The reviewers appreciated the authors' efforts and commitment in the rebuttal, but my conclusion from our discussion that this paper requires another round of revisions. I hope the authors would follow the reviewer's comments, improve the paper, and re-submit.",ICLR2022, +WsbmlEoft-Q,1610040000000.0,1610470000000.0,1,PQ2Cel-1rJh,PQ2Cel-1rJh,Final Decision,Reject,"This paper got mixed reviews. One for acceptance, one for reject and two borderline. After the rebuttal, AR2 raises the review to borderline. AR1 gives the highest recommendation but did not provide detailed supporting evidence. Other reviews provide comment on the strength and also share the concerns. Most of the concerns concentrate on the motivation (whether the proposed method is violating the objective of knowledge distillation) and the brought additional computation overhead. Also the scope of this paper was defined wider than the actual one. The authors only did experiments for BERT but did not consider and compare with existing KD method. Overall, AC read the paper and also has the similar concerns, the novelty is limited. the brought increase in inference time is violating the KD objective and the scope of this paper was not defined clearly. The authors should improve the submission in these aspects. At its current status, AC does not recommend acceptance. ",ICLR2021, +6MS2k57_Y3,1576800000000.0,1576800000000.0,1,BJlkgaNKvr,BJlkgaNKvr,Paper Decision,Reject,"The paper investigates why adversarial training can sometimes degrade model performance on clean input examples. + +The reviewers agreed that the paper provides valuable insights into how adversarial training affects the distribution of activations. On the other hand, the reviewers raised concerns about the experimental setup as well as the clarity of the writing and felt that the presentation could be improved. + +Overall, I think this paper explores a very interesting direction and such papers are valuable to the community. It's a borderline paper currently but I think it could turn into a great paper with another round of revision. I encourage the authors to revise the draft and resubmit to a different venue. + + ",ICLR2020, +HyxKHNk8xN,1545100000000.0,1545350000000.0,1,SkghN205KQ,SkghN205KQ,Incremental improvement over rank-based training of SPENs ,Reject,"This paper proposes search-guided training for structured prediction energy networks (SPENs). + +The reviewers found some interest in this approach, though were somewhat underwhelmed by the experimental comparison and the details provided about the method. + +R1 was positive and recommends acceptance; R2 and R3 thought the paper was on the incremental side and recommend rejection. Given the space restriction to this year's conference, we have to reject some borderline papers. The AC thus recommends the authors to take the reviewers comments in consideration for a ""revise and resubmit"".",ICLR2019,3: The area chair is somewhat confident +ydF7lzeeCk,1576800000000.0,1576800000000.0,1,HkxwmRVtwH,HkxwmRVtwH,Paper Decision,Reject,"The authors propose an approach to Bayesian deep learning, by representing neural network weights as latent variables mapped through a Kronecker factored Gaussian process. The ideas have merit and are well-motivated. Reviewers were primarily concerned by the experimental validation, and lack of discussion and comparisons with related work. After the rebuttal, reviewers still expressed concern regarding both points, with no reviewer championing the work. + +One reviewer writes: ""I have read the authors' rebuttal. I still have reservation regarding the gain of a GP over an NN in my original review and I do not think the authors have addressed this very convincingly -- while I agree that in general, sparse GP can match the performance of GP with a sufficiently large number of inducing inputs, the proposed method also incurs extra approximations so arguing for the advantage of the proposed method in term of the accurate approximate inference of sparse GP seems problematic."" + +Another reviewer points out that the comment in the author rebuttal about Kronecker factored methods (Saatci, 2011) for non-Gaussian likelihoods and with variational inference being an open question is not accurate: SV-DKL (https://arxiv.org/abs/1611.00336) and other approaches (http://proceedings.mlr.press/v37/flaxman15.pdf) were specifically designed to address this question, and are implemented in popular packages. Moreover, there is highly relevant additional work on latent variable representations for neural network weights, inducing priors on p(w) through p(z), which is not discussed or compared against (https://arxiv.org/abs/1811.07006, https://arxiv.org/abs/1907.07504). The revision only includes a minor consideration of DKL in the appendix. + +While the ideas in the paper are promising, and the generally thoughtful exchanges were appreciated, there is clearly related work that should be discussed in the main text, with appropriate comparisons. With reviewers expressing additional reservations after rebuttal, and the lack of a clear champion, the paper would benefit from significant revisions in these directions. + +Note: In the text, it says: +""However, obtaining p(w|D) and p(D) exactly is intractable when N is large or when the network is large and as such, approximation methods are often required."" +One cannot exactly obtain p(D), or the predictive distribution, regardless of N or the size of the network; exact inference is intractable because the relevant integrals cannot be expressed in closed form, since the parameters are mapped through non-linearities, in addition to typically non-Gaussian likelihoods.",ICLR2020, +DhrRYKG7F81,1642700000000.0,1642700000000.0,1,AJg35fkqOPA,AJg35fkqOPA,Paper Decision,Reject,"This submission proposes a new loss function for facial attribute GAN editing and transfer via text inputs. +A latent mapping mechanism based on StyleCLIP is used to disentangle the semantic attributes of human face. +The resulting semantic directional decomposition network (SDD-Net) transfers attributes from reference image to a target guided with text descriptions. Experiments show on CelebA-HQ dataset some qualitative results and ablations for the « smile » attribute. + +The main contribution is essentially a loss term that measures latent similarity in CLIP latent space. +Most of the reviewers are not convinced by the approach and have raised several issues. One can question the relevancy of the way that text features are used (as a semantic direction). The role of the reference image in attribute transfer is is also questionable in the proposed framework. +Additionally, evaluation is not sufficient, in particular to investigate whether the proposed method works on a wide range of attributes. +The paper only conducts experiments on CelebA-HQ dataset. It would be interesting to have experiments on other datasets. +The comparison to StyleCLIP is also insufficient, and there are no quantitative experiments to support the authors' claims. +We encourage the authors to take into account all these remarks and Rs' comments in order to get an improved proposition for a future conference.",ICLR2022, +HxS2JDJ94a,1576800000000.0,1576800000000.0,1,Skx6WaEYPH,Skx6WaEYPH,Paper Decision,Reject,The reviewers recommend rejection due to various concerns about novelty and experimental validation. The authors have not provided a response.,ICLR2020, +qBe2DsrRX8,1576800000000.0,1576800000000.0,1,BklRFpVKPH,BklRFpVKPH,Paper Decision,Reject,"The paper proposes to combine RL and Imitation Learning. It defines a regularized reward function that minimizes the KL distance between the policy and the expert action. The formulation is similar to the KL regularized MDPs, but with the difference that an additional indicator function based on the support of the expert’s distribution is multiplied to the regularized term. + +Several issues have been brought up by the reviewers, including: +* Comparison with pre-deep learning literature on the combination of RL and imitation learning +* Similarity to regularized MDP framework +* Assumption 1 requiring a stochastic expert policy, contradicting the policy invariance claim +* Difficulty of learning the indicator function of the support of the expert’s data distribution + +Some of these issues have been addressed, but at the end of the day, one of the expert reviewers was not convinced that the problem of learning an indicator function is going to be easy at all. The reviewer believes that learning such a function requires ""learning a harsh approximation of the density of visits of the expert for every state which is a quite hard task, especially in stochastic environments.” + +Another issue is related to the policy invariance under the optimal expert policy. In most MDPs, the optimal policy is not stochastic and does not satisfy Assumption 1, so the optimal policy invariance proof seems to contradict Assumption 1. + +Overall, it seems that even though this might become a good paper, it requires some improvements. I encourage the authors to address the reviewers’ comments as much as possible.",ICLR2020, +B1ft8yTSM,1517250000000.0,1517260000000.0,826,SJSVuReCZ,SJSVuReCZ,ICLR 2018 Conference Acceptance Decision,Reject,"The proposed conditional variance regularizer looks interesting and the results show some promise. However, as the reviewers pointed out, the connection between the information-theoretic argument provided and the final form of the regularizer is too tenuous in its current form. Since this argument is central to the paper, the authors are urged to either provide a more rigorous derivation or motivate the regularizer more directly and place more emphasis on its empirical evaluation.",ICLR2018, +PZD_Lo1pZH,1610040000000.0,1610470000000.0,1,v5WXtSXsVCJ,v5WXtSXsVCJ,Final Decision,Reject,"This work describes a series of strategies for optimizing the training speed of +word embeddings (as in word2vec and fasttext). + +All reviewers appreciate the convincing empirical results, which are without a +doubt impressive. Reviewers also mostly agree that speeding up embedding +training is important, and there is no doubt that this type of paper is +appropriate for ICLR (as clearly highlighted in the CfP.) + +However, the specific optimization strategies deployed and described here +are deemed not to bring novel insight, useful in itself to the community, beyond the +software contribution described. +The paper seems to mostly serve as documentation of the +implementation, limiting its value and impact to further research. +The pedagogic value is also limited, as the paper tackles multiple different, +eclectic optimizations, a narrative strategy that does not leave room to describe a single one more +generally, helping the community find other places to apply it. +All in all this leads to a borderline negative assessment, and I cannot +recommend acceptance.",ICLR2021, +7nRgl5xilO7,1610040000000.0,1610470000000.0,1,mj7WsaHYxj,mj7WsaHYxj,Final Decision,Reject,"This paper studies the problem of adversarial training for graph neural networks. The proposed method is build on the free training approach, and more specifically FreeLB, with some additional tricks including bias perturbation (for node-classification) and unbounded attacks. While these additions are potentially useful, there are only limited investigation into their effect. Putting aside the technical distinctions of the method with prior work, this paper can also be viewed as an empirical study of adversarial training techniques on graph data with various GNN architectures. It is worth noting that overall the conclusions on ""adversarial training"" are positive, we do see consistent improvement over a variety of architectures and tasks. The issues however, is that it is unclear whether these improvements can be similarly achieved using prior technique like FreeLB (oblation is only done on one single task, where biased perturbation is shown to lead to minor improvement). The paper also provides some results showing the effect of depth of the network as well as different training strategies such as batch norm, dropout with general adversarial training. These results are interesting to see but do seem to be limited in both scope and depth. It appears that the authors have two goals in mind, one is to propose FLAG and demonstrate its usefulness, and the other is to provide a better understanding of how adversarial training works for GNNs in general. Given the limited novelty of FLAG compared to prior methods, the main contribution actually comes from the later part, which unfortunately is somewhat underdeveloped. +",ICLR2021, +Byg2aslWlN,1544780000000.0,1545350000000.0,1,BJfguoAcFm,BJfguoAcFm,Clarification and comparison needed,Reject,"This work propose a method for learning a Kolmogorov model, which is a binary random variable model that is very similar (or identical) to a matrix factorization model. The work proposes an alternative optimization approach that is again similar to matrix factorization approaches. Unfortunately, no discussion or experiments are made to compare the proposed problem and method with standard matrix factorization; without such comparison, it is unclear if the proposed is substantially new, or a reformation of a standard problem. The authors are encouraged to improve the draft to clarify the connection matrix factorization and standard factor models. ",ICLR2019,4: The area chair is confident but not absolutely certain +yi7b3DDmQUui,1642700000000.0,1642700000000.0,1,IcUWShptD7d,IcUWShptD7d,Paper Decision,Accept (Poster),"This submission presents an interesting contribution on differentiable sorting, providing an analysis of monotonicity for these operations. + +The reviewers overall argue for acceptance.",ICLR2022, +c2VTHtZCVNa,1642700000000.0,1642700000000.0,1,rFJWoYoxrDB,rFJWoYoxrDB,Paper Decision,Accept (Poster),"This paper makes the important, albeit somewhat unsurprising, finding, that cell-based NAS search spaces, and in particular the DARTS search space, include some operations that are much better than others. Reducing the search space to these allows even random architectures to yield good performance, similarly to the findings of ""Designing Network Design Spaces"", https://openaccess.thecvf.com/content_CVPR_2020/html/Radosavovic_Designing_Network_Design_Spaces_CVPR_2020_paper.html + +This paper received mostly positive scores (5,6,6,8). While I agree with the negative reviewer that it would be good to study this on other benchmarks as well, I follow the positive reviewers in recommending acceptance. I encourage the authors to fix the remaining typos (there are still many) and to open source their code. This would increase the paper's impact a lot. + +Finally, I would like to ask the authors to avoid protraying the misconception that we don't need large and powerful search spaces. In fact, as already hinted on in Section 6, we *do* need larger and more exciting search spaces in order to discover entirely novel architectures. Also the multi-objective nature of NAS is not to be undervalued, so the take-away of the paper should *not* be that we should design NAS benchmarks with really small & strong search spaces, but that, given a specific problem and objective, it may be prudent to evaluate whether the whole power of a given NAS search space is needed or whether it can be reduced to its essential parts.",ICLR2022, +rgMPGB5pOvBU,1642700000000.0,1642700000000.0,1,T_8wHvOkEi9,T_8wHvOkEi9,Paper Decision,Reject,"Description of paper content: + +The paper studies the problem of achieving coordination among a group of agents in a cooperative, multi-agent task. Coordination graphs reduce the computational complexity of this problem by reducing the joint value function to a sum of local value functions depending on only subsets of agents. In particular, the Q-function of the entire system is “expanded” up to second-order in agent interactions: Q = \sum_{i \in [n]} q_i + \sum_{(i,j) \in G} q_{ij}, where the q_i is function of the i-th agent’s history and current action, and q_{ij} is a function of two agents’ histories and current actions. As G does not include higher-order (third and above) terms, the algorithm does not have exponential dependence on the number of agents. If G includes only a subset of pairs of agents, then the computational complexity is reduced to less than quadratic. Since the coordination problem is cooperative, the authors propose a meta-agent (“coordinator”) that selects the graph G in a dynamic (state-by-state) fashion in order to maximize return. The optimization problems of the meta-agent and the sub-agents are performed by deep Q-learning. + +Summary of paper discussion: + +The critical comment made by one reviewer was: “Going back on that trend now only to pursue the polynomial-time nature of the running algorithm would in my opinion require far more diverse evaluation examples, backed by a stronger motivation highlighting real-world threats of all the other MARL algorithms taking longer than polynomial time. As is, SOP-CG does not contend amazingly against other MARL algorithms that chose the ""NP-hard? Curse of dimensionality? Fine. We'll approximate, approximate, approximate."" path rather than the ""Polynomial time is our topmost priority; function expressiveness can wait."" path. That leads me back to the question of why pursue polynomial time at the cost of losing both the function expressiveness and the peak performance….” + +Comments from Area Chair: + +Looking at the experiments, the number of agents in the empirical problems is not large. For example, there are 15 agents in ""Sensor."" Any focus on computational complexity at this scale is hard to justify, especially with algorithms that are approximate. It seems favorable at this small scale to use function approximators that can take in all the agents' histories and actions. This obvious baseline is not included in comparisons. It is hard to justify inclusion of this paper in the conference.",ICLR2022, +ry9evkaSz,1517250000000.0,1517260000000.0,927,rkONG0xAW,rkONG0xAW,ICLR 2018 Conference Acceptance Decision,Reject,"This is an interesting paper and addresses an important problem of neural networks with memory constrains. New experiments have been added that add to the paper, but the full impact of the paper is not yet realised, needing further exploration of models of current practice, wider set of experiments and analysis, and additional clarifying discussion.",ICLR2018, +s-Ujxf_4W,1576800000000.0,1576800000000.0,1,ryxAY34YwB,ryxAY34YwB,Paper Decision,Reject,"This paper proposes a method to leverage the Lead (i.e., first sentence of an article) in training a model for abstractive news summarization. + +Reviewers' initial recommendations were weak reject to weak accept, pointing out the limitations of the paper including 1) little novelty in modeling, 2) weak evaluation, and 3) lack of deep analysis. After the author rebuttal and revised paper, one of the reviewers increased the score and were leaning toward weak accept. + +However, reviewers noted that there was significant overlap with another submission, and we discussed that it would be best to accept one of the two, incorporating the contributions of both papers. Hence, I recommend that this paper not be accepted, and perhaps some of the non-overlapping contents of this paper can be included in the other, accepted paper. + +Thank you for submitting this paper. I enjoyed reading it.",ICLR2020, +SyEQhGIux,1486400000000.0,1486400000000.0,1,HJgXCV9xx,HJgXCV9xx,ICLR committee final decision,Accept (Poster),"pros: + - demonstration that using teacher's feedback to improve performance in a dialogue system can be made to work + in a real-world setting + - comprehensive experiments + + cons: + - lack of technical novelty due to prior work + - not all agree with the RL vs not-RL (pre-built datasets) distinction suggested in the paper with respect to the previous work + + Overall, the paper makes a number of practical contributions and evaluation, rather than theoretical novelty.",ICLR2017, +IvfpCpTeDti,1610040000000.0,1610470000000.0,1,Mu2ZxFctAI,Mu2ZxFctAI,Final Decision,Accept (Poster),"The paper proposed weighted-MOCU, a novel objective-oriented data acquisition criterion for active learning. The propositions are well-motivated, and all reviewers find the analysis of the drawbacks of several popular myopic strategies (e.g. ELR tends to stuck in local optima; BALD tends to be overly explorative)) interesting and insightful. Reviews also appreciate the novelty of the proposed weighted strategy for addressing the convergence issue of MOCU-based approaches. Overall I share the same opinions and believe the paper offers useful insights for the active learning community. + +In the meantime, there were shared concerns among several reviewers in the readability (structure and intuition), lack of empirical results on more realistic active learning tasks, and limited discussion on the modeling assumptions. Although the rebuttal revision does improve upon many of these points, the authors are strongly encouraged to take into account the reviews, in particular, to further strengthen the empirical analysis and discussions, when preparing a revision. +",ICLR2021, +SyxnltRxgV,1544770000000.0,1545350000000.0,1,HJMRvsAcK7,HJMRvsAcK7, This is an interesting topic but the reviewers had substantial concerns on the clarity and significance of the contribution.,Reject," +This is an interesting topic but the reviewers had substantial concerns on the clarity and significance of the contribution. +",ICLR2019,4: The area chair is confident but not absolutely certain +B6uxWUN7W7,1610040000000.0,1610470000000.0,1,7ZJPhriEdRQ,7ZJPhriEdRQ,Final Decision,Reject,"This paper is about learning the output noise variance of a VAE and its effect on the generated image quality as measured by FID. The paper argues that the output variance parameter plays an important role and proposes a simple procedure, where a maximum likelihood estimate of the noise variance is estimated. Experiments on some standard datasets are provided. +Overall, the paper is well written and has been perceived positively by the reviewers. However, the effect of observation variance has been in detail analysed by earlier work, in particular Dai and Wipf 2019. The novelty of the current paper is somewhat limited in scope. The paper is somewhat borderline in these respect; a much stronger experimental section would have been helpful. + +One key contribution of the work is empirical comparison of alternative parametrizations of the output noise. Overall, the paper would be stronger if this aspect is analysed more in detail, possibly with careful comparisons with competing methods. Inclusion of controlled experiments (e.g. by adding extra noise to data) to show how precise the noise variance estimation and how the procedure influences the convergence of other parameters would have made the paper much more impactful. +",ICLR2021, +RKo585pH65,1610040000000.0,1610470000000.0,1,8wqCDnBmnrT,8wqCDnBmnrT,Final Decision,Accept (Poster),"This paper presents a zero-shot generation approach by disentangling representations into swappable components (each component corresponding to an attribute) and then conditioning on any desired combination of attributes to do zero-shot synthesis of samples containing those attributes. + +There were some concerns raised in the original reviews which the authors have addressed in the rebuttal and the revised submission. Post the discussion phase, all reviewers see merit in the proposed ideas and unanimously recommend acceptance. Based on my own reading of the paper and the reviews/author responses, I agree with the assessment. ",ICLR2021, +B_VFg72-d-C,1610040000000.0,1610470000000.0,1,FN7_BUOG78e,FN7_BUOG78e,Final Decision,Reject,"Thank you for your submission to ICLR. The reviewers and I unanimously felt, even after some of the clarifications provided, that while there was some interesting element to this work, ultimately there were substantial issues with both the presentation and content of the paper. Specifically, the reviewers largely felt that the precise problem being solved was somewhat poorly defined, and the benefit of the proposed preimage technique wasn't always clear. And while the ACAS system was a nice application, it seems to be difficult to quantify the real benefit of the proposed method in this setting (especially given that other techniques can similarly be used to verify NNs for this size problem). The answer that this paper provides seems to be something along the lines of ""ease of visual interpretation"" of the pre-image conditions, but this needs to be quantified substantially more to be a compelling case.",ICLR2021, +SJxbz6UBxV,1545070000000.0,1545350000000.0,1,BklpOo09tQ,BklpOo09tQ,Needs improvement.,Reject,"While the proposed method is novel, the evaluation is not convincing. In particular, the datasets and models used are small. Susceptibility to adversarial examples is tightly related to dimensionality. The study could benefit from more massive datasets (e.g., Imagenet).",ICLR2019,4: The area chair is confident but not absolutely certain +SyeMxE2Ve4,1545020000000.0,1545350000000.0,1,SklcFsAcKX,SklcFsAcKX,Borderline paper: incremental contribution over recent literature,Reject,"The paper analyzes the interesting problem of image denoising with neural networks by imposing simplifying assumptions on the Gaussianity and independence of the prior. A bound is established from the analysis of (Hand & Voroninksi, 2018) that can be algorithmically achieved through a small tweak to gradient descent. + +Unfortunately, the contribution of this paper is incremental given the recent works of (Hand & Voroninksi, 2018) and (Bora et al., 2017); an opinion the reviewers unanimously shared. Reviewer opinion differed on whether they found the overall contribution to be barely acceptable or simply insufficient. No reviewer detected a major advance, and there seems to be a question of whether the achievement is significant given the strength of the assumptions required to achieve the modest additions. + +After scrutiny, the main theoretical contributions of the paper appear to be a bit overstated. For example, the bound in Theorem 1 is quite weak: it does not establish convergence to a global minimizer (even under the strong assumptions given), but only that Algorithm 1 eventually remains in a neighborhood of the global minimizer. It is true that this neighborhood can be made arbitrarily small by increasing the strength of the assumptions made on epsilon and omega, but epsilon remains a constant with respect to iteration count. The subsequent claim that the algorithm achieves a denoising rate of sigma^2 k/n is not an accurate interpretation of Theorem 1, given that this claim would require require (at the very least) that epsilon can be made arbitrarily small, which it cannot be. More precision is required in stating supportable conclusions from the given results. + +The algorithmic motivation itself is rather weak, in the sense that this paper only provides an anecdotal demonstration that there are no spurious critical points beyond the negation of the global minimizer---the theoretical support for this claim already resides in (Hand & Voroninski, 2018). The provenance of such a central observation was not made sufficiently clear in the paper nor in the discussion. + +An additional quibble about the experimental evaluation is that it does not compare to plain gradient descent (or other baseline optimization techniques), which the authors observe almost always works in the scenario considered. It seems that the ""negation tweak"" embedded in Algorithm 1 has no real impact on the experimental results, raising the question of whether the contributions do indeed have any practical import. The descriptions offered in the current paper suggest that a serious algorithmic advantage has yet to be demonstrated in any real experiment. The paper requires a far better evaluation of Algorithm 1 in comparison to standard baseline optimizers, to support the case that the proposed algorithmic tweak has practical significance. + +This paper remained in a weak borderline position after the review and discussion period. In the end, this was a very difficult decision to make, but I think the paper would benefit from further strengthening before it can constitute a solid publication.",ICLR2019,5: The area chair is absolutely certain +S1vdEk6BM,1517250000000.0,1517260000000.0,386,H1I3M7Z0b,H1I3M7Z0b,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"The paper received generally positive reviews, but the reviewers also had some concerns about the evaluations. + +Pros: +-- An improvement over HashNet, the model ties weights more systematically, and gets better accuracy. +Cons: +-- Tying weights to compress models already tried before. +-- Tasks are all small and/or audio related. +-- Unclear how well the results will generalize for 2D convolutions. +-- HashNet results are preliminary; comparisons with HashNet missing for audio tasks. + +Given the expert reviews, I am recommending the paper to the workshop track. +",ICLR2018, +KXxJ4hX_lo,1576800000000.0,1576800000000.0,1,H1lMogrKDH,H1lMogrKDH,Paper Decision,Reject,"The paper studies non-spiking Hudgkin-Huxley models and shows that under few simplifying assumptions the model can be trained using conventional backpropagation to yield accuracies almost comparable to state-of-the-art neural networks. Overall, the reviewers found the paper well-written, and the idea somewhat interesting, but criticized the experimental evaluation and potential low impact and interest to the community. While the method itself is sound, the overall assessment of the paper is somewhat below what's expected from papers accepted to ICLR, and I’m thus recommending rejection.",ICLR2020, +QgEGG74f3NH,1610040000000.0,1610470000000.0,1,eo6U4CAwVmg,eo6U4CAwVmg,Final Decision,Accept (Poster),"This paper aims to improve the training of generative adversarial networks (GANs) by incorporating the principle of contrastive learning into the training of discriminators in GANs. Unlike in an ordinary GAN which seeks to minimize the GAN loss directly, the proposed GAN variant with a contrastive discriminator (ContraD) uses the discriminator network to first learn a contrastive representation from a given set of data augmentations and real/generated examples and then train a discriminator based on the learned contrastive representation. It is noticed that a side effect of such blending is the improvement in contrastive learning as a result of GAN training. The resulting GAN model with a contrastive discriminator is shown to outperform other techniques using data augmentation. + +**Strengths:** + * It proposes a new way of training the discriminators of GANs based on the principle of contrastive learning. + * The paper is generally well written to articulate the main points that the authors want to convey. + * The experimental evaluation is well designed and comprehensive. + +**Weaknesses:** + * Even though the proposed learning scheme is novel, the building blocks are based on existing techniques in GAN and contrastive learning. + * The claim that GAN helps contrastive learning is not fully substantiated. + * It is claimed in the paper that the proposed contrastive discriminator can lead to much stronger augmentations *without catastrophic forgetting*. However, this “catastrophic forgetting” aspect is not really empirically validated in the experiments. + * The writing has room for improvement. + +Despite its weaknesses, this paper explores a novel direction of training GANs that would be of interest to the research community. +",ICLR2021, +YDkJBsaiDrA,1610040000000.0,1610470000000.0,1,T4gXBOXoIUr,T4gXBOXoIUr,Final Decision,Reject,"The proposed ConVIRT learns representations of medical data from paired image and text data. +While the paper addresses a relevant problem, the reviewers agree that the method has limited novelty. Two reviewers find and that the experiments are not convincing. One reviewer notes that the paper does not compare to the state-of-the-art methods for the tasks. +",ICLR2021, +Bklaqej0JV,1544630000000.0,1545350000000.0,1,rkgpCoRctm,rkgpCoRctm,"Interesting simple idea, but limited novelty and unfair experimental setups",Reject,"The paper proposes a simple method for detecting out-of-distribution samples. The authors' major finding is that mean and standard deviation within feature maps can be used as an input for classifying out-of-distribution (OOD) samples. The proposed method is simple and practical. + +The reviewers and AC note the following potential weaknesses: (1) limited novelty and somewhat ad-hoc approach, i.e., it is not too surprising to expect that such statistics can be useful for the purpose. Some theoretical justification might help. (2) arguable experimental settings, i.e., the performance highly varies depending on validation (even in the revised draft), and sometimes irrationally good. It also depends on the choice of classifier. + +For (2), I think the whole evaluation should be done assuming that we don't know how it looks the OOD set. Under the setting, the authors should compare the proposed method and existing ones for fair comparisons. AC understands the authors follows the same experimental settings of some previous work addressing this problem, but it's time that this is changed. Indeed, a recent paper by Lee at al. 2018 considers such a setting for detecting more general types of abnormal samples including OOD. + +In overall, the proposed idea is simple and easy to use. However, AC decided that the authors need more significant works to publish the work. +",ICLR2019,5: The area chair is absolutely certain +ryglXPwZgV,1544810000000.0,1545350000000.0,1,HyxSBh09t7,HyxSBh09t7,Some merit.,Reject,"AR1 is concerned about the novelty and what are exact novel elements of the proposed approach. AR2 is worried about the novelty (combination of existing blocks) and lack of insights. AR3 is also concerned about the novelty, complexity and poor evaluations/lack of thorough comparisons with other baselines. After rebuttal, the reviewers remained unconvinced e.g. AR3 still would like to see why the proposed method would be any better than GAN-based approaches. + +With regret, at this point, the AC cannot accept this paper but AC encourages the authors to take all reviews into consideration and improve their manuscript accordingly. Matters such as complexity (perhaps scattering networks aren't the most friendly here), clear insights and strong comparisons to generative approaches are needed.",ICLR2019,4: The area chair is confident but not absolutely certain +XN4qjQS-dv,1610040000000.0,1610470000000.0,1,akgiLNAkC7P,akgiLNAkC7P,Final Decision,Reject,"This paper introduces ICRL, where the RL agent is supposed to maximize the reward under unknown constraints, which should be inferred from the expert demonstration. Reviewers generally agreed that this is an interesting work, and potentially make RL to be applied to more general settings. However, they also would like to see more experimental results with baselines (e.g. agents based on IRL and also related constrained learning approaches) to make the motivation behind the approach more convincing. I hope these concerns are addressed in the future work.",ICLR2021, +zhI9uwpHsyO,1610040000000.0,1610470000000.0,1,O3Y56aqpChA,O3Y56aqpChA,Final Decision,Accept (Oral),"The paper introduces an approach to self-train a source domain classifier on unlabeled data from the target domain, considering the few-shot learning setting when there is significant discrepancy between the source and target domains. While the reviewers pointed out a few weaknesses, such as somewhat limited methodological novelty and lack of comparisons with other methods, they all recommend acceptance as final decision. The paper is beautifully written. The proposed method is very simple, but yields excellent results in a very practical problem, which should be of wide interest to the ICLR community. The experimental evaluation is rigorous and the ablation studies are convincing. The AC agrees with the decision made by the reviewers and recommends acceptance.",ICLR2021, +sJLWCl0YyC,1576800000000.0,1576800000000.0,1,Skgaia4tDH,Skgaia4tDH,Paper Decision,Reject,"The paper presents a structured VAE, where the model parameters depend on a local structure (such as distance in feature or local space), and it uses the meta-learning framework to adjust the dependency of the model parameters to the local neighborhood. + +The idea is natural, as pointed by Rev#1. It incurs an extra learning cost, as noted by Rev#1 and #2, asking for details about the extra-cost. The authors' reply is (last alinea in first reply to Rev#1): we did not comment (...) because in essence, using neighborhoods in a naive way is not affordable. +The area chair would like to know the actual computational time of Local VAE compared to that of the baselines. + +More details (for instance visualization) about the results on Cars3D and NORB would also be needed to better appreciate the impact of the locality structure. The fact that the optimal value (wrt Disentanglement) is rather low ($10^{-2}$) would need be discussed, and assessed w.r.t. the standard deviation. + +In summary, the paper presents a good idea. More details about its impacts on the VAE quality, and its computation costs, are needed to fully appreciate its merits. ",ICLR2020, +0GOvsrFwQh,1576800000000.0,1576800000000.0,1,BJeTCAEtDB,BJeTCAEtDB,Paper Decision,Reject,"The paper proposed the use of a lossy transform coding approach to to reduce the memory bandwidth brought by the storage of intermediate activations. It has shown the proposed method can bring good memory usage while maintaining the the accuracy. +The main concern on this paper is the limited novelty. The lossy transform coding is borrowed from other domains and only the use of it on CNN intermediate activation is new, which seems insufficient. ",ICLR2020, +yQCUg_xaq2,1576800000000.0,1576800000000.0,1,rklhqkHFDB,rklhqkHFDB,Paper Decision,Reject,"The authors demonstrate how neural networks can be used to learn vectorial representations of a set of items given only triplet comparisons among those items. The reviewers had some concerns regarding the scale of the experiments and strength of the conclusions: empirically, it seemed like there should be more truly large-scale experiments considering that this is a selling point; there should have been more analysis and/or discussion of why/how the neural networks help; and the claim that deep networks are approximately solving an NP-hard problem seemed unimportant as they are routinely used for this purpose in ML problems. With a combination of improved experiments and revised discussion/analysis, I believe a revised version of this paper could make a good submission to a future conference.",ICLR2020, +esuH1AOHziN,1642700000000.0,1642700000000.0,1,UxBH9j8IE_H,UxBH9j8IE_H,Paper Decision,Reject,"Dear Authors, + +The paper was received nicely and discussed during the rebuttal period. However, the current consensus suggests the paper requires another round of revisions before it gets accepted. + +In particular: + +- There were still some gray areas regarding comparison to simple techniques. E.g., one reviewer raised the question how it compares to simply stopping based on validation accuracy for example. The reviewer was missing the justification why stopping at the loss of Ramanujan graph property is preferable in comparison to other criteria. +- Several reviewers found the general idea interesting, but all felt that more reasonings about the impact/insights/relationship of Ramanujan graph property with pruning need to be found to get accepted. +- Reviewers appreciate that the authors corrected many parts of the submission (see increased scores). However, reviewers felt that the paper requires more data/evidence to get accepted at this level, based on the discussions made during the rebuttal period. + +Best AC",ICLR2022, +vG1v-kJnMCo,1610040000000.0,1610470000000.0,1,EbIDjBynYJ8,EbIDjBynYJ8,Final Decision,Accept (Oral),"This paper proposes a model for learning disentangled representations by assuming the slowness prior over transitions between two frames. The model is well justified theoretically, and evaluated extensively experimentally. The results are good, and all reviewers agree that this paper is among the top papers they have reviewed. For this reason, I am pleased to recommend this paper for an Oral.",ICLR2021, +3gkfkPbyvOr,1610040000000.0,1610470000000.0,1,vXj_ucZQ4hA,vXj_ucZQ4hA,Final Decision,Accept (Poster),"The paper proposes a sensitivity-based pruning method at initialization. For fully connection and and convolutional neural networks, it shows that the model is trainable only when the initialization satisfies Edge of Chaos (EOC). The paper also provided a rescaling method so that the pruned network is initialized on the EOC. For Resnet, the paper shows that the proposed pruning satisfies the EOC condition by default and further provides re-parameterization method to tackle exploding gradients. The experiments show the performance of the proposed method on fully connected and convolution neural network, as well as ResNet. There were some concerns about the contribution of the paper compared to that of [1]. I read the two papers carefully and while both papers aim at addressing a similar problem, i.e., pruning at initialization while avoiding layer collapse, the paper provides a different perspective on the problem, and provides enough theoretical contribution and insights to be found helpful and interesting by the community. +",ICLR2021, +HkR52zUul,1486400000000.0,1486400000000.0,1,HJ0UKP9ge,HJ0UKP9ge,ICLR committee final decision,Accept (Poster),The program committee appreciates the authors' response to concerns raised in the reviews. All reviewers agree that this is a good piece of work that should be accepted to ICLR. Authors are encouraged to incorporate reviewer feedback to further strengthen the paper.,ICLR2017, +4JoSsJVNfb1,1610040000000.0,1610470000000.0,1,g4E6SAAvACo,g4E6SAAvACo,Final Decision,Reject,"This paper proposes an interesting new direction for low-cost NAS. However, the paper is not quite ready for acceptance in its current form. The main area of improvement is around the generalizability of the score presented, both empirically and (ideally) theoretically. The two main directions of generalizability that would be worth investigating are 1) different image datasets (see comments around Imagenet-16) 2) different/larger search spaces. Even simple search spaces consisting of a few architectural modification starting from standard architectures (e.g. resnets) would go a long way in convincing the community that the proposed method generalizes past NasBench.",ICLR2021, +0jsYW1HCcK,1610040000000.0,1610470000000.0,1,#NAME?,#NAME?,Final Decision,Accept (Poster),"Solid work on extending AntisymmetricRNN and expanding its expressivity while controlling the global stability of the recurrent dynamics. It contributes to the growing interest in continuous-time RNN formulations that can deal with exploding gradient problem, and worthy of ICLR poster presentation. Three reviewers were positive and one was slightly negative. Authors added additional experiments and strengthened the manuscript significantly during the review process.",ICLR2021, +BJlDc3nZeE,1544830000000.0,1545350000000.0,1,rJEyrjRqYX,rJEyrjRqYX,metareview,Reject,"The submission suggests reducing the parameters in a conv-lSTM by replacing the 3 gates in the standard LSTM with one gate. The idea is to get a more efficient convolutional LSTM and use it for video prediction. Two of the reviewers found the manuscript and description of the work difficult to follow and the justification for the proposed method lacking. Additionally, the contribution of this submission feels rather thin, and the experimental results are not very convincing: the absolute training time is too coarse of a measurement (and convergence may depend on many factors), and the improvements over PredNet seem somewhat marginal. + +Finally, I agree with the reviewer that mentioned that a proper comparison with baselines should be done in such a way that the number of parameters is comparable (if #params is a main claim of the paper!). It is entirely plausible that if you reduce the number of parameters in PredNet by 40% (in some other way), its performance would also benefit. + +With all this in mind, I do not recommend this paper be accepted at this time.",ICLR2019,5: The area chair is absolutely certain +qsS3ikF8e1R,1642700000000.0,1642700000000.0,1,3AkuJOgL_X,3AkuJOgL_X,Paper Decision,Reject,"This paper considered the computational budgets of adversarial training in the context of Federated Learning and studied the propagation of adversarial robustness from affordable parties to low-resource parties. Although the authors conducted the extensive experiments to show the effectivenss of FedRBN, there are still important concerns from the reviewers, + +(1) The novelty is marginal compared to FedBN, DBN and previous insights, which moves the similar framework to adversarial robustness and changed the rules, especially given the competitive ICLR. More theorectical novelty will be preferred. + +(2) Many technical details are not well explained and some parts need to be improved, which make the reviewers not well convinced about FedRBN. + +Given above points, I will recommend rejection and encourage the authors to improve the paper in the future.",ICLR2022, +yDWiL734L9K,1642700000000.0,1642700000000.0,1,aBO5SvgSt1,aBO5SvgSt1,Paper Decision,Accept (Poster),"This paper proposes and studies a variant of policy optimization---mirror descent policy optimization (MDPO)---which was inspired by the mirror descent algorithm in the optimization literature. The proposed algorithm attempts to find a policy parameter that maximizes the expected regularized advantage function, where the regularization term is based on the KL divergence between the new policy iterate and the current policy iterate. The main contributions are algorithmic and empirical, with detailed discussions provided to illuminate the connection between MDPO and other existing policy optimization paradigms like TRPO, PPO, etc. The paper provides an interesting and useful contribution to the growing literature of policy optimization.",ICLR2022, +WeUfNBvXfsE,1610040000000.0,1610470000000.0,1,sHSzfA4J7p,sHSzfA4J7p,Final Decision,Reject,"Three reviewers have reviewed this paper and they maintain their findings after the rebuttal. The reviewers are mainly concerned about the novelty (several highly-related papers exist) and well as the technical contribution (more theoretical developments are needed). Therefore, this paper in its current form cannot be accepted.",ICLR2021, +COW1Mz61YWO,1642700000000.0,1642700000000.0,1,JVWB8QRUOi-,JVWB8QRUOi-,Paper Decision,Reject,"This paper propose a novel framework to increase cooperation in second-order social dilemmas. This is based on encouraging homophilic incentives. Reviewers agree that the paper does not meet the standards of publication yet. In particular, they worry that the assumptions made are so restrictive as to make model inapplicable to interesting problems. There is also a concern that the work is simply not novel enough.",ICLR2022, +NMEreOFQEY,1576800000000.0,1576800000000.0,1,BJe_z1HFPr,BJe_z1HFPr,Paper Decision,Reject,"This paper offers likely novel schemes for image resizing. The performance improvement is clear. Unfortunately two reviewers find substantial clarity issues in the manuscript after revision, and the AC concurs that this is still an issue. The paper is borderline but given the number of higher ranked papers in the pool is unable to be accepted unfortunately. ",ICLR2020, +mmQwMQyQhIQ,1610040000000.0,1610470000000.0,1,7FNqrcPtieT,7FNqrcPtieT,Final Decision,Accept (Poster),"This paper provides some theoretical perspective on the use of data augmentation in consistency regularization-based semi-supervised learning. The framework used in the paper argues that high-quality data augmentation should move along the data manifold. This generic view allows the paper's ideas to be applied across datasets (as opposed to image-specific data augmentation used in state-of-the-art semi-supervised learning algorithms). I am not aware of any other work raising these points, and indeed this paper is significant in that it provides a new and potentially useful perspective on the most performative semi-supervised learning approach. Reviewers agreed that the paper was clear and useful. The main concern was that the paper only included experiments in toy settings. Indeed, it would have been much more impactful to apply these ideas to state-of-the-art semi-supervised learning methods, but I think it can be excused given the theoretical focus of the work.",ICLR2021, +rJlfgRJlgV,1544710000000.0,1545350000000.0,1,HkxaFoC9KQ,HkxaFoC9KQ,A significant study of relational inductive biases in DRL,Accept (Poster),"The paper presents a family of models for relational reasoning over structured representations. The experiments show good results in learning efficiency and generalization, in Box-World (grid world) and StarCraft 2 mini-games, trained through reinforcement (IMPALA/off-policy A2C). + +The final version would benefit from more qualitative and/or quantitative details in the experimental section, as noted by all reviewers. + +The reviewers all agreed that this is worthy of publication at ICLR 2019. E.g. ""The paper clearly demonstrates the utility of relational inductive biases in reinforcement learning."" (R3)",ICLR2019,4: The area chair is confident but not absolutely certain +dwO2519SOIc,1642700000000.0,1642700000000.0,1,9TdCcMlmsLm,9TdCcMlmsLm,Paper Decision,Reject,"This work proposes an approach to improve non-ML based methods of text generation. It reformulates the problem with the soft Q-learning approach from RL instead of standard hard RL formulations from previous text generation work. By doing this, the work allows application of path consistency learning. This is an elegant formulation. However, this reformulation into soft Q-learning appears quite straightforward and so the application of path consistency learning does not require much change to be used for text generation. This limits the novelty of the work. The experiments are also relatively small-scale and consists of some non-standard tasks such as prompt generation (which is typically evaluated indirectly, the response to the prompts rather than the prompt itself). As the reviewers mention, evaluating on more large-scale standard tasks such as summarisation or dialog would be more convincing. Finally the work lacks references to recent works in the field, such as LeakGAN.",ICLR2022, +CZ6E5FMIEVAy,1642700000000.0,1642700000000.0,1,Ivku4TZgEly,Ivku4TZgEly,Paper Decision,Reject,"This paper analyzes analyze the fairness of Integrated Gradient based attribution methods. The authors exploit SHAP and BShap, two approaches based on the theory of Shapley Values, as the reference of ""fair"" methods. Specifically, they present an ""attribution transfer"" phenomenon in which the Integrated Gradients are affected by some sharply fluctuated area across the integration path, thereby deviating from the ''fair'' attribution methods. To avoid the attribution transfer issue, they further propose Integrated Certainty Gradients (ICG) method, where the integration path does not pass through the original fluctuated input space. Experiments are performed to demonstrate the advantages of ICG in avoiding attribution transfer. While the basic premise of the work is interesting, many conceptual details remain unclear and experimental evaluation can also be improved (please see detailed reviewer comments below). Given this, we are unable to recommend acceptance at this time. We hope the authors find the reviews helpful.",ICLR2022, +qyH1c29-3Vs,1610040000000.0,1610470000000.0,1,bB2drc7DPuB,bB2drc7DPuB,Final Decision,Accept (Poster),"This paper takes a step towards understanding the role of nonlinear function approximation--- more specifically, function approximation via (two-layer) neural nets---in some variants of the policy-gradient algorithms. The authors borrow the mean field analysis idea recently popularized in studying shallow neural nets, and investigate the mean-field limits of the training dynamics in the current RL settings. The results and analyses are interesting as they nicely complement another line of linearization-based analyses (i.e., the one based on neural tangent kernels) towards understanding non-linear function approximation. As suggested by a reviewer, it would be nice to add discussions in the revised paper regarding when the dynamics can be guaranteed to converge to a stationary point.  +",ICLR2021, +ATsOONzBBb,1610040000000.0,1610470000000.0,1,fSTD6NFIW_b,fSTD6NFIW_b,Final Decision,Accept (Poster),"This paper studies the reasons for failure of trained neural network models on out of distribution tasks. While the reviewers liked the theoretical aspects of the paper, one important concern is about the applicability of these insights to real datasets. The authors added an appendix to the paper showing results on a real dataset that mitigates this concern to an extent. Further, there are interesting insights in the paper to merit acceptance.",ICLR2021, +r1lh6DVweV,1545190000000.0,1545360000000.0,1,rkeX-3Rqtm,rkeX-3Rqtm,"Good idea, but research not yet ripe. Missing extensive comparison with alternative approaches.",Reject,"The paper proposes a novel local combinatorial search algorithm for the discrete target propagation framework of Friesen & Domingos 2018, and shows a few promising empirical results. + +Reviewers found the paper well written and clear, and two of them were enthusiastic about the direction of this research. +But all reviewers agreed that the paper is too preliminary, particularly in its empirical coverage. More extensive experiments are needed to compare with competitive approaches form the literature, for the task of training hard-threshold networks. Experiments would need to evaluate the algorithms on larger models and data more representative of the field, to measure how the approach can scale, and to convince of the superiority or advantage of the proposed method.",ICLR2019,5: The area chair is absolutely certain +rylbPkTSf,1517250000000.0,1517260000000.0,932,Skx5txzb0W,Skx5txzb0W,ICLR 2018 Conference Acceptance Decision,Reject,"The subject of model evaluation will always be a contentious one, and the reviewers were not yet fully-convinced by the discussion. The points you bring up at the end of your rresponse already point to directions for improvement as well as a greater degree of precision and control.",ICLR2018, +ZkmMdRkxBf5,1642700000000.0,1642700000000.0,1,7MLeqJrHNa,7MLeqJrHNa,Paper Decision,Reject,"This submission receives four negative reviews. The raised issues include paper organizations, presentation clarity, more experimental evaluations, the trade-off between technical contribution and application configuration, and the potential impact on more general visual recognition scenarios. In the rebuttal and discussion phases, the authors do not make any response to these reviews. Overall, the AC agrees with four reviewers that the current submission does not reach the publication bar. The authors are suggested to improve the current submission based on the reviews to make further improvements.",ICLR2022, +0mLpjLV8t,1576800000000.0,1576800000000.0,1,BygIjTNtPr,BygIjTNtPr,Paper Decision,Reject,"Motivated by GANs, the authors study the convergence of stochastic subgradient +descent on convex-concave minimax games. +They introduced an improved ""anchored"" SGD variant, that provably converges +under milder assumptions that the base algorithm. +It is applied to training GANs on MNIST and CIFAR-10, partially showing +improvements over alternative training methods. + +A main point of criticism that the reviewers identify is the strength of the +assumptions needed for the analysis. +Furthermore, the experimental results were deemed weak as the reported scores +are far away from the SOTA, and only simple baselines were compared against. ",ICLR2020, +BJbQ71TSG,1517250000000.0,1517260000000.0,99,ryiAv2xAZ,ryiAv2xAZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Meta score: 6 + +The paper approaches the problem of identifying out-of-distribution data by modifying the objective function to include a generative term. Experiments on a number of image datasets. + +Pros: + - clearly expressed idea, well-supported by experimentation + - good experimental results + - well-written + +Cons: + - slightly limited novelty + - could be improved by linking to work on semi-supervised learning approaches using GANs + +The authors note that ICLR submission 267 (https://openreview.net/forum?id=H1VGkIxRZ) covers similar ground to theirs.",ICLR2018, +0LswIVsn5Lr,1610040000000.0,1610470000000.0,1,Kao09W-oe8,Kao09W-oe8,Final Decision,Reject,"The paper begins with an observation in standard trained CNNs that the correlations in the output channels are high. Building upon this the paper proposes a new ""optimizer"" which modifies the gradients to encourage corelations among output channels. They provide a theoretical foundation for the method, by deriving the gradient through placing a riemannian metric on the manifold of parameter tensors which encourages smoothness along the output channel dimension. Two variants (one based on a Sobolev metric) are proposed and are experiments are provided. The underlying idea and the derivation of the gradients were generally appreciated by the reviewers. However some reviewers maintained their concern regarding the effectiveness of the performed experimentation. The gains demonstrated are relatively small over the baselines and more importantly the baselines are quite far off the state of the art baselines for the particular problems. This is the primary reason for my recommendation as experiments are the only source of understanding whether the method is effective (there is little theory - mostly at an intuitive level to justify the form of the optimizer). Overall, I strongly encourage the authors to explore the idea further and strengthen the paper with stronger baselines (perhaps on larger datasets) and resubmit. ",ICLR2021, +NO7fh8U9_Y,1576800000000.0,1576800000000.0,1,HkeuD34KPH,HkeuD34KPH,Paper Decision,Reject,"The paper proposes to improve sequential recommendation by extending SASRec (from prior work) by adding user embedding with SSE regularization. The authors show that the proposed method outperforms several baselines on five datasets. + +The paper received two weak accepts and one reject. Reviewers expressed concerns about the limited/scattered technical contribution. Reviewers were also concerned about the quality of the experiment results and need to compare against more baselines. After examining some related work, the AC agrees with the reviewers that there is also many recent relevant work such as BERT4Rec that should be cited and discussed. It would make the paper stronger if the authors can demonstrate that adding the user embedding to another method such as BERT4Rec can improve the performance of that model. Regarding R3's concerns about the comparison against HGN, the authors indicates there are differences in the length of sequences considered and that some method may work better for shorter sequences while their method works better for longer sequences. These details seems important to include in the paper. + +In the AC's opinion, the paper quality is borderline and the work is of limited interest to the ICLR community. Such would would be more appreciated in the recommender systems community. The authors are encouraged to improve the paper with improved discussion of more recent work such as BERT4Rec, add comparisons against these more recent work, incorporate various suggestions from the reviewers, and resubmit to an appropriate venue.",ICLR2020, +Twvf8XCqaF,1576800000000.0,1576800000000.0,1,BklSwn4tDH,BklSwn4tDH,Paper Decision,Reject,"This paper focuses on avoiding overfitting in the presence of noisy labels. The authors develop a two phase method called pre-stopping based on a combination of early stopping and a maximal safe set. The reviewers raised some concern about illustrating maximal safe set for all data sets and suggest comparisons with more baselines. The reviewers also indicated that the paper is missing key relevant publications. In the response the authors have done a rather through job of addressing the reviewers comments. I thank them for this. However, given the limited time some of the reviewers comments regarding adding new baselines could not be addressed. As a result I can not recommend acceptance because I think this is key to making a proper assessment. That said, I think this is an interesting with good potential if it can outperform other baselines and would recommend that the authors revise and resubmit in a future venue.",ICLR2020, +15rFYiQSUk,1576800000000.0,1576800000000.0,1,SyxjVRVKDB,SyxjVRVKDB,Paper Decision,Reject," This paper proposes a method to capture patterns of the so called “off” neurons using a newly proposed metric. The idea is interesting and worth pursuing. However, the paper needs another round of modification to improve both writing and experiments. ",ICLR2020, +M0L0mTe44au,1610040000000.0,1610470000000.0,1,IqVB8e0DlUd,IqVB8e0DlUd,Final Decision,Reject,"This paper proposes an algorithm to address the disparate effect that DP has on the accuracy of minority/low-frequency sub-populations. Unfortunately the work does not actually guarantee or analyze the resulting privacy guarantees. In particular it may provide much worse privacy (or no privacy at all) to the minority subpopulation. +The paper also calls their algorithm ""fair"" without using an accepted term or a careful discussion of what an algorithm needs to satisfy to be considered ""fair"". Using a more technical term such ""reducing the accuracy disparity"" would make much more sense. + + ",ICLR2021, +YU96RojFTTn,1610040000000.0,1610470000000.0,1,bzVsk7bnGdh,bzVsk7bnGdh,Final Decision,Reject,"The paper focuses on NeuralODE and shows that for the implementation popular among ML community, one of the equations is not an ODE and can be replaced by an integral. This is implemented using ""seminorm"" (just assigning zero weight to the last equation). + +Pros: +- Well written +- Useful to replace the ""standard"" implementation +- Consistent benchmarking + +Cons: +- Contribution is too limited +- Used in several ""prior"" codes without explicit ICLR submission. +- (My personal) The title is not good: more on the ""hype side"" of the story, rather than progressing the field. I don't think we need to put every single minor fact into a ICLR submission. For example, one can just compute the integral as an alternative by any suitable quadrature rule. That would add 10-15 function evaluations at most, since most of the functions in NeuralODEs are quite smooth.",ICLR2021, +KNP7IwkVoE,1642700000000.0,1642700000000.0,1,e0jtGTfPihs,e0jtGTfPihs,Paper Decision,Accept (Poster),"This paper builds on previous work on supermasks. It proposes to replace binary masks by a signed supermask, i.e. a trainable, trashold-based mask that can take values from {-1,0,1}. This change (in combination with the use of ELUS activation functions and an ELUS specific initialization strategy) leads to a significantly higher pruning rate while keeping competitive performance in comparison to baseline models. + +Most reviewers agreed that the paper is well written and that the proposed approach and the experimental findings are interesting. The motivation to improve interpretability was commonly perceived as misleading. Another downside that was mentioned is the training time/efficiency. This however, should not be taken too much into account since the work focusses on finding the smallest possible subnetwork that still performs well (without changing the weight values) and- in line with work on the lottery hypothesis- aims at understanding more about the structure of the „winning tickets“ which is interesting for itself. The paper therefore should be accepted.",ICLR2022, +S9fnCUaUEV,1576800000000.0,1576800000000.0,1,HkeQ6ANYDB,HkeQ6ANYDB,Paper Decision,Reject,"This paper constitutes interesting progress on an important topic; the reviewers identify certain improvements and directions for future work, and I urge the authors to continue to develop refinements and extensions.",ICLR2020, +LNWVOGpVhJ,1610040000000.0,1610470000000.0,1,Ki5Mv0iY8C,Ki5Mv0iY8C,Final Decision,Reject,"All reviewers explain in detail, why they think the paper should not be accepted. Besides fixing an initially criticized format violation, the authors did not respond to any of the concerns raised the reviewers, and in fact, they partially agree that more work in another direction needs to be done. ",ICLR2021, +qR8g32awiF,1576800000000.0,1576800000000.0,1,BJgyn1BFwS,BJgyn1BFwS,Paper Decision,Reject,"The authors propose a framework for estimating ""global robustness"" of a neural network, defined as the expected value of ""local robustness"" (robustness to small perturbations) over the data distribution. The authors prove the the local robustness metric is measurable and that under this condition, derive a statistically efficient estimator. The authors use gradient based attacks to approximate local robustness in practice and report extensive experimental results across several datasets. + +While the paper does make some interesting contributions, the reviewers were concerned about the following issues: +1) The measurability result, while technically important, is not surprising and does not add much insight algorithmically or statistically into the problem at hand. Outside of this, the paper does not make any significant technical contributions. +2) The paper is poorly organized and does not clearly articulate the main contributions and significance of these relative to prior work. +3) The fact that the local robustness metric is approximated via gradient based attacks makes the final results void of any guarantees, since there are no guarantees that gradient based attacks compute the worst case adversarial perturbation. This calls into question the main contribution claim of the paper on computing global robustness guarantees. + +While some of the technical aspects of the reveiwers' concerns were clarified during the discussion phase, this was not sufficient to address the fundamental issues raised above. + +Hence, I recommend rejection.",ICLR2020, +B1zJ4JarG,1517250000000.0,1517260000000.0,260,BJk7Gf-CZ,BJk7Gf-CZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Understanding global optimality conditions for deep nets even in the restricted case of linear layers is a valuable contribution. Please add clarifications to ways in which the paper goes beyond the results of Kawaguchi'16, which was the main concern expressed by the reviewers.",ICLR2018, +Hk1xafIdx,1486400000000.0,1486400000000.0,1,rJTKKKqeg,rJTKKKqeg,ICLR committee final decision,Accept (Poster),"Reviewers found this work to be a ""well-motivated"", ""good contribution"", and ""clever"". The idea was clearly conveyed and reviewers were convinced that the approach was simpler than others like NTMs. Experiments are sufficient, and the work will likely be used in the future. + + Pros: + - Well-explained and expected to be widely implemented + - Experimental results convincing on the tasks. + + Cons: + - Several questions about ""generalization to some unseen entities"". + - Reliance on synthetic tasks unclear if ""scalable to complex real tasks"" + - Training process seems quite complicated (although again simpler that NTMs)",ICLR2017, +W4MObQvX4QP,1642700000000.0,1642700000000.0,1,j97zf-nLhC,j97zf-nLhC,Paper Decision,Reject,"The author response addressed some reviewer concerns, and generally reviewers increased their scores. However, there are important, and unanswered concerns about the generalization of the model. The discussion raised the concerns that despite the paper claim of ""a specific class of higher order reasoning"" emerging, the result suggests relatively simple strategies. This might not be a limitation of the approach, but of the evaluation scenario. So, this either requires a more nuanced view of the findings, and further empirical evidence to support the claim.",ICLR2022, +RP5jcRxbI,1576800000000.0,1576800000000.0,1,BklxN0NtvB,BklxN0NtvB,Paper Decision,Reject,"This paper argues that NNs deployed to hardware needs to robust to additive noise and introduces two methods to achieve this. + +The reviewers liked aspects of the paper and the paper is borderline. However, all in all sufficient reservations were raised to put the paper below the threshold. The criticism was constructive and can be used in an updated version submitted to next conference. + +Rejection is recommended.",ICLR2020, +jdsalY1drH2,1642700000000.0,1642700000000.0,1,LGTmlJ10Kes,LGTmlJ10Kes,Paper Decision,Reject,"The paper proposes a new curriculum learning framework by parameterizing data partitioning and weighting schemes. Extensive experiments are performed on three different datasets to demonstrate the effectiveness of the proposed framework. The reviewers acknowledged that the proposed framework is interesting as it encompasses several existing curriculum learning methods. However, the reviewers pointed out several weaknesses in the paper and shared concerns, including the scalability of the framework to larger datasets and the significance of the improvements over baselines. I want to thank the authors for their detailed responses. Based on the reviewers’ concerns and follow-up discussions, there was a consensus that the work is not ready for publication. The reviewers have provided detailed feedback to the authors. We hope that the authors can incorporate this feedback when preparing future revisions of the paper.",ICLR2022, +KKPWxNnVBQ,1576800000000.0,1576800000000.0,1,S1gmvyHFDS,S1gmvyHFDS,Paper Decision,Reject,"This paper offers an interesting and potentially useful approach to robust watermarking. The reviewers are divided on the significance of the method. The most senior and experienced reviewer was the most negative. On balance, my assessment of this paper is borderline; given the number of more highly ranked papers in my pile, that means I have to assign ""reject"".",ICLR2020, +BkNkQJprz,1517250000000.0,1517260000000.0,45,HktRlUlAZ,HktRlUlAZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The paper proposes a new deep architecture based on polar transformation for improving rotational invariance. The proposed method is interesting and the experimental results strong classification performance on small/medium-scale datasets (e.g., rotated MNIST and its variants with added translations and clutters, ModelNet40, etc.). It will be more impressive and impactful if the proposed method can bring performance improvement on large-scale, real datasets with potentially cluttered scenes (e.g., Imagenet, Pascal VOC, MS-COCO, etc.).",ICLR2018, +P1F6nmXSJHb,1642700000000.0,1642700000000.0,1,FpKgG31Z_i9,FpKgG31Z_i9,Paper Decision,Reject,"The paper proposed Trained ML oracles to find the decent direction and step size in optimization. The process they call grafting. Reviewers raised several concerns about the reliability of ML oracles in general settings which is valid. The rebuttal could not convince the reviewers to change their opinion. Ideally for an empirical only paper with heavy reliability on ML for critical decisions, to meet the high bar of ICLR there must be several experiments (5-10 datasets or more) on diverse datasets and settings. Also, there should be discussions on when and how the method fails and related discussions. In that sense the paper does not meet the bar for publication.",ICLR2022, +HkZ9VkpHf,1517250000000.0,1517260000000.0,407,Skk3Jm96W,Skk3Jm96W,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"Overall, the paper is missing a couple of ingredients that would put it over the bar for acceptance: + +- I am mystified by statements such as ""RL2 no longer gets the best final performance."" from one revision to another, as I have lower confidence in the results now. + +- More importantly, the paper is missing comparisons of the proposed methods on *already existing* benchmarks. I agree with Reviewer 1 that a paper that only compares on benchmarks introduced in the very same submission is not as strong as it could be. + +In general, the idea seems interesting and compelling enough (at least on the Krazy World & maze environments) that I can recommend inviting to the workshop track.",ICLR2018, +vWmu6zw_JI,1642700000000.0,1642700000000.0,1,OM_lYiHXiCL,OM_lYiHXiCL,Paper Decision,Accept (Poster),"This work proposed to detect backdoor in a black-box manner, where only the model output is accessible. + +Most reviewers think it is a valuable task, and this work provides a novel perspective of using adversarial perturbation to diagnosis the backdoor. Some theoretical analysis for linear models and kernel models are provided. There is still huge gap to analyze the DNN model. But on the other side, it provides some insight to understand the proposed method and could inspire further studies. + +Besides, since there have been many advanced backdoor attack methods, and many more are coming out, I am not sure that the proposed detection criteria is well generalizable, considering only some typical attack methods are tested. However, I think the studied problem is valuable, and the presented analysis is inspired for future works. Thus, I recommend for accept.",ICLR2022, +qOoNpaLjerI,1610040000000.0,1610470000000.0,1,_OGAW_hznmG,_OGAW_hznmG,Final Decision,Reject,"As one of the reviewers concisely summarized: This paper investigates maximum entropy (MaxEnt) inference and compares it to a Bayesian estimator and regularized maximum likelihood for finite models. + +Two reviewers specifically question whether they have learned anything new after reading. This combined with various other drawbacks described during the review phase led to strong agreement among the reviewers about a variety of deficiencies in this paper. One reviewer initially gave a relatively high score but has since revised his/her opinion in light of the other reviews and discussion. I find that the significance of this work is not high enough to warrant acceptance at this time, but the authors would do well to incorporate the reviewers suggestions to improve the paper. ",ICLR2021, +6l2YiXfJP,1576800000000.0,1576800000000.0,1,r1x3unVKPS,r1x3unVKPS,Paper Decision,Reject,"The submission proposes a method for adversarial imitation learning that combines two previous approaches - GAIL and RED - by simply multiplying their reward functions. The claim is that this adaptation allows for better learning - both handling reward bias and improving training stability. + +The reviewers were divided in their assessment of the paper, criticizing the empirical results and the claims made by the authors. In particular, the primary claims of handling reward bias and reducing variance seem to be not well justified, including results which show that training stability only substantially improves when SAIL-b, which uses reward clipping, is used. + +Although the paper is promising, the recommendation is for a reject at this time. The authors are encouraged to clarify their claims and supporting experiments and to validate their method on more challenging domains.",ICLR2020, +6yMAsGAygH,1576800000000.0,1576800000000.0,1,HJxhUpVKDr,HJxhUpVKDr,Paper Decision,Reject,"The authors present an approach to multi-task learning. Reviews are mixed. The main worries seem to be computational feasibility and lack of comparison with existing work. Clearly, one advantage to Cross-stitch networks over the proposed approach is that their approach learns sharing parameters in an end-to-end fashion and scales more efficiently to more tasks. Note: The authors mention SluiceNets in their discussion, but I think it would be appropriate to directly compare against this architecture - or DARTS [https://arxiv.org/abs/1806.09055], maybe - since the offline RSA computations only seem worth it if better than *anything* you can do end-to-end. I would encourage the authors to map out this space and situate their proposed method properly in the landscape of existing work. I also think it would be interesting to think of their approach as an ensemble learning approach and look at work in this space on using correlations between representations to learn what and how to combine. Finally, some work has suggested that benefits from MTL are a result of easier optimization, e.g., [3]; if that is true, will you not potentially miss out on good task combinations with your approach? + +Other related work: +[0] https://www.aclweb.org/anthology/C18-1175/ +[1] https://www.aclweb.org/anthology/P19-1299/ +[2] https://www.aclweb.org/anthology/N19-1355.pdf - a somewhat similar two-stage approach +[3] https://www.aclweb.org/anthology/E17-2026/",ICLR2020, +xQr0tIp3cpX,1610040000000.0,1610470000000.0,1,ggNgn8Fhr5Q,ggNgn8Fhr5Q,Final Decision,Reject,"The paper analyses the behaviour of Neural Processes in the frequency domain and, in particular, how it suppresses high-frequency components of the input functions. While this is entirely intuitive, the paper adds some theoretical analysis via the Nyquist-Shannon theorem. But the analysis remains too generic and it is not clear it will be of broad interest to the community. ",ICLR2021, +qgOkBtgqlM,1642700000000.0,1642700000000.0,1,PlFtf_pnkZu,PlFtf_pnkZu,Paper Decision,Reject,"This paper has conducted extensive experiments to examine the scaling and transferring laws of LMs for machine translation and has concluded several interesting findings which could be inspiring to the future work. + +The main concerns from reviewers are that the novelty of this paper is not enough. In addition, the experiments are not well-designed and the clarity of this paper can be further improved. We hope the reviews can help authors improve their paper.",ICLR2022, +iRAtlD0FTH_,1610040000000.0,1610470000000.0,1,083vV3utxpC,083vV3utxpC,Final Decision,Reject,"The paper proposes an approach to selectively update the weights of neural networks in federated learning. This is an interesting and important problem. As several reviewers pointed out, this is highly related to pruning although with a different objective. It is an interesting paper but is a marginal case in the end due to the weakness on presentation and evaluation. + +",ICLR2021, +SXIN6VrCtY,1576800000000.0,1576800000000.0,1,HJgLZR4KvH,HJgLZR4KvH,Paper Decision,Accept (Talk),"This is a very interesting paper on unsupervised skill learning based on the predictability of skill effects, with the incorporation of these ideas into model-based RL. + +This is a clear accept, based on the clarity of the ideas presented and the writing, as well as the thorough and convincing experiments.",ICLR2020, +gi60eiXqGp,1642700000000.0,1642700000000.0,1,p98WJxUC3Ca,p98WJxUC3Ca,Paper Decision,Accept (Poster),"This paper deals with the important topic of active transfer learning. All reviewers agree that +while the paper presents some shortcomings , it is considered to be a worth contribution.",ICLR2022, +zQ4v0W1JRPc,1642700000000.0,1642700000000.0,1,OBwsUF4nFye,OBwsUF4nFye,Paper Decision,Reject,"Many problems in machine learning rely on multi-task learning (MTL), in which the goal is to solve multiple related machine learning tasks simultaneously. In this work, authors formalize notions of task-level privacy for MTL via joint differential privacy (JDP). They propose an algorithm for mean-regularized MTL, an objective commonly used for applications in personalized federated learning, subject to JDP. Then analyze objective and solver, providing certifiable guarantees on both privacy and utility. The main results, namely the convergence rate results, are hard to parse and hard to interpret. For example, as one reviewer pointed out, it is bounded below by a constant which is not properly explained. Further, comparisons to the literature in user-level privacy (which is equivalent as the task-level privacy) is not provided enough. Significant improvement in the presentation of the main results, along with an interpretable explanation of the contribution, is necessary for this manuscript.",ICLR2022, +tKSkXvMGkyY,1610040000000.0,1610470000000.0,1,oBmpWzJTCa4,oBmpWzJTCa4,Final Decision,Reject,"This paper was quite contentious. While reviewers appreciated the detailed response by the authors, and there is consensus that the paper addresses a relevant problem and contains interesting ideas, in the end there remain several concerns. The paper provides a complex combination of techniques from active learning, meta learning and symbolic reasoning (via MILPs), and there are concerns about the clarity of the exposition. For a paper claiming safety properties, there is also a lack of either formal theoretical analysis of well-specified safety properties, or a compelling demonstration of its effectiveness on a real system (all experiments are carried out in simulation).",ICLR2021, +S1gOJ3wWxV,1544810000000.0,1545350000000.0,1,BkedznAqKQ,BkedznAqKQ,"Good paper, recommend for acceptance.",Accept (Poster),"The reviewers unanimously agreed that the paper was a significant advance in the field of machine learning on graph-structured inputs. They commented particularly on the quality of the research idea, and its depth of development. The results shared by the researchers are compelling, and they also report optimal hyperparameters, a welcome practice when describing experiments and results. + +A small drawback the reviewers highlighted is the breadth of the content in the paper, which gave the impression of a slight lack of focus. Overall, the paper is a clear advance, and I recommend it for acceptance. ",ICLR2019,5: The area chair is absolutely certain +NOUOCaiiIc_,1642700000000.0,1642700000000.0,1,T8vZHIRTrY,T8vZHIRTrY,Paper Decision,Accept (Spotlight),"This manuscript introduces a theoretical framework to analyze the sim2real transfer gap of policies learned via domain randomization algorithms. This work focusses on understanding the success of existing domain randomization algorithms through providing a theoretical analysis. The theoretical sim2real gap analysis requires two critical components: *uniform sampling* and *use of memory* + +**Strengths** +All reviewers agree that this manuscript provides a strong theoretical analysis for an important problem (understanding sim2real gap) +well written manuscript, and well motivated +Intuitive understanding for theoretical analysis is provided + + +**Weaknesses** +analysis is limited to sim2real transfer without fine-tuning in the real world +the manuscript doesn't provide a novel experimental evaluation +lack of take-aways + +**Rebuttal** +The authors acknowledge the limitation of not addressing fine-tuning, but also point out that several papers have performed sim2real transfer without fine-tuning. +The authors address the lack of novel experimental evaluation by arguing that the theoretical analysis can be directly linked to existing algorithms for which empirical evaluations have already been performed. I agree with the authors that in that context it seems of little value to redo those experiments. However, I also believe that those links could be made even clearer in the manuscript and I would encourage the authors to do so. Furthermore, while the authors do provide intuitive take-aways for domain randomization algorithms, it would be helpful if those take-aways were more clearly linked to existing algorithms as well (given that there is no experimental evaluation of this). + +**Summary** +This manuscript provides a theoretical framework for analyzing the sim2real gap and using that framework provides bounds on the sim2real gap. All reviewers agree this is a strong theoretical analysis. Some take-aways on what makes domain randomization algorithms successful are provided by the provided sim2real-gap analysis (memory use, uniform sampling). Thus I recommend accept.",ICLR2022, +Hy21UkaSz,1517250000000.0,1517260000000.0,701,SyL9u-WA-,SyL9u-WA-,ICLR 2018 Conference Acceptance Decision,Reject,"Pros: ++ Clearly written paper. ++ Good theoretical analysis of the expressivity of the proposed model. ++ Efficient model update is appealing. ++ Reviewers appreciated the addition of results on the copy and adding tasks in Appendix C. + +Cons: +- Evaluation was on less-standard RNN tasks. A language modeling task should have been included in the empirical evaluation because language modeling is such an important application of RNNs. + +This paper is close to the decision boundary, but the reviewers strongly felt that demonstration of the method on a language modeling task was necessary for acceptance. +",ICLR2018, +eKytb1tm1x,1576800000000.0,1576800000000.0,1,HJxp9kBFDS,HJxp9kBFDS,Paper Decision,Reject,"This paper examines the interplay between the related ideas of invariance and robustness in deep neural network models. Invariance is the notion that small perturbations to an input image (such as rotations or translations) should not change the classification of that image. Robustness is usually taken to be the idea that small perturbations to input images (e.g. noise, whether white or adversarial) should not significantly affect the model's performance. In the context of this paper, robustness is mostly considered in terms of adversarial perturbations that are imperceptible to humans and created to intentionally disrupt a model's accuracy. The results of this investigation suggests that these ideas are mostly unrelated: equivariant models (with architectures designed to encourage the learning of invariances) that are trained with data augmentation whereby input images are given random rotations do not seem to offer any additional adversarial robustness, and similarly using adversarial training to combat adversarial noise does not seem to confer any additional help for learning rotational invariance. (In some cases, these types of training on the one hand seem to make invariance to the other type of perturbations even worse.) + +Unfortunately, the reviewers do not believe the technical results are of sufficient interest to warrant publication at this time. ",ICLR2020, +S1_6Bkprf,1517250000000.0,1517260000000.0,670,SJiHOSeR-,SJiHOSeR-,ICLR 2018 Conference Acceptance Decision,Reject,"This paper is lacking in terms of clarity and experimentation, and would require a lot of additional work to bring it to the standards of any high quality venue.",ICLR2018, +HJCgQkpHz,1517250000000.0,1517260000000.0,68,HkUR_y-RZ,HkUR_y-RZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper generally presents a nice idea, and some of the modifications to searn/lols that the authors had to make to work with neural networks are possibly useful to others. Some weaknesses exist in the evaluation that everyone seems to agree on, but disagree about importance (in particular, comparison to things like BLS and Mixer on problems other than MT). + +A few side-comments (not really part of meta-review, but included here anyway): +- Treating rollin/out as a hyperparameter is not unique to this paper; this was also done by Chang et al., NIPS 2016, ""A credit assignment compiler..."" +- One big question that goes unanswered in this paper is ""why does learned rollin (or mixed rollin) not work in the MT setting."" If the authors could add anything to explain this, it would be very helpful! +- Goldberg & Nivre didn't really introduce the _idea_ of dynamic oracles, they simply gave it that name (e.g., in the original Searn paper, and in most of the imitation learning literature, what G&n call a ""dynamic oracle"" everyone else just calls an ""oracle"" or ""expert"")",ICLR2018, +SJCeTGU_g,1486400000000.0,1486400000000.0,1,Sk2Im59ex,Sk2Im59ex,ICLR committee final decision,Accept (Poster),"The authors propose a application of GANs to map images to new domains with no labels. E.g., an MNIST 3 is used to generate a SVHN 3. Ablation analysis is given to help understand the model. The results are (subjectively) impressive and the approach could be used for cross-domain transfer, an important problem. All in all, a strong paper.",ICLR2017, +kq561PtLpfT,1642700000000.0,1642700000000.0,1,6p8D4V_Wmyp,6p8D4V_Wmyp,Paper Decision,Reject,"This paper proposes a new dataset, called RainNet, obtained from gridded precipitation data, for training precipitation downscaling methods, as well as a new neural network-based architecture for that task, which estimates the underlying dynamics of the local weather system, and new metrics for evaluating precipitation downscaling methods. + +Reviewers praised the large, novel and useful dataset (D3tQ, szBD, ggKX) and novel metrics for evaluating statistical downscaling methods (D3tQ), along with evaluation on 14 baselines (szBD, ggKX). + +There were however many issues highlighted by the reviewers. First, reviewer D3tQ raised concerns about the paper being resubmitted after rejection from NeurIPS (/pdf?id=VVZZJiQB51l), with minimal changes (/pdf?id=6p8D4V_Wmyp), and noticed that the authors did not follow up on most reviewer recommendations. D3tQ noticed however that in the ICLR resubmission, the cross validation results were presented to provide a more robust comparison between models, and that the discussion of metrics in section 4 was much more thorough than in the previous version. + +Other themes in the negative reviews included concerns about missing standard errors in the cross-validation results (D3tQ, 5pVg) or measures of uncertainty in the upscaling (ggKX), lack of information about hyperparameter tuning (D3tQ), inadequate literature review about statistical downscaling (D3tQ), lack of information about the dataset (5pVg), missing discussion about applications (ggKX) and insufficient proofreading (D3tQ, 5pVg). + +I will not take into consideration the criticism from szBD who ""don't feel that ICLR is the right venue for this work"" as I do not find such opinions to be much helpful. + +The authors did not provide a rebuttal to the initial reviews and there was no discussion about this paper among the reviewers. Given the issues raised by the reviewers and the scores of 3, 3, 5 and 6, I believe that this paper does not meet the acceptance bar in its current form. + +Sincerely, +AC",ICLR2022, +-0ePms0SIF,1576800000000.0,1576800000000.0,1,H1lKd6NYPS,H1lKd6NYPS,Paper Decision,Reject,"There was extension discussion of the paper between the reviewers. It's clear that the reviewers appreciated the main idea in the paper, and the notion of an ""online"" meta-critic that accelerates the RL process is definitely very appealing. However, there were unanswered questions about what the method is actually doing that make me reticent to recommend acceptance at this point. I would refer the authors to R3 and R1 for an in-depth discussion of the issues, but the short summary is that it's not clear whether, if, and how the meta-loss in this case actually converges, and what the meta-critic is actually doing. In the absence of a theoretical understanding for what the modification does to accelerate RL, we are left with the empirical experiments, and there it is necessary to consider alternative hypotheses and perform detailed ablation analyses to understand that the method really works for the reasons stated by the authors (and not some of the alternative explanations, see e.g. R3). While there is nothing wrong with a result that is primarily empirical, it is important to analyze that the empirical gains really are happening for the reasons claimed, and to carefully study convergence and asymptotic properties of the algorithm. The comparatively diminished gains with the stronger algorithms (TD3 and especially SAC) make me more skeptical. Therefore, I would recommend that the paper not be accepted at this time, though I encourage the authors to resubmit with a more in-depth experimental evaluation.",ICLR2020, +PjxGfV5Bb1,1610040000000.0,1610470000000.0,1,zrT3HcsWSAt,zrT3HcsWSAt,Final Decision,Accept (Spotlight),"This paper focuses on the problem of performing imitation learning from trajectory-level data that includes optimal as well as suboptimal demonstrations. The authors wish to avoid the requirement of a separate filtering process that would throw away the bad trajectories. The authors propose a clever innovation that allows for leveraging the policy that is itself being learned to reweight the samples for a next round of weighted behavioral cloning. The paper is also somewhat theoretically rigorous and provides insight into the problem. + +The reviewers pointed out some initial issues related to clarity and the authors did a good job of addressing reviewer concerns. Ultimately all reviewers agreed that the core innovation of the paper was interesting and empirically worked reasonably well. + +One older line of work that I think is quite relevant, but which is not discussed, is the empirically observed ""clean-up effect"", described by Michie and colleagues in the 90s (e.g. ""Learning to fly"" Sammut et al 1992). This clean-up effect is intuitive and reportedly achieved for free in settings where the learning objective is mode-seeking and the dataset is large, insofar as the mean value of the resulting policy *should* produce actions that corresponds to the average action produced by demonstrators in the same situation. I think it would be worth discussing how the analysis of this paper relates to this empirical phenomenon. In particular, it would be worth clarifying in what regimes the suboptimality of training from a dataset with noisy examples arises and how likely this is to effect the mean value of the learned policy (for context, it is fairly common in practice to evaluate the student policy in BC settings by only using the mean action value; perhaps this point was present in the paper, and I missed it). From a certain perspective, the innovation of this paper is to accentuate the clean-up effect. + +As noted by a reviewer, and subsequently incorporated into the paper, the actual algorithm has some similarities to versions of recent ""offline RL"" algorithms (though of course it does not leverage rewards). In particular, the motif of performing a weighted regression could perhaps be a bit more thoroughly contextualized by connecting it to other weighting factors (e.g. see Critic Regularized Regression). That said, I leave this entirely to the discretion of the authors. + +The final scores were 8, 7, & 6. I see this as a strong paper and will endorse it for a spotlight.",ICLR2021, +0EpVC3wT3F0,1610040000000.0,1610470000000.0,1,WDVD4lUCTzU,WDVD4lUCTzU,Final Decision,Reject,"This paper proposes a Conditional Masked Language Modeling (CMLM) method to enhance the MLM by conditioning on the contextual information. + +All of the reviewers think the results are good. However, the reviewers also think the intuition and experiments are not so convincing. The responses and revisions still not satisfy all the reviewers' major concern.",ICLR2021, +GQh3dPxJ7eU,1610040000000.0,1610470000000.0,1,tw60PTRSda2,tw60PTRSda2,Final Decision,Reject,"This paper is a computational linguistic study of the semantics that can be inferred form text corpora given parsers (which are trained on human data) are used to infer the verbs and their objects in text. The reviewers agreed that the work was well executed, and that the experiments comparing the resulting representations to human data were solid. The method employed has little or no technical novelty (in my opinion, not necessarily a flaw), and it's not clear what tasks (beyond capturing human data) representations could be applied to (again, not a problem if the goal is to develop theories of cognition). + +The first draft of the work missed important connections to the computational linguistics literature, where learning about 'affordances for verbs' (referred to as 'selectional preferences') has long been an important goal. The authors did a good job of setting out these connections in the revised manuscript, which the reviewers appreciated. + +The work is well executed, and should be commended for relating ideas from different sub-fields in its motivation and framing. But my sincere view is that it does not meet the same standards of machine-learning or technical novelty met by other papers at this conference. It is unclear to me what the framing in terms of 'affordance' adds to a large body of literature studying the semantics of word embeddings, given various syntactically and semantically-informed innovations. It feels to me like this work would have been an important contribution to the literature in 2013, but given the current state of the art in representation learning from text and jointly learning from text and other modalities, I would like to have seen some attempt to incorporate these techniques and bridge the gap between the notion of affordance in text/verbs (selectional preference) and Gibson's notion of object affordance (what you can do physically with an object) in experiments and modelling, not just in the discussion. Such a programme of research could yield fascinating insights into the nature of grounding, and the continuum from the concrete, which can be perceived and directly experienced, to the abstract, which must be learned from text. I encourage the authors to continue in this direction. An alternative is to consider submitting the current manuscript to venue where the primary focus is cognitive modelling, and accounting for human, behavioural data, and where there is less emphasis on the development of novel methods or models. + +For these reasons, and considering the technical scope of related papers in the programme, I cannot fairly recommend acceptance in this case. ",ICLR2021, +Byg9LP8XlV,1544940000000.0,1545350000000.0,1,BJg4Z3RqF7,BJg4Z3RqF7,"Good work, but a few issues should be addressed in the camera-ready version.",Accept (Poster),"This paper proposes a GAN-based method to recover images from a noisy version of it. The paper builds upon existing works on AmbientGAN and CS-GAN. By combining the two approaches, the work finds a new method that performs better than existing approaches. + +The paper clearly has new interesting ideas which have been executed well. Two of the reviewers have voted in favour of acceptance, with one of the reviewer providing an extensive and detailed review. The third reviewer however has some doubts which were not resolved completely after the rebuttal. + +Upon reading the work myself, I am convinced that this will be interesting to the community. However, I will recommend the authors to take the comments of Reviewer 2 into account and do whatever it takes to resolve issues pointed by the reviewer. + +During the review process, another related work was found to be very similar to the approach discussed in this work. This work should be cited in the paper, as a prior work that the authors were unaware of. +https://arxiv.org/abs/1812.04744 +Please also discuss any new insights this work offers on top of this existing work. + +Given that the above suggestions are taken into account, I recommend to accept this paper. +",ICLR2019,3: The area chair is somewhat confident +ByBNQ1prG,1517250000000.0,1517260000000.0,117,SyZI0GWCZ,SyZI0GWCZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The reviewers all agree this is a well written and interesting paper describing a novel black box adversarial attack. There were missing relevant references in the original submission, but these have been added. I would suggest the authors follow the reviewer suggestions on claims of generality beyond CNN; although there may not be anything obvious stopping this method from working more generally, it hasn't been tested in this work. Even if you keep the title you might be more careful to frame the body in the context of CNN's.",ICLR2018, +dYm2Eso-70n,1610040000000.0,1610470000000.0,1,IjIzIOkK2D6,IjIzIOkK2D6,Final Decision,Reject,"This paper presents a differentiable neural architecture search method for GNNs using Gumbel softmax-based gating for fast search. It also introduces a transfer technique to search architectures on smaller graphs with similar properties as the target graph dataset. The paper further introduces a search space based on GNNs message aggregators, skip connections, and layer aggregators. Results are presented on several undirected graph datasets without edge features on both node and graph classification. + +The reviewers mention that the results are promising, but they unanimously agree that the paper does not meet the bar for acceptance in its current form. I tend to agree with the reviewers in that the effect of the individual contributions (search space vs. method vs. transfer) needs to be better disentangled and studied independently, and that it is unclear why selecting a single aggregation function out of many is important vs. choosing multiple ones at the same time such as in PNA [1] as pointed out by R1. This should be carefully studied going forward. Lastly, all reviewers agreed that the proposed transfer method requires more detailed experimental validation and motivation. + +[1] Corso et al.: Principal Neighbourhood Aggregation for Graph Nets (NeurIPS 2020)",ICLR2021, +2aPZHzwv7c,1576800000000.0,1576800000000.0,1,rylrI1HtPr,rylrI1HtPr,Paper Decision,Reject,"This paper proposes to use the grey level co-occurrence matrix method (GLCM) for both the performance evaluation metric and an auxiliary loss function for single image super resolution. Experiments are conducted on X-ray images of rock samples. Three reviewers provide comments. Two reviewers rated reject while one rated weak reject. The major concerns include the lack of clear and detailed description, low novelty, limited experiment on only one database, unconvincing improvement over the prior work, etc. The authors agree that the limited experiment on one database does not demonstrate the generalization capability of the proposed method. The AC agrees with the reviewers’ comments, and recommend rejection.",ICLR2020, +HJxIL16rM,1517250000000.0,1517260000000.0,783,r1kjEuHpZ,r1kjEuHpZ,ICLR 2018 Conference Acceptance Decision,Reject,"Each of the reviewers had a slightly different set of issues with this paper but here is an attempt at a summary: + +PROS: +1. Paper is mostly clear and well structured. + +CONS: +1. Lack of novelty +2. Unsupported claims +3. Questionable methodology (using dropout confounds the goal of the experiment) + +The authors did not submit a rebuttal.",ICLR2018, +_MRU5WumTCH,1642700000000.0,1642700000000.0,1,Y2eS8eWCsyG,Y2eS8eWCsyG,Paper Decision,Reject,"Summary: The given work studies one-shot object detection, and demonstrates an array of experiments that show that by increasing the number of categories in training data, the model can get better at one-shot detection. + +Pros: +- Well written +- Presents many experiments that are interesting +- Improves SOTA one-shot performance on COCO by using more data. + +Cons: +- Authors did not test their claims on a variety of model architectures +- Similar conclusions have been made in prior art. +- Conclusions are intuitive. As more categories are added, the likelihood of reducing semantic dissimilarity to a new novel category is quite high. +- Overall contribution is currently limited. + +Reviewers are unanimous in their decision. Authors did not alleviate concerns of reviewer. AC recommends authors take all feedback into consideration and submit to another venue or workshop.",ICLR2022, +fopvJWHJYPSJ,1642700000000.0,1642700000000.0,1,givsRXsOt9r,givsRXsOt9r,Paper Decision,Accept (Poster),"The paper considers representation learning of 3D molecular graphs. +The authors propose a message passing scheme using spherical coordinates. It is +tested on three datasets of 3D moleclular graphs. The authors offer an in depth +analysis of different aspects, with an extensive experimentation of the method. + +Strengths: + +- The SMP introduces an interesting method to alleviate the computation cost issue in SCS from O(nk^3) to O(nk^2). This method is important and can be generalized to more broad types of tasks. +- This is an empirical work, and the experimental results support the effectiveness of SMP. +- The proposed MP approach can better distinguish certain structures than some existing models. +- Incorporating torsion information when representing 3D molecules is novel and helpful +- While message passing methods on graphs exploit only the connectivity, this work shows an interesting method to include the embedding information in the case of geometrical graphs. + +Weaknesses: + +- The proposed SMP scheme in Eq. (1) lacks novelty since it basically enriches the GN framework in [1] with geometry features +- the architecture of the proposed SphereNet is similar to DimeNet +- Why SMP is better than Cartesian coordinate system (CCS) is not well explained. + +Overall, a majority of reviewers are in favor of acceptance and a third reviewer is happy with either acceptance or rejection and does not give strong reasons for rejecting the paper. My recommendation is, therefore, acceptance. I recommend the authors use the reviewers comments to improve the paper for its camera-ready version.",ICLR2022, +rkerMoV1x4,1544670000000.0,1545350000000.0,1,B1x-LjAcKX,B1x-LjAcKX,Meta-review,Reject,This paper proposes a new training approach for deep neural interfaces. The idea is to bootstrap from critics of other layers instead of using the final loss as target. The method is evaluated of CIFAR-10 and CIFAR-100 and found to improve performance slightly upon Sobolev training while being simpler. The reviewers found the idea interesting but were concerned about the strength of the experimental results. The datasets are similar and the significance of the results is not clear. The revision submitted by the authors was only able to address some of these issues such as the evaluation protocol.,ICLR2019,2: The area chair is not sure +19Wp_Zt0_9d,1610040000000.0,1610470000000.0,1,JBAa9we1AL,JBAa9we1AL,Final Decision,Accept (Spotlight),"The paper provides a method to train boosted decision trees to satisfy individual fairness. All of the reviews suggest that this paper is well-written and gives novel techniques for solving an interesting problem. The authors have addressed most of the concerns raised by the reviewers during their response. However, the authors should follow a suggestion in the reviews and include the running time in the empirical evaluation.",ICLR2021, +aWRV91ekAo,1610040000000.0,1610470000000.0,1,PhV-qfEi3Mr,PhV-qfEi3Mr,Final Decision,Reject,"This work develops a weight-quantization method for deep neural networks that is suitable for a type of analog hardware system known as crossbar-enabled analog computing-in-memory (CACIM). The goal of this work is to train models on GPUs in such a way that they retain their predictive accuracy during inference when deployed on the analogue hardware system. + +Pros: +* Good adaptation of quantization methods to the CACIM system +* Simple method +* Validation of the proposed method on multiple datasets and models + +Cons: +* Lack of novelty: the proposed method is a simple combination of two popular methods, LLoyd's quantization and noise-aware training + +All reviewers appreciate the simplicity of the method and the good fit to the hardware. The authors responded to all reviews and two reviewers acknowledged the authors' response. The authors acknowledge some reviewer observations (motivation of quantization as reducing analogue noise, lack of experiments on the actual CACIM system), and the authors added an experimental evaluation on the actual physical CACIM system showing that their method performs well. + +Overall the work is well-executed and the proposed method is a good fit to the CACIM system. However, the proposed quantization method is a straightforward adaptation of popular quantization methods.",ICLR2021, +0mLDQR5ywLe,1642700000000.0,1642700000000.0,1,8rCMq0yJMG,8rCMq0yJMG,Paper Decision,Reject,"This paper considers a domain adaptation setting where a source domain model trained on a server is adapted on a client using target domain dataset. The paper considers the setting where the client only has a modest memory footprint (e.g., an edge device) and uses a recently proposed technique ""TinyTL"" (NeurIPS 2020) which is based on freezing the network weights but only updating the biases and adding a lightweight residual module. The basic idea of the paper is also based on SHOT (ICML 2020). + +While the reviewers appreciate the problem setting and the basic idea, there were several concerns, some of which included: + +- Limited novelty: The paper's key ideas are largely based on SHOT and TinyTL and a simple combination of these with not such significant challenges or insights offered. +- Federated setting not considered adequately: Although the paper title and the abstract/introduction talk about the federated setting, the paper largely focuses on a single source and single client setting. +- Inadequate baselines and experiments: The federated learning baselines used in the paper are fairly basic ones (e.g., FedAvg). Some of the experimental results are not convincing enough. + +The paper received mixed scores and the reviewers engaged in discussions with the authors. However, the concerns still linger. Based on the reviews, discussion, and my own reading and assessment of the paper, I think the paper falls short of the acceptance threshold. The authors are advised to consider the reviewers' concerns to improve the manuscript for a future submission.",ICLR2022, +uI8PQjd_cM,1610040000000.0,1610470000000.0,1,Ti87Pv5Oc8,Ti87Pv5Oc8,Final Decision,Accept (Poster),"This paper considers meta-learning based on MAML. The authors use Neural Tangent Kernels (NTKs) to develop two meta-learning algorithms that avoid the inner-loop adaptation, which makes MAML computationally intensive. Experimental results demonstrate favorable empirical performance over existing methods. + +The paper is generally well written and readable. The proposed methods are well motivated and based on solid theoretical ground. The emprirical performance shows advantages in efficiency and quality. This work is worth acceptence in ICLR 2021. ",ICLR2021, +ry23B1aHf,1517250000000.0,1517260000000.0,659,S1GUgxgCW,S1GUgxgCW,ICLR 2018 Conference Acceptance Decision,Reject,"This paper combines existing models to detect topics and generate responses, and the resulting model is shown to be slightly preferred by human evaluators over baselines. This is quite incremental and the results are not impressive enough to stand on their own merit.",ICLR2018, +33bGeQzuxdx,1642700000000.0,1642700000000.0,1,0oSM3TC9Z5a,0oSM3TC9Z5a,Paper Decision,Reject,"The paper studies the Bayesian persuasion model in a more realistic setting where the sender does not know the receiver’s utility but can interact with the receiver repeatedly to learn the utility. The paper proposes a learning-based framework to optimize the sender’s strategy, then analyze the theoretical properties of the proposed framework, and perform extensive experiments. The reviewers acknowledged that the paper investigates an important problem of relaxing the practical shortcomings of the Bayesian persuasion model. However, the reviewers pointed out several weaknesses in the paper, and there was a clear consensus that the work is not ready for publication. The reviewers have provided very detailed and constructive feedback to the authors. We hope that the authors can incorporate this feedback when preparing future revisions of the paper.",ICLR2022, +rJlU4O2WeV,1544830000000.0,1545350000000.0,1,S1eBzhRqK7,S1eBzhRqK7,Paper decision,Reject,"Reviewers are in a consensus and recommended to reject after engaging with the authors. Please take reviewers' comments into consideration to improve your submission should you decide to resubmit. +",ICLR2019,4: The area chair is confident but not absolutely certain +61KUonbIR0,1576800000000.0,1576800000000.0,1,ryevtyHtPr,ryevtyHtPr,Paper Decision,Reject,"This paper investigates a notion of recognizing insideness (i.e., whether a pixel is inside a closed curve/shape in the image) with deep networks. It's an interesting problem, and the authors provide analysis on the limitations of existing architectures (e.g., feedforward and recurrent networks) and present a trick to handle the long-range relationships. While the topic is interesting, the constructed datasets are quite artificial and it's unclear how this study can lead to practically useful results (e.g., improvement in semantic segmentation, etc.). ",ICLR2020, +BAJiHidS-3,1642700000000.0,1642700000000.0,1,UeRmyymo3kb,UeRmyymo3kb,Paper Decision,Reject,"The paper proposes a method to change the graph structure for better robustness against adversarial attacks. The reviewers commend the authors for a clearly written paper and promising results. Several reviewers expressed concerns about experimental validation (specifically, comparison to truncated SVD and choice of baselines), complexity, and novelty. The rebuttal and follow-up discussion alleviated some of the concerns, but the reviewers still have outstanding issues, therefore the AC does not recommend accepting the paper.",ICLR2022, +9WJxoyJ3gt,1576800000000.0,1576800000000.0,1,HJgepaNtDS,HJgepaNtDS,Paper Decision,Reject,"This paper received two weak rejects (3) and one accept (8). In the discussion phase, the paper received significant discussion between the authors and reviewers and internally between the reviewers (which is tremendously appreciated). In particular, there was a discussion about the novelty of the contribution and ideas (AnonReviewer3 felt that the ideas presented provided an interesting new thought-provoking perspective) and the strength of the empirical results. None of the reviewers felt really strongly about rejecting and would not argue strongly against acceptance. However, AnonReviewer3 was not prepared to really champion the paper for acceptance due to a lack of confidence. Unfortunately, the paper falls just below the bar for acceptance. Taking the reviewer feedback into account and adding careful new experiments with strong results would make this a much stronger paper for a future submission.",ICLR2020, +nkq8-0WTyJq,1610040000000.0,1610470000000.0,1,0h9cYBqucS6,0h9cYBqucS6,Final Decision,Reject,"This paper presents an efficient secure aggregation algorithm in federated learning scenarios, which employs sparse random secure-sharing clients. Four experienced reviewers left valuable comments on this paper, and three of them are unfortunately negative to this work (4, 4, 3) while one reviewer is slightly on the positive side. + +The reviewers are generally positive about the main idea and the direction for this work, but they are not convinced of its mathematical soundness and practical benefits; the theoretical analysis and mathematical proof has been conducted only for simplified models while their practical advantage is not clear enough. Also, even the most positive reviewer (R3) is concerned about the novelty of the proposed approach. +Although the concerns raised in the original reviews have been partially clarified during the discussion phase, there still remain several critical limitations, which makes this paper require (probably) multiple rounds of revision before publication and this AC has a reservation for accepting this paper.",ICLR2021, +CYZy6SxD_ID,1642700000000.0,1642700000000.0,1,Oy9WeuZD51,Oy9WeuZD51,Paper Decision,Accept (Poster),"The paper considers the empirical distribution of layer/channel in CNN ,and proposes to use global null tests with Simes and Fisher statistics to aggregate the p-values. This method is competitive while computationally efficient. The underlying theoretical insights are discussed in detail. + +The paper received mixed ratings, and the discussions weren't active. So, AC carefully read the paper and inspected all reviews. Reviewer a8KZ comments were factually inaccurate in listing references, and lack substantial feedback on the actual content of the paper. Hence, the review was down-weighted. + +The other negative reviewer Ni17, as an OoD expert, unfortunately did not offer more feedback to author rebuttals. From what AC comprehends, the authors should have clarified their The theoretical guarantee and compared properly with Liu et. al. 2020 energy-score (ES). + +Considering the above, AC feels that the study deserves to be published.",ICLR2022, +R42CkBrpNI,1610040000000.0,1610470000000.0,1,Kzg0XmE6mxu,Kzg0XmE6mxu,Final Decision,Reject,"This paper proposed a novel Adversarial Deep Metric Learning approaches. The reviews pointed out the paper proposes an interesting idea and it is among the rare works that address directly robust metric learning which an important topic for efficient metric learning. +Some concerns were raised about the analysis and the lack of comparisons notably with other types of adversarial attacks. +The authors provide a rebuttal where they addressed some concerns raised by reviewers with some precisions on the work, its positioning with respect to other related papers and additional comparisons notably with other types of attacks. +A minor remark: there is a typo in Eq(13), where the $z$ in the loss function is actually not defined and should be included in the max function. +That being said, the contribution is still limited in considering only the infinite norm, analysis and comparisons to prior work remain weak. The paper does not meet the requirements for acceptance to ICLR in its current form. +I have then to propose rejection. +",ICLR2021, +MmO7_9Qk0F,1610040000000.0,1610470000000.0,1,9OHFhefeB86,9OHFhefeB86,Final Decision,Accept (Spotlight),All reviewers expressed consistent enthusiasm on this submission during the review process. No reviewers expressed concerns and objections to accept this submission during discussion. It is quite clear this is a strong submission and deserves accept.,ICLR2021, +B1xyzFyIg4,1545100000000.0,1545350000000.0,1,HkghV209tm,HkghV209tm,Borderline paper,Reject,"The reviewers expressed some interest in this paper, but overall were lukewarm about its contributions. R4 raises a fundamental issue with the presentation of the analysis (see the D_infty assumption). The AC thus goes for a ""revise and resubmit"".",ICLR2019,4: The area chair is confident but not absolutely certain +T0tQBM-_ksd,1642700000000.0,1642700000000.0,1,fKv__asZk47,fKv__asZk47,Paper Decision,Reject,"This paper studies a data-driven similarity metric for physical simulation data, based on entropy rate of a physical system. The authors consider a one-parameter family of spatial fields obtained by varying certain parameter, and use those in a self-supervised setup. +Reviewers were split in this submission. While some reviewers highlighted the novelty in the problem setup and the idea of considering one-parameter families, they also expressed concern about the lack of proper justification of the entropy analogy, as well as doubts on the empirical evaluation. Ultimately, and taking all these considerations into account, the AC believes this work would greatly benefit from another review cycle, by addressing the concerns expressed here. Therefore, the AC recommends rejection at this time.",ICLR2022, +3vFfxhvrS,1576800000000.0,1576800000000.0,1,SygcSlHFvS,SygcSlHFvS,Paper Decision,Reject,"The paper proposes a set of conditions that enable a mapping from word embeddings to relation embeddings in knowledge graphs. Then, using recent results about pointwise mutual information word embeddings, the paper provides insights to the latent space of relations, enabling a categorization of relations of entities in a knowledge graph. Empirical experiments on recent knowledge graph models (TransE, DistMult, TuckER and MuRE) are interpreted in light of the predictions coming from the proposed set of conditions. + +The authors responded to reviewer comments well, providing significant updates during the discussion period. Unfortunately, the reviewers did not engage further after their original reviews, and so it is hard to tell whether they agreed that the changes resolved all their questions. + +Overall, the paper provides much needed analysis for understanding of the latent space of relations on knowledge graphs. Unfortunately, the original submission did not clearly present the ideas, and it is unclear whether the updated version addresses all the concerns. The paper in its current state is therefore not yet suitable for publication at ICLR.",ICLR2020, +QNTtA61hWH,1610040000000.0,1610470000000.0,1,_qJXkf347k,_qJXkf347k,Final Decision,Reject,"The paper proposed a novel RL-based solution to the optimal partial of DNNs which is interesting to readers. +However, the paper is not well presented and hard to follow. The lack of comparisons agains existing solutions and inconsistencies in the writing as pointed out by the reviewers largely weakens the submission. +There's also no updates to the paper based on reviewers' comments. + +The main reason for the decision is lack of clarity and significance justifications.",ICLR2021, +TWbErN6GT1,1610040000000.0,1610470000000.0,1,WMUSP41HQWS,WMUSP41HQWS,Final Decision,Reject,"This paper proposes two methods to speed up the evaluation of neural ODEs: regularizing the ODE to be easier to integrate, and adaptively choosing which integrator to use. + +These two ideas are fundamentally sensible, but the execution of the current paper is lacking. In addition to writing and clarity issues, the main problem is not comparing to Finlay et al. The Kelly et al paper could potentially be considered concurrent work. + +I also suggest broadening the scope of the DISE method to ODE / SDE /PDE solvers in general, in situations where many similar differential equations need to be solved, amortizing the solver selection will be worthwhile even if there are no neural nets in the differential equation. + +I also encourage the authors to do experiments that explore the tradeoffs of different approaches, rather than aiming just for bold lines in tables.",ICLR2021, +QUmQEK9py,1576800000000.0,1576800000000.0,1,BJgcwh4FwS,BJgcwh4FwS,Paper Decision,Reject,"This paper proposed graph neural networks based approach for subgraph detection. The reviewers find that the overall the paper is interesting, however further improvements are needed to meet ICLR standard: +1. Experiments on larger graph. Slight speedup in small graphs are less exciting. +2. It seems there's a mismatch between training and inference. +3. The stopping criterion is quite heuristic. +",ICLR2020, +ByxJXDflx4,1544720000000.0,1545350000000.0,1,BJgLg3R9KQ,BJgLg3R9KQ,Meta-Review ,Accept (Poster),"This paper presents a large-scale annotation of human-derived attention maps for ImageNet dataset. This annotation can be used for training more accurate and more interpretable attention models (deep neural networks) for object recognition. All reviewers and AC agree that this work is clearly of interest to ICLR and that extensive empirical evaluations show clear advantages of the proposed approach in terms of improved classification accuracy. In the initial review, R3 put this paper below the acceptance bar requesting major revision of the manuscript and addressing three important weaknesses: (1) no analysis on interpretability; (2) no details about statistical analysis; (3) design choices of the experiments are not motivated. Pleased to report that based on the author respond, the reviewer was convinced that the most crucial concerns have been addressed in the revision. R3 subsequently increased assigned score to 6. As a result, the paper is not in the borderline bucket anymore. +The specific recommendation for the authors is therefore to further revise the paper taking into account a better split of the material in the main paper and its appendix. The additional experiments conducted during rebuttal (on interpretability) would be better to include in the main text, as well as explanation regarding statistical analysis. +",ICLR2019,5: The area chair is absolutely certain +2DOOu5eeNIe,1642700000000.0,1642700000000.0,1,CyKHoKyvgnp,CyKHoKyvgnp,Paper Decision,Accept (Spotlight),"The authors provide in this manuscript a theoretical analysis to explain why deep neural networks become linear in the neighbourhood of the initial optimisation point as their width tends to infinity. They approach this question by viewing the network as a multi-level assembly model. + +All reviewers agree that this is an interesting, novel, and relevant study. The paper is very well-written. + +Initially, a weak point raised by a reviewer was that an empirical evaluation of the theory was missing. The authors addressed this issue in a satisfactory manner in their response. + +In conclusion, this is a strong contribution worth publication.",ICLR2022, +rJg4Lm5lxV,1544750000000.0,1545350000000.0,1,SJfFTjA5KQ,SJfFTjA5KQ,rejection,Reject,"although the way in which the authors characterize existing rnn variants and how they derive a new type of rnn are interesting, the submission lacks justification (either empirical or theoretical) that supports whether and how the proposed rnn's behave in a ""learning"" setting different from the existing rnn variants.",ICLR2019,4: The area chair is confident but not absolutely certain +c9WLAoHnlHT,1642700000000.0,1642700000000.0,1,N9W24a4zU,N9W24a4zU,Paper Decision,Accept (Poster),The paper develops steerable partial differential operator and show how it can be used to build equivariant network. Experimentation on rotated MNIST and STL10 show the merits of the proposed method. Reviewers agreed on the significance of the work and that it brings new perspective on equivariance that would be interesting to the ICLR community. Accept,ICLR2022, +pugtAxBFOh2,1642700000000.0,1642700000000.0,1,xLfAgCroImw,xLfAgCroImw,Paper Decision,Accept (Poster),"This paper considers the valuation problem for a cooperative game, and shows that some classical metrics (e.g. Shapley value), can be considered as approximations to the maximum entropy. + +Reviewers were generally very positive. They especially praised the novelty and writing quality, while having some concerns about the quality of the empirical results. The authors did an excellent job responding to the reviewers, and resolved their main concerns. A few quibbles remain, however, and while the manuscript is very good as-is, please consider the reviewer criticisms in creating an updated version.",ICLR2022, +9aN2LcFvLA,1610040000000.0,1610470000000.0,1,Y45i-hDynr,Y45i-hDynr,Final Decision,Reject,"All three reviewers expressed consistent concerns on this submission in their reviews. In addition, none of them enthusiastically supported this work during discussion. It is clear this submission does not make the bar of ICLR. Thus a reject is recommended.",ICLR2021, +qw_VOwOwT--,1610040000000.0,1610470000000.0,1,45uOPa46Kh,45uOPa46Kh,Final Decision,Accept (Poster),"The authors propose to take a token-level generative approach to the task of vision-language navigation (R2R/R4R). The reviewers raise a number of concerns which should be noted in the final version of this work. The primary concern revolves around generality. How will this approach generalize to more sophisticated generative and discriminative models? To what extent is the model relying on the short instruction/action sequences to succeed and would not perform well on longer instructions, longer trajectories, or more abstract language. Finally, the discussion of the uninformed prior is interesting because while ""clean"", reviewers note there is no realistic grounded language scenario in which an uninformative prior makes sense.",ICLR2021, +YlAaJKPQoRl,1642700000000.0,1642700000000.0,1,Ew4hVmrrqJE,Ew4hVmrrqJE,Paper Decision,Reject,"While authors have updated the draft to address reviewers' concerns, some parts are still not clear enough and the presentation needs further improvements. I encourage authors to revise the draft accordingly and resubmit in the future venues.",ICLR2022, +RtT8q6UrUE,1576800000000.0,1576800000000.0,1,r1eiu2VtwH,r1eiu2VtwH,Paper Decision,Accept (Poster),"This paper proposes Neural Oblivious Decision Ensembles, a formulation of ensembles of decision trees that is end-to-end differentiable and can use multi-layer representation learning. The reviewers are in agreement that this is a novel and useful tool, although there was some mild concern about the extent of the improvement over other methods. Post-discussion, I am recommending the paper be accepted.",ICLR2020, +BJgSRT1fxV,1544840000000.0,1545350000000.0,1,HyeU1hRcFX,HyeU1hRcFX,Use of multimodal prior for mode conditional generation but lacks convincing experiments to demonstrate its use/benefits ,Reject,"The paper uses a multimodal prior in GANs and reconstructs the latents back from images in two stages to match the generated data modes to the latent space modes. It is empirically shown that this can prevent mode collapse to some extent (including intra-class collapse). However the paper lacks a comparison with state of the art GANs that have been shown to get better FID scores (~21 for SN-GAN [1] vs ~28 in the paper) so the benefit here is unclear, particularly in cases when the mode prior is unknown. Similarly for other applications used in the paper such as inference and attribute discovery, it falls short of demonstrating quantitative improvements with the approach. For example, there is a growing body of work on unsupervised disentanglement in generative models with several metrics to measure it, which could be used to evaluate the attribute discovery performance. R1 has brought up the point of lack of comparisons which the AC agrees with. Authors have made revisions in the paper including some comparisons but these feel insufficient to establish the benefits of the method over state of the art in preventing mode collapse. + +A borderline paper as reflected in the reviewer scores but can be made stronger with experiments showing convincing improvements over state of the art in at least one of the applications considered in the paper. + + +[1] Miyato, T., Kataoka, T., Koyama, M., & Yoshida, Y. (2018). Spectral normalization for generative adversarial networks. ArXiv Preprint ArXiv:1802.05957.",ICLR2019,4: The area chair is confident but not absolutely certain +Ra1gOA9RsIo,1642700000000.0,1642700000000.0,1,#NAME?,#NAME?,Paper Decision,Reject,"This paper proposes an MLP-based neural network specifically designed for speech processing. The proposed Split & Glue layer is used to capture multi-resolution speech characteristics. The method achieved better performance in both command recognition and speech enhancement tasks. + +Two major concerns raised by the reviewers: +The proposed split & glue layer is similar to convolution. Although the authors revised the paper with more clarification on the differences, the op is equivalent to frame-wise convolution which has been explored in speech literature. This limits the novelty of the paper. +The experimental justifications are relatively simple and limited. On the voice command and speech enhancement tasks presented in the paper, stronger and better baselines would be more convincing to justify the benefit of the proposed method. Moreover, testing on large scale ASR tasks instead of the relatively simple voice command task would be more convincing. + +The decision is mainly based on the limited novelty and experimental justification.",ICLR2022, +kOtJnMbqed7,1610040000000.0,1610470000000.0,1,kB8DkEKSDH,kB8DkEKSDH,Final Decision,Reject,"The reviewer concerns generally centered around the novelty of replacing the distance metric for a policy constraint. While the authors clarified many of the reviewer concerns and added some additional comparisons, in the end it was not clear why the proposed approach was interesting: while it is true that this particular distance metric has not been evaluated in prior work, and the result would have been interesting if it resulted in some clear benefits either empirically or theoretically, in the absence of clear and unambiguous benefit, it's not clear how valuable this concept really is. After discussion, the reviewers generally found the paper to not be ready for publication in its present state.",ICLR2021, +rkycSyTrf,1517250000000.0,1517260000000.0,620,Hk-FlMbAZ,Hk-FlMbAZ,ICLR 2018 Conference Acceptance Decision,Reject,"The original paper was sloppy in its use of mathematical constructs such as manifolds, made assumptions that are poorly motivated (see review #2 for details), and presented an empirical evaluation is preliminary. Based on the reviews, the authors have substantially revised the paper to try and address those issues by adding new theory, etc. + +Unfortunately, it is difficult to assess whether these revisions are sufficient to address the aforementioned issues without going through a second round of ""full"" review. I encourage the authors to use the reviewer comments to further improve the paper, and re-submit to a different venue.",ICLR2018, +ocezxU3XfNh,1610040000000.0,1610470000000.0,1,ey1XXNzcIZS,ey1XXNzcIZS,Final Decision,Reject,"This paper proposes routing strategies for multilingual NMT. The motivation is to train a single mixture model that can serve the training and prediction of multiple models. Several strategies are proposed: token-level, sentence-level and task-level. This is a simple and straightforward approach (which is fine). The main concerns from the reviewers regard novelty and missing comparisons. In their updated draft, the authors added comparisons to bilingual models and they added a discussion wrt related work. However, the author’s response did not address enough some of other reviewers’ concerns regarding comparison with other approaches, and the lack of novelty persists (mixture models for multi-task learning have been previously proposed in the literature), which makes me lean towards rejection. I suggest the authors address these aspects in future iterations of their work. +",ICLR2021, +AlBAgatu-w,1576800000000.0,1576800000000.0,1,ByeGzlrKwH,ByeGzlrKwH,Paper Decision,Accept (Spotlight),"This paper has a few interesting contributions: (a) a bound for un-compressed networks in terms of the compressed network (this is in contrast to some prior work, which only gives bounds on the compressed network); (b) the use of local Rademacher complexity to try to squeeze as much as possible out of the connection; (c) an application of the bound to a specific interesting favorable condition, namely low-rank structure. + +As a minor suggestion, I'd like to recommend that the authors go ahead and use their allowed 10th body page!",ICLR2020, +wsZd_VtOAJE,1642700000000.0,1642700000000.0,1,xm6YD62D1Ub,xm6YD62D1Ub,Paper Decision,Accept (Poster),"This paper presents a self-supervised learning method for the multi-modal setting where each modality has its own feature extraction mapping, and i) the extracted features shall be close for paired data, ii) in the feature space each view has close to diagonal covariance, while iii) the scale for each feature dimension is constrained away from zero to avoid trivial features. The presentation is clear and the reviewers do not have major confusion on the methodology. There have been some discussions between the authors and reviewers, and most questions on the empirical study have been addressed by the authors with additional experiments. The remaining concern is on the novelty (difference from prior SSL methods especially Barlow-Twins) and significance. I think that while it is relatively straightforward to extend methods like Barlow-twins to the multi-modal setting, I do see the value of empirically demonstrating the effectiveness of an alternative loss to the currently pervasive contrastive learning paradigm, and hence the paper is worth discussion in my opinion. In the end, the method resembles classical multi-modal methods like canonical correlation analysis, in terms of the objective (matching paired data in latent space) and constraints (un-correlated feature in each view, and unit-scale constraint for each feature dimension); such connections shall be discussed.",ICLR2022, +HJPWXJ6rf,1517250000000.0,1517260000000.0,75,rJGZq6g0-,rJGZq6g0-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"An interesting paper, generally well-written. Though it would be nice to see that the methods and observations generalize to other datasets, it is probably too much to ask as datasets with required properties do not seem to exist. There is a clear consensus to accept the paper. + ++ an interesting extension of previous work on emergent communications (e.g., referential games) ++ well written paper + ",ICLR2018, +B1wa2fIdg,1486400000000.0,1486400000000.0,1,BybtVK9lg,BybtVK9lg,ICLR committee final decision,Accept (Poster),"The reviewers agree that the approach is interesting and the paper presents useful findings. They also raise enough questions and suggestions for improvements that I believe the paper will be much stronger after further revision, though these seem straightforward to address.",ICLR2017, +_RQub-SX5Z,1610040000000.0,1610470000000.0,1,refmbBH_ysO,refmbBH_ysO,Final Decision,Reject,"This paper clearly has great ideas and reviewers appreciated that. However, the lack of experiments that can be validated by the community (only 1 experiment on the proprietary dataset) is an issue. We don't know if the reported accuracy is a respectable one (in the public domain). Having a proprietary dataset is a plus, but no public benchmark raises concerns about reproducibility. +We recommend the authors to add some tasks and benchmarks for the community to check for themselves that the numbers reported are non-trivial. ",ICLR2021, +SkF6SJarM,1517250000000.0,1517260000000.0,671,BkpXqwUTZ,BkpXqwUTZ,ICLR 2018 Conference Acceptance Decision,Reject,This paper is nowhere near standards for publication anywhere.,ICLR2018, +bADJ3QCzum,1642700000000.0,1642700000000.0,1,G9JXCpShpni,G9JXCpShpni,Paper Decision,Reject,"The paper studies dyna-style MBRL in a resource-limited setting. It is evaluated on an acrobat task where it shows very promising results. + +The reviewers appreciated the extensive replies, but they did not fundamentally change their opinion. In particular: +- Lack of formal problem statement and definitions +- The experiment on a single task (and that being a non-standard version) isn't sufficient to demonstrate the general merits of the method + +While the ideas are very promising, the paper cannot be published in its current form. We'd hence like to highly encourage the authors to revise the paper and to re-submit at a different venue.",ICLR2022, +ryx_vctQl4,1544950000000.0,1545350000000.0,1,SJlt6oA9Fm,SJlt6oA9Fm,"A new design for channel selection, yet over-complicated. ",Reject,"This paper proposed Selective Convolutional Unit (SCU) for improving the 1x1 convolutions used in the bottleneck of a ResNet block. The main idea is to remove channels of low “importance” and replace them by other ones which are in a similar fashion found to be important. To this end the authors propose the so-called expected channel damage score (ECDS) which is used for channel selection. The authors also show the effectiveness of SCU on CIFAR-10, CIFAR-100 and Imagenet. + +The major concerns from various reviewers are that the design seems the over-complicated as well as the experiments are not state-of-the-art. In response, the authors add some explanations on the design idea and new experiments of DenseNet-BC-190 on CIFAR10/100. But the reviewers’ major concerns are still there and did not change their ratings (6,5,5). Based on current results, the paper is proposed for borderline lean reject. +",ICLR2019,4: The area chair is confident but not absolutely certain +vmFSpeIS8p,1576800000000.0,1576800000000.0,1,r1gfQgSFDr,r1gfQgSFDr,Paper Decision,Accept (Talk),"The authors design a GAN-based text-to-speech synthesis model that performs competitively with state-of-the-art synthesizers. The reviewers and I agree that this appears to be the first really successful effort at GAN-based synthesis. Additional positives are that the model is designed to be highly parallelisable, and that the authors also propose several automatic measures of performance in addition to reporting human mean opinion scores. The automatic measures correlate well (though far from perfectly) with human judgments, and in any case are a nice contribution to the area of evaluation of generative models. It would be even more convincing if the authors presented human A/B forced-choice test results (in addition to the mean opinion scores), which are often included in speech synthesis evaluation, but this is a minor quibble.",ICLR2020, +Kun6F-Q1ic1,1610040000000.0,1610470000000.0,1,SOVSJZ9PTO7,SOVSJZ9PTO7,Final Decision,Reject,"Four knowledgeable referees reviewed this paper; one reviewer (weakly) supports accept and other three indicate reject. Even with the rebuttal, all reviewers (including positive reviewer) have concerns on unconvincing experimental results (due to missing baselines for instance). I basically agree on negative reviews that this submission fails to have enough quality considering the high standard of ICLR.",ICLR2021, +nCH_vM3GHyX,1610040000000.0,1610470000000.0,1,Gj9aQfQEHRS,Gj9aQfQEHRS,Final Decision,Reject,"This paper presents a new graph neural network (GNN) architecture with attention and with applications to Boolean satisfiability. + +The reviewers expressed concerns over various aspects of the paper such as a need for better ablations and an analysis of the difficulty level of the SAT problems used in evaluation. No rebuttal was provided.",ICLR2021, +xJss0QWXzZm,1610040000000.0,1610470000000.0,1,6BWY3yDdDi,6BWY3yDdDi,Final Decision,Reject,"All reviewers agree that this paper is interesting, but needs improvement in order to be suitable for a highly competitive venue such as ICLR. Reviewer 3 is especially incisive and detailed, but other reviewers make similar points.",ICLR2021, +NE2IPdiNuKX,1642700000000.0,1642700000000.0,1,AmUhwTOHgm,AmUhwTOHgm,Paper Decision,Accept (Poster),"For pairs of pieces of text, the central idea of this paper is to combine the approaches of using bi-encoders (where a vector is formed from each text then compared), which are easily trained in an unsupervised manner, with cross-encoders (where the two texts are related at the token level), which are normally trained in a supervised manner. The chief contribution of this paper is to train a combined model (as a ""trans-encoder"") by doing co-training of both model types in an alternating cyclic self-distillation framework. The paper suggests that this allows unsupervised training of a cross-encoder. This claim met some pushback from the reviewers, since the method does require good quality aligned text pairs (much like a traditional MT system does), and so the result is a task-specific sentence-pair modeling approach rather than a generic unsupervised learning approach. + +In the discussion, downsides included the claims of ""unsupervised"" being overstated, the genuine remaining need for related sentence pairs, the lack of a more theoretical understanding of why this works, and the feeling that the paper is not yet fully mature. Upsides include solid work building from existing models, big performance improvements over SimCSE, novelty in combining previous ideas in a new way for a new problem, and good experiments. To my mind, while the requirement of related sentence pairs does mean the model is task-specific and less than fully unsupervised, this is still a common and useful scenario, the performance of the model is strong, and, while the proposed model is built from existing components and ideas, they are combined in an interesting new way to achieve an intriguing and strong new way of training models, and the discussion here (and now in Appendix A.2) of what the authors had to do to get the model to work in terms of choosing different losses, etc., convincingly demonstrated that the authors had thought significantly and deeply about the nature of their proposal and how to get it to work well. Moreover the authors were able to work expeditiously during the reviewing period to address other weaknesses, such as now providing results with other methods than SimCSE (DeCLUTR and Contrastive Tension) and on other language models (RoBERTa). + +As such, although this paper is clearly somewhat borderline rather than an unambiguous accept, I find myself quite convinced by the novelty, thoroughness, and intriguing nature of this work, and so my vote is to accept it.",ICLR2022, +fhrOmNoTyjO,1610040000000.0,1610470000000.0,1,hKps4HGGGx,hKps4HGGGx,Final Decision,Reject,"This paper presents an inference-softmax cross entropy (I-SCE) loss, a modification to the widely adopted ""Softmax Cross Entropy"" (SCE) loss, to achieve better robustness against adversarial attacks. The original submission had critical issues on motivation, theoretical analysis and experiments. Although the authors provided a revised version, it needs another round of thorough examination before publishing. ",ICLR2021, +3o5m1JLR97,1610040000000.0,1610470000000.0,1,_HsKf3YaWpG,_HsKf3YaWpG,Final Decision,Reject,"The authors argue that uniform priors for the high-level latent representations improve transferability, which is beneficial in a number of tasks involving transference. The approach is evaluated on deep metric learning, zero-shot domain adaptation and few-shot meta-learning. + +Pro: +- A simple yet effective method +- Signifiant gains in experimental study + +Cons: +- Close variants of this approach were proposed in previous works, and so the novelty of the current work is limited. +- There is no accompanying analysis which may shed new light on the advantages of the approach.",ICLR2021, +HkY1Ny6HG,1517250000000.0,1517260000000.0,267,HyydRMZC-,HyydRMZC-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"All reviewers gave ""accept"" ratings. +it seems that everyone thinks this is interesting work. + +The paper generated a large number of anonymous comments and these were addressed by the authors. ",ICLR2018, +ya3qXxHwApx,1642700000000.0,1642700000000.0,1,kSwqMH0zn1F,kSwqMH0zn1F,Paper Decision,Accept (Poster),"The paper proposes PipeGCN, a system that uses pipeline parallelism to accelerate distributed training of large-scale graph convolutional neural networks. Like some pipeline-parallel methods (but unlike others), PipeGCN involves asynchrony in the sense that its features and feature-gradients can be stale. The paper provides theoretical guarantees on the convergence of PipeGCN in the presence of this staleness, which is a nice contribution in itself. In discussion, the reviewers found the work to be well-executed and sound. All reviewers recommended acceptance, and I concur with this consensus.",ICLR2022, +E05yh5qCfD,1610130000000.0,1610470000000.0,1,KYPz4YsCPj,KYPz4YsCPj,Final Decision,Accept (Poster),"The paper introduces a new method for encoding dynamics of temporal networks. The approach, while not ground-breaking, is interesting and the results are fairly convincing. + +The submission raised a number of concerns from the reviewers. They questioned the complexity of the proposed approach (R3 and R4), the clarity/readability (R2 and R1), and appropriateness of the link sampling strategy (R2), as well as raised several more minor (from my perspective) issues. I believe that the authors adequately addressed most of these concerns in their rebuttal and the revision. R2 has confirmed that they read the rebuttal and raised their score to strong accept. Unfortunately, the other reviewers have not engaged during the discussion period, and it is unclear if they are satisfied with the clarifications and changes. Nevertheless, after reading the authors' responses and skimming through the manuscript, I believe that most concerns have been addressed, and this is a good paper that deserves to be accepted. That being said, the issue of readability has been raised by the reviewers, and, while I do not think the paper is unreadable, I do agree that there is much room for improvement. I would encourage the authors to polish the manuscript for the camera-ready version, as well as try to address the remaining concerns raised by the reviewers. + +",ICLR2021, +7wDRgiQP48,1642700000000.0,1642700000000.0,1,dFbKQaRk15w,dFbKQaRk15w,Paper Decision,Accept (Spotlight),"Improving the expressiveness of GNN is an important problem in the current graph learning community. Its key idea is to generate subgraphs from the original graph, then encode the subgraphs into the message passing process of GNN. The proposed method is proven to be strictly more powerful than 1-WL. The authors also quantize how design choices such as the subgraph selection policy and equivariant neural architecture can affect the architecture’s expressive power. + +After the rebuttal, all reviewers are glad to accept this submission. + +During the discussion, while reviewer B3oK has shown some concerns on the concurrent works in NeurIPS 2021, it should not affect the decision of the submission once the authors have discussed them in the main text. The authors have done this in their revision, thus an acceptance (spotlight) is suggested.",ICLR2022, +gTvGswwVMH,1610040000000.0,1610470000000.0,1,sMEpviTLi1h,sMEpviTLi1h,Final Decision,Reject,"The authors propose two algorithms and their theoretical analysis for solving bilevel optimization problems where the inner objective is assumed to be strongly convex. The authors have greatly improved the paper to answer reviewer comments and three out of four reviewers have increased their scores. That said, given the large amount of new material added to this paper during the discussion phase, the program committee believes the paper requires a new round of reviews for a confident assessment. We encourage the authors to resubmit their work to a top conference such as ICML.",ICLR2021, +mBf6NtdnJB,1576800000000.0,1576800000000.0,1,r1lUdpVtwB,r1lUdpVtwB,Paper Decision,Reject,"The authors propose a model which combines a neural machine translation system and a context-based machine translation model, which combines some aspects of rule and example based MT. This paper presents work based on obsolete techniques, has relatively low novelty, has problematic experimental design and lacks compelling performance improvements. The authors rebutted some of the reviewers claims, but did not convince them to change their scores. ",ICLR2020, +Hyl5JGcll4,1544750000000.0,1545350000000.0,1,B1xU4nAqK7,B1xU4nAqK7,"incremental, limited evaluation",Reject,"Strengths + +The paper proposes to include exploration for the PETS (probabilistic ensembles with trajectory sampling) +approach to learning the state transition function. The paper is clearly written. + +Weaknesses + +All reviewers are in agreement regarding a number of key weaknesses: limited novelty, limited evaluation, +and aspects of the paper are difficult to follow or are sparse on details. +No revisions have been posted. + +Summary + +All reviewers are in agreement that the paper requires significant work and that it is not ready for ICLR publication. +",ICLR2019,5: The area chair is absolutely certain +PHNMB1UiAYZ,1610040000000.0,1610470000000.0,1,SQ7EHTDyn9Y,SQ7EHTDyn9Y,Final Decision,Reject,"This paper investigates the topic of nondeterminism and instability in neural network optimization. The reviewers found the results on different sources of nondeterminism particularly interesting and relevant. The experiments are carried on both language and also vision, which strengthens the findings. Concerns were raised about the use of smaller non-standard models, which were somewhat mitigated by the addition of Resnet-18 experiments on CIFAR. The reviewers also noted that the measures used in the experimental protocol were already present in the literature, and that the proposed mitigation strategy is from another work. Furthermore, R2 also found that the optimization instability section should be more developed. The paper should be resubmitted with an improved discussion of related works and more developed section on instability as suggested by the reviewers.",ICLR2021, +r1q9HJpBG,1517250000000.0,1517260000000.0,630,r111KtCp-,r111KtCp-,ICLR 2018 Conference Acceptance Decision,Reject," + interesting approach for a detailed analysis of the limitations of autoencoders in solving a simple toy problem + - resulting insights somewhat trivial, not really novel, nor practically useful => lacks demonstration of a gain on non-toy task + - regularization study too limited in scope: lacking theoretical grounding, and more exhaustive comparison of regularization schemes.",ICLR2018, +Oo1IvMbxmcE,1610040000000.0,1610470000000.0,1,YtgKRmhAojv,YtgKRmhAojv,Final Decision,Reject,"This paper proposes to use a single parametric Householder reflection to represent Orthogonal weight matrices. +It demonstrates that this is sufficient provided that we make the reflection direction a function of the input vector. It is also demonstrated under which conditions this modified transformation is invertible. The derivations are sound. +This insight allows for cheaper forwarding of the model but it also comes with extra costs: It has an increased computational cost for inversion (e.g. requires optimisation) and, importantly, it does not allow to cache the $O(d)$ matrix so it is not clear there is an advantage of the method over exp maps when we have parameter sharing (e.g. as in RNNs), since the action of the matrix has to be recomputed every-time. The presented experiments are OK, but comparisons to other (potentially more efficient) methods are lacking as pointed out by the reviewers. As it stands it is not clear that this is an idea of broad interest, perhaps more suited to a specialised venue such as a workshop.",ICLR2021, +BJlUq8zleN,1544720000000.0,1545350000000.0,1,BkMn9jAcYQ,BkMn9jAcYQ,"Reasonably complete experiments, but motivation of method is still not clear.",Reject,"This paper proposes a method to resolve ""language drift,"" where a pre-trained X->language model trained in an X->language->Y pipeline drifts away from being natural language. In particular, it proposes to add an auxiliary training objective that performs grounding with multimodal input to fix this problem. Results are good on a task where translation is done between two languages. + +The main concern that was raised with this paper by most of the reviewers is the validity of the proposed task itself. Even after extensive discussion with the authors, it is not clear that there is a very convincing scenario where we both have a pre-trained X->language, care about the intermediate results, and have some sort of grounded input to fix this drift. While I do understand the MT task is supposed to be a testbed for the true objective, it feel it is necessary to additionally have one convincing use case where this is a real problem and not just the artificially contrived. This use case could either be of practical use (e.g. potentially useful in an application), or of interest from the point of view of cognitive plausibility (e.g. similar to how children actually learn, and inspired by cognitive science literature). + +A concern that offshoots from this is that because the underlying idea is compelling (some sort of grounding to inform language learning), a paper at a high-profile conference such as ICLR may help re-popularize this line of research, which has been a niche for a while. Normally I would say this is definitely a good thing; I think considering grounding in language learning is definitely an important research direction, and have been a fan of this line of work since reading Roy's seminal work on it from 15 years ago. However, if the task used in this paper, which is of questionable value and reality, becomes the benchmark for this line of work I think this might lead other follow-up work in the wrong direction. I feel that this is a critical issue, and the paper will be much stronger after a more realistic task setting is added. + +Thus, I am not recommending acceptance at this time, but would definitely like the authors to think hard and carefully about a good and realistic benchmark for the task, and follow up with a revised version of the paper in the future.",ICLR2019,4: The area chair is confident but not absolutely certain +1_EQHVdD6p2,1642700000000.0,1642700000000.0,1,ARyEf6Z77Y,ARyEf6Z77Y,Paper Decision,Reject,"This paper trains an expert style DNN that routes input examples to appropriate expert modules resulting in high accuracy on ImageNet with less compute. Reviewers have been positive about the strong empirical results. However the paper itself is not written well and reviewers had hard time figuring out actual architecture and training methodology. For example reviewers couldn't easily figure out the differences between LGM, WGM and SRM. + +The paper itself is sparse on why some of the choices have been made, their relation to existing methods and how do they affect the final performance. For example - In eq2, TCP objective has been normalized for each expert separately with a vague No Superiority Assumption. What motivates this assumption? Why is it reasonable? Eq 4 is quite similar to the load balancing loss in Switch Transformer paper. However there has been no discussion about the similarities and differences. + +I think the paper needs to rewritten with clear explanation of the actual architecture, in what aspects it is similar/differs to existing expert models. What key components are the reason for the superior performance? + +While I appreciate the authors for the ablations studies they presented during response phase, I think the paper requires major rewriting and cannot recommend acceptance at this stage.",ICLR2022, +s0g71vQdK0L,1642700000000.0,1642700000000.0,1,7_JR7WpwKV1,7_JR7WpwKV1,Paper Decision,Accept (Poster),"All three reviewers viewed this paper as marginally above the acceptance threshold (6). + +Most of the initial concerns of reviewers were around (a) the applicability of the theory to actual practical use cases and networks, and (b) the presentation and framing of the work, and scope of its results. There were fairly detailed responses from the authors: two of the three reviewers increased their scores after the author response. There's still some lingering questions as to how ""real-world"" relevant the theory is, but the consensus at this point is to accept the paper. + +My primary concern for acceptance would be that the proofs techniques are based on Boolean circuits, and none of the reviewers (nor the AC) are particularly familiar with this, and thus the proofs in the appendix have been only lightly reviewed. The ""impression"" of all reviewers is of correctness.",ICLR2022, +RKQzOhYsYd5,1642700000000.0,1642700000000.0,1,JLbXkHkLCG6,JLbXkHkLCG6,Paper Decision,Reject,"Learning policies from video demonstrations alone without paired action data is a promising paradigm for scaling up Imitation Learning. As such the paper is well-motivated. Two approaches P-SIL and P-DAC train rewards for RL training, based on learning Sinkhorn distances between trajectory embeddings and an adversarial approach. The reviews brought up lack of clarity in presentation and experimental results and ablation studies falling short of convincingly demonstrating value of distance functions used and other design tradeoffs. As such the paper does not meet the bar for acceptance at ICLR.",ICLR2022, +aC4uAM2KZ,1576800000000.0,1576800000000.0,1,BJlA6eBtvH,BJlA6eBtvH,Paper Decision,Reject,"The reviewers agreed that this paper tackles an important problem, continual learning, with a method that is well motivated and interesting. The rebuttal was very helpful in terms of relating to other work. However, the empirical evaluation, while good, could be improved. In particular, it is not clear based on the evaluation to what extent more interesting continual learning problems can be tackled. We encourage the authors to continue pursuing this work.",ICLR2020, +jPJl6sobK_U,1610040000000.0,1610470000000.0,1,HWX5j6Bv_ih,HWX5j6Bv_ih,Final Decision,Reject,"This paper received mixed reviews: two positives (6, 6) and two negatives (5, 3). However, the positive reviewers have very low confidence, do not show strong supports for this paper. The reviewers raised various concerns about this paper, and there still exist remaining critical issues although the authors made substantial efforts to answer the questions. + +After reading the paper and all the comments by the reviewers, I decided to recommend rejecting this paper mainly due to its weak technical contribution and ignorance of privacy issues. Note that this opinion is shared with two negative reviewers. The proposed model and alternative training scheme are straightforward, and the novelty is not distinct. Also, the authors seem to assume that ""the extracted feature vectors and corresponding gradients are not sensitive"". This comment is given by R2 but has not been clarified. The proposed method is lacking in this aspect and it is hard to say that it is an FL approach. + +",ICLR2021, +SJlp-MGzeN,1544850000000.0,1545350000000.0,1,Hygp1nR9FQ,Hygp1nR9FQ, Interesting proposal but the claims made are not well-supported,Reject,"The paper proposes a technique for defending against adversarial examples that relies on averaging pixels that are close to each other both in position and value. This approach seems to be an interesting preprocessing technique in the robust training pipeline. However, the actual claims made are not well-supported and, in fact, seem somewhat implausible. ",ICLR2019,5: The area chair is absolutely certain +HjUZRSKUJ,1576800000000.0,1576800000000.0,1,r1egIyBFPS,r1egIyBFPS,Paper Decision,Accept (Poster),"This work introduces a neural architecture and corresponding method for simplifying symbolic equations, which can be trained without requiring human input. This is an area somewhat outside most of our expertise, but the general consensus is that the paper is interesting and is an advance. The reviewer's concerns have been mostly resolved by the rebuttal, so I am recommending an accept. ",ICLR2020, +r1lTyZCSxN,1545100000000.0,1545350000000.0,1,rJgMlhRctm,rJgMlhRctm,Strong paper in an interesting new direction,Accept (Oral),"Strong paper in an interesting new direction. +More work should be done in this area.",ICLR2019,4: The area chair is confident but not absolutely certain +dXXMlZYieV,1576800000000.0,1576800000000.0,1,S1xJFREKvB,S1xJFREKvB,Paper Decision,Reject,"This paper introduces a variant of Nesterov momentum which saves computation by only periodically recomputing certain quantities, and which is claimed to be more robust in the stochastic setting. The method seems easy to use, so there's probably no harm in trying it. However, the reviewers and I don't find the benefits persuasive. While there is theoretical analysis, its role is to show that the algorithm maintains the convergence properties while having other benefits. However, the computations saved by amortization seem like a small fraction of the total cost, and I'm having trouble seeing how the increased ""robustness"" is justified. (It's possible I missed something, but clarity of exposition is another area the paper could use some improvement in.) Overall, this submission seems promising, but probably needs to be cleaned up before publication at ICLR. +",ICLR2020, +c-9o6iql0iB,1610040000000.0,1610470000000.0,1,QjINdYOfq0b,QjINdYOfq0b,Final Decision,Reject,"The paper proposes to integrate multiple bit configurations (including pruning) into a single architecture, and then automatically select bit resolution through binary gates. The overall approach can be differentiable and optimized with parameters. However, as pointed out by the reviewers, the novelty of this paper can be the big question. Also, it seems to be hard or unpractical to employ different number of bits for layers, given the standard GPU and CPU hardware. ",ICLR2021, +7pz9M3rCw0,1576800000000.0,1576800000000.0,1,ByedzkrKvH,ByedzkrKvH,Paper Decision,Accept (Poster),"Double coúnterfactual regret minimization is an extension of neural counterfactual regret minimization that uses separate policy and regret networks (reminiscent of similar extensions of the basic RL formula in reinforcement learning). Several new algorithmic modifications are added to improve the performance. + +The reviewers agree that this paper is novel, sound, and interesting. One of the reviewers had a set of questions that the authors responded to, seemingly satisfactorily. Given that this seems to be a high-quality paper with no obvious issues, it should be accepted.",ICLR2020, +Sy_fIyTrf,1517250000000.0,1517260000000.0,738,rye7IMbAZ,rye7IMbAZ,ICLR 2018 Conference Acceptance Decision,Reject,"This paper addresses the question of how to regularize when starting from a pre-trained convolutional network in the context of transfer learning. The authors propose to regularize toward the parameters of the pre-trained model and study multiple regularizers of this type. The experiments are thorough and convincing enough. This regularizer has been used quite a bit for shallow models (e.g. SVMs as the authors mention, but also e.g. more general MaxEnt models). There is at least some work on regularization toward a pre-trained model also in the context of domain adaptation with deep neural networks (e.g. for speaker adaptation in speech recognition). The only remaining novelty is the transfer learning context. This is not a sufficiently different setting to merit a new paper on the topic.",ICLR2018, +a-UAiwS-ojM,1610040000000.0,1610470000000.0,1,zsKWh2pRSBK,zsKWh2pRSBK,Final Decision,Reject,"The paper argues that a successful backdoor attack on classifiers is connected with further fundamental security issues. In particular they demonstrate and not only an original backdoor trigger but also other triggers can be inserted by anyone with access to the classifiers. Furthermore, the alternative triggers may appear very different from the original triggers, which confirms the claim in the paper's title that such classifiers are ""fundamentally broken"". + +The paper offers an interesting insight into the features of poisoned classifiers. However, such insight is diminished by the fact that the proposed attack requires a substantial manual interaction. The user must manually analyze the adversarial examples generated for robustified classifiers in order to determine the key parameters of alternative triggers. While manual intervention as such does not undermine the main observation of the paper, this makes an automatic exploitation of this idea hardly feasible and hence decreases the significance of the paper's main result. ",ICLR2021, +at1byXbGVgg,1610040000000.0,1610470000000.0,1,KvyxFqZS_D,KvyxFqZS_D,Final Decision,Accept (Oral),"This paper provides a global convergence guarantee for feedforward three-layer networks trained with SGD in the MF regime. By introducing the novel concept of neuronal embedding of a random initialization procedure, SGD trajectories of large-width networks are shown to be well approximated by the MF limit, a continuous-time infinite-width limit (Theorem 3). Furthermore, under some additional assumptions the MF limit is shown to converge to the global optimum when the loss is convex (Theorem 8, case 1) and for a generic loss when $y=y(x)$ is a deterministic function of input $x$ (Theorem 8, case 2). The global convergence guarantee presented in this paper is based on less restrictive assumptions compared with existing studies. All the reviewers rated this paper quite positively, with less confidence however, seemingly because of mathematical thickness of the proofs. Although the reviewers did not manage to check every detail of the proofs, they agreed that the reasoning seems mathematically sound as far as they can tell. The authors response adequately addressed minor concerns raised by the reviewers. I am thus glad to recommend acceptance of this paper. + +Pros: +- Introduces the idea of a neuronal embedding, which allows establishing relation between SGD on large-width three-layer networks and its MF limit in a quantitative way with a less restrictive setting. +- Provides a global convergence guarantee under the iid initialization, in the sense that if the MF limit converges it attains the global optimum. +- Shows that the global convergence guarantee does not require convexity of the loss when a deterministic function is to be learned. + +In particular, the uniform approximation property, rather than the convexity of the loss, plays a crucial role in proving the global convergence guarantee (it allows translation of the vanishing gradient in expectation at convergence into the almost-sure vanishing gradient), which is a quite original contribution of this paper.",ICLR2021, +YPYO9DUiwz,1576800000000.0,1576800000000.0,1,HJgCcCNtwH,HJgCcCNtwH,Paper Decision,Reject,"This work proposes new initialization and layer topologies for training a priori sparse networks. Reviewers agreed that the direction is interesting and that the paper is well written. Additionally the theory presented on the toy matrix reconstruction task helped motivate the proposed approach. However, it is also necessary to validate the new approach by comparing with existing sparsity literature on standard benchmarks. I recommend resubmitting with the additional experiments suggested by the reviewers.",ICLR2020, +DCcY4ZUKAE,1576800000000.0,1576800000000.0,1,Bkl8YR4YDB,Bkl8YR4YDB,Paper Decision,Reject,"The authors address the problem of training an NMT model on a really massive parallel data set of 40 billion Chinese-English sentence pairs, an order of magnitude bigger than other cz-en experiments. To address noise and training time problems they propose pretraining + a couple of different ways of creating a fine-tuning data set. Two of the reviewers assert that the technical contribution is thin, and the results are SOTA but not really as good as you might hope with this amount of data. This combined with the fact that the data set is not released, makes me think that this paper is not a good fit with ICLR and would more appropriate for an application focussed conference. The authors engaged strongly with the reviewers, adding more backtranslation results. The reviewers took their responses into account but did not change their scores. ",ICLR2020, +xtRKCUKF3kT,1610040000000.0,1610470000000.0,1,4NNQ3l2hbN0,4NNQ3l2hbN0,Final Decision,Reject,This paper addresses an interesting problem and all reviewers agree. Most reviewers found the paper difficult to understand and it was hard to see the novel contributions. The paper will need a significant revision before publication.,ICLR2021, +8FUmZBuESV,1642700000000.0,1642700000000.0,1,3wU2UX0voE,3wU2UX0voE,Paper Decision,Accept (Oral),"Strong submission that analyses the unsupervised skill discovery setting from the perspective of information geometry, which leads to some interesting conclusions. In particular, it is shown that this does not lead to skills that are optimal for all reward functions, but does provide a good initialization for methods that aim to find optimal policies. + +Across the board, the reviewers believe the analysis provided by this work is both important and novel. And while there were some initial concerns raised, such as lack of empirical confirmation of some of the claims and some questions about the analysis, the authors have addressed all of these concerns convincingly. + +Hence, I strongly recommend acceptance of this submission.",ICLR2022, +jPSiYwnp07,1642700000000.0,1642700000000.0,1,dZPgfwaTaXv,dZPgfwaTaXv,Paper Decision,Accept (Poster),"The paper presents an approach to learn the surrogate loss for complex prediction tasks where the task loss is non-differentiable and non-decomposable. The novelty of the approach is to rely on differentiable sorting, optimizing the spearman correlation between the true loss and the surrogate. This leads to a pipeline that is simpler to integrate to existing works than approaches that try to learn a differentiable approximation to the task loss, and to better experimental results. + +The paper is well written and the approach clearly presented. The reviewers liked the simplicity of the approach and the promising experimental results on a variety of challenging tasks (human pose estimation and machine reading).",ICLR2022, +PHJ5-wYH2Zh,1642700000000.0,1642700000000.0,1,kDF4Owotj5j,kDF4Owotj5j,Paper Decision,Reject,"This is an interesting work, and I urge the authors to keep pushing this direction of research. Unfortunately, I feel like the manuscript, in its current format is not ready for acceptance. + +The research direction is definitely under-explored, which makes the evaluation of the work a bit tricky. Still I think that some of the points raised by the reviewers hold, for e.g. the need of additional baselines (to provide a bit of context for what is going on)I understand that the authors view their work as an improvement of the previously proposed DT network, however that is a recent architecture, not sufficiently established not to require additional baseline for comparisons. This combined with the novely of the dataset makes it really hard to judge the work. + +The write-up might also require a bit of attention. In particular it seems a lot of important details of the work (or clarifications regarding the method) ended up in the appendix. A lot of the smaller things reviewer pointed out the authors rightfully so acknowledged in the rebuttal and propose to fix, however I feel this might end up requiring a bit of re-organization of the manuscript rather that adding things at the end of the appendix. I also highlight (and agree) with the word ""thinking"" being overloaded in this scenario. + +Ablation studies (some done as part of the rebuttal) might be also a key component to get this work over the finish line. E.g. the discussion around the progressive loss. I acknowledge that the authors did run some of those experiments, though I feel a more in depth look at the results and interpretation of them (e.g. not looking just at final performance, but at the behaviour of the system), and integrating them in the main manuscript could also provide considerable additional insight in the proposed architecture. + +My main worry is that in its current format, the paper might not end up having the impact it deserves and any of the changes above will greatly improve the quality and the attention the work will get in the community.",ICLR2022, +rkeI0XTa14,1544570000000.0,1545350000000.0,1,HJx54i05tX,HJx54i05tX,ICLR 2019 decision,Accept (Oral),"This paper analyzes random auto encoders in the infinite dimension limit with an assumption that the weights are tied in the encoder and decoder. In the limit the paper is able to show the random auto encoder transformation as doing an approximate inference on data. The paper is able to obtain principled initialization strategies for training deep autoencoders using this analysis, showing the usefulness of their analysis. Even though there are limitations of paper such as studying only random models, and characterizing them only in the limit, all the reviewers agree that the analysis is novel and gives insights on an interesting problem. ",ICLR2019,4: The area chair is confident but not absolutely certain +Tq_C6TJmSs,1610040000000.0,1610470000000.0,1,48goXfYCVFX,48goXfYCVFX,Final Decision,Reject,"There is a broad consensus that this paper explores an interesting and novel problem space. Nonetheless, in their initial assessment, the reviewers pointed to a few limitations of the paper including lack of strong baselines, lack of an ablation study, and weaker results according to the HIT@10 metric. + +The authors provided an improved version of their paper as a response. The new paper added two baselines, is better written, and justifies some of the HIT@10 results (basically, the metric is biased for this task). + +After discussion, the reviewers find that the contribution of the current manuscript falls short of the acceptance threshold. In particular, the reviewers find that: 1) this contribution is for a specific domain of recommender systems, an area of interest, but perhaps only relevant to a subset of the ICLR community; 2) while more recent baselines helped, there has been lots of more recent work on collaborative filtering models for recommender systems over the last few years (the Wide&Deep baseline is from 2016); 3) since some of the usual recommender systems' metric does not seem appropriate here, why not suggest new ones (or propose a slightly different evaluation protocol); 4) the proposed model is useful, but somewhat incremental given prior work. All in all, while any of these limitations on their own might not have been sufficient to warrant rejection, I find that their combination does. + +Given the interest in this new task, I do strongly encourage the authors to pursue their work. I also find that the qualitative study propsoed by Reviewer 4 could add another interesting angle to this paper (I also imagine that it might not be that easy to carry out). ",ICLR2021, +B1PkwyTSG,1517250000000.0,1517260000000.0,911,r1ISxGZRb,r1ISxGZRb,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers were uniformly unimpressed with the contributions of this paper. The method is somewhat derivative and the paper is quite long and lacks clarity. Moreover, the tactic of storing autoencoder variables rather than full samples is clearly an improvement, but it still does not allow the method to scale to a truly lifelong learning setting. ",ICLR2018, +r1CvVk6SM,1517250000000.0,1517260000000.0,378,SkRsFSRpb,SkRsFSRpb,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,The reviewers found the paper meaningful but noted that they were not convinced by the experiments as they stand and the presentation was dense for them.,ICLR2018, +rkxV9_pqkN,1544370000000.0,1545350000000.0,1,HJlY0jA5F7,HJlY0jA5F7,Intersting ideas that need some further investigations ,Reject,"The paper proposes a novel sample based evaluation metric which extends the idea of FID by replacing the latent features of the inception network by those of a data-set specific (V)AE and the FID by the mean FID of the class-conditional distributions. Furthermore, the paper presents interesting examples for which FID fails to match the human judgment while the new metric does not. All reviewers agree, that while these ideas are interesting, they are not convinced about the originality and significance of the contribution and believe that the work could be improved by a deeper analysis and experimental investigation. +",ICLR2019,4: The area chair is confident but not absolutely certain +b3LFvGfBb5,1576800000000.0,1576800000000.0,1,ByljMaNKwB,ByljMaNKwB,Paper Decision,Reject,"Thanks for the detailed replies to the reviewers. +Their score was slightly improved, this paper is still below the bar given high competition of ICLR2020. +For this reason, we decided not to accept this paper.",ICLR2020, +cGZZM9KkDG2,1642700000000.0,1642700000000.0,1,T4-65DNlDij,T4-65DNlDij,Paper Decision,Accept (Poster),"This paper adds an attention mechanism to deep variational autoencoders. The authors develop a global + local attention method and achieve better log likelihoods than a variety of recent methods on MNIST and OMNIGLOT. Overall the reviewers found this paper strong (8, 8, 8, 6), particularly after the author rebuttal. They found the paper to be clear, the contribution sensible and novel and the experiments thorough and compelling. In particular, the authors added additional experimental results on a larger dataset which addressed a common concern among the reviewers. Thus the recommendation is to accept the paper.",ICLR2022, +drXOt9d0hdN,1642700000000.0,1642700000000.0,1,zxEfpcmTDnF,zxEfpcmTDnF,Paper Decision,Reject,"This work shows that the source-filter model of speech production naturally arises in the latent space of a variational autoencoder (VAE). It is interesting that the fundamental frequency and formant frequencies are encoded in orthogonal subspaces of the VAE latent space -- this opens up a possible way of easily controlling these. + +The key motivation/goal of the paper has caused some confusion. The abstract highlights an observation about VAE’s learned representation. In retrospection, some reviewers have not found the findings very surprising. On the other hand, the authors also do not attempt at developing and evaluating a speech generation method. As is, the paper seems to be much more suitable to a specialized workshop on speech. Alternatively, the paper could be extended to other modalities to show steerability of a representation using a synthetic dataset. However, the current scope seems to be somewhat limited hence I am not able to recommend the current manuscript for acceptance.",ICLR2022, +GYF--OXJ-M,1576800000000.0,1576800000000.0,1,Bkgk624KDB,Bkgk624KDB,Paper Decision,Reject,"This paper introduces MELEE, a meta-learning procedure for contextual bandits. In particular, MELEE learns how to explore by training on datasets with full-information about what every reward each action would obtain (e.g., using classification datasets). The idea is strongly related to imitation learning, and a regret bound is demonstrated for the procedure that comes from that literature. Experiments are performed. + +Perhaps due to the generality in which the algorithm was presented, reviewers found some parts of the work unintuitive and difficult to follow. The work may greatly benefit from having an explicit running example for F and pi and how it evolves during training. Some reviewers were not impressed by the experimental results relative to epsilon-greedy. Yes, epsilon-greedy is a strong baseline, but MELEE introduces significant technical debt and data infrastructure so it seems fair to expect a sizable bump over epsilon-greedy or else why is it worth it? + +Perhaps with revisions and experiments within a domain that justify its complexity, this paper may be suitable at another venue. But it is not deemed acceptable at this time, Reject. ",ICLR2020, +HkgEmh0Wg4,1544840000000.0,1545350000000.0,1,S1xLN3C9YX,S1xLN3C9YX,Architecture search through Bayesian Optimization,Accept (Poster),"The authors propose a method to learn a neural network architecture which achieves the same accuracy as a reference network, with fewer parameters through Bayesian Optimization. The search is carried out on embeddings of the neural network architecture using a train bi-directional LSTM. The reviewers generally found the work to be clearly written, and well motivated, with thorough experimentation, particularly in the revised version. Given the generally positive reviews from the authors, the AC recommends that the paper be accepted. +",ICLR2019,3: The area chair is somewhat confident +VlxKilPd5g,1576800000000.0,1576800000000.0,1,B1gF56VYPH,B1gF56VYPH,Paper Decision,Accept (Poster),"Two reviewers recommend acceptance while one is negative. The authors propose t-shaped kernels for view synthesis, focusing on stereo images. AC finds the problem and method interesting and the results to be sufficiently convincing to warrant acceptance.",ICLR2020, +HJnZIJpBf,1517250000000.0,1517260000000.0,728,HJ39YKiTb,HJ39YKiTb,ICLR 2018 Conference Acceptance Decision,Reject,"None of the reviewers are enthusiastic about the paper, primarily due to lack of proper evaluation. The response of the authors towards this criticism is also not sufficient. The final results are mixed which does not show very clearly that the presented associative model performs better than the sole seq2seq baseline that the authors use for comparison. We think that addressing these immediate concerns would improve the quality of this paper.",ICLR2018, +Bz6Mnwrnvqy,1642700000000.0,1642700000000.0,1,0Kj5mhn6sw,0Kj5mhn6sw,Paper Decision,Reject,"PAPER: This paper describes a method to generate visual gestures by learning an intermediate representation based on gesture sequences. This proposed method builds from previous work on VAE and vector quantized VAE. +DISCUSSION: The reviewers wrote some detailed reviews about the paper, bringing some valid concerns and asking some questions to the authors. Unfortunately, no responses were posted from the authors. +SUMMARY: After looking at all reviews, there was a general consensus among the reviewers that this paper was not ready for publication. We hope that the reviews will be helpful for the authors in revising their work for future submission.",ICLR2022, +SJzhNy6Hz,1517250000000.0,1517260000000.0,435,rJiaRbk0-,rJiaRbk0-,ICLR 2018 Conference Acceptance Decision,Reject,"This paper proposes training binary-values LSTMs for NLP using the Gumbel-softmax reparameterization. The motivation is that this will generalize better, and this is demonstrated in a couple of instances. + +However, it's not clear how cherry-picked the examples are, since the training loss wasn't reported for most experiments. And, if the motivation is better generalization, it's not clear why we would use this particular setup.",ICLR2018, +BJsOUyaSM,1517250000000.0,1517260000000.0,820,H1u8fMW0b,H1u8fMW0b,ICLR 2018 Conference Acceptance Decision,Reject,"All 3 reviewers consider the paper insufficiently good, including a post-rebuttal updated score. +All reviewers + anonymous comment find that the paper isn't well-enough situated with the appropriate literature. +Two reviewers cite poor presentation - spelling /grammar errors making hte paper hard to read. +Authors have revised the paper and promise further revisions for final version. +",ICLR2018, +f8pugDDYnXC,1642700000000.0,1642700000000.0,1,97WDkHzofx,97WDkHzofx,Paper Decision,Reject,The reviewers are in consensus. I recommend that the authors take their recommendations into consideration in revising their manuscript.,ICLR2022, +BkMuN1pHM,1517250000000.0,1517260000000.0,382,SkmiegW0b,SkmiegW0b,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"The paper proposes a method to disentangle style from content (two factor disentanglement) using weak labels (information about the common factor for a pair of images). It is similar to an earlier work by Mathieu et al (2016) with main novelty being in the use of the discriminator which operates with pairs of images in the proposed method. Authors also have some theoretical statements about two challenges in disentangling the factors but reviewers have complained about missing connection b/w theory and experiments, and about exposition in general. + +The idea has novelty, although somewhat limited in the light of earlier work by Mathieu et al (2016)), and theoretical statements are also of interest but reviewers still feel the paper needs improvement in writing and presentation of results. I would recommend an invitation to the workshop track. ",ICLR2018, +zcpzuTAS_s,1642700000000.0,1642700000000.0,1,q23I9kJE3gA,q23I9kJE3gA,Paper Decision,Reject,"In this paper, the authors propose a method to generate sets, which are order invariant, with a sequence-to-sequence model. The main idea is to order the elements of the sets, and then treat them as regular sequences. The authors propose to use PMI and conditional probability to obtain a partial order on the elements of sets. Overall, while the reviewers note that the proposed method is simple and intuitive, they also raised concerns about the paper: one of the main concerns is about missing baselines, such as non seq2seq models for set generation, such as binary classification (to predict whether an element should be included or not). For this reason, I recommend to reject the paper.",ICLR2022, +HJYx3zLOg,1486400000000.0,1486400000000.0,1,SJUdkecgx,SJUdkecgx,ICLR committee final decision,Reject,"The paper proposes two extensions of the KISSME metric learning method: (i) learning dimensionality reduction together with the metric; (ii) incorporating it into deep neural networks. The contribution is rather incremental, and the results are at the level of the prior art in deep metric learning, so in its current form the paper is not ready to be accepted.",ICLR2017, +kc7UvE4ROE,1576800000000.0,1576800000000.0,1,HJlY_6VKDr,HJlY_6VKDr,Paper Decision,Reject,"This paper formalizes the concept of buffer zones, and proposes a defense method based on a combination of deep neural networks and simple image transformations. The authors argue that the proposed method based on buffer zones is robust against state-of-the-art black box attacks methods.This paper, however, falls short of (1) unjustified claims (e.g., buffer zones are widened when the models are diverse); (2) incomplete literature survey and related work; (3) similar ideas are well-known in the literature, (4) unfair experimental evaluations and many others. Even after author response, it still does not gather support from the reviewers. Thus I recommend reject.",ICLR2020, +rJla2RJ-eN,1544780000000.0,1545350000000.0,1,BylBfnRqFm,BylBfnRqFm,metareview,Reject,"This paper proposes a meta-learning algorithm that performs gradient-based adaptation (similar to MAML) on a lower dimensional embedding. The paper is generally well-written, and the reviewers generally agree that it has nice conceptual properties. The method also draws similarities to LEO. The main weakness of the paper is with regard to the strength of the experimental results. In a future version of the paper, we encourage the authors to improve the paper by introducing more complex domains or adding experiments that explicitly take advantage of the accessibility of the task embedding. +Without such experiments that are more convincing, I do not think the paper meets the bar for acceptance at ICLR.",ICLR2019,4: The area chair is confident but not absolutely certain +fveJA_qIgZm,1642700000000.0,1642700000000.0,1,UORhn0DGIT,UORhn0DGIT,Paper Decision,Reject,"The aim of this paper is to propose a novel ""GW""-like discrepancy function between probability measures living in different spaces (here restricted to be Euclidean, with a squared euclidean distance as the base metric). While interesting (notably the idea of learning distinct maps mapping a random direction in a latent space onto two spaces) there are a few issues with presentation, incremental nature of work and importantly a few shortcomings in the empirical evaluation as detailed by reviewers. Hopefully these can be used to improve the draft for a future version.",ICLR2022, +BJleml9NyN,1543970000000.0,1545350000000.0,1,SkgKzh0cY7,SkgKzh0cY7,Reasonable extension of prior work to additional dimensions. ,Reject,"In this work, a central idea introduced by CycleGAN is extended from 2D convolutions to 3D convolutions to ensure better consistency of style transfer across time. Authors demonstrate improvements on a variety of datasets in comparison to frame-by-frame style transfer. + +Reviewer Pros: ++ Seems to be effective at enforcing improved consistency over time ++ Proposed medical dataset may be good contribution to community. ++ Good quality evaluation + +Reviewer Cons: +- All reviewers felt the technical novelty was low. +- Some questions arose around quantitative results, left unanswered by authors. +- Experiments missing some baseline approaches +- Architecture limited to fixed length video segments + +Reviewer consensus is to reject. Authors are encouraged to continue their work and take into account suggestions made by reviewers, including adding additional comparison baselines ",ICLR2019,4: The area chair is confident but not absolutely certain +muKC6p_x1U6,1642700000000.0,1642700000000.0,1,L01Nn_VJ9i,L01Nn_VJ9i,Paper Decision,Accept (Poster),"This paper introduces Back2Future, a deep learning approach for refining predictions when +backfill dynamics are present. + +All reviewers agree on that the authors successfully motivate their work and +introduce a topic of great interest, i.e. that of dealing with the effect of revising previously recorded data and its effect +timeseries predictions. The reviewers also underline the strong and thorough experimental section. +Among the reviews is also underlined the potential impact of the work for the research domain. + +Many thanks to the authors for replying to the minor concerns raised. + +I concur with the reviews and find this submission very interesting, convincing and thus +recommend for accept. + +Thank you for submitting the paper to ICLR.",ICLR2022, +-18ripprlTi,1610040000000.0,1610470000000.0,1,BEs-Q1ggdwT,BEs-Q1ggdwT,Final Decision,Reject," +In this paper, the authors proposed the expected quadratic utility maximization (EQUM) to implement the risk-aware decision-making for mean-variance RL. The EQUM framework directly optimizes the weighted sum of first and second order moments, while ignores the square of first moments, and thus, largely reduces the difficulty in optimization. The authors tested the proposed policy gradient based algorithm empirically. + +However, the connection to classic mean-variance is not clearly by simply ignoring the square of first moments in the objective theoretically (R2, R3, R4). Meanwhile, the effect of tunable weights (\psi) is not clear and consistent empirically (R2, R3, R4). + +As all reviewers agree this paper is interesting and promising, I encourage the authors to address these issues and consider to submit to next venue. ",ICLR2021, +5Ctu3QFSb,1576800000000.0,1576800000000.0,1,r1gx60NKPS,r1gx60NKPS,Paper Decision,Reject,"The authors tackle the questions of automatic metrics for assessing document similarity and propose the use of Transformed-based language models as a critic providing scores to samples. As a note, ideas like these have been also adopted in Computer Vision with the use of the Inception score as a proxy the quality of generated images. The authors ask great questions in the paper and they clearly tackle a very important problem, that of automatic measures for assessing text quality. While their first indications are not negative, this paper lacks the rigor and depth of experiments of a conference paper that would convince the research community to abandon BLEU and ROUGE in lieu of some other metric. It's perhaps a good workshop paper or a short paper at a *CL conference. Specifically, we would need more tasks where BLEU/ROUGE is the standard measure and so how the proposed measure correlates better with humans, so cases where word overlap is in theory a good proxy of similarity assuming reference sentence (e.g., logical entailment is not such a prototypical task). MT is a first step towards that, but summarization is also a necessary I would say. Other questions of interest relate to the type of LM (does it only need to be Roberta?) and the quality of LM (what if i badly tune my LM?) On a more personal note: We all know that BLEU is not a good metric (especially for document-level judgements) and every now and then there have been proposals to replace BLEU that do correlate better (e.g., http://ccc.inaoep.mx/~villasen/bib/Regression%20for%20machine%20translation%20evaluation.pdf) . However, BLEU is still here due to each simplicity. Please keep pushing this research and I’m looking forward to seeing more experimental evidence.",ICLR2020, +Syl7oInbl4,1544830000000.0,1545350000000.0,1,HyNbtiR9YX,HyNbtiR9YX,borderline paper,Reject,"This paper proposes a document classification algorithm based on partitioned word vector averaging. +I agree with even the most positive reviewer. More experiments would be good. This is a very developed old area.",ICLR2019,4: The area chair is confident but not absolutely certain +rtfzk8xzD1,1576800000000.0,1576800000000.0,1,B1lsXREYvr,B1lsXREYvr,Paper Decision,Reject,"This paper proposed to use a compressive sensing approach for neural architecture search, similar to Harmonica for hyperparameter optimization. + +In the discussion, the reviewers noted that the empirical evaluation is not comparing apples to apples; the authors could not provide a fair evaluation. Code availability is not mentioned. The proof of theorem 3.2 was missing in the original submission and was only provided during the rebuttal. All reviewers gave rejecting scores, and I also recommend rejection. ",ICLR2020, +nC_Gq-WDZl,1576800000000.0,1576800000000.0,1,BygkQeHKwB,BygkQeHKwB,Paper Decision,Reject,"In this paper the authors highlight the role of time in adversarial training and study various speed-distortion trade-offs. They introduce an attack called boundary projection BP which relies on utilizing the classification boundary. The reviewers agree that searching on the class boundary manifold, is interesting and promising but raise important concerns about evaluations on state of the art data sets. Some of the reviewers also express concern about the quality of presentation and lack of detail. While the authors have addressed some of these issues in the response, the reviewers continue to have some concerns. Overall I agree with the assessment of the reviewers and do not recommend acceptance at this time.",ICLR2020, +l1iIDscmWPr_,1642700000000.0,1642700000000.0,1,Z8FzvVU6_Kj,Z8FzvVU6_Kj,Paper Decision,Accept (Poster),"The paper proposes a supernet learning strategy for NAS based on meta-learning to tackle the knowledge forgetting issue. Forgetting happens when training a sampled sub-model to optimize the shared parameters overrides the previous knowledge learned by the other sub-models. The main idea of the paper is to consider training of each subnetwork as a task, and then apply MAML to ensure efficient cross-task adaptivity. While the reviewers found the proposed method mainly an application of the existing meta-learning strategies to one-shot NAS, additional experimental results provided by the authors mostly convinced them about the effectiveness of the proposed method.",ICLR2022, +HygvEnoEeN,1545020000000.0,1545350000000.0,1,SJzwvoCqF7,SJzwvoCqF7,Concerning forum,Reject,"I'm quite concerned by the conversation with Anonymous, entitled ""Why is the dependence..."". My issues concern the empirical Rademacher complexity (ERC) and in particular the choice of the loss class for which the ERC is being computed. This class is obviously data dependent, but the Reviewers concerns centers on the nature of its data dependence. It is not valid to define the classes by the Jacobian's norm on the input data, as this _structure_ over the space of classes is data dependent, which is not kosher. The reviewer was gently pushing the authors towards a very strong assumption... i'm guessing that the jacobian norm over all data sets was bounded by a particular constant. This seems like a whopping assumption. The fact that I can so easily read this concern off of the reviewer's comments and the authors seem to not be able to understand what the reviewer is getting at, concerns me. + +Besides this concern, it seems that this paper has undergone a rather significant revision. I'm not convinced the new version has been properly reviewed. For a theory paper, I'm concerned about letting work through that's not properly vetted, and I'm really not certain this has been. I suggest the authors consider sending it to COLT.",ICLR2019,3: The area chair is somewhat confident +jszzD6HgCYb,1642700000000.0,1642700000000.0,1,7DI6op61AY,7DI6op61AY,Paper Decision,Accept (Poster),"The authors propose to combine ideas from SDEs and time series modeling with stochastic optimal control to present a framework for modeling continuous-time stochastic dynamics. The reviewers are in agreement that there are several good ideas presented here and that the interface of the perspectives the authors combine toward their proposed framework is an interesting one to explore. One referee mentions valid concerns in confusing points of the details in the presentation, and the positive reviewers echoed these concerns. In particular, more details and clearer exposition are needed for the decomposition into the subproblems and the problem of many hyper parameters. Nonetheless, my overall impression after a careful read of the paper and discussion is that these concerns are addressable and are to a degree ameliorated by the author response, and that they may be viewed as limitations outweighed by the merits of the novel ideas presented here. I emphasize that all reviewers were surprisingly consistent in their assessment of the shortcomings, and I encourage the authors to take these constructive criticisms seriously in preparing a final version of this paper.",ICLR2022, +cgBnzQFUi4A,1642700000000.0,1642700000000.0,1,BrfHcL-99sy,BrfHcL-99sy,Paper Decision,Reject,"This paper has been reviewed by four reviewers with three borderline scores leaning towards an accept and one clear reject. Reviewers have raised a number of issues. They feel that *the paper is borderline* as *the paper may not have great novelty* due to the use of low-rankness even though it is used for the low-rank tensor approximation and that *larger datasets* should be used to demonstrate the effectiveness of the proposed approach (even though there are no papers doing it on the large scale graphs to be fair). + +Also, reviewers note that they would like to see more theoretical justifications rather than just to see authors *propose a method for the adversary scenario* without full theoretical analysis. For instance, reviewers xxhm and WHUo were seeking the novel theoretical analysis in the context of adversarial robustness rather than a statement that *the problem of recovering the data under gross error has gained much attention* followed by the list of prior papers and an outline of their findings. + +While all reviewers agree that the empirical results look very promising, they also agree that the theoretical analysis needs an improvement. For the above reasons, however tempting, even if overlooking the reject score from the reviewer xxhm, it is difficult for AC to advocate for a clean accept.",ICLR2022, +_qpeoyIkd5s,1642700000000.0,1642700000000.0,1,Ybx635VOYoM,Ybx635VOYoM,Paper Decision,Reject,"This paper tackles a really interesting and realistic problem: how does contradictory (potentially) fake information affect QA systems? The authors try to approach this problem by building a new dataset, starting with the widely used SQuAD and adding contradictory information. This is quite interesting, but the rest of the paper does not follow through. Reviewers ask a critical question: how would you distinguish the information that is fake, as opposed to valid, truthful information? Without this distinction, how would you train a language model to detect the fakeness and answer the question using the valid information? Unfortunately, the authors did not reply to this critical question, so it is difficult to judge the validity and contributions of this paper. There are also serious ethical implications which are discussed in the ethics review.",ICLR2022, +h-l-htBocDJ,1610040000000.0,1610470000000.0,1,rryJiPXifr,rryJiPXifr,Final Decision,Reject,"The reviewers appreciate the idea of hyperparameter planning and the thorough experimentation. Some concerns remain regarding the comparison between this method and SlowFast that require to be addressed. Also, the scope of the paper that targets hyperparameter optimization networks for action recognition specifically, may be too narrow for an ICLR audience. ",ICLR2021, +YE_GOYHW4em,1610040000000.0,1610470000000.0,1,b_7OR0Fo_iN,b_7OR0Fo_iN,Final Decision,Reject," +This paper analyzes several neighbor embedding methods-- t-SNE, UMAP, and ForceAtlas2-- by considering their objectives as consisting of attractive and repulsive terms. The main hypothesis is that stronger repulsive terms contribute towards learning discrete structures, while stronger attractive terms contribute towards learning continuous/manifold structures. The paper empirically explored the space parameterized by the relative weighting of the attractive and repulsive terms for the t-SNE and UMAP algorithms, using several data sets, and qualitatively confirmed their conclusions about the impact of the attractive and repulsion terms as the relative weights vary. + +The experimental validation of the paper's main hypothesis is thorough and the use of diverse data sets and neighbor embedding methods is appreciated-- as the authors point out, several reviewers missed this contribution. However, several reviewers point out that the insight presented in the paper is already largely present in the literature, and that beyond its analysis the paper does not present new algorithms based on this insight. The authors rebut this claim by arguing that the novelty of the paper lies in it: (1) showing the contrary to the established opinion, UMAP works despite, instead of because, it uses cross-entropy loss, and (2) the paper offers for the first time a theoretical understanding of why ForceAtlas2 highlights continuous developmental trajectories, and (3) prior work has not made the connection between UMAP, ForceAtlas2, and t-SNE or suggested using exaggeration throughout the optimization process for t-SNE rather than simply as a warm-up. The paper does indeed present intuitions for (1)-(3) based on the attraction-repulsion ideas, and makes the connection between these neighbor embedding algorithms by viewing them as variations on the theme of attraction-repulsion, but these intuitions are not significant steps forward with respect to what is already known about how neighbor embeddings balance attraction and repulsion. The mathematical analyses consist of stating the gradient for the algorithms and explaining how weighing the attraction and repulsion terms differently lead to different qualitative observations. The use of exaggeration throughout the optimization process is straightforward, and no strong mathematical characterization of the properties of the resulting algorithm is given. + +It is recommended that this paper be rejected, as it consists of a thorough empirical validation of an understanding of the trade-off between attractive and repulsive forces in neighbor embedding methods that was already present in the literature, along with some straightforward arguments connecting several popular neighbor embedding methods, but does not introduce any significantly new actionable insights or novel algorithms. +",ICLR2021, +rfn6jOGgleg,1642700000000.0,1642700000000.0,1,wzJnpBhRILm,wzJnpBhRILm,Paper Decision,Reject,"The paper studies the introduction of a variant of batch normalization (BN) to train deep neural network. The underlying idea is a two-step approach for per-sample based normalization, relying on augmenting the computational graph to handle ""several samples"" nodes. + +The reviewers have mentioned that the idea of altering the computational graph is interesting and potentially novel. +Yet, the numerical experiments were not enough precise or solid to back up the claims by the authors, that their proposed BN alternative is of practical interest. +It was also raised that the paper lacks theoretical supports: no formal analysis, most explanations are ad hoc, etc.",ICLR2022, +beqc2bOBH,1576800000000.0,1576800000000.0,1,rkg0_eHtDr,rkg0_eHtDr,Paper Decision,Reject,"This paper studies over-parameterization for unsupervised learning. The paper does a series of empirical studies on this topic. Among other things the authors observe that larger models can increase the number latent variables recovered when fitting larger variational inference models. The reviewers raised some concern about the simplicity of the models studied and also lack of some theoretical justification. One reviewer also suggests that more experiments and ablation studies on more general models will further help clarify the role over-parameterized model for latent generative models. I agree with the reviewers that this paper is ""compelling reason for theoretical research on the interplay between overparameterization and parameter recovery in latent variable neural networks trained with gradient descent methods"". I disagree with the reviewers that theoretical study is required as I think a good empirical paper with clear conjectures is as important. I do agree with the reviewers however that for empirical paper I think the empirical studies would have to be a bit more thorough with more clear conjectures. In summary, I think the paper is nice and raises a lot of interesting questions but can be improved with more through studies and conjectures. I would have liked to have the paper accepted but based on the reviewer scores and other papers in my batch I can not recommend acceptance at this time. I strongly recommend the authors to revise and resubmit. I really think this is a nice paper and has a lot of potential and can have impact with appropriate revision.",ICLR2020, +HytXBk6HG,1517250000000.0,1517260000000.0,535,S1EwLkW0W,S1EwLkW0W,ICLR 2018 Conference Acceptance Decision,Reject,"This paper presents a theoretical justification for the Adam optimizer in terms of decoupling the signs and magnitudes of the gradients. The overall analysis seems reasonable, though there's been much back-and-forth with the reviewers about particular claims and assumptions. Overall, the contributions don't feel quite substantial enough for an ICLR publication. The interpretation in terms of signs is interesting, but it's very similar to the motivation for RMSprop, of which Adam is an extension. The performance result on diagonally dominant noisy quadratics is interesting, but it feels unsurprising that a diagonal curvature approximation would work well in this setting. I don't recommend acceptance at this point, though these ideas could potentially be developed further into a strong submission. +",ICLR2018, +3VgR8BaGuy,1642700000000.0,1642700000000.0,1,St6eyiTEHnG,St6eyiTEHnG,Paper Decision,Accept (Poster),"Most of the discussion centered around whether the underlying question in the literature is setup correctly in terms of its relationship to causality as the question being asked is one of an intervention. The underlying literature makes an attempt at not including things that can't be intervened on like age, but the setup of a ""counterfactual"" could benefit from a causal take. + +Holding that aside, the paper makes progress on an established question though analysis that reveals that the Lipschitz continuity and confidence are important for causality and Stable Neighbor Search for generating counterfactuals. + +The most negative reviewer in discussion writes that they're okay with the paper being accepted if the rest of the reviewers are positive. The rest of the reviewers are positive with one mentioning that the paper is well written and interesting in the discussion and that the author replies cleared up the issues about counterfactuals.",ICLR2022, +BktLH1aBz,1517250000000.0,1517260000000.0,574,HkMCybx0-,HkMCybx0-,ICLR 2018 Conference Acceptance Decision,Reject,"The authors introduce a new activation function which is similar in shape to ELU, but is faster to compute. The reviewers consider this to not be a significant innovation because the amount of time spent in computing the activation function is small compared to other neural network operations.",ICLR2018, +SyeJoLVexN,1544730000000.0,1545350000000.0,1,rJed6j0cKX,rJed6j0cKX,Good paper,Accept (Poster),"This paper proposes a framework for using invertible neural networks to study inverse problems, e.g., recover hidden states or parameters of a system from measurements. This is an important and well-motivated topic, and the solution proposed is novel although somewhat incremental. The paper is generally well written. Some theoretical analysis is provided, giving conditions under which the proposed approach recovers the true posterior. Empirically, the approach is tested on synthetic data and real world problems from medicine and astronomy, where it is shown to compared favorably to ABC and conditional VAEs. Adding additional baselines (Bayesian MCMC and Stein methods) would be good. There are some potential issues regarding MMD scalability to high dimensional spaces, but overall the paper makes a solid contribution and all the reviewers agree it should be accepted for publication.",ICLR2019,4: The area chair is confident but not absolutely certain +12gc0pR1wr,1576800000000.0,1576800000000.0,1,BylNoaVYPS,BylNoaVYPS,Paper Decision,Reject,"The present work addresses the problem of opponent modeling in multi-agent learning settings, and propose an approach based on variational auto-encoders (VAEs). Reviewers consider the approach natural and novel empirical results area presented to show that the proposed approach can accurately model opponents in partially observable settings. Several concerns were addressed by the authors during the rebuttal phased. A key remaining concern is the size of the contribution. Reviewers suggest that a deeper conceptual development, e.g., based on empirical insights, is required.",ICLR2020, +SJLj3ML_e,1486400000000.0,1486400000000.0,1,rkYmiD9lg,rkYmiD9lg,ICLR committee final decision,Invite to Workshop Track,"Modeling nonlinear interactions between variables by imposing Tensor-train structure on parameter matrices is certainly an elegant and novel idea. All reviewers acknowledge this to be the case. But they are also in agreement that the experimental section of this paper is somewhat weak relative to the rest of the paper, making this paper borderline. A revised version of this paper that takes reviewer feedback into account is invited to the workshop track.",ICLR2017, +T4BCYXfaM84C,1642700000000.0,1642700000000.0,1,CIaQKbTBwtU,CIaQKbTBwtU,Paper Decision,Accept (Poster),"This paper proposes a new method for domain generalization by adopting a single test example. Authors formulate the problem using a variational bayesian framework which ends up in an adaptation technique requiring a single feed-forward computation. The provided empirical results indicate that the proposed method has comparable performance to techniques which require more data. + +Reviewers all acknowledge the novelty and significance of this work. The paper is well-written and the related work is adequately discussed. Moreover, the proposed method is computationally efficient and empirical results provide strong evidence in its favor. While I am recommending acceptance, I tend to agree with reviewer xA1m about the main weaknesses of this work and I recommend authors to improve them for the final version: + +- Lack of proper discussion or intuition about under what conditions the proposed method works well. This may be using theoretical analysis, using toy examples, trying to break the method, motivate using prior work or just simply providing intuitive arguments. Also, as reviewers pointed, Figure 1 is currently very confusing. +- Lack of analysis or ablation study allows a better understanding of the proposed method",ICLR2022, +W8a02AliI7,1576800000000.0,1576800000000.0,1,rylDzTEKwr,rylDzTEKwr,Paper Decision,Reject,There was a clear consensus amongst reviewers that the paper should not be accepted. This view was not changed by the rebuttal. Thus the paper is rejected. ,ICLR2020, +iQBL8-1AXUu,1642700000000.0,1642700000000.0,1,jNB6vfl_680,jNB6vfl_680,Paper Decision,Reject,"The paper claims that one of the most common (and obvious) pruning methods in the literature today (global magnitude pruning) is ""overlooked"" and ""seen as a mediocre baseline by the community."" As an active member of the pruning research community myself, I can attest that this is simply not true. I am in strong agreement with reviewer MHY2 and - after reading the discussion around that review and the paper itself in detail - I confidently recommend rejection. + +Magnitude pruning itself dates back decades, at least to the work of Janowski (Pruning vs. Clipping in Neural Networks, 1988). The paper is correct that *global* magnitude pruning (in which all weights are compared in a layer-agnostic manner) was largely ignored in favor of layer-wise magnitude pruning (i.e., pruning all layers by the same amount) in much of the work that popularized magnitude pruning (e.g., Han et al., 2015). However, global magnitude pruning has become much more popular since that time. In work establishing the lottery ticket hypothesis, Frankle and Carbin (The Lottery Ticket Hypothesis) use it in certain cases and - later - in all cases (Frankle et al., Linear Mode Connectivity and the Lottery Ticket Hypothesis). In the past several years, global pruning in general has become the de facto way to use all new pruning heuristics (e.g., SNIP: Single-Shot Network Pruning based on Connection Sensitivity; Picking Winning Tickets Before Training by Preserving Gradient Flow; Pruning Neural Networks without Any Data by Iteratively Conserving Synaptic Flow). Moreover, other papers have specifically advocated that global magnitude pruning is state-of-the-art within recent years at this very conference: Comparing Rewinding and Fine-Tuning in Neural Network Pruning (Renda et al., ICLR 2020 oral): ""We propose a pruning algorithm...that matches state-of-the-art tradeoffs between Accuracy and Parameter-Efficiency across networks and datasets:...globally prune the 20% of weights with the lowest magnitudes."" (This paper does not cite Renda et al. despite the fact that it is a prominent paper that directly contradicts the purported problem that the paper relies on to support the significance of the findings.) + +In short, in the pruning literature, the idea that global pruning, magnitude pruning, or global magnitude pruning is overlooked or is not recognized as a strong baseline is simply preposterous. The reason that global magnitude pruning has ""largely been ignored in recent years, generally being relegated to the position of a baseline for comparison"" is because it is a simple technique whose efficacy has long been known and established - exactly what a good ""baseline for comparison"" should be. + +The paper has narrowed its claims somewhat during the discussion and revision period, advocating for a one-shot global magnitude pruning strategy that ""does not require any complex pruning frameworks like RL or sparsification schedules [or]...iterative procedure."" To do so, however, the proposed method replaces each of these ""complex"" hyperparameters with another set: whether or not to use a minimum threshold (MT) and where to set it. Even if the approach isn't iterative, the hyperparameter search necessary to set it almost certainly is, and it is unclear whether searching for the MT value is any more efficient than the other approaches. The costs of this hyperparameter search need to be measured. And iterative pruning's costs can often be mitigated by making pruning gradual, something the paper considers superficially in the revisions. + +Finally, as reviewer MHY2 observes, one of the primary reason papers *don't* use global magnitude pruning is that, although it leads to higher sparsities than layerwise magnitude pruning, it also often leads to higher FLOP counts. Although FLOP counts are a terrible indicator of real-world speedup, they are a much higher-fidelity indicator than parameter-count, which neglects the fact that - in convolutional networks - a small number of parameters can lead to vastly more FLOPs if they operate on larger activation maps (i.e., before the activation maps have been downsampled). In the revisions, the paper gives a token nod (and a superficial dismissal) to this fact in Sections 4.3 and 6, but the paper needs to fully acknowledge this point by measuring and discussing its consequences. ""Look[ing] at this in future work"" is not enough. + +Due to these many concerns, I strongly recommend rejection.",ICLR2022, +18hKIdaSuF7,1642700000000.0,1642700000000.0,1,fJIrkNKGBNI,fJIrkNKGBNI,Paper Decision,Reject,"This paper proposes to apply a piece-wise polynomial filter on the spectral corresponding to the graph convolution to enhance the model expressivity of graph neural networks. The effectiveness of the proposed model is investigated through numerical experiments and it was shown that the method achieves fairly nice performances. + +This paper gives a natural extension to the usual adaptive Generalized PageRank approaches to more expressive piece-wise polynomial filters. However, the reviewers are not enthusiastic on this paper. This is mainly because of the following concerns: (1) Since it requires diagonalization of the aggregation operator, it requires much more computational burden than the usual polynomial filters, which prevents the method from being applied to data with much more large size. (2) The choice of the filter could be more investigated, in particular, the complexity-expressivity trade-off (in other words, bias-variance trade-off) could be discussed more, for example, by theoretical work. + +In summary, the paper seems not to be well matured for being published in ICLR conference.",ICLR2022, +IiEQ3CFEyZ,1642700000000.0,1642700000000.0,1,5-2mX9_U5i,5-2mX9_U5i,Paper Decision,Accept (Poster),This paper provides a near-optimal analysis of the unadjusted Langevin Monte Carlo (LMC) algorithm with respect to the W2 distance. The main statement is that the mixing time is ~ d^{1/2}/eps under standard assumptions. The authors also give a nearly matching lower bound under these assumptions. The reviewers agreed that this is an interesting contribution obtained via non-trivial techniques. The consensus recommendation is to accept the paper.,ICLR2022, +H1xLpi20J4,1544630000000.0,1545350000000.0,1,B1GIQhCcYm,B1GIQhCcYm,Lack of novelty,Reject,"The paper formulates the problem of unsupervised one-to-many image translation and addresses the problem by minimizing the mutual information. + +The reviewers and AC note the critical limitation of novelty and comparison of this paper to meet the high standard of ICLR. + +AC decided that the authors need more works to publish.",ICLR2019,4: The area chair is confident but not absolutely certain +HJg7LJeIxE,1545110000000.0,1545350000000.0,1,BJeY6sR9KX,BJeY6sR9KX,"Interesting take on quantifying similarity of networks to brain visual processing, unclear significance of that result for ICLR audiences",Reject,"This work provides two contributions: 1) Brain-Score, that quantifies how a given network's responses compare to responses from natural systems; 2) CORnet-S, an architecture trained to optimize Brain-Score, that performs well on Imagenet. +As noted by all reviewers, this work is interesting and shows a promising approach to quantifying how brain-like an architecture is, with the limitations inherent to the fact that there is a lot about natural visual processing that we don't fully understand. However, the work here starts from the premise that being more similar to current metrics of brain processes is by itself a good thing -- without a better understanding of what features of brain processing are responsible for good performance and which are mere by-products, this premise is not one that would appeal to most of ICLR audience. In fact, the best performing architectures on imagenet are not the best scoring for Brain-Score. Overall, this work is quite intriguing and well presented, but as pointed out by some reviewers, requires a ""leap of faith"" in matching signatures of brain processes that most of the ICLR audience is unlikely to be willing to take.",ICLR2019,4: The area chair is confident but not absolutely certain +FiEH1JD6Bx6,1610040000000.0,1610470000000.0,1,3-a23gHXQmr,3-a23gHXQmr,Final Decision,Reject,"The reviewers are in consensus that the manuscript is not ready for publication in its current form: more comprehensive evaluation, and careful analysis (either theoretical or empirical) of the simple-but-effective methodology would improve the quality further. The discussion was constructive and helped the authors to reason about their work better. + +The AC recommends Reject and encourages the authors to take the constructive feedback into consideration . ",ICLR2021, +JQDOodtjrq,1576800000000.0,1576800000000.0,1,HygcdeBFvr,HygcdeBFvr,Paper Decision,Reject,"Main content: + +Blind review #1 summarizes it well: + +his paper claims to be the first to tackle unconditional singing voice generation. It is noted that previous singing voice generation approaches leverage explicit pitch information (either of an accompaniment via a score or for the voice itself), and/or specified lyrics the voice should sing. The authors first create their own dataset of singing voice data with accompaniments, then use a GAN to generate singing voice waveforms in three different settings: +1) Free singer - only noise as input, completely unconditional singing sampling +2) Accompanied singer - Providing the accompaniment *waveform* (not symbolic data like a score - the model needs to learn how to transcribe to use this information) as a condition for the singing voice +3) Solo singer - The same setting as 1 but the model first generates an accompaniment then, from that, generates singing voice + +-- + +Discussion: + +The reviews generally point out that while a lot of new work has been done, this paper bites off too much at once: it tackles many different open problems, in a generative art domain where evaluation is subjective. + +-- + +Recommendation and justification: + +This paper is a weak reject, not because it is uninteresting or bad work, but because the ambitious scope is really too large for a single conference paper. In a more specialized conference like ISMIR, it would still have a good chance. The authors should break it down into conference sized chunks, and address more of the reviewer comments in each chunk.",ICLR2020, +hr1SrWPCOPK,1642700000000.0,1642700000000.0,1,xp2D-1PtLc5,xp2D-1PtLc5,Paper Decision,Reject,"This paper proposes a voice conversion framework, ClsVC, which is based on disentanglement of speaker and content information in some latent space. The authors introduce two classification constraints (a common speaker classifier and an adversarial classifier) to improve the separation of the two embeddings. Experimental results are reported on a few voice conversion tasks with objective and subjective scores. Reviewers have reservation about the novelty of the work which is not considered overwhelmingly significant given existing techniques. The theory and arguments on the claimed effectiveness of the disentanglement of speaker and content also raise concerns, which need to be further verified. The experimental results need to be more convincing. Lastly, the exposition needs significant improvements. The authors' rebuttal answers some of the comments but a few major concerns still stand. This paper can not be accepted given its current form.",ICLR2022, +TGgn_whIHu,1610040000000.0,1610470000000.0,1,qRdED5QjM9e,qRdED5QjM9e,Final Decision,Reject,"The paper proposes a rather complex algorithm for unsupervised doamin adaptation. +While the paper provides detailed explanation, some motivation and some experimental resulst, +it does not provide any theoretical guarantees for its performance. More concerning, since domain adaptation +can only succeed when there is a close relationship between the source and target tasks, and only with algorithms +that take that relationship into account, any scientific proposal for domain adaptation should include a clear +discussion of the assumptions driving the proposed algorithms and of the circumstances under which the proposed approach +may or may not work. This is missing in the current submission. + +More specifically, a similar ocncern was voiced by Reviewer 3 +Namely "".The generalization error (both theoretically and empirically) of the gradient approximation is unclear. It is necessary to analyze how effective and under what conditions the proposed approximation can work for the expected target loss optimization."" Thsi point was not addressed in teh authors' rebuttal. + +Anotehr key concerning point that was also brought up by reviewer 3 read: +""It needs elaboration why the density ratios can be directly replaced as discriminator predictions, which seems not straight-forward and is the main difference to the conventional DRL."" In response the authors cite the paper by Bickel et al 2007 but it falls short of addressing the well know fact that density ratio cannot be reliably estimated from samples of bounded size. The authors should have explained specific assumptions that can make this step of their algorithm og through.",ICLR2021, +B1PMhG8_x,1486400000000.0,1486400000000.0,1,B1GOWV5eg,B1GOWV5eg,ICLR committee final decision,Accept (Poster),"The basic idea of this paper is simple: run RL over an action space that models both the actions and the number of times they are repeated. It's a simple idea, but seems to work really well on a pretty substantial variety of domains, and it can be easily adapted to many different settings. In several settings, the improvement using this approach are dramatic. I think this is an obvious accept: a simple addition to existing RL algorithms that can often perform much better. + + Pros: + + Simple and intuitive approach, easy to implement + + Extensive evaluation, showing very good performance + + Cons: + - Sometimes unclear _why_ certain domains benefit so much from this",ICLR2017, +wsfqj9c-lg,1642700000000.0,1642700000000.0,1,-3yxxvDis3L,-3yxxvDis3L,Paper Decision,Reject,"Dear Authors, + +This paper eventually received mostly negative reviews (scores 5, 3, 5), with one mildly positive review (score 6). All reviews were particularly informative, offering detailed and expert feedback. I was hoping for author engagement, but unfortunately, no rebuttal was submitted. + +In general, the reviewers and me found the paper well written, on a timely topic, but of a very limited theoretical novelty. Well-articulated details of this can be found in the reviews and I would recommend the authors to consider them carefully in their revision. I have no option but to reject this work. + +The main reason for rejection in this case is therefore limited theoretical novelty. However, this is a solid paper that is of publishable quality, albeit perhaps in a somehow lesser venue, at least in its current form. + +Kind regards, + +Area Chair",ICLR2022, +BkxvHyVBgN,1545060000000.0,1545350000000.0,1,B1e9csRcFm,B1e9csRcFm,Technically correct but lacking in sufficiently interesting insights,Reject,"This paper provides a generalization analysis for graph embedding methods concluding with the observation that the norm of the embedding vectors provides an effective regularization, more so than dimensionality alone. The main theoretical result is backed up by several experiments. While the result appears to be correct, norm control, dimensionality reduction and early stopping during optimization are all very well studied in machine learning as effective regularizers, either operating alone or in conjunction. The regularization parameters, iteration count, embedding dimensionality is typically tuned for an application. The AC agrees with Reviewer 2 that the paper does not provide sufficiently interesting insights beyond this observation and is unlikely to influence practical applications of these methods. Both reviewer 2 and 3 have also raised points on the need for stronger empirical analysis.",ICLR2019,5: The area chair is absolutely certain +S-OUcc0F8f,1610040000000.0,1610470000000.0,1,c7rtqjVaWiE,c7rtqjVaWiE,Final Decision,Reject,"The consensus among the reviewers is that this is a borderline paper: its main idea is sensible and natural. Unfortunately, while the reviewers appreciated the authors' responses to their comments, they felt that the paper failed to demonstrate the usefulness of the idea beyond toy-datasets. The latter would considerably strengthen this paper.",ICLR2021, +QvhM-H5CCq2,1610040000000.0,1610470000000.0,1,w2mYg3d0eot,w2mYg3d0eot,Final Decision,Accept (Poster),"The paper proves new rates of convergence for stochastic subgradient under an interpolation condition. The analysis is rather simple but it produces better rates than previously known, which all reviewers agree is interesting. As pointed out by the reviewers, this work has the potential to help the community better understand optimization with over-parametrized neural networks (where convexity or other related assumptions play a role). + +To the authors, please add a citation to Pegasos as requested by the reviewers. + +",ICLR2021, +C4KIlfcXPD,1576800000000.0,1576800000000.0,1,rJeXS04FPH,rJeXS04FPH,Paper Decision,Accept (Poster),"The authors design a deep model architecture for learning word embeddings with better performance and/or more efficient use of parameters. Results on language modeling and machine translation are promising. Pros: Interesting idea and nice results. New model may have some independent value beyond NLP. Cons: Empirical comparisons could be more thorough. For example, it is not clear (to me at least) what would be the benefits of this approach applied to whole words versus a competitor using subword units.",ICLR2020, +cWejFnkkIxM,1642700000000.0,1642700000000.0,1,Le8fg2ppDSv,Le8fg2ppDSv,Paper Decision,Reject,"The paper is well motivated and tackles a hard and long-standing problem with seq2seq models: diversity and controllability. +The authors propose simple architecture for controllable text summarization. They use multiple decoders controlled by a gating mechanism which can be learnt or controlled manually. They control mostly the abstractiveness and specificity properties of the model. + +Pros ++ the proposed approach is somewhat novel (several earlier work have proposed multiple decoder models to control the generation -- as pointed by the reviewer team) ++ the proposed modifications are motivated well, the approach is simple and easy to understand. ++ the paper is well written and easy to read. ++ the authors made an effort to address most of the reviewers comments even added human evaluation scores (which was asked by reviewers) ++ It seems a highly flexible way of enriching existing models in a simple way for additional control behavior in output summary generation of documents. + +Cons ++ During discussions, reviewers have circled around the novelty and continued to raise concerns about the weaknesses of benchmarks and comparison to related work and the fact that the proposed model has more parameters is potential advantage over other models that might contribute to the performance gains. Thus, the paper could be made stronger with further evaluations that could possibly make it stand out.",ICLR2022, +1d584rJlCgL,1642700000000.0,1642700000000.0,1,S874XAIpkR-,S874XAIpkR-,Paper Decision,Accept (Poster),"The paper studies the behavior cloning based strategies of offline RL algorithms in different type of environments and reports that performance primarily depends on model size and regularization. The results contradict some of the earlier claims, and the authors conjecture that model size and regularization characteristics can explain past results. + +During the review period, the reviewers agreed that the paper has certain merits, and on the other hand, they also raised some concerns, regarding some missing technical details, whether the empirical finding could be trusted, the generalization of the findings to more scenarios, and the comparison with some highly related papers. The authors did a good job in their rebuttal, which removed many of the above concerns (although not all) and convinced the reviewers to raise their scores. As a result, we believe it is fine to accept the paper (although somehow like a weak accept).",ICLR2022, +SyYnU16rz,1517250000000.0,1517260000000.0,872,rJoXrxZAZ,rJoXrxZAZ,ICLR 2018 Conference Acceptance Decision,Reject,"The paper presents a hybrid architecture which combines WaveNet and LSTM for speeding-up raw audio generation. The novelty of the method is limited, as it’s a simple combination of existing techniques. The practical impact of the approach is rather questionable since the generated audio has significantly lower MOS scores than the state-of-the-art WaveNet model.",ICLR2018, +zqjZAhdaZA,1610040000000.0,1610470000000.0,1,arNvQ7QRyVb,arNvQ7QRyVb,Final Decision,Reject,"The reviewers enjoyed reading about an interesting take on lifelong learning, encapsulating an EM methodology for selecting a transfer configuration and then optimizing the parameters. R3 made valid concerns regarding comparison with previous, recent work. R2 also would prefer to see more thorough experiments (ideally in settings where multiple tasks exist, as also commented by R4). During the rebuttal phase the authors made a good effort to run additional experiments which cover the related work aspect better. These experiments and the overall paper were discussed extensively among reviewers after the rebuttal phase. + +In the discussions, the reviewers agreed that an interesting idea can be publishable even if it does not achieve SOTA results in all scenarios, as long as it brings new perspectives and shows at least comparable results. However, in the particular case of this paper, there exist remaining concerns regarding the usefulness and applicability of the method. Specifically, the paper could benefit from a more convincing demonstration about how the method can scale (e.g. R3 and R4’s comments), especially since training time and model capacity are important factors to consider for practical continual learning scenarios. Furthermore, it is not clear how the proposed method can be used in combination with other machine learning tools within a continual learning application, for example by leveraging modern deep architectures or by complementing existing adaptive knowledge approaches (as discussed by R3). + +Although the opinions of the reviewers are not fully aligned, this borderline paper seemed to lack an enthusiastic endorsement by a reviewer to compensate for the concerns discussed above and the relatively weak experimental results. Therefore I recommend rejection. +",ICLR2021, +a0Ez9m5PZ,1576800000000.0,1576800000000.0,1,ryl-RTEYvB,ryl-RTEYvB,Paper Decision,Reject,"Three reviewers have reviewed this submission and scored it as 6/3/3. After rebuttal, the reviewers remained unconvinced. The main criticisms concerns the Jacobian regularizaton [1] being known which makes the contributions of this submission look diluted. Additionally, there were concerns over results (degradation) on CIFAR10 and ImageNet and other minor issues. +For these reasons, this paper cannot be accepted by ICLR2020.",ICLR2020, +#NAME?,1642700000000.0,1642700000000.0,1,tBtoZYKd9n,tBtoZYKd9n,Paper Decision,Accept (Spotlight),"This paper provides an overview of evaluating graph generative models (GGMs). It systematically evaluates one of the more popular metrics, maximum mean discrepancy (MMD). It highlights some challenges and pitfalls for practitioners and suggests some ways to mitigate them. The reviewers found the paper practically relevant and several reviewers upgraded their scores through the discussion process. The authors acknowledged there are still some remaining issues regarding (i) considering other metrics & descriptor functions; ii) evaluating node/edge attributes and iii) addressing molecule generation. I am satisfied that these areas are beyond the scope of the current work and that the clarification improvements in the paper are adequate. It stands well enough on its own to accept in its present form.",ICLR2022, +DDZb-1uWSu,1576800000000.0,1576800000000.0,1,r1e30AEKPr,r1e30AEKPr,Paper Decision,Reject,"This paper presents a rigorous mathematical framework for knowledge graph embedding. The paper received 3 reviews. R1 recommends Weak Reject based on concerns about the contributions of the paper; the authors, in their response, indicate that R1 may have been confused about what the contributions were meant to be. R2 initially recommended Reject, based on concerns that the paper was overselling its claims, and on the clarity and quality of writing. After the author response, R2 raised their score to Weak Reject but still felt that their main concerns had gone unanswered, and in particular that the authors seemed unwilling to tone down their claims. R3 recommends Weak Reject, indicating that they found the paper difficult to follow and gave some specific technical concerns. The authors, in their response, express confusion about R3's comments and suggest that R3 also did not understand the paper. However, in light of these unanimous Weak Reject reviews, we cannot recommend acceptance at this time. We understand that the authors may feel that some reviewers did not properly understand or appreciate the contribution, but all three reviewers are researchers working at highly-ranked institutions and thus are fairly representative of the attendees of ICLR; we hope that their points of confusion and concern, as reflected in their reviews, will help authors to clarify a revision of the paper for another venue. + +",ICLR2020, +Bkx5NhJbg4,1544780000000.0,1545350000000.0,1,ByMHvs0cFQ,ByMHvs0cFQ,Interesting work applying quaternion representations to neural networks,Accept (Poster),"The authors derive and experiment with quaternion-based recurrent neural networks, and demonstrate their effectiveness on speech recognition tasks (TIMIT and WSJ), where the authors demonstrate that the proposed models can achieve the same accuracy with fewer parameters than conventional models. The reviewers were unanimous in recommending that the paper be accepted.",ICLR2019,4: The area chair is confident but not absolutely certain +fWE3bwu_qg,1610040000000.0,1610470000000.0,1,zYmnBGOZtH,zYmnBGOZtH,Final Decision,Reject,"This paper studies the following model: The input to our classifier is the instance X which determines the label Z and we observe a noisy version of this label Y. The key assumption is that the label noise is independent of the instance, and the goal is to learn the channel from Z to Y. The main motivation is that generally algorithms that can handle instance-independent noise need to know the noise model. Thus the main contribution of this paper is to decouple the problem of learning the noise channel and the problem of learning a high-accuracy classifier. In particular they inject their own label noise and design a discriminator to test if the noise on the labels has maximum entropy. They show that their method is statistically consistent. Finally they complement this with synthetic experiments on CIFAR to show that their algorithm works. + +While the reviewers all found the ideas promising, they brought up a few deficiencies in this work which they hope could be improved in later versions. First, the writing is at times unclear and imprecise. For example, there are many places that could benefit from further discussion, particularly in terms of justifying why the assumptions are ""mild"" or not. Second, the experiments would be more compelling if there were an application where learning the noise model actually led to improved performance on some downstream application. Third, the approach crucially relies on having a separable map, which seems like a rather strong assumption. ",ICLR2021, +BkwIhMIOe,1486400000000.0,1486400000000.0,1,H1hoFU9xe,H1hoFU9xe,ICLR committee final decision,Reject,"This paper examines an application of that deviates from the usual applications presented at ICLR. The idea seems very interesting to the reviewers, but a number of reviewers had trouble really understanding why the proposed SGAN would be attractive for this problem, and this problem setup with the SGAN in general. Clearer concrete 'use case scenarios' and experimentation that helps clarify the precise application setting and the advantages of the SGAN formulation would help make this work more impactful on the community. Given the quality of other paper submitted to ICLR this year the reviewer scores are just short of the threshold for acceptance",ICLR2017, +LN2Ogebpqn,1576800000000.0,1576800000000.0,1,rkeNfp4tPr,rkeNfp4tPr,Paper Decision,Accept (Poster),"This paper studies the impact of using momentum to escape saddle points. They show that a heavy use of momentum improves the convergence rate to second order stationary points. The reviewers agreed that this type of analysis is interesting and helps understand the benefits of this standard method in deep learning. The authors were able to address most of the concerns of the reviewers during rebutal, but is borderline due to lingering concerns about the presentation of the results. We encourage the authors to give more thought to the presentation before publication.",ICLR2020, +WIWfhIG8Ywo,1642700000000.0,1642700000000.0,1,wv6g8fWLX2q,wv6g8fWLX2q,Paper Decision,Accept (Spotlight),"The authors introduce the Time-Aware Multiperistence Spatio-Supra Graph CN that uses +multiparameter persistence to capture the latent time dependencies in spatio-temporal data. + +This is a novel and experimentally well-supported work. The novelty is achieved by combining research in topological analysis (multipersistence) and neural networks. Technically sound. Clear presentation and extensive experimental section. +Reviewers were uniformly positive, agreeing that the approach was interesting and well-motivated, and the experiments convincing. Some concerns that were raised were successfully addressed by the authors and revised in the manuscript. + +Happy to recommend acceptance. A veru nice paper!",ICLR2022, +bFEZNM83-nR,1642700000000.0,1642700000000.0,1,nHpzE7DqAnG,nHpzE7DqAnG,Paper Decision,Accept (Spotlight),"The paper addresses a problem encountered in many real-world applications, i.e. the treatment of tabular data, composed of heterogeneous feature types, where samples are not i.i.d. In this case, learning is more effective if the typically successful approach for i.i.d. data (boosted decision trees + committee techniques) is combined with GNN to take into account the dependencies between samples. The main contribution of the paper with respect to previous work in the field is the introduction of a principled approach to pursue such integration. One important component of the proposed approach is played by the definition of a specific bi-level loss (efficient bilevel boosted smoothing) that allows for convergence guarantees under mild assumptions. Both theoretical and experimental contributions are sound and convincing, justifying the claimed merits of the proposed approach. Another strong point is the fact that the proposed approach is general and amenable to support a broad family of propagation rules. One weakness with the original submission was presentation, mainly because some key information was confined into the supplementary material. The revised version addressed this problem and added some more empirical results that confirmed the superiority of the proposed approach. +Finally, the fact that learning over tabular graph data is very important in Industry, the proposed approach may be of interest for a wide audience.",ICLR2022, +XMVAzS7OB4E,1610040000000.0,1610470000000.0,1,9az9VKjOx00,9az9VKjOx00,Final Decision,Reject,"The paper is concerned with learning transformation equivariant node representation of graph data in an unsupervised setting. The paper extends prior work in this topic by focusing on equivariance under topology transformations (adding/removing edges) and considering an information theoretic perspective. Reviewers highlighted the promising ideas of the approach, its relevance for the ICLR community, and the promising experimental results (although improvements over prior work are not necessarily significant on all benchmarks). + +However, reviewers raised concerns regarding the novelty of the method and the clarity of presentation with respect to key parts of the method. These aspects connect also to further concerns raised, e.g., related to mathematical correctness as well as the significance of the proposed loss function, the benefits of motivating it from MI, and the improvements over GraphTER. The rebuttal didn't fully clarify these points. While the paper is mostly solid, I agree with the reviewers' concerns and -- currently -- the paper doesn't clear the bar for acceptance; it would require another revision to improve upon these points. However, I'd encourage the authors to revise and resubmit their work with considering this feedback.",ICLR2021, +PDWi43fcVoh,1642700000000.0,1642700000000.0,1,POTMtpYI1xH,POTMtpYI1xH,Paper Decision,Accept (Poster),"This paper analyzes the latent concepts learned in BERT. In contrast to previous work which tries to map embeddings to predefined +linguistic concepts this paper sets out to discover what is inherently learned by BERT. This is however easier said that done, since +there is no easy way to inspect the embeddings and draw conclusions on what is being learned. The authors adopt a methodology which could be used to inspect the inner workings of other pretrained models. They employ hierarchical clustering to discover latent concepts and then inspect these clusters by manually labeling them. The reviewers raised various issues regarding the number of clusters, and the amount of effort required which de facto renders the approach not very portable. The authors addressed the comments and flagged several difficulties with undertaking such an analysis. I will vote for the paper to be presented as an oral for two reasons a) it is difficult to analyze pretrained models, and although I am not convinced what the authors propose is feasible, it will at least get the discussion going, b) the manually annotated dataset is useful and will go towards allowing us to perform comparisons between models c) the annotation tool will be useful to others if the authors are considering releasing it.",ICLR2022, +Sb86bpVR-C,1576800000000.0,1576800000000.0,1,r1g87C4KwB,r1g87C4KwB,Paper Decision,Accept (Spotlight),"This is an interesting study analyzing learning trajectories and their dependence on hyperparameters, important for better understanding of learning in deep neural networks. All reviewers agree that the paper has a useful message to the ICLR community, and appreciate changes made by the authors in response to the initial reviews.",ICLR2020, +VXxPphxJN5_,1642700000000.0,1642700000000.0,1,T_uSMSAlgoy,T_uSMSAlgoy,Paper Decision,Reject,"This paper studies discontinuities (i.e., holes) in the latent space of text VAE. Analysis of previous hole detection methods are conducted, and a new efficient hole detection algorithm is proposed. It is an interesting work, but the paper in its current form has a few weaknesses/flaws regarding the proposed algorithm, experiment designs and the resulting conclusions. Reviewers have made various constructive suggestions, which the authors acknowledged.",ICLR2022, +rJN8EyaBz,1517250000000.0,1517260000000.0,354,ByxLBMZCb,ByxLBMZCb,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"The paper nicely unifies previous results and develops the property of local openness. While interesting, I find the application to multi-layer linear networks extremely limiting. There appears to be a sub-field in theory now focusing on solely multi-layer linear networks which is meaningless in practice. I can appreciate that this could give rise to useful proof techniques and hence, I am recommending it to the workshop track with the hope that it can foster more discussions and help researchers move away from studying multi-layer linear networks.",ICLR2018, +HJx6AfAZgV,1544840000000.0,1545350000000.0,1,HkzSQhCcK7,HkzSQhCcK7,Meta-Review,Accept (Poster),"The paper presents a generative model of sequences based on the VAE framework, where the generative model is given by CNN with causal and dilated connections. + +Novelty of the method is limited; it mainly consists of bringing together the idea of causal and dilated convolutions and the VAE framework. However, knowing how well this performs is valuable the community. + +The proposed method appears to have significant benefits, as shown in experiments. The result on MNIST is, however, so strong that it seems incorrect; more digging into this result, or sourcecode, would have been better.",ICLR2019,4: The area chair is confident but not absolutely certain +6MynmiUkxQ,1576800000000.0,1576800000000.0,1,Sye_OgHFwH,Sye_OgHFwH,Paper Decision,Accept (Poster),"In this paper, the authors present adversarial attacks by semantic manipulations, i.e., manipulating specific detectors that result in imperceptible changes in the picture, such as changing texture and color, but without affecting their naturalness. Moreover, these tasks are done on two large scale datasets (ImageNet and MSCOCO) and two visual tasks (classification and captioning). Finally, they also test their adversarial examples against a couple of defense mechanisms and how their transferability. Overall, all reviewers agreed this is an interesting work and well executed, complete with experiments and analyses. I agree with the reviewers in the assessment. I think this is an interesting study that moves us beyond restricted pixel perturbations and overall would be interesting to see what other detectors could be used to generate these type of semantic manipulations. I recommend acceptance of this paper. +",ICLR2020, +SJlu8D3ElE,1545030000000.0,1545350000000.0,1,HJerDj05tQ,HJerDj05tQ,Paper requires further refinement,Reject,"The paper describes a constrained optimization strategy for optimizing on an intersection of two manifolds. Unfortunately, the paper suffers from generally weak presentation quality, with the technical exposition seriously criticized by two out of the three reviewers. (The single positive review is too short and devoid of content to be taken seriously. Even there, concerns are expressed.) This paper requires substantial improvement before it could be considered for publication.",ICLR2019,5: The area chair is absolutely certain +ju25JPc_-a,1576800000000.0,1576800000000.0,1,S1xRnxSYwS,S1xRnxSYwS,Paper Decision,Reject,"This paper proposes a framework for privacy-preserving training of neural networks within a Trusted Execution Environment (TEE) such as Intel SGX. The reviewers found that this is a valuable research directions, but found that there were significant flaws in the experimental setup that need to be addressed. In particular, the paper does not run all the experiments in the same setup, which leads to the use of scaling factor in some cases. The reviewers found that this made it difficult to make sense of the results. The writing of this paper should be streamlined, along with the experiments before resubmission.",ICLR2020, +r1eINIUbe4,1544800000000.0,1545350000000.0,1,rkevMnRqYQ,rkevMnRqYQ,"Interesting idea and setup, although technical contribution is somewhat limited",Accept (Poster),"The paper proposes to take advantage of implicit preferential information in a single state, to design auxiliary reward functions that can be combined with the standard RL reward function. The motivation is to use the implicit information to infer signals that might not have been included in the reward function. The paper has some nice ideas and is quite novel. A new algorithm is developed, and is supported by proof-of-concept experiments. + +Overall, the paper is a nice and novel contribution. But reviewers point out several limitations. The biggest one seems to be related to the problem setup: how to combine inferred reward and the given reward, especially when they are in conflict with each other. A discussion of multi-objective RL might be in place.",ICLR2019,3: The area chair is somewhat confident +Pp2Ck9G53Y,1642700000000.0,1642700000000.0,1,MMAeCXIa89,MMAeCXIa89,Paper Decision,Accept (Poster),"This paper investigates Bayesian optimization where a prior distribution over the optimal is available. The authors conducted a systematic study on a very intuitive prior-augmented acquisition function that multiplication the prior probability with the EI heuristic --- including an asymptotic analysis on the regret, comprehensive (controlled) synthetic experiments, and moderate empirical support on several real-world case studies. + +All reviewers find the paper well-written and appreciate the rigor of the empirical evaluation. The theoretical analysis is also helpful to provide additional justification for the proposed approaches. I would like to add that the paper also included a brief but comprehensive survey on prior work related to leveraging prior in BO, which I find useful for the general audience. + +Reviewers noted that such a bound could become trivial with a bad prior in practice, and further suggest that one may leverage these theoretical insights as general guidelines to practitioners in designing the prior. I think this is a valuable message to convey and suggest authors take it into account in the revision. + +There were initial confusions pertaining to the experimental details, mainly concerning the effect of the quality of the prior on the performance of the proposed algorithm. The authors provided an effective rebuttal with much concrete empirical support, and after a few rounds of interaction during the discussion phase, the reviewers are convinced about the empirical significance of the proposed work. Overall, this makes a solid work.",ICLR2022, +trSQzB8wJXZH,1642700000000.0,1642700000000.0,1,srtIXtySfT4,srtIXtySfT4,Paper Decision,Accept (Poster),"The reviewers were mostly concerned about the practical impact/implications of the proposed methods. There was a long discussion across multiple threads of the benefits of the approach proposed in CNNs vs larger language models, dissecting the benefits in terms of training time (as opposed to memory or FLOPs, which may have a non-linear impact on running time). Overall, the authors did a good job of putting their contribution into context and addressing the reviewer concerns.",ICLR2022, +SJJpjfLOe,1486400000000.0,1486400000000.0,1,HJeqWztlg,HJeqWztlg,ICLR committee final decision,Reject, The reviewer's opinions were clear for this paper. Mainly it seems that the fact that this work focuses on binary image patterns limited the ability of reviewers to assess the significance of this work based on the instantiation of the model explored in this work. It was also noted that the writing could have been clearer when describing the intuitions for the approach and that the derivations could have been explained in more detail.,ICLR2017, +D2dBz03-x1,1576800000000.0,1576800000000.0,1,HklsthVYDH,HklsthVYDH,Paper Decision,Reject,"This paper considers solving the minimax formulation of adversarial training, where it proposes a new method based on a generic learning-to-learn (L2L) framework. Particularly, instead of applying the existing hand-designed algorithms for the inner problem, it learns an optimizer parametrized as a convolutional neural network. A robust classifier is learned to defense the adversarial attack generated by the learned optimizer. The idea is using L2L is sensible. However, main concerns on empirical studies remain after rebuttal. ",ICLR2020, +g11IReIFSkj,1610040000000.0,1610470000000.0,1,ucuia1JiY9,ucuia1JiY9,Final Decision,Reject,"We thank the authors for their detailed responses to reviewers, and for engaging in a constructive discussions. + +As explained by the reviewers, the paper is clearly written and the method is novel. However, the novelty is to combine existing ideas and techniques to define an objective function that allows to incorporate cluster assignment constraints, which was considered incremental. Regarding quality, the discussion highlighted some possible improvements that the authors propose to do in a future version of the paper, and we encourage them to follow that direction. Regarding significance, although the experimental results are promising there were some concerns that the improvement over existing techniques is marginal, and that more experiments leading to a clearer message would be useful. + +In summary, this is not a bad paper, but it is below the standards of ICLR in its current form.",ICLR2021, +HJVPNypHf,1517250000000.0,1517260000000.0,369,Byd-EfWCb,Byd-EfWCb,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"this submission has two results; (1) it defines what it means for the optimal representation is, although this is rather uninteresting that it simply says that if the representation from a model is going to be used based on some given metric, the cost function should directly reflect it, and (2) it shows that different choices of encoding and decoding have different implications. as with most of the reviewers, i found these to be a rather weak contribution.",ICLR2018, +rcXTG5A1P,1576800000000.0,1576800000000.0,1,ryxQ6T4YwB,ryxQ6T4YwB,Paper Decision,Reject,"The authors propose an invertible flow-based model for molecular graph generation. The reviewers like the idea but have several concerns: in particular, overfitting in the model, need for more experiments and missing related work. It is important for authors to address them in a future submission",ICLR2020, +SJllrJaHz,1517250000000.0,1517260000000.0,487,B1kIr-WRb,B1kIr-WRb,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers are concerned that the evaluation quality is not sufficient to convince readers that the proposed embedding method is indeed superior to alternatives. Though the authors attempted to address these comments in a subsequent revision but still, e.g., the evaluation is only intrinsic or on contrived problems. Given the limited novelty of the approach (it is a fairly straightforward generalization of Levy and Goldberg's factorization of PPMI matrix; the factorization is not new per se as well), the quality of experiments and analysis should be improved. + ++ the paper is well written +- novelty is moderate +- better evaluation and analysis are necessary +",ICLR2018, +FNEX1moHf-t,1610040000000.0,1610470000000.0,1,7hMenh--8g,7hMenh--8g,Final Decision,Reject,"Being able to give confidence intervals or have a robust measure of uncertainty is very important for offline RL methods. In this work, they proposed a dropout based method to have a measure of uncertainty. The authors provide an significant empirical improvements over other baselines. Nevertheless, as it stands right now and as AnonReviewer5 have pointed out, this paper has some important shortcomings. I have noticed that the authors have updated the paper, but still the some of the important points made by AnonReviewer5 are unaddressed as it stands right now. Thus, I am suggesting to reject this paper hoping that the authors will address those issues and resubmit to a different venue. + +Firstly, I agree with AnonReviewer5, it is not clear if the dropout and the variance trick used in this paper actually represents epistemic uncertainty that we would like to have for an offline RL algorithm, because the variance do not necessarily need to shrink as you train it with more data, and as opposed to supervised learning setting it is not clear what type of uncertainty the proposed dropout method will induce in the offline RL setting. It would have been nice to have some results showing how calibrated the uncertainty estimates coming from the dropout is... I would recommend the authors not to include any claims regarding the epistemic uncertainty in the camera-ready version of the paper. + +Also as AnonReviewer5 pointed out, having distributional baselines and/or ensemble methods like REM or bootstrapped DQN would be a more fair comparison. So, it would be nice to see some of those baselines in a future version of this paper.",ICLR2021, +I4ujghR3yhgH,1642700000000.0,1642700000000.0,1,lTqGXfn9Tv,lTqGXfn9Tv,Paper Decision,Accept (Poster),"This paper studies the important statistical phenomenon of double descent, a very timely topic, using influence functions, and thereby derives lower bounds for the population loss. The reviewers generally appreciated the conceptual as well as the technical contributions in the work, but argued that the set of assumptions taken by authors can potentially diminish the significance of the analysis. This, as well as additional issues regarding the empirical and the analytical support for the modeling assumptions (and the implied scope of applicability, i.e. lazy\kernel vs. rich regime) have generated considerable discussion between the reviewers and the authors. Along the process, major and minor concerns were addressed to the satisfaction of the reviewers, resulting in a substantial improvement of the overall evaluation. Thus, this AC recommends acceptance.",ICLR2022, +BJhTI1prG,1517250000000.0,1517260000000.0,888,SyF7Erp6W,SyF7Erp6W,ICLR 2018 Conference Acceptance Decision,Reject,This paper does not seem completely appropriate for ICLR.,ICLR2018, +TFnd7cthZgt,1610040000000.0,1610470000000.0,1,0IOX0YcCdTn,0IOX0YcCdTn,Final Decision,Accept (Poster)," +The paper proposes ALFWorld, which combined TextWorld and ALFRED to create aligned scenarios (one that is text-only, and the other in an embodied visual simulator) so that high-level policies in language can be learned in a simpler world, and then transferred to the embodied one (using the proposed BUTLER architecture). The proposed BUTLER model consists of three components: 1) a perceptual module (converts environment observation to specification of objects and relations using text), 2) goal-planning module for generating textual specification of subgoals (from observed environment state) and 3) controller module which takes outputs from 1) and 2) and generates a sequence of actions. Experiments show that using the textual specification, it is possible to models pretrained in the text world can generalize better to embodied settings. + +Review Summary: The submission received slightly divergent reviews with R2, R3 recommending acceptance (score 7) and R1 recommending reject (score 4). All reviewers recognized the novelty of the work, and the potential for follow-up work based on the submission. After considering the author response and discussion between reviewers, both R2 and R3 agreed that there are indeed flaws with the work as pointed out by R1 (R3 lowered their rating to 6). Despite the concerns, both R2,R3 remained on the positive side. + +Pros: +- The work and proposed framework can stimulate further research on transferring policies from simple text environments to more realistic visual environment. (R2) +- The decomposition of high-level goals into low-level actions sequences is a good direction for future research (R3) +- Good set of experiments and comparisons (R3) +- the paper is clearly written and easy to understand (R2) + +Cons: +- The main claim of the work (high-level policies learned in a text-based environment can be transferred to a physically simulated environment) is not properly substantiated by the experimental results. (R1) +- The proposed method is a complex system and simpler baselines should be considered (R2) +- Some assumptions are made in ALFWorld need to be hand designed and may miss important aspects of perception (R3) +- More experiments and ablations are needed to properly evaluate the framework + +Despite the issues pointed out by R1, the AC believe that the work can inspire future work in this area, and thus recommend acceptance. The paper is also well-written and easy to understand. ",ICLR2021, +TckC4nJNNrO,1642700000000.0,1642700000000.0,1,YpPiNigTzMT,YpPiNigTzMT,Paper Decision,Accept (Poster),"The paper propose a universal technique that enables weak supervision over any label type while still offering desirable properties, including practical flexibility, computational efficiency, and theoretical guarantees. + +Over the course of the rebuttal, the authors have made a substantial overhaul on writing and experimentation. The universality claims are now better supported by bounds, and experiments cover comparison to snorkel, majority vote and supervised learning, on multiple applications. The authors are encouraged to move the related work section to the main body of the paper. The authors should also clarify to what extent the contributions they make pertain to Snorkel as opposed to weak supervision more generally. This may require revisiting both the introduction as well as perhaps the title.",ICLR2022, +_DaNnE7JU,1576800000000.0,1576800000000.0,1,B1x3EgHtwB,B1x3EgHtwB,Paper Decision,Reject,"The paper develops linear over-parameterization methods to improve training of small neural network models. This is compared to training from scratch and other knowledge distillation methods. + +Reviewer 1 found the paper to be clear with good analysis, and raised concerns on generality and extensiveness of experimental work. Reviewer 2 raised concerns about the correctness of the approach and laid out several other possibilities. The authors conducted several other experiments and responded to all the feedback from the reviewers, although there was no final consensus on the scores. + +The review process has made this a better paper and it is of interest to the community. The paper demonstrates all the features of a good paper, but due to a large number of strong papers, was not accepted at this time.",ICLR2020, +S1abEkTHf,1517250000000.0,1517260000000.0,299,ry1arUgCW,ry1arUgCW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This is a very interesting paper that also seems a little underdeveloped. As noted by the reviewers, it would have been nice to see the idea applied to domains requiring function approximation to confirm that it can scale -- the late addition of Freeway results is nice, but Freeway is also by far the simplest exploration problem in the Atari suite. There also seems to be a confusion between methods such as UCB, which explore/exploit, and purely exploitative methods. The case gamma_E > 0 is also less than obvious. Given the theoretical leanings of the paper, I would strongly encourage the authors to focus on deriving an RMax-style bound for their approach. +",ICLR2018, +r1FGr16HG,1517250000000.0,1517260000000.0,522,SySisz-CW,SySisz-CW,ICLR 2018 Conference Acceptance Decision,Reject,"Thank you for submitting you paper to ICLR. The paper presents an interesting analysis, but the utility of this analysis is questionable e.g. it is not clear how this might lead to improved VAEs/GANs. The authors did add an additional experimental result in their revised paper, but questions still remain. In light of this the significance of the paper is on the low side and it is therefore not ready for publication in ICLR without more work. +",ICLR2018, +raXreyvy1kl,1610040000000.0,1610470000000.0,1,X76iqnUbBjz,X76iqnUbBjz,Final Decision,Accept (Poster),"This work provides interesting insights on the transferability of adversarial perturbations and proposes ways of making it more effective. While several reviewers have found parts of the paper unsatisfactory, there are interesting results to merit acceptance.",ICLR2021, +yelwISGoIcT,1642700000000.0,1642700000000.0,1,vLz0e9S-iF3,vLz0e9S-iF3,Paper Decision,Reject,"The paper uses quasi-potential theory to analyze the escape behavior of SGD. Although this is a topic of interest to the ML community, the reviewers found a critical issue with the paper, which the authors admit can not be fixed during this submission. I, therefore, do not think there is a need for a longer discussion.",ICLR2022, +r1eSRW94lV,1545020000000.0,1545350000000.0,1,BJEOOsCqKm,BJEOOsCqKm,Interesting question but insufficient analysis,Reject,"This paper focuses on the problem of detecting visual anomalies within textures. For that purpose, the authors consider several parametric texture models and train anomaly detection models on the corresponding outputs. + +Reviewers were generally positive about the topic under study, but were unanimous in signaling a severe weaknesses in the experimental setup. In particular, in R2 words, ""my main concern is that the performance evaluation is not suitable to achieve meaningful results"", and ""showing quantitative results from only two textures does not feel like a very comprehensive analysis"". Moreover, the authors did not respond to reviewers feedback. Therefore, the AC recommends rejection at this time.",ICLR2019,5: The area chair is absolutely certain +B1NRhM8_x,1486400000000.0,1486400000000.0,1,rJ0JwFcex,rJ0JwFcex,ICLR committee final decision,Accept (Poster),"There is a bit of a spread in the reviewer scores and unfortunately it wasn't possible to entice the reviewers to participate in a discussion. The area chair therefore discounts the late review of reviewer3, who seems to have had a misunderstanding that was successfully rebutted by the authors. The other reviewers are supportive of the paper.",ICLR2017, +kxgDTmQY3,1576800000000.0,1576800000000.0,1,HyeqPJHYvH,HyeqPJHYvH,Paper Decision,Reject,"The paper proposes a method for learning a latent dynamics model for videos. The main idea is to learn a latent representation and model the dynamics of the latent features via residual connection motivated by ODE. The architectural choice of residual connection itself is not new as many prior works have employed ""skip connections"" in hidden representations but the notion of connecting this with ODE and factoring time as input into the residual function seems a new idea. The experimental results show the promise of the proposed method on moving MNIST, KTH, and BAIR datasets. The experiments on different frame rates are also nice. In terms of weakness, the evaluation is performed on relatively simple domains (e.g., moving MNIST and KTH) with static backgrounds and the improvement on BAIR dataset (which is not considered as a difficult benchmark) in terms of FVD is not as clear. For the BAIR dataset, it's unclear how the proposed method will handle the interactions between the robot arm and background objects due to the modeling assumption (i.e., static background). In this sense, content swap results on BAIR dataset look quite anecdotal, and the significance is limited. For improvement, I would suggest adding evaluations on other challenging domains, such as Human 3.6M (where human motions are much more uncertain compared to KTH) and other Robot datasets with more complex robot-object interactions. Overall, the paper proposes an interesting architecture with promising results on relatively simple datasets, but the advantage over existing SOTA methods on challenging benchmarks is unclear yet. +",ICLR2020, +#NAME?,1610040000000.0,1610470000000.0,1,KmykpuSrjcq,KmykpuSrjcq,Final Decision,Accept (Poster),"This paper proposes an extension to previous unsupervised feature learning work, with an EM-style latent variable model with momentum encoders. The paper is well-written and provides a nice read. It has been noted that it is easy to follow and provides good insights. On the experimental side, compared with MoCo, the proposed approach achieve noticeable improvements. One of the reviewers noted the easy reproducibility of the proposed approach. + +Some reviewers noted some comparisons were lacking from the original manuscript, but the authors have update the draft to include those. As noted in the reviews, the field of SSL in vision is moving at a very quick pace, making it hard to clearly state what is the SOTA at time t. + +Overall, most questions raised by the reviewers were properly addressed during rebuttal - and given the ratings, I suggest acceptance.",ICLR2021, +Lml9dqy7Ul,1576800000000.0,1576800000000.0,1,Skl-fyHKPH,Skl-fyHKPH,Paper Decision,Reject,"This paper was assessed by three reviewers who scored it as 6/1/6. The main criticism included somewhat weak experiments due to the manual tuning of bandwidth, the use of old (and perhaps mostly solved/not challenging) datasets such as Mnist and Cifar10, lack of ablation studies. The other issue voiced in the review is that the proposed method is very close to a MMD-GAN with a kernel plus random features. Taking into account all positives and negatives, we regret to conclude that this submission falls short of the quality required by ICLR2020, thus it cannot be accepted at this time. + +",ICLR2020, +4wnt509XLIP,1642700000000.0,1642700000000.0,1,TySnJ-0RdKI,TySnJ-0RdKI,Paper Decision,Accept (Poster),"Inspired by the observtion that the poisoned samples tend to cluster together in the feature space of the attacked DNN model, which is mostly due to the end-to-end supervised training paradigm, the authors propose a novel defense method based on contrastive learning and decouple end-to-end training to defend against backdoor attacks. +The issues, including training time, difference from certain previous studies, ablation study, and so on, raised by the reviewers have been properly addressed and the reviewers are satisfied with the responses from the authors. +According to the consistent positive opinions from the reviewrs, this manuscript is recommended to accept.",ICLR2022, +1Ou9gqEC61n,1642700000000.0,1642700000000.0,1,H6mR1eaBP1l,H6mR1eaBP1l,Paper Decision,Reject,"This paper presents an approach for using prior knowledge to constrain transitions for consecutive time steps and aims to replace conditional random fields for sequence tagging tasks in sequence labeling. However, the paper seems incomplete with no experimental results and analysis to validate the proposed ideas.",ICLR2022, +N8FXAmWlT0,1610040000000.0,1610470000000.0,1,avHr-H-1kEa,avHr-H-1kEa,Final Decision,Reject,"Three reviewers are mildly positive, while one is negative. The substantive comments of the reviewers are consistent with each other; it is merely their evaluations that differ. + +One contribution of the paper is that it shows how using temperature tuning can yield similar accuracy to using batch normalization; this is useful because batch normalization is not always possible. The revised paper shows improvements, and we appreciate the engagement of the authors with the reviewer comments. However, there are remaining weaknesses such as a weak argument based on the empirical.results. + +This paper can be improved based on the comments made by the reviewers. We encourage the authors to resubmit to a future venue.",ICLR2021, +zNpQVyeUqj,1576800000000.0,1576800000000.0,1,S1l-C0NtwS,S1l-C0NtwS,Paper Decision,Accept (Poster),"Reviewer worries include: whether the approach scales to distant language pairs, overselling of the paper as a ""framework"", a few citations and comparisons missing. I agree and encourage the authors not to use the word ""framework"" here. I would also encourage the authors to evaluate on more interesting language pairs, and analyze what vocabularies are relocated, as well as what their method is better at compared to previous work. ",ICLR2020, +B1QNSyarz,1517250000000.0,1517260000000.0,543,SkNQeiRpb,SkNQeiRpb,ICLR 2018 Conference Acceptance Decision,Reject,"meta score: 4 + +The paper uses a deep autoencoder to rating prediction, with experiments on netflix. + +Pros + - Proposed dense refeeding approach appears novel + - Good experimental results + +Cons + - limited experimentation + - main novelty (dense refeeding) is not well linked to existing data imputation approaches + - novel contribution is otherwise quite limited + +",ICLR2018, +rJlPeBuEeV,1545010000000.0,1545350000000.0,1,ryeaZhRqFm,ryeaZhRqFm,"Interesting and important problem, somewhat limited novelty of the approach",Reject,"The paper describes a method for the link prediction problem in both directed and undirected hypergraphs. While the problem discussed in the paper is clearly importnant and interesting, all reviewers agree that the novelty of the proposed approach is somewhat limited given the prior art.",ICLR2019,4: The area chair is confident but not absolutely certain +OeDzNeCFm54,1610040000000.0,1610470000000.0,1,sr68jSUakP,sr68jSUakP,Final Decision,Reject,"During the discussion phase, although the reviewers acknowledge the effectiveness of the proposed approach, they raised the concern about the novelty of the paper. + +In my opinion, I also agree that the novelty is not well justified in this paper. In the related work section, although the authors put an effort to review the existing studies of subspace learning and feature selection, their relationship (similarity and/or difference) to the proposed method is not discussed. Since the idea of using subspace learning and feature selection in clustering is standard, the novelty of this work should be introduction of the integration step into neural networks, which is not significant enough in its current state. The paper becomes more significant if, for example, theoretically discuss the unique characteristics of the integration into NNs which does not appear in the usual setting. + +In addition, the motivation of face clustering is not convincing. I recommend either (1) use and discuss the domain specific property of the problem of face clustering in the proposed method, or (2) construct a general clustering method. Since the authors present additional experiments in the author response, I guess (2) fits. Then, however, the paper should be re-organized. + +The readability can be improved. For example, Algorithm 1 receives training data {X, A}, but I cannot find the definition of A. Also, please italicize mathematical symbols. + +Overall, the paper is still not ready for publication, I will therefore reject the paper. +",ICLR2021, +BJzN0S-PQ,1576800000000.0,1576800000000.0,1,SkglVlSFPS,SkglVlSFPS,Paper Decision,Reject,"The authors study planning problems with sparse rewards. +They propose a tree search algorithm together with an ensemble of value +functions to guide exploration in this setting. +The value predictions from the ensemble are combined in a risk sensitive way, +therefore biasing the search towards states with high uncertainty in value +prediction. +The approach is applied to several grid-world environments. + +The reviewers mostly criticized the presentation of the material, in particular +that the paper provided insufficient details on the proposed +method. Furthermore, the comparison to model-free RL methods was deemed somewhat +lacking, as the proposed algorithm has access to the ground truth model. +The authors improved the manuscript in the rebuttal. + +Based on the reviews and my own reading I think that the paper in it's current +form is below acceptance threshold. However, with further improved presentation +and baselines for the experiments, this has potential to be an important contribution.",ICLR2020, +_VPelBLUb,1576800000000.0,1576800000000.0,1,HJezF3VYPB,HJezF3VYPB,Paper Decision,Accept (Poster),"This paper studies an interesting new problem, federated domain adaptation, and proposes an approach based on dynamic attention, federated adversarial alignment, and representation disentanglement. + +Reviewers generally agree that the paper contributes a novel approach to an interesting problem with theoretical guarantees and empirical justification. While many professional concerns were raised by the reviewers, the authors managed to perform an effective rebuttal with a major revision, which addressed the concerns convincingly. AC believes that the updated version is acceptable. + +Hence I recommend acceptance.",ICLR2020, +B9aGy9PoIzE,1642700000000.0,1642700000000.0,1,RW_GTtTfHJ6,RW_GTtTfHJ6,Paper Decision,Reject,"In this paper, the authors provide a model-based approach for combining experimental and observational data in reinforcement learning, specifically in POMDPs. + +The paper was not received very favorably by reviewers, with the main concerns revolving around: (a) writing quality, (b) validation, (c) extent of contribution given existing work on causal RL. + +In preparing your revision, in addition to clarifying writing, and adding better validation, I would urge the authors to consult existing causal inference literature on point and partial identification in settings related to RL, such as off-line policy learning. This will help address issues of novelty by extending their approach to settings with more types of confounding. In addition to useful references suggested by reviewers, another useful draft may be: + +""Path-Dependent Structural Equation Models."" Srinivasan, R., Lee, J., Bhattacharya, R., and Shpitser, I.. In Proceedings of the Thirty Seventh Conference on Uncertainty in Artificial Intelligence.",ICLR2022, +hdj3aOSrtvR,1642700000000.0,1642700000000.0,1,4XtpgPsvxE8,4XtpgPsvxE8,Paper Decision,Reject,"The premise of this paper is that the development of time series forecasting methods has traditionally focused on accuracy rather than other criteria such as training time or latency. This work presents a new benchmark, evaluating classical and deep learning-based methods on a number of public datasets. They also propose a technique, ParetoSelect that is able to select models from the Pareto front that can efficiently select models in a multi-objective setting. + +Reviewer XSBL liked the observation that classical methods do not always beat deep learning methods even for very small datasets. They thought that the empirical contribution was valuable and myth-breaking. They also commented that the evaluation was robust. Their main concerns were: inadequate description of hyperparameters, lack of evaluations on *really* small datasets, missing confidence measures for latency results. They also made some suggestions for improving clarity. The reviewers responded, pointing to a description of the hyperparameters in Table 2 of the appendix. They also responded to the reviewer’s comment about very small datasets and explained the advantage of the ranking loss. They made some small adjustments to the paper based on the clarity comments. + +Reviewer PZ2f also thought that the large scale comparison was valuable for the community. Overall they thought it was well written though could be improved w.r.t. Notation and writing style. They even inspected the code. Their primary concern was that the paper lacked focus and “tries to do too much and too little”. Is this a benchmarking effort of previous methods, or is the main contribution the ParetoSelect algorithm? This reviewer thought that due to its superficial coverage of too many things, and it wasn’t ready yet for publication at ICLR. The authors provided quite a comprehensive response to reviewer PZ2f and pointed to some minor improvements in the manuscript. + +Reviewer rQb3, like the others, viewed the benchmark analysis as valuable. They thought that the ParetoSelect approach was “natural” and that it was shown to be effective over baselines. Like PZ2f they had some structural criticisms and pressed for more insights. + +Reviewers XSBL and rQb3 continued to engage in discussion through the AC-reviewer discussion phase. XSBL said that the authors’ response addressed some concerns yet raised others w.r.t. hyper-parameter selection. rQb3 updated their review after considering the author's response, feeling that minor concerns were addressed but the paper could still use further development. Overall, after considering the discussion I think that it’s been difficult for the authors to provide any patterns regarding which model performs best for which datasets. To me, a benchmark paper should provide some deeper insight and the paper appears to be struggling to do that. On the other hand, the study is comprehensive. The authors have argued in their response to all reviews, that their evaluation is at quite a different scale compared to other published time series model evaluations. I think that this benchmark paper can provide value to the community yet it could use further work: specifically the authors need to focus the paper and communicate clearer insights from the study.",ICLR2022, +xlU-k6aiZ4,1576800000000.0,1576800000000.0,1,r1x_DaVKwH,r1x_DaVKwH,Paper Decision,Reject,"This paper proposes a new benchmark that compares performance of deep reinforcement learning algorithms on the Atari Learning Environment to the best human players. The paper identifies limitations of past evaluations of deep RL agents on Atari. The human baseline scores commonly used in deep RL are not the highest known human scores. To enable learning agents to reach these high scores, the paper recommends allowing the learning agents to play without a time limit. The time limit in Atari is not always consistent across papers, and removing the time limit requires additional software fixes due to some bugs in the game software. These ideas form the core of the paper's proposed new benchmark (SABER). The paper also proposes a new deep RL algorithm that combines earlier ideas. + +The reviews and the discussion with the authors brought out several strengths and weaknesses of the proposal. One strength was identifying the best known human performance in these Atari games. +However, the reviewers were not convinced that this new benchmark is useful. The reviewers raised concerns about using clipped rewards, using games that received substantially different amounts of human effort, comparing learning algorithms to human baselines instead of other learning algorithms, and also the continued use of the Atari environment. Given all these many concerns about a new benchmark, the newly proposed algorithm was not viewed as a distraction. + +This paper is not ready for publication. The new benchmark proposed for deep reinforcement learning on Atari was not convincing to the reviewers. The paper requires further refinement of the benchmark or further justification for the new benchmark.",ICLR2020, +9abhtGXJdR,1576800000000.0,1576800000000.0,1,H1gDNyrKDS,H1gDNyrKDS,Paper Decision,Accept (Talk),"This paper studies the properties of Differentiable Architecture Search, and in particular when it fails, and then proposes modifications that improve its performance for several tasks. The reviews were all very supportive with three Accept opinions, and authors have addressed their comments and suggestions. Given the unanimous reviews, this appears to be a clear Accept. ",ICLR2020, +mYoyXMCdpHx,1610040000000.0,1610470000000.0,1,T1EMbxGNEJC,T1EMbxGNEJC,Final Decision,Reject,"Despite the performance gains of RankingMatch over the benchmarks used in the paper, the reviewers remained concerned about how the paper compares to state of the art in several respects.",ICLR2021, +BklBdMH1lV,1544670000000.0,1545350000000.0,1,rkMW1hRqKX,rkMW1hRqKX,Exciting approach to training sequence-to-sequence models from scratch,Accept (Poster),"This paper proposes an algorithm for training sequence-to-sequence models from scratch to optimize edit distance. The algorithm, called optimal completion distillation (OCD), avoids the exposure bias problem inherent in maximum likelihood estimation training, is efficient and easily implemented, and does not have any tunable hyperparameters. Experiments on Librispeech and Wall Street Journal show that OCD improves test performance over both maximum likelihood and scheduled sampling, yielding state-of-the-art results. The primary concerns expressed by the reviewers pertained to the relationship of OCD to methods such as SEARN, DAgger, AggreVaTe, LOLS, and several other papers. The revision addresses the problem with a substantially larger number of references and discussion relating OCD to the previous work. Some issues of clarity were also well addressed by the revision.",ICLR2019,5: The area chair is absolutely certain +HJOjN1TBM,1517250000000.0,1517260000000.0,426,S1LXVnxRb,S1LXVnxRb,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"We encourage the authors to improve the mentioned aspects of their work in the reviews. +",ICLR2018, +CgcRRDdKTd,1576800000000.0,1576800000000.0,1,r1gelyrtwH,r1gelyrtwH,Paper Decision,Accept (Poster),"All reviewers agree that this research is novel and well carried out, so this is a clear accept. Please ensure that the final version reflect the reviewer comments and the new information provided during the rebuttal",ICLR2020, +_z-mhtJGDh5,1610040000000.0,1610470000000.0,1,MwxaStJXK6v,MwxaStJXK6v,Final Decision,Reject,"The paper provides an improved analysis of the finite time convergence rate of double Q-learning under more reasonable step size rules, comparing to previous work by Xiong et al., 2020. Understanding the convergence behavior of double Q-learning is an obviously interesting theoretical topic and all reviewers appreciate the authors’ improved analysis. + + Several reviewers questioned the sample complexity in terms of the dependence on L (thus |S||A|); In the latest revision, the authors claimed they now refined the dependence from O(L^6) to O(L). This major change is yet to be further reviewed since the authors did not leave any clue on why/how such an improvement was attained. +Another outstanding concern relates to the theoretical comparison of the rates between double Q-learning and Q-learning, which remains clueless. It’s unclear whether the bound in this paper is sharp enough and whether/when double Q-learning is provably inferior than Q-learning. + +Therefore, I am not recommending acceptance at this time, though I encourage the authors to resubmit with a more conclusive theoretical analysis. +",ICLR2021, +OyHf0auxT,1576800000000.0,1576800000000.0,1,rJePwgSYwB,rJePwgSYwB,Paper Decision,Reject,"This article studies convergence of WGAN training using SGD and generators of the form $\phi(Ax)$, with results on convergence with polynomial time and sample complexity under the assumption that the target distribution can be expressed by this type of generator. This expands previous work that considered linear generators. An important point of discussion was the choice of the discriminator as a linear or quadratic function. The authors' responses clarified some of the initial criticism, and the scores improved slightly. Following the discussion, the reviewers agreed that the problem being studied is a difficult one and that the paper makes some important contributions. However, they still found that the considered settings are very restrictive, maintaining that quadratic discriminators would work only for the very simple type of generators and targets under consideration. Although the article makes important advances towards understanding convergence of WGAN training with nonlinear models, the relevance of the contribution could be greatly enhanced by addressing / discussing the plausibility or implications of the analysis in a practical setting, in the best case scenario addressing a more practical type of neural networks. ",ICLR2020, +AM-aL3q6D-D,1642700000000.0,1642700000000.0,1,JXSZuWSPH85,JXSZuWSPH85,Paper Decision,Reject,"This paper studies the problem of inverse reinforcement learning by relying on only demonstrations and no interaction (like imitation learning). The reviewers liked the premise but had major concerns with evaluation and baselines. The paper initially received reviews tending to reject. One of the questions was about missing behavior cloning baseline which the authors added in rebuttal. But the BC baseline seems to be really competitive (in fact, better in 3 out of 4 envs) as compared to the proposed approach. In conclusion, all reviewers still believed that their concerns regarding insufficient evidence for justifying approach and missing comparisons to other prior work still stand. AC agrees with the reviewers' consensus that the paper is not yet ready for acceptance.",ICLR2022, +KrBNNY48fQs,1610040000000.0,1610470000000.0,1,hecuSLbL_vC,hecuSLbL_vC,Final Decision,Reject,"The reviewers were excited by the paper's theoretical contribution to continual learning, since that aspect of continual learning is underdeveloped. However, all reviewers (including the most positive reviewer during discussions) expressed that the paper would benefit from revisions to improve the clarity and the thoroughness of comparisons in the paper. The paper's focus on OGD is not necessarily an issue for it to be of use to the community, as mentioned as a negative point in one review that other reviewers disagreed with. The authors are encouraged to revise this paper incorporating the reviewers' suggestions.",ICLR2021, +Zd3wDqbwYmB,1642700000000.0,1642700000000.0,1,iaqgio-pOv,iaqgio-pOv,Paper Decision,Reject,"This paper presents two novel approaches to provide explanations for the similarity between two samples based on 1) the importance measure of individual features and 2) some of the other pairs of examples used as analogies. The proposed approach to explain similarity prediction is a relatively less explored area, which makes the problem addressed and the proposed method unique. However, reviewers expressed concerns about evaluation methods and there were some concerns about the design choices that were not well motivated. The major issue is, as pointed out by the majority of the reviewers, the evaluation methods. Given the paper, reviews, and responses of the authors and the reviewers, it appears that there is certainly room for improvement for more convincing evaluation methodologies to convince a cross-section of machine learning researchers that the proposed approach advances the field. Overall, this is a good paper, but appears to be borderline to marginally below the threshold for the acceptance.",ICLR2022, +hzVFXt3zkzD,1642700000000.0,1642700000000.0,1,5y35LXrRMMz,5y35LXrRMMz,Paper Decision,Reject,"This paper received a split of scores, from 3 to 10. Among the reviewers, there are both strong advocates and strong rejects. All reviewers agree that finding a policy that is not only improving value but also has lowered variance is an interesting ideas. However, many reviewers point out that are major clarity issues that might hide fundamental problems. The proved guarantees seem to require strong assumptions that are unlikely to hold in practice, and experimental comparisons also have some subtleties. Taking together, although this could be a very interesting work, it will require a major revision and another round of review+discussions before it can be shaped into an acceptable paper.",ICLR2022, +X21lfgpaPs8,1642700000000.0,1642700000000.0,1,TQ75Md-FqQp,TQ75Md-FqQp,Paper Decision,Reject,"This paper was particularly discussed between the reviewers, the AC and SAC. A last minute reviewer was also called to clarify some issues raised, as one of the reviews never got into the system. + +The paper was overall perceived as well written and well presented, and that the software contribution of implicit differentiation techniques is a nice asset for the community, especially its modularity. +The stability guaranty constitutes a nice (though straightforward) result providing a theoretical ground for the proposed approach. +Yet, the paper is often loose on mathematical justifications, in particular on minimal validity assumptions. Details on when the proposed framework could fail would be of interest, both on theoretical and practical parts. A discussion on the minimal assumptions required for validity of the approach should be highlighted more in the text. + +Furthermore, the paper lacks discussions and comparisons with concurrent works, +for instance how would the framework compare with existing estimates for implicit differentiation or for unrolling. This could be improved along with providing more analysis on the implementation efficiency. +On the practical part, a high level description the software details would also be much beneficial. +A core discussion focused around what should be expected of this type of paper (i.e., ""implementation issues, parallelization, software platforms, hardware"" papers as suggested by Q3Lr) + +A point of concern was the novelty aspects in the discussion phase was the novelty of the proposed framework: even if the contribution is the framework introduced, this is not new per se (the literature on implicit differentiation now contains a considerable amount of results and implementation examples). +The relevance of the work, both on theoretical and computational aspects, beyond the development of a computational library was found difficult to assess by several reviewers. +Overall, the reviewers judged the novelty and the paper's contribution more on the software side. Hence, a core discussion could focus on aspects expected for code oriented papers (i.e., implementation issues, parallelization, hardware, etc.). + +Following the long discussion phase (more than 30 posts on OpenReview) and the aforementioned comments, the paper was rejected. + +We encourage the authors to submit a revised version in a future conference or possibly to a software oriented journal, such as JMLR MLOSS or JOSS for instance.",ICLR2022, +r1tqoMUdg,1486400000000.0,1486400000000.0,1,S1vyujVye,S1vyujVye,ICLR committee final decision,Reject,"The paper proposes a formulation for unsupervised learning of ConvNets based on the distance between patches sampled from the same and different images. The novelty of the method is rather limited as it's similar to [Doersch et al. 2015] and [Dosovitsky et al. 2015]. The evaluation is only performed on the small datasets, which limits the potential impact of the contribution.",ICLR2017, +HyMl3G8Oe,1486400000000.0,1486400000000.0,1,Sys6GJqxl,Sys6GJqxl,ICLR committee final decision,Accept (Poster),"The paper is the first to demonstrate that it is possible for an adversary to change the label that a convolutional network predicts for an image to a specific value. Like Papernot et al., it presents a successful attack on Clarifai's image-recognition system. I encourage the authors to condense the paper to its key results (13 pages without / 24 pages with supplemental material is too long for a conference paper).",ICLR2017, +OCTOrSKK3vB,1642700000000.0,1642700000000.0,1,q4tZR1Y-UIs,q4tZR1Y-UIs,Paper Decision,Accept (Poster),"While one reviewer remained concerned about the possibility of convergence to bad equilibria and felt that the proposed method appears to be four minor changes from prior work (PAIRED), the authors demonstrate empirically that the proposed changes make a significant difference in their evaluation. Other reviewers were positive about this work and all others rated this work as an accept. Post rebuttal the most positive reviewer increased their score to an 8 and felt did a good job answering their concerns. They wanted to see an analysis of systems with larger numbers of agents, but felt that the current manuscript was more than sufficient to warrant acceptance, and fell into the category of a good paper with the additional ablations provided during the rebuttal. + +The AC recommends accepting this paper.",ICLR2022, +Hk2Qr16rG,1517250000000.0,1517260000000.0,537,ByJDAIe0b,ByJDAIe0b,ICLR 2018 Conference Acceptance Decision,Reject,"This paper presents a memory architecture for RL based on reservoir sampling, and is meant to be an alternative to RNNs. The reviewers consider the idea to be potentially interesting and useful, but have concerns about the mathematical justification. They also point out limitations in the experiments: in particular, use of artificial toy problems, and a lack of strong baselines. I don't think the paper is ready for ICLR publication in its current form. + +",ICLR2018, +AxMtqcNdkZU,1610040000000.0,1610470000000.0,1,GCXq4UHH7h4,GCXq4UHH7h4,Final Decision,Reject,"The reviewers all agree that the problems studied in this paper are interesting, and the solutions provided are reasonable. However qualitative and quantitative comparisons to state of the art methods are missing, and the sensing model assumed by the paper needs to be more well motivated. +",ICLR2021, +D5j0__OBk,1576800000000.0,1576800000000.0,1,S1ghzlHFPS,S1ghzlHFPS,Paper Decision,Reject,"While reviewers find this paper interesting, they raised number of concerns including the novelty, writing, experiments, references and clear mention of the benefit. Unfortunately, excellent questions and insightful comments left by reviewers are gone without authors’ answers. ",ICLR2020, +Mrj7ruFqbhX,1642700000000.0,1642700000000.0,1,NoB8YgRuoFU,NoB8YgRuoFU,Paper Decision,Accept (Poster),"The paper proposes a novel method, PI3NN, for estimating prediction intervals (PIs) for quantifying the uncertainty of neural network predictions. The method is based on independently training three neural networks with different loss functions which are then combined via a linear combination where the coefficients for a given confidence level can be found by the root-finding algorithm. A specific initialization scheme allows to employ the method to OOD detection. + +Reviewers agreed on the importance of the problem of producing reliable confidence estimates. The proposed method addressed some of the limitations of the existing approaches, and reviewers valued that a theoretical as well as an empirical analysis is provided. + +On of the main criticisms was that the theoretical derivation of the method is based on the assumption of the noise being homoscedastic. This however is a common issue with other methods in this area, which are nevertheless all applied (and seem to work) on heteroscedastic data as well and are outperformed by the proposed method. Another main point that was criticized was that the empirical analysis was limited. In turn the authors added another experiments on another dataset and with another network architecture (a LSTM) to their analysis. Moreover, the authors adequately addressed a lot of the concerns and questions of the reviewers in their answers and the revised manuscript. The final mean scores are exactly borderline (5.5) but with a higher confidence of reviewers voting for acceptance. Based on the listed points, the paper should be accepted. + +I would encourage to improve the discussion around the dependence on x in Section 3.2, which could still be made clearer, in the final version of the manuscript, and to add the discussion about the limitations of the theoretical analyses (i.e. the applicability only to the homoscedastic settings) to the conclusion.",ICLR2022, +r1g_ZvFXgE,1544950000000.0,1545350000000.0,1,HkeWSnR5Y7,HkeWSnR5Y7,The clarity of presentation and evaluation is a concern,Reject,"This paper conducts a study on provable defenses to spatially transformed adversarial examples. In general, the paper pursues an interesting direction, but reviewers had many concerns regarding the clarity of the presentation and the depth of the experimental results, which the authors did not address in a rebuttal. ",ICLR2019,5: The area chair is absolutely certain +JxeoXn4ld8,1576800000000.0,1576800000000.0,1,r1gRTCVFvB,r1gRTCVFvB,Paper Decision,Accept (Poster),"This paper presents an approach for the long-tailed image classification, where the class frequencies during (supervised) training of an image classifier are heavily skewed, so that the classifier underfits on under-represented classes. The authors' responses to the reviews clarified most of their concerns, although some reviewers pointed out that some of the details regarding experiments such as the construction of the validation set and the selection of balanced/imbalanced sets remain unclear. Overall, we believe this paper contains interesting observations to be shared.",ICLR2020, +fflH5pKQq,1576800000000.0,1576800000000.0,1,S1xO4xHFvB,S1xO4xHFvB,Paper Decision,Reject,"This paper proposed a very general idea called Atomic Compression Networks (ACNs) to construct neural networks. The idea looks simple and effective. However, the reason why it works is not well explained. The experiments are not sufficient enough to convince the reviewers.",ICLR2020, +AheRSFOxic,1576800000000.0,1576800000000.0,1,BJeS62EtwH,BJeS62EtwH,Paper Decision,Accept (Poster),"This paper presents a method for extracting ""knowledge consistency"" between neural networks and understanding their representations. + +Reviewers and AC are positive on the paper, in terms of insightful findings and practical usages, and also gave constructive suggestions to improve the paper. In particular, I think the paper can gain much attention for ICLR audience. + +Hence, I recommend acceptance.",ICLR2020, +rJewJH_bg4,1544810000000.0,1545350000000.0,1,SJl98sR5tX,SJl98sR5tX,metareview,Reject,"The submission proposes a setting of two agents, one of them probing the other (the latter being the ""demonstrator""). The probing is done in a way that learns to imitate the expert's behavior, with some curiosity-driven reward that maximizes the chance that the probing agents has the expert do trajectories that the probing agent hasn't seen before. + +All the reviewers found the idea and experiments interesting. The major concern is whether the setup and the environments are too contrived. At least 2 reviewers commented on the fact that the environments/dataset seemed engineered for success of the given method, which is a concern about how this method would generalize to something other than the proposed setup. + +I also share the concern with R3 regarding the practicality of the proposed method: it is not obvious to me what problems this would actually be *useful* for, given that the method requires online interaction with an expert agent in order to succeed. The space of such scenarios where we can continuously probe an expert agent many many times for free/cheap is very small and frankly I'm not entirely sure why you would need to do imitation learning in that case at all (if the method was shown to work using only a state, rather than requiring a state/action pair from the expert, then maybe it'd be more useful). + +It's a tough call, but despite the nice results and interesting ideas, I think the method lacks generality and practical utility/significance and thus at this point I cannot recommend acceptance in its current form.",ICLR2019,4: The area chair is confident but not absolutely certain +BkA6ozUul,1486400000000.0,1486400000000.0,1,ryMxXPFex,ryMxXPFex,ICLR committee final decision,Accept (Poster),"The authors present a novel reparameterization framework for VAEs with discrete random variables. The idea is to carry out symmetric projections of the approximate posterior and the prior into a continuous space and evaluating the autoencoder term in that space by marginalizing out the discrete variables. They consider the KL divergence between the approximating posterior and the true prior in the original discrete space and show that due to the symmetry of the projection into the continuous space, it does not + contribute to the KL term. + + One question that warrants further investigation is whether this framework can be extended to GANs and what empirical success they would have. + + The reviewers have presented a strong case for the acceptance of the paper and I go with their recommendation.",ICLR2017, +WFHqV89_eIf,1642700000000.0,1642700000000.0,1,-8sBpe7rDiV,-8sBpe7rDiV,Paper Decision,Accept (Poster),"The authors propose an adversarial training method to increase network robustness to parameter variations. The proposed approach performs adversarial attacks on network parameters during training. They demonstrate that their method flattens the loss landscape of the network. Experiments were performed on F-MNIST, ECG data, and speech command detection datasets using a conventional CNN and a recurrent spiking neural networks (SNNs). + +The manuscript is well-written and the method is interesting. + +One reviewer was somewhat concerned about the novelty of the work, but acknowledged that the application to recurrent SNNs was new. +The main initial criticism was the question of scalability of the method, as it was tested only on networks with a relatively small number of parameters. + +In the revision, the authors addressed these issues. Their method was compared to related approaches, and experiments on CIFAR-10 with a ResNet32 were performed. +The reviewers acknowledged these larger-size experiments, but were not fully convinced as much larger models are typically used today. + +Nevertheless, the reviewers acknowledged the improvements and ratings were increased, so all are voting for acceptance.",ICLR2022, +SJkHEypBf,1517250000000.0,1517260000000.0,337,rk6H0ZbRb,rk6H0ZbRb,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"I am somewhat of two minds from the paper. The authors show empirically that adversarial perturbation error follows power law and looks for a possible explanation. The tie in with generalization is not clear to me and makes me wonder how to evaluate the significance of the finding of the power law distribution.. On the other hand, the authors present an interesting analysis, show that the finding holds in all the cases they explored and also found that architecture search can be used to find neural networks that are more resilient to adversarial search (the last shouldn't be surprising if that was indeed the training criterion). + +All in all, I think that while the paper needs a further iteration prior to publication, it already contains interesting bits that could spur very interesting discussion at the Workshop. + +(Side note: There's a reference missing on page 4, first paragraph)",ICLR2018, +HJxxv_iWxE,1544820000000.0,1545350000000.0,1,HyxKIiAqYQ,HyxKIiAqYQ,Strong contribution to image compression,Accept (Poster),"This paper proposes an algorithm for end-to-end image compression outperforming previously proposed ANN-based techniques and typical image compression standards like JPEG. + +Strengths +- All reviewers agreed that this a well written paper, with careful analysis and results. + +Weaknesses +- One of the points raised during the review process was that 2 very recent publications propose very similar algorithms. Since these works appeared very close to ICLR paper submission deadline (within 30 days), the program committee decided to treat this as concurrent work. + +The authors also clarified the differences and similarities with prior work, and included additional experiments to clarify some of the concerns raised during the review process. Overall the paper is a solid contribution towards improving image compression, and is therefore recommended to be accepted. +",ICLR2019,5: The area chair is absolutely certain +CnP0gwDJmvE,1642700000000.0,1642700000000.0,1,WIJVRV7jnTX,WIJVRV7jnTX,Paper Decision,Reject,"This is a borderline case and it's quite difficult to decide the recommendation. The paper works on a critically important problem, namely removing or reducing the in-distribution accuracy drop when we need to also take the out-of-distribution accuracy into account. The proposed method is simple and it works, which is great. However, as the reviewers discussed, the demonstrated applications are not very representative, and the authors should consider more popular setups of few-short learning and even other forms of domain generalization. Furthermore, adversarial examples are also OOD (in most cases, since the ID manifolds are thin films and the attacks can easily go out of the ID manifolds), it would be great if adversarial accuracy can be incorporated as a case of OOD accuracy. Since there is still room for improvement, we hope the paper would benefit from a cycle of revisions for a re-submission.",ICLR2022, +b13_Rj40-P,1642700000000.0,1642700000000.0,1,H-iABMvzIc,H-iABMvzIc,Paper Decision,Accept (Poster),"This paper proposes an approach to improve cross-domain generalization in few-shot learning, using an objective that attempts to fight overfitting on the observed domain at any given iteration while maintaining the general learned information so far from all domains. The approach uses a domain-cycling procedure, where each iteration sees a single domain and, pseudo-labels coming from predictions of a previous iterate of the model and from a parameter-averaged general model are used in a combined training objective. + +Three of the reviewers support acceptance (one strongly), while the fourth leans weakly towards rejection, despite an extensive response from the authors that include new results. One concern was a lack of comparison on Meta-Dataset, which the authors went some way towards addressing during the rebuttal, though they also argued Meta-Dataset couldn't really support the kind of cross-domain evaluation they were targeting. The reviewer was not convinced by the authors' argument, and I too am not, in particular when you consider that Meta-Dataset evaluations now often include evaluations on MNIST and CIFAR-10/100, in addition to MS-COCO and TrafficSigns (all not included in the training split of Meta-Dataset). That said, the experimental protocol favored in the authors' experiments certainly is sound and challenging for cross-domain generalization, so I'd hesitate to penalize them for that alternative choice. + +Overall, I find the ideas behind this work neat, interesting and well motivated. Even if the basic ideas aren't completely novel, I found their combination thought provoking and creative. + +Therefore, in the end, I feel this work will be beneficial to the body of literature on few-shot learning and would merit to appear at ICLR.",ICLR2022, +FL3upWkjnJD,1642700000000.0,1642700000000.0,1,YqHW0o9wXae,YqHW0o9wXae,Paper Decision,Reject,"The paper proposed a novel assisted learning scenario which would likely be useful for organizational level learners (i.e. learners with sufficient computational resources but limited and imbalance data). The paper is generally well presented, but there are shared concerns amongst the reviewers in the significance of technical contributions: (1) Due to the asymptotic nature of the consistency results, the technical strength is not strongly supported with the existing theoretical analysis. (2) Although the problem setup is novel and seems interesting, the practical significance of the results is not well supported without a concrete real-world application. (3) There are a few clarity issues raised in the reviews, which suggest that the paper could benefit from a major revision to address the above concerns.",ICLR2022, +rY85litDgyS,1610040000000.0,1610470000000.0,1,8IbZUle6ieH,8IbZUle6ieH,Final Decision,Reject,"While the author response clarified some concerns, it could not convince the reviewers that the current version of the paper should be accepted for publication at ICLR. +",ICLR2021, +ryJ3iG8_g,1486400000000.0,1486400000000.0,1,r1kGbydxg,r1kGbydxg,ICLR committee final decision,Reject,"After reading the paper and the reviews, I believe that the paper presents a solid contribution and a detailed empirical exploration of the choice of action space representations for continuous control trajectory tracking tasks, but has limited relevance to the ICLR audience in its present form. + + The conclusion that PD controllers with learned gains provide improved learning speed and sometimes better final results than the ""default"" joint torque representation is intriguing and has some implications for future work on trajectory tracking for simulated articulated rigid body characters, but it's unclear from the current results what conclusion there is to take away for general algorithm or model design. Since the only evaluation is on trajectory tracking, it's also not even clear (as pointed out by other reviewers) to what degree these results will generalize to other tasks. In fact, a contrarian view might be that PD tracking is specifically a good fit for trajectory tracking with a known reference trajectory, but might be a poor fit for accomplishing a particular task, where more open-loop behaviors might be more optimal. The passive compliance of MTUs is also shown to often not be beneficial, but it's again not clear whether this might in fact be an artifact of the trajectory tracking task. + + But perhaps more problematically, the primary conclusions of the paper are of limited relevance when it comes to design of algorithms or models (or at least, the authors don't discuss this), but are more relevant for practitioners interested in applying deep RL for learning controllers for articulated rigid body systems such as robots or virtual characters. As such, it's not clear what fraction of the ICLR audience will find the paper relevant to their interests. + + I would strongly encourage the authors to continue their research: there is a lot to like about the present paper, including an elegant reinforcement learning methodology, rigorous empirical results for the specific case of trajectory following, and some very nice videos and examples.",ICLR2017, +HkgGIlnleE,1544760000000.0,1545350000000.0,1,Bkx8OiRcYX,Bkx8OiRcYX,Meta-Review for Countdown Regression paper,Reject,"All reviewers agree to reject. While there were many positive points to this work, reviewers believed that it was not yet ready for acceptance.",ICLR2019,5: The area chair is absolutely certain +sXBtLuT-g6,1576800000000.0,1576800000000.0,1,H1lefTEKDS,H1lefTEKDS,Paper Decision,Reject,"This paper provide an extensive set of benchmarks for Deep Model-based RL algorithms. + +This paper contains a large number of algorithms, environments, and empirical results. The reviewers all recognized the need for such a study to provide some clarity, insights, and common standards. The reviewers we concerned about several aspects of the implementation of the effort. (1) All the performance is based on 4 runs, smoothed curves, and default errors (often extensively overlapping). The paper cites Henderson et al, and yet does not follow the advice laid out therein. (2) The results were fairly inconclusive---perhaps to be expected---we didn't learn much (more on this below). (3) The paper has communication issues. + +The overall approach taken was a bit perplexing. Some algorithms we given access to the dynamics. The reward functions were converted to diff. forms, and early stopping in a domain specific way was employed. This all seems like simplifying the problem in different ways so that some methods can be competitive, but it is not at all clear why. If we take the typical full rl problem and limit domain knowledge, many of these approaches cannot be applied and others will fail. Those are the results we want. One could actually view these choices are unfair to more general algorithms---algorithms that need diff rewards pay no price for this assumption. This also leads to funny things, for example, like using position as the reward in mountain car (totally non-standard, and invalid without discounting). The paper claims a method can solve MC, but that is unclear from the graph. The paper motivates the entire enterprise based on the claimed lack of standardization in the literature, but then proceeds to redefine classic control tasks with little discussion or explanation. + +The paper has communication issues. For example, all the domains are use continuous actions (and the others in the response highlight that is their main focus), but this is never stated in the paper. The paper refers to and varies ""environment length"", but this was not defined in the paper and has no obvious meaning. The tasks are presumably discounted but the the value of gamma is not specified anywhere in the paper (could be there, but I searched for a while). Pages of parameter settings in the appendix with many not discussed or their ranges justified. + +This paper is ambitious, but I urge the authors to perhaps limit the scope and do less, and consider a slightly broader audience in both the writing and experiment design. +",ICLR2020, +S11m2f8_e,1486400000000.0,1486400000000.0,1,Hk6a8N5xe,Hk6a8N5xe,ICLR committee final decision,Reject,"Reviewers found this paper clear to read, but leaned negative on in terms of impact and originality of the work. Main complaint is that the paper is neither significantly novel in terms of modeling (pointing to Cheng & Lapata), nor significantly more performative on this task (""only slightly better""). One reviewer also has a side complaint that the task itself is also somewhat simplistic and simplified, and suggests other tasks. This comment is perhaps harsh, but reflects a mandate for revisiting ""old"" problems to provide significant improvements in accuracy or novel modeling.",ICLR2017, +HkeLkrczeE,1544890000000.0,1545350000000.0,1,r1erRoCqtX,r1erRoCqtX,Not ready for publication ICLR,Reject,"Following the unanimous vote of the four submitted reviews, this paper is not ready for publication at ICLR. Among other concerns raised, the experiments need significant work.",ICLR2019,5: The area chair is absolutely certain +#NAME?,1610040000000.0,1610470000000.0,1,bIQF55zCpWf,bIQF55zCpWf,Final Decision,Reject,"This paper proposes a novel way (Pani) that constructs image patch-level graphs and then linearly interpolates the patch-level features. The authors show how this can be used in Virtual Adversarial Training (PaniVAT) and Mixup/MixMatch (Pani Mixup). The method is shown to improve classification compared to standard VAT and related techniques on CIFAR-10 (low data setting), as well as outperform Mixup on CIFAR-10/CIFAR-100/TinyImageNet (standard setting, multiple different architectures) with and without data augmentation. + +Reviewer 4 liked that the method was simple, but was not convinced of its effectiveness because of the baselines that were chosen. Specifically they thought that FixMatch was a stronger baseline than MixMatch. The authors said that Pani is complementary to FixMatch and similar improvement could be expected when applying Pani to FixMatch instead of MixMatch. + +Reviewer 2 appreciated that the work was “important and interesting” and noted that the experiments showed that Pani improved existing algorithms. They were concerned with lack of motivation and lack of theoretical guarantees. The authors clarified motivation in their response to the reviewer but, understandably, were unable to provide any theoretical analysis. + +Reviewer 1 expressed disappointment with the writing and understandability of the paper. I read the paper myself and I agree. They posed several clarifying questions to the authors, to which the authors responded. I note that the reviewer could not find the appendix, but it was attached separately as supplementary material. + +Reviewer 3 wrote a very short review and stated that they are not familiar with the topic of the paper. With three other full reviews, I have discounted R3’s review because of their extremely low confidence. They also asked a couple of clarifying questions, to which the authors responded. I found the authors’ response satisfying. + +Overall, two reviewers are not extremely excited about the paper and one reviewer thinks the work is interesting but has concerns about clarity. I think that overall it is a neat idea, but the paper could use more polishing and clarification. Compared to other borderline papers in my stack, it is not over the bar. It could get there with further work. I hope the authors continue to improve the paper and re-submit it in the near future.",ICLR2021, +Yo03K0Usn0Y,1610040000000.0,1610470000000.0,1,PxTIG12RRHS,PxTIG12RRHS,Final Decision,Accept (Oral),All reviewers agree that this is a well-written and interesting paper that will be of interest to the ICLR and broader ML community.,ICLR2021, +ofwUwWkWV9T,1610040000000.0,1610470000000.0,1,G67PtYbCImX,G67PtYbCImX,Final Decision,Reject,"The paper proposed an active search algorithm for efficiently identifying rare concepts among heavily imbalanced datasets. Reviewers find the paper very well-motivated and addressing an important real-world challenge in active learning. All reviewers appreciate the extensive demonstration of the effectiveness of the proposed algorithm on real-world tasks, in particular in industrial settings where the scale of problems goes far beyond common academic datasets. + +In the meantime, there are shared concerns among several reviewers in the technical depth of the proposed algorithm. Although the authors provided intuitive explanations of the nearest-neighbor based approach, the results reported are restricted mostly to several final performance metrics on the search performance. As a purely empirical work, the paper would benefit from more fine-grained experimental analyses and ablation studies (e.g., by breaking down to analyses at intermediate levels). +",ICLR2021, +Z8YcUVf3ZA8,1610040000000.0,1610470000000.0,1,gJYlaqL8i8,gJYlaqL8i8,Final Decision,Accept (Poster),"All reviewers agree that this paper is worth publishing. It investigates a novel idea on how to adaptively prioritise experiences from replay based on relative (within-batch) importance. The empirical investigation is thorough, and while the performance improvements are not stunning, the benefit is surprisingly consistent across many environments.",ICLR2021, +HuuG-XJWZPH,1610040000000.0,1610470000000.0,1,MDX3F0qAfm3,MDX3F0qAfm3,Final Decision,Reject,"Dear authors, + +Thank you for your submission. The reviewers all appreciated the direction of research and the message that GN can be a bad measure of generalization. That said, they all shared concerns regarding the strength of the conclusions that can be drawn from your work. + +I encourage you to address their comments and submit a revised version to a later conference. + +--------------------------------- +Reviewer 1 wanted to update their review but couldn't so here is the update: + + +Some more details on my original concerns + +Thank you for your detailed responses. I wanted to add more details to the ones not discussed by other reviewers. + +- Regarding the speed of computing the gradient norm, I still don't agree that the computation cost is high. Figure 6 in the Backpack paper shows the cost of computing individual gradients at most 4x the cost of a single backprop not 100-1000x. In reference [2] that I gave, there is also a cheaper approximation discussed with computational costs detailed in Appendix B. As long as the computation of gradient norm is comparable with the cost of a single back-prop it should be cheap enough to run all your experiments. + +- Regarding the conclusions in the paper. Thank you for giving more details. Adding those explanations to the paper would help. I personally missed some of those takeaway messages. + +Overall, I strongly recommend either strengthening the link between GN and AGN or using better approximations. As well as better discussing the conclusions. Of course in addition to the suggestions by other reviewers.",ICLR2021, +ryxTXJielN,1544760000000.0,1545350000000.0,1,H1gTEj09FX,H1gTEj09FX,Provably stable rotation equivariant networks,Accept (Poster),"This paper builds on the recent DCFNet (Decomposed Convolutional Filters) architecture to incorporate rotation equivariance while preserving stability. The core idea is to decompose the trainable filters into a steerable representation and learn over a subset of the coefficients of that representation. +Reviewers all agreed that this is a solid contribution that advances research into group equivariant CNNs, bringing efficiency gains and stability guarantees, albeit these appear to be incremental with respect to the techniques developed in the DCFNet work. In summary, the AC believes this to be a valuable contribution and therefore recommends acceptance. ",ICLR2019,5: The area chair is absolutely certain +SJQnmJprz,1517250000000.0,1517260000000.0,218,SyELrEeAb,SyELrEeAb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The reviewers agree that the work is high quality, clear, original, and could be significant. + +Despite this, the scores are borderline. The reason is due to rough agreement that the empirical evaluations are not quite there yet. In particular, two reviewers agree that, in the synthetic experiments, the method is evaluated on data that is an order of magnitude too easy and quite far from the nature of real data, which has a much lower signal to noise ratio. + +However, the authors have addressed the majority of the concerns and there is little doubt that the authors are capable of carrying out this new experiment and reporting its results. Even if the results are surprising, they should shed light on what seems to be an interesting new approach.",ICLR2018, +01LDhFLNEBsd,1642700000000.0,1642700000000.0,1,aM7l2S2s5pk,aM7l2S2s5pk,Paper Decision,Reject,"It seems that the reviewers reached out a consensus that the paper is not ready for publication at ICLR. The reviewers raised concerns including “The empirical observations are not supported by theoretical analysis” , “The proposed algorithm is a simple modification to an existing algorithm”, concerns with “with the novelty of the paper”, “The message of the paper is not new. “ Please see the reviews for more detailed discussions about the paper.",ICLR2022, +58EL1gkylom,1610040000000.0,1610470000000.0,1,dgtpE6gKjHn,dgtpE6gKjHn,Final Decision,Accept (Poster),"This is a well written paper with good experimentation. The paper builds on the work of FedDF and does ablation studies to demonstrate its improvements. The key original idea is the use of a common pool of unlabeled data which is used in transmitting partial results between local and global servers. The results seem pretty good. + +From a practical viewpoint, the unlabelled common data will, in most cases, need to be generated/artificial data since it will need to be public (to the other servers at least). This option should be tested to demonstrate feasability. + +AnonReviewer2 was concerned about whether it was fair to provide additional unlabelled data. The authors tested this out and showed it was OK. Regardless, the different servers could easily generate artificial data for this purpose. AnonReviewer1 had a number of issues which the authors largely addressed. The other two reviewers appreciated the paper. All reviewers gave constructive suggestions. +",ICLR2021, +BJg8w-elxV,1544710000000.0,1545350000000.0,1,BJlSHsAcK7,BJlSHsAcK7,meta-review,Reject,"The authors propose an approach for continual learning of a sequence of tasks which augments the network with task-specific neurons which encode 'adversarial subspaces' and prevent interference and forgetting when new tasks are being learnt. The approach is novel and seems to work relatively well on a simple sequence of MNIST or CIFAR10 classes, and has certain advantages, such as not requiring any stored data. However, the reviewers agreed that the presentation of the method is quite confusing and that the paper does not provide adequate intuition, visualisation, or explanation of the claim that they are preventing forgetting through the intersection of adversarial subspaces. Moreover, there was a concern that the baselines were not strong enough to validate the approach.",ICLR2019,5: The area chair is absolutely certain +dQwIjMh1wH,1576800000000.0,1576800000000.0,1,SJeQi1HKDH,SJeQi1HKDH,Paper Decision,Reject,"The paper proposes a mechanism for obtaining diverse policies for solving a task by posing it as a multi-agent problem, and incentivizing the agents to be different from each other via maximizing total variation. + +The reviewers agreed that this is an interesting idea, but had issues with the placement and exact motivations -- precisely what kind of diversity is the work after, why, and what accordingly related approaches does it need to be compared to. +Some reviewers also found the technical and exposition clarity to be lacking. + +Given the consensus, I recommend rejection at this time, but encourage the authors to take the reviewers' feedback into account and resubmit to another venue.",ICLR2020, +16M6aR1OiRD,1610040000000.0,1610470000000.0,1,LucJxySuJcE,LucJxySuJcE,Final Decision,Accept (Poster),"This paper proposed an ensemble of diverse models as a mechanism to protect models from theft. +The idea is quite novel. There are some concerns regarding the robustness of the hashing function (that I share), however not every paper has to be perfect, especially when it introduces a novel setup. + +AC",ICLR2021, +#NAME?,1576800000000.0,1576800000000.0,1,r1xjgxBFPB,r1xjgxBFPB,Paper Decision,Reject,"This work tackles the problem of catastrophic forgetting by using Gaussian processes to identify ""memory samples"" to regularize learning. + +Although the approach seems promising and well-motivated, the reviewers ultimately felt that some claims, such as scalability, need stronger justifications. These justifications could come, for example, from further experiments, including ablation studies to gain insights. Making the paper more convincing in this way is particularly desirable since the directions taken by this paper largely overlap with recent literature (as argued by reviewers). +",ICLR2020, +bcvLySjXUeH,1610040000000.0,1610470000000.0,1,cQzf26aA3vM,cQzf26aA3vM,Final Decision,Reject,"This paper proposes a benchmark suite of offline model-based optimization problems. This benchmark includes diverse and realistic tasks derived from real-world problems in biology, material science, and robotics contains a wide variety of domains, and it covers both continuous and discrete, low and high dimensional design spaces. The authors provide a comprehensive evaluation of existing methods under identical assumptions and get several interesting takeaways from the results. They found there exists surprising efficacy of simple baselines such as naive gradient ascent, which suggests the need for careful tuning and standardization of methods in this area, and provides a test bed for algorithms that try to solve this challenge. However, most reviewers agreed that a more in-depth analysis and insightful explorations for the RL experiment results will help readers understand why their method has superiority even without trajectory data, and that the paper needs another revision before being accepted. Therefore, I recommend rejection although all reviewers agreed that the tasks is very interesting and a good start.",ICLR2021, +fwBV_Ae_Pwg,1642700000000.0,1642700000000.0,1,QvTH9nN2Io,QvTH9nN2Io,Paper Decision,Reject,"The paper proposes a sampling technique for unnormalized distributions. The main idea is to gradually transform particles by following the gradient flow of the relative entropy in the Wasserstein space of probability distributions. The paper tackles an important problem and provides an interesting new perspective. However, even putting aside the concerns on the theoretical analysis raised by the reviewers, the experimental evaluations does not seem sufficient to demonstrate the benefits of the proposed approach.",ICLR2022, +Qd-quJXoyxo,1610040000000.0,1610470000000.0,1,UFJOP5w0kV,UFJOP5w0kV,Final Decision,Reject,"From the positive side the problem addressed by the paper could be of potential interest in the case there is noise in the features associated to each node of the graph. The paper is mostly well written and clear. The proposed approach is based on solid mathematical grounds. + +On the other hand there are concerns about: + +i) motivation: it is not clear how significant the proposed approach is since the authors were not able to clearly highlight the advantages with respect to the standard approach where already the weight matrix (via learning) can play the role of a low-pass filter for node’s features. Maybe the main advantage is given by the fact that the network does not have to learn a low-pass filter, however this needs a better clarification; + +ii) suggested approach: the authors are using an approach that seems to be more complex with respect to simpler ones already proposed in literature and not mentioned in the paper. In addition to that, the simpler approaches have convergence guarantees that have not been proved for the proposed approach; + +iii) significance of the experimental results: the obtained experimental results are obtained by using a model with more parameters with respect to the baselines. Comparisons versus baselines with a similar number of parameters are necessary to have a fair assessment of the merits of the proposed approach. +",ICLR2021, +SkNTMJ6rf,1517250000000.0,1517260000000.0,29,SJaP_-xAb,SJaP_-xAb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"In this paper the authors show how to allow deep neural network training on logged contextual bandit feedback. The newly introduced framework comprises a new kind of output layer and an associated training procedure. This is a solid piece of work and a significant contribution to the literature, opening up the way for applications of deep neural networks when losses based on manual feedback and labels is not possible. ",ICLR2018, +u6Z6Nc1c5nS,1610040000000.0,1610470000000.0,1,V69LGwJ0lIN,V69LGwJ0lIN,Final Decision,Accept (Poster),"This paper presents an interesting mix of new theoretical and empirical results showing how learning temporally extended primitive behaviors can help improve offline (batch) RL. + +Although 2/3 reviewers initially raised concerns regarding the motivation of the approach and some of the choices that were made, the authors did an excellent job at addressing these concerns in detail, and there is now a consensus towards acceptance. + +I consider that this work is a meaningful contribution towards better offline RL, which is definitely a very important use case in practice. The authors have given convincing explanations to motivate their approach, and made several improvements to the paper. As a result, I am recommending it for acceptance, as a poster.",ICLR2021, +6VtbHGrT5yS,1642700000000.0,1642700000000.0,1,dS3AxHZkrZT,dS3AxHZkrZT,Paper Decision,Reject,"This paper proposes a method that combines Bad-GAN and Good-GAN, in which Good-GAN learns to generate the anomalies while Bad-GAN reguralizes the anomaly pseudo anomalies at the boundary of inlier distribution. In addition, a new orthogonal loss is proposed to regularize the generation of anomaly samples to be distributed evenly at the periphery of the training data. The proposed method is new and shows some improvement over existing methods. + +However, there are some detailed technical concerns raised by reviewers. Some of the concerns still remain unresolved after the discussion. 1) The proposed method lacks a principled way to select hyperparameters. 2) The experimental setting is a bit simple to verify the effectiveness of the proposed method in challenging real world applications. Especially, there is no theoretical guarantee of the proposed method, empirical evaluation is the only way to show the effectiveness of the proposed method. 3) The overall performance improvement is not very significant compared to existing methods. For example, the performance is very close to a method F-AnoGAN published in 2019. Addressing the concerns needs a significant amount of work. Thus, I do not recommend acceptance of this paper.",ICLR2022, +HJlLoGdEyN,1543960000000.0,1545350000000.0,1,SJgiNo0cKX,SJgiNo0cKX,decision,Reject,"As the reviewers point out, the paper is below the acceptance standard of ICLR due to low novelty, unclear presentation, and lack of experimental comparison against the state-of-the-art baselines.",ICLR2019,5: The area chair is absolutely certain +H1YiXyTSf,1517250000000.0,1517260000000.0,210,rkLyJl-0-,rkLyJl-0-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Pros: ++ Clearly written paper. ++ Easily implemented algorithm that appears to have excellent scaling properties and can even improve on validation error in some cases. ++ Thorough evaluation against the state of the art. + +Cons: +- No theoretical guarantees for the algorithm. + +This paper belongs in ICLR if there is enough space. +",ICLR2018, +KA65jj-wX,1576800000000.0,1576800000000.0,1,BJgnXpVYwS,BJgnXpVYwS,Paper Decision,Accept (Talk),"Gradient clipping is increasingly popular and it's nice to see a paper theoretically exploring its nice performance. All reviewers appreciated the work and the results. + +Please make sure to incorporate all of their comments for the final version.",ICLR2020, +4jCm7kDu7z,1642700000000.0,1642700000000.0,1,eo1barn2Xmd,eo1barn2Xmd,Paper Decision,Reject,"Although the reviewers acknowledge that the paper is well-written and easy to follow, they found that the contributions of the paper are not enough to be accepted at ICLR. Some concerns from the reviewers are as follows: + +1. Assumption 3 is very strong and uncommon. It is not easy to verified even for over-parameterized setting. +2. Both the theoretical and experimental results are not sufficient. No improvement in theoretical results compared to the previous work. Moreover, the performance of the method is no better than the baselines, which are themselves much weaker than state-of-the-art results. +3. Motivation for small-batch training, advantages over K-FAC, the practicality of SLIM-QN, and novelty compared to L-BFGS are questionable. +4. The method is essentially LBFGS with momentum and damping of the hessian, hence its novelty is questionable. +5. The authors emphasize that ""we are trying to design a practical QN method with light compute/memory cost, especially when applied to large-scale NNs"". Any method that has 20-40 times as much memory requirement as SGD cannot be said to have light memory cost. + +Based on the above concerns, the paper is not ready for the publication at this moment. The authors should consider to improve the paper by addressing the reviewers' comments and implementing their suggestions and resubmit this paper in the future venues.",ICLR2022, +2KwPslB6Wrm-,1642700000000.0,1642700000000.0,1,yztpblfGkZ-,yztpblfGkZ-,Paper Decision,Reject,"The paper proposes a graph convolution operator (BankGCN) to be used in graph neural networks. The reviewers mainly raised concerns about the limited of novelty in the light of numerous previous works that are similar or address similar problems as well as lacking evaluation. While the rebuttal addressed some of the concerns, the overall impression is that the paper is not of sufficient methodological or experimental significance for the conference.",ICLR2022, +uMlmY7KROvy0,1642700000000.0,1642700000000.0,1,hm2tNDdgaFK,hm2tNDdgaFK,Paper Decision,Accept (Poster),"This work presents ChIRo, a method that incorporates 3D torsion angles of a molecular conformer to specifically handle chirality. Specifically ChIRo uses trigonometric functions to encode the torsion angles, which are invariant to bond torsion but sensitive to chirality, thus capable of distinguishing between enantiomers. Overall, although not groundbreaking, we found the idea presented in the paper to be sufficiently novel and its ability to handle chirality to be of significant practical values.",ICLR2022, +YidG6-w3-Y,1610040000000.0,1610470000000.0,1,trPMYEn1FCX,trPMYEn1FCX,Final Decision,Reject,"This paper proposes a method for out-of-distribution modeling and evaluation in the human motion prediction task. Paper was reviewed by four expert reviewers who identified the following pros and cons. + +> Pros: +- New benchmark for testing out of distribution performance [R1] +- Compelling performance with respect to the baselines [R1,R4] +- Paper is well written and easy to follow [R2] +- Generative model in the context of out-of-distribution modeling of human motion is novel [R1,R2,R4] + +> Cons: +- Lack of support for interpretability claim [R1] +- Validity and usefulness of the metric [R1] +- Lack of ""effectiveness"" of the proposed approach [R2,R4] +- Technical contributions are not significant [R3,R4] +- Experimental validation lacks comparisons to other state-of-the-art in motion prediction methods [R3] +- Lack of evaluation on additional datasets and for the main task [R4] + +Authors tried to address the comments in the rebuttal, but largely unconvincingly to the reviewers. On balance, reviewers felt that negatives outweighed the positives and unanimously suggest rejection. AC concurs and sees no reason to overturn this consensus. +",ICLR2021, +#NAME?,1576800000000.0,1576800000000.0,1,BklEF3VFPB,BklEF3VFPB,Paper Decision,Reject,"This paper proposes max-margin domain adversarial training with an adversarial reconstruction network that stabilizes the gradient by replacing the domain classifier. + +Reviewers and AC think that the method is interesting and motivation is reasonable. Concerns were raised regarding weak experimental results in the diversity of datasets and the comparison to state-of-the-art methods. The paper needs to show how the method works with respect to stability and interpretability. The paper should also clearly relate the contrastive loss for reconstruction to previous work, given that both the loss and the reconstruction idea have been extensively explored for DA. Finally, the theoretical analysis is shallow and the gap between the theory and the algorithm needs to be closed. + +Overall this is a borderline paper. Considering the bar of ICLR and limited quota, I recommend rejection.",ICLR2020, +MTV8kP0PDZY,1610040000000.0,1610470000000.0,1,EKV158tSfwv,EKV158tSfwv,Final Decision,Accept (Poster),"This paper presents a new benchmark for evaluating continual learning(CL) algorithms on transferability and scalability. It also introduces a data-driven prior to reduce the architecture searching space. Experiments show the new benchmark helps to analyze the properties of CL algorithms and the proposed algorithm performs better than baselines. + +The reviewers raised concerns about evaluation metric, weak baselines, and limited experimental cases for evaluating transferability. The authors added more experiments with stronger baselines and revised the paper based on the reviewers' suggestions. However, the authors also admit that how to evaluate transferability is still an open question. + +Despite the concerns, the reviewers generally agreed that the paper is well written, +and the new benchmark is an important contribution for evaluating continual learning algorithms on transferability and scalability. Hence it makes a worthwhile contribution to ICLR and I'm recommending acceptance of the paper.",ICLR2021, +5Buy44OZ_2,1576800000000.0,1576800000000.0,1,Hye00pVtPS,Hye00pVtPS,Paper Decision,Reject,"This manuscript proposes a strategy for fitting predictive models on data separated across nodes, with respect to both samples and features. + +The reviewers and AC agree that the problem studied is timely and interesting, and were impressed by the size and scope of the evaluation dataset (particularly for a medical application). However, reviewers were unconvinced about the novelty and clarity of the conceptual and empirical results. On the conceptual end, the AC also suggests that the authors look into closely related work on split learning (https://splitlearning.github.io/) which has also been applied to medical data settings.",ICLR2020, +jPaOX8yIXoZ,1642700000000.0,1642700000000.0,1,LQnyIk5dUA,LQnyIk5dUA,Paper Decision,Reject,"There are many discussions among the reviewers for this paper and eventually none of the reviewers (including the one who gave most positive score) would like to support the publication of this paper. + +Some concerns from the reviewers are as follows: +1. Missing the discussion on storage cost. +2. The improvement is limited. $G_0$ must be small and independent of $n$, hence it is not clear if it is possible to give a fair comparison between the current complexity and previous best complexity. +3. Missing the discussion on the case when $n \leq \mathcal{O}(\varepsilon^{-4})$ of the state-the-art results. +4. For the complexity results in terms of $\varepsilon$, it requires $\varepsilon$ to be arbitrarily small. The authors should also discuss this point for comparing with their result. +5. Some other statements in the papers are overclaimed. + +Please take the comments and suggestions from the reviewers carefully to revise the paper for the future venues since they raised valid points.",ICLR2022, +5O7SXjEiRsK,1610040000000.0,1610470000000.0,1,YTWGvpFOQD-,YTWGvpFOQD-,Final Decision,Accept (Spotlight),"This paper presents a very interesting investigation. While deep neural networks are typically best in non-private settings, the authors show that linear models with handcrafted features (ScatterNets) perform better in certain settings of the privacy parameter. The reviewers all found this to be important and insightful, with a thorough investigation, and I tend to agree, recommending acceptance.",ICLR2021, +8ZZnloLfoSv,1610040000000.0,1610470000000.0,1,Uh0T_Q0pg7r,Uh0T_Q0pg7r,Final Decision,Reject,"This paper proposes an approach for active learning in CNNs. The method computes the expected reduction in the predictive variance across a representative set of points and selects the next data point to be queried from the same set. + +Pros: +- The method is rather simple and seems practical. +- The paper is generally well-written. + +Cons: +- The novelty of the paper is limited, as it essentially applies a known approach to CNNs. +- The performance gains presented in experiments seem rather mild, and may not justify using this method.",ICLR2021, +rkpzrkpSM,1517250000000.0,1517260000000.0,525,r1drp-WCZ,r1drp-WCZ,ICLR 2018 Conference Acceptance Decision,Reject,Thank you for submitting you paper to ICLR. The consensus from the reviewers is that this is not quite ready for publication. The work is related to (although different from) Gu et al Neural Sequential Monte Carlo NIPS2015 and it would be useful to point this out in the related work section.,ICLR2018, +86O8hhWCiqc,1642700000000.0,1642700000000.0,1,Q5uh1Nvv5dm,Q5uh1Nvv5dm,Paper Decision,Accept (Poster),"Thanks for your submission to ICLR! + +This paper presents a novel way to combine domain adaptation with semi-supervised learning. The reviewers were, on the whole, quite happy with the paper. On the positive side, the results are very extensive and impressive, it's a clever way to combine domain adaptation and semi-supervised learning, and it's a fairly general approach in that it works in several settings (e.g., unsupervised vs semi-supervised domain adaptation). On the negative side, the approach itself is somewhat limited technically. + +After discussion, the one somewhat negative reviewer agreed that the paper has sufficient merit and should be accepted; thus, everyone was ultimately in agreement. I also read this paper carefully and personally find it very interesting and promising, so I am happy to recommend acceptance. It seems to give state of the art performance in several cases, and could possibly lead to more research down the road on methods to combine adaptation techniques with SSL.",ICLR2022, +2Gn_k1idq4,1576800000000.0,1576800000000.0,1,BkgGJlBFPS,BkgGJlBFPS,Paper Decision,Reject,"The paper presents an unsupervised method for graph representation, building upon Loukas' method for generating a sequence of gradually coarsened graphs. The contribution is an ""encoder-decoder"" architecture trained by variational inference, where the encoder produces the embedding of the nodes in the next graph of the sequence, and the decoder produces the structure of the next graph. + +One important merit of the approach is that this unsupervised representation can be used effectively for supervised learning, with results quite competitive to the state of the art. + +However the reviewers were unconvinced by the novelty and positioning of the approach. The point of whether the approach should be viewed as variational Bayesian, or simply variational approximation was much debated between the reviewers and the authors. + +The area chair encourages the authors to pursue this very promising research, and to clarify the paper; perhaps the use of ""encoder-decoder"" generated too much misunderstanding. +Another graph NN paper you might be interested in is ""Edge Contraction Pooling for Graph NNs"", by Frederik Diehl. +",ICLR2020, +WifaP0KeYP,1610040000000.0,1610470000000.0,1,67q9f8gChCF,67q9f8gChCF,Final Decision,Reject,"The authors introduce vPERL, a model that generates an intrinsic reward for imitation learning. vPERL is trained on demonstrations to minimise a variational objective that matches a posterior formed by ""action backtracking"" and a forward model, with the intrinsic reward coming from the reward map. The authors might be interested in related work on few shot imitation learning: e.g., ""One shot imitation learning"", Duan et al, 2017, ""Watch, try learn: meta-learning from demonstrations and rewards"", Zhou et al 2019. As all reviewers pointed out, and I can confirm, the paper is quite tricky to understand in its present form, and would very much benefit the writing being re-visited to more clearly express the ideas within (in particular, section 3, which is the core of the contributions). +",ICLR2021, +DybSK5aelg,1576800000000.0,1576800000000.0,1,rJxbJeHFPS,rJxbJeHFPS,Paper Decision,Accept (Spotlight),"This paper proposes a framework which qualifies how well given neural architectures can perform on reasoning tasks. From this, they show a number of interesting empirical results, including the ability of graph neural network architectures for learn dynamic programming. + +This substantial theoretical and empirical study impressed the reviewers, who strongly lean towards acceptance. My view is that this is exactly the sort of work we should be show-casing at the conference, both in terms of focus, and of quality. I am happy to recommend this for acceptance.",ICLR2020, +rE8VRHGt_c,1576800000000.0,1576800000000.0,1,HJg_ECEKDr,HJg_ECEKDr,Paper Decision,Reject,"Overview: +This paper introduces a method to distill a large dataset into a smaller one that allows for faster training. The main application of this technique being studied is neural architecture search, which can be sped up by quickly evaluating architectures on the generated data rather than slowly evaluating them on the original data. + +Summary of discussion: +During the discussion period, the authors appear to have updated the paper quite a bit, leading to the reviewers feeling more positive about it now than in the beginning. In particular, in the beginning, it appears to have been unclear that the distillation is merely used as a speedup trick, not to generate additional information out of thin air. The reviewers' scores left the paper below the decision boundary, but closely enough so that I read it myself. + +My own judgement: +I like the idea, which I find very novel. However, I have to push back on the authors' claims about their good performance in NAS. This has several reasons: + +1. In contrast to what is claimed by the authors, the comparison to graph hypernetworks (Zhang et al) is not fair, since the authors used a different protocol: Zhang et al sampled 800 networks and reported the performance (mean +/- std) of the 10 judged to be best by the hypernetwork. In contrast, the authors of the current paper sampled 1000 networks and reported the performance of the single one judged to be best. They repeated this procedure 5 times to get mean +/- std. The best architecture of 1000 is of course more likely to be strong than the average of the top 10 of 800. + +2. The comparison to random search with weight sharing (here: 3.92% error) does not appear fair. The cited paper in Table 1 is *not* the paper introducing random search + weight sharing, but the neural architecture optimization paper. The original one reported an error of 2.85% +/- 0.08% with 4.3M params. That paper also has the full source code available, so the authors could have performed a true apples-to-apples comparison. + +3. The authors' method requires an additional (one-time) cost for actually creating the 'fake' training data, so their runtimes should be increased by the 8h required for that. + +4. The fact that the authors achieve 2.42% error doesn't mean much; that result is just based on scaling the network up to 100M params. (The network obtained by random search also achieves 2.51%.) + +As it stands, I cannot judge whether the authors' approach yields strong performance for NAS. In order to allow that conclusion, the authors would have to compare to another method based on the same underlying code base and experimental protocol. Also, the authors do not make code available at this time. Their method has a lot of bells and whistles, and I do not expect that I could reproduce it. They promise code, but it is unclear what this would include: the generated training data, code for training the networks, code for the meta-approach, etc? This would have been much easier to judge had the authors made the code available in anonymized fashion during the review. + +Because of these reasons, in terms of making progress on NAS, the paper does not quite clear the bar for me. The authors also evaluated their method in several other scenarios, including reinforcement learning. These results appear to be very promising, but largely preliminary due to lack of time in the rebuttal phase. + +Recommendation: +The paper is very novel and the results appear very promising, but they are also somewhat preliminary. The reviewers' scores leave the paper just below the acceptance threshold and my own borderline judgement is not positive enough to overrule this. I believe that some more time, and one more iteration of reorganization and review, would allow this paper to ripen into a very strong paper. For a resubmission to the next venue, I would recommend to either perform an apples-to-apples comparison for NAS or reorganize and just use NAS as one of several equally-weighted possible applications. In the current form, I believe the paper is not using its full potential.",ICLR2020, +8DykXTTd0rG,1642700000000.0,1642700000000.0,1,AAJLBoGt0XM,AAJLBoGt0XM,Paper Decision,Accept (Poster),"The reviewers all acknowledge the importance of the paper as it addressed the challenge of the insufficient data problem in conditional contrastive learning, feeling that the idea was novel, the experiments verified the effectiveness of the model well, and the paper is well written. Reviewers also raised some good questions, such as the computational complexity, comparison with Fair_InfoNCE in the experiments, and kernel ablations. These questions are well addressed in the rebuttal and the revised version. One reviewer raised the issue of similarity to [1]. After taking a close look at this paper and [1], the AC felt that the motivation and focus of this paper are quite different from [1]. The authors should incorporate all the rebuttal info into the final version. + +[1] Jean-Francois Ton et al. 2021.",ICLR2022, +HJ2erJpHf,1517250000000.0,1517260000000.0,498,ByuP8yZRb,ByuP8yZRb,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers tend to agree that the empirical results in this paper are good compared to the baselines. However, the paper in its current form is considered a bit too incremental. Some reviewers also suggested additional theory could help strengthen the paper.",ICLR2018, +KgDlu8OMy,1576800000000.0,1576800000000.0,1,Bkg75aVKDH,Bkg75aVKDH,Paper Decision,Reject,"The authors develop a new technique for training neural networks to be provably robust to adversarial attacks. The technique relies on constructing a polyhedral envelope on the feasible set of activations and using this to derive a lower bound on the maximum certified radius. By training with this as a regularizer, the authors are able to train neural networks that achieve strong provable robustness to adversarial attacks. + +The paper makes a number of interesting contributions that the reviewers appreciated. However, two of the reviewers had some concerns with the significance of the contributions made: +1) The contributions of the paper are not clearly defined relative to prior work on bound propagation (Fast-Lin/KW/CROWN). In particular, the authors simply use the linear approximation derived in these prior works to obtain a bound on the radius to be certified. The authors claim faster convergence based on this, but this does not seem like a very significant contribution. + +2) The improvements on the state of the art are marginal. + +These were discussed in detail during the rebuttal phase and the two reviewers with concerns about the paper decided to maintain their score after reading the rebuttals, as the fundamental issues above were not + +Given these concerns, I believe this paper is borderline - it has some interesting contributions, but the overall novelty on the technical side and strength of empirical results is not very high.",ICLR2020, +bmpHRbyvFpQ,1610040000000.0,1610470000000.0,1,O9bnihsFfXU,O9bnihsFfXU,Final Decision,Accept (Poster),"Knowledgeable R3 found the paper very good (8). He/she found the authors' responses very informative and that edits made the paper much stronger. R2 expressed reservations about rank collapse being the cause of degradation of performance, but also indicated his/her willingness to increase the score if the authors can convincingly respond to his/her concerns. This concerned was shared by other reviewers, and there was an extensive discussion during the discussion period. R3 and R1 found the authors' responses very convincing. Fairly confident R1 found the paper good, appreciated the discussion, and recommends the paper to be accepted. R4 found the paper marginally above the acceptance threshold, however expressing a lower confidence in his/her assessment. In summary, the article contains extensive experiments, theory, and a well motivated idea, elucidating an intriguing phenomenon and useful for designing better bootstrapping-based deep RL methods. Although the reviewers expressed some reservations in their initial reviews, there was a lively discussion with quite positive final feedback. Weighing the ratings by confidence and participation in the discussion, I am recommending the paper for acceptance. I would like to encourage the authors to make efforts in making the presentation as clear as possible, having in mind the discussion and comments from the reviewers. ",ICLR2021, +rOcr1VHdj,1576800000000.0,1576800000000.0,1,SkgbmyHFDS,SkgbmyHFDS,Paper Decision,Reject,"The authors present a metalearning-based approach to learning intrinsic rewards that improve RL performance across distributions of problems. This is essentially a more computationally efficient approach to approaches suggested by Singh (2009/10). The reviewers agreed that the core idea was good, if a bit incremental, but were also concerned about the similarity to the Singh et al. work, the simplicity of the toy domains tested, and comparison to relevant methods. The reviewers felt that the authors addressed their main concerns and significantly improved the paper; however the similarity to Singh et al. remains, and thus the concerns about incrementalism. Thus, I recommend this paper for rejection at this time.",ICLR2020, +B1H2LyprM,1517250000000.0,1517260000000.0,869,SJmAXkgCb,SJmAXkgCb,ICLR 2018 Conference Acceptance Decision,Reject,"The paper presents a technique for feature map compression at inference time. As noted by reviewers, the main concern is that the method is applied to one NN architecture (SqueezeNet), which severely limits its impact and applicability to better performing state-of-the-art models.",ICLR2018, +0PfrdSfG1O9,1642700000000.0,1642700000000.0,1,4V4TZG7i7L_,4V4TZG7i7L_,Paper Decision,Reject,"PAPER: This paper presents a multimodal auto-encoder architecture built on the premise that unimodal variations can be best generated when taking advantage of a shared latent space. This is operationalized by defining a hierarchical model with two primary levels: a shared structure space and unimodal variations (which could be multi-layer). +DISCUSSION: The reviewers and follow-up discussion brought many questions and issues. The authors submitted a significantly revised version of their paper which clarified many issues and added a few extra results. While many of the reviewers’ questions were addressed by the authors, it seems that reviewers ended up not changing significantly their review scores. One fundamental concern is if the basic assumption about the shared structure is effectively the proper way to approach such generative modeling task. The experimental for image generation did not seem to support this hypothesis. +SUMMARY: While the revised version was an improvement over the original submission, improving clarity and adding some experimental measures, the experimental results did not seem to always support the main hypothesis. Human evaluation results may help in this direction.",ICLR2022, +Hwjkubm3bQ,1610040000000.0,1610470000000.0,1,tlV90jvZbw,tlV90jvZbw,Final Decision,Accept (Poster),"This paper provides a novel theoretical analysis of epoch-wise double descent for a linear model and a two-layer non-linear model in the constant-NTK regime. Some reviewers noted that these models may be too simple to offer a full explanation for the phenomenon in state-of-the-art practical models, for which the NTK is known to change significantly. While this may be true, I believe that the detailed understanding derived in these simple settings provides an important first step and will surely be of interest to the community. I therefore recommend acceptance.",ICLR2021, +HJgshGIOg,1486400000000.0,1486400000000.0,1,HJcLcw9xg,HJcLcw9xg,ICLR committee final decision,Reject,"This paper studies the invertibility properties of deep rectified networks, and more generally the piecewise linear structure that they implicitly define. The authors introduce a 'pseudocode' to compute preimages of (generally non-invertible) half-rectified layers, and discuss potential implications of their method with manifold-type models for signal classes. + + The reviewers agreed that, while this is an interesting and important question, the paper is currently poorly organized, and leaves the reader a bit disoriented, since the analysis is incomplete. The AC thus recommends rejection of the manuscript. + + As an addendum, the AC thinks that the authors should make an effort to reorganize the paper and clearly state the contributions, and not expect the reader to find them out on their own. In this field of machine learning, I see contributions as being either (i) theoretical, in which case we expect to see theorems and proofs, (ii) algorithmical, in which case an algorithmic is presented, studied, extensively tested, and laid out in such a way that the interested reader can use it, or (iii) experimental, in which case we expect to see improved numerical performance in some dataset. Currently the paper has none of these contributions. + Also, a very related reference seems to be missing: ""Signal Recovery from Pooling Representations"", Bruna ,Szlam and Lecun, ICML'14, in which the invertibility of ReLU layers is precisely characterized (proposition 2.2).",ICLR2017, +Syh6ByTSG,1517250000000.0,1517260000000.0,673,B1i7ezW0-,B1i7ezW0-,ICLR 2018 Conference Acceptance Decision,Reject,"The paper proposes a novel approach for DNN inversion mainly targeted towards semi-supervised learning. However the semi-supervised learning results are not competitive enough. Although the authors mention in the author-response that semi-supervised learning is not the main goal of the paper, the experiments and claims of the paper are mainly targeted towards semi-supervised learning. As the approach for inversion is novel, the paper could be motivated from a different angle with appropriate supporting experiments. In its current form it's not suitable for publication. ",ICLR2018, +q47jhxjNwBb,1610040000000.0,1610470000000.0,1,vVjIW3sEc1s,vVjIW3sEc1s,Final Decision,Accept (Poster),"The paper attempts to provide a theoretical explanation for benefit of language model pretraining on downstream classification task. In this regard, the authors provide a mathematical framework which seems to indicate that the distribution of the next word, conditional on the context, can provide a strong discriminative signal for the downstream task. The reviewers found the formulation insightful, interesting, and novel. Also reviewers enjoyed reading the well written paper and appreciated its cautious in its tone. As correctly pointed out by reviewers, the proposed framework might not directly align with techniques used in practice. Applicability of the framework to other pre-training approaches is limited. Also, there are some unresolved concerns about $O(\sqrt{\epsilon})$ assumption still. Nevertheless, reviewers reached a consensus that the framework would be beneficial for the community and attract follow-up works. Thus, I recommend an acceptance to ICLR. Following reviewer suggestion, it is strongly recommended that extensions section be expanded in the revised version using the extra page.",ICLR2021, +wULlqAp37ng,1642700000000.0,1642700000000.0,1,_uCb2ynRu7Y,_uCb2ynRu7Y,Paper Decision,Accept (Poster),"This paper introduces a control-based approach to sampling. All of the reviewers found the idea interesting. There were serious concerns by some of the reviewers regarding how the paper positioned itself relative to the literature, how it designed baselines for experiments, and how it compared itself to existing methods. There was vigorous rebuttal phase. The authors submitted a slightly late revision based on a procedural misunderstanding, and I decided to incorporate their late revision. + +Based on the revision, the majority of reviewers felt that the paper was at least above the bar for acceptance and some of the more positive reviewers stood strongly by the paper. I believe that this paper is of value to the community, so I will recommend that it is accepted, but I want to be very clear about something: the authors **must** incorporate the late revision as the basis of their camera ready and I **strongly recommend** that they address the concerns of reviewer d7Mk, including but not limited to: + +- ""Our approach avoid long mixing time theoretically and is more efficient"": this claim is too strong. +- Figure 1 is not particularly informative and the authors should reconsider it. +- The presentation of Eq (2), Algorithm 1, and Algorithm 2 should be simplified. +- Section F.1 should incorporate the comments of Reviewer d7Mk. +- Citing standard references mentioned by Reviewer d7Mk. + +The reason I highlight these recommendations is that I believe they will greatly improve the quality, longevity, and impact of this paper. Slightly overselling ideas feels like a good strategy, but it is a bad long-term strategy. I believe addressing these points is in the interest of the authors.",ICLR2022, +C0cPgElgV7P,1642700000000.0,1642700000000.0,1,dEwfxt14bca,dEwfxt14bca,Paper Decision,Accept (Spotlight),"Exploration can happen at various levels of granularity and at different times during an episode, and this work performs a study of the problem of exploration (when to explore/when to switch between exploring and exploitation, at what time-scale to do so, and what signals would be good triggers to switch). The study is performed on atari games. + +Strenghts: +------------ +The study is well motivated and the manuscript is overall well written +Studies a new problem area, and proposes an initial novel method for this problem +extensive study on atari problems + +Weaknesses +-------------- +some clarity issues as pointed out by the reviewers +no illustrative task is given to give a more intuitive exposition of the ""when to explore"" problem +comparison to some extra baselines like GoExplore would have been insightful + +Rebuttal: +---------- +Most clarity issues have been addressed satisfactorily. It has been explained why some requests for extra baselines would be challenging/or not relevant enough. While the authors agree that GoExplore would be an interesting baseline, they seem to have not added it. An illustrative task was not provided. + +Summary: +------------ +All reviewers agree that this manuscript opens up and tackles a novel direction in exploration, and provides an extensive empirical study on atari games (a standard benchmark for such problem settings). While I agree with the reviewers that point out that this paper could have been made stronger by adding an illustrative task and additional baselines like GoExplore, there is a general consensus that the provided empirical study on this novel problem setting is a good contribution in itself. Because of this I recommend accept.",ICLR2022, +RISiI_bTQ6,1576800000000.0,1576800000000.0,1,HJxkvlBtwH,HJxkvlBtwH,Paper Decision,Reject,"The paper developed log abstract transformer, square abstract transformer and sigmoid-tanh abstract transformer to certifiy robustness of neural network models for audio. The work is interesting but the scope is limited. It presented a neural network certification methods for one particular type of audio classifiers that use MFCC as input features and LSTM as the neural network layers. This thus may have limited interest to the general readers. + +The paper targets to present an end-to-end solution to audio classifiers. Investigation on one particular type of audio classifier is far from sufficient. As the reviewers pointed out, there're large literature of work using raw waveform inputs systems. Also there're many state-of-the-art systems are HMM/DNN and attnetion based encoder-decoder models. In terms of neural network models, resent based models, transformer models etc are also important. A more thorough investigation/comparison would greatly enlarge the scope of this paper. ",ICLR2020, +B13rhf8de,1486400000000.0,1486400000000.0,1,Sy2fzU9gl,Sy2fzU9gl,ICLR committee final decision,Accept (Poster),"This paper proposes a modification of the variational ELBO in encourage 'disentangled' representations, and proposes a measure of disentanglement. The main idea and is presented clearly enough and explored through experiments. This whole area still seems a little bit conceptually confused, but by proposing concrete metrics and methods, this paper makes several original contributions.",ICLR2017, +HyNdBkarG,1517250000000.0,1517260000000.0,596,S1680_1Rb,S1680_1Rb,ICLR 2018 Conference Acceptance Decision,Reject,"This paper considers graph neural representations that use Cayley polynomials of the graph Laplacian as generators. These polynomials offer better frequency localization than Chebyshev polynomials. The authors illustrate the advantages of Cayleynets on several benchmarks, producing modest improvements. + +Reviewers were mixed in the assessment of this work, highlighting on the one hand the good quality of the presentation and the theoretical background, but on the other hand skeptical about the experimental section significance. In particular, some concerns were centered about the analysis of complexity of Cayley versus the existing alternatives. + +Overall, the AC believes this paper is perhaps more suited to an audience more savvy in signal processing than ICLR, which may fail to appreciate the contributions. ",ICLR2018, +B1Pq81THf,1517250000000.0,1517260000000.0,844,B1suU-bAW,B1suU-bAW,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers agree that this paper provides a sensible mechanism for producing word embeddings that exploit correlating features in the data (e.g. texts written by the same author), but point to other work doing the same thing. The lack of direct comparison in the experimental section is troublesome, although it is entirely possible the authors' were not aware of related work. Unfortunately, the lack of an author response to the reviews makes it hard to see the argument in defense of this paper, and I must recommend rejection.",ICLR2018, +S1B-EJ6Sf,1517250000000.0,1517260000000.0,291,r1VVsebAZ,r1VVsebAZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper proposes a novel application of generative adversarial networks to model neural spiking activity. Their technical contribution, SpikeGAN, generates neural spikes that accurately match the statistics of real recorded spiking behavior from a small number of neurons. + +The paper is controversial among the reviewers with a 4, a 6 and an 8. The 6 is short and finds the idea exciting but questions the utility of the proposed approach in terms of actually studying neural spiking. The 4 and 8 are both quite thorough reviews. 4 seems to mostly question the motivation of using a GAN over a MaxEnt model and demands empirical comparison to other approaches. 8 applauds the paper as a well-executed pure application paper, applying recent innovations in machine learning to an important application with some technical innovation. Overall the reviewers found the paper clear and easy to follow and agree that the application of GANs to neural spiking activity is novel. In general, I find that such high variance in scores (with thorough reviews) indicate that the paper is exciting, innovative and might stir up some interesting discussion. As such, and under the belief that ICLR is made stronger with interesting application papers, I feel inclined to accept as a poster. + +Pros: +- A novel application of GANs to neural spiking data +- Addresses an important and highly studied application area (computational neuroscience) +- Clearly written and well presented +- The approach appears to model well real neural spiking activity from salamander retina + +Cons: +- Known pitfalls of GANs aren't really addressed in the paper (mode collapse, etc.) +- The authors don't compare to state of the art models of neural spiking activity (although they compare to an accepted standard approach - MaxEnt) +- Limited technical innovation over existing methods for GANs",ICLR2018, +59XzXb_Rd1f,1610040000000.0,1610470000000.0,1,snOgiCYZgJ7,snOgiCYZgJ7,Final Decision,Accept (Poster),"Four knowledgeable referees support acceptance for the contributions, and I also recommend acceptance. There is agreement among all reviewers that this paper is about a highly relevant topic, that the model presented is technically sound and has significantly novel aspects, and that the experimental results are convincing. There were several points of criticism raised by the reviewers, concerning, for instance, further comparison experiments, the heuristic nature of masking rules, or the treatment of homologous sequences. In my opinion, however, most of these points have been addressed in a rather convincing way during the rebuttal phase. ",ICLR2021, +wvgIlW4pGzX,1610040000000.0,1610470000000.0,1,XZDeL25T12l,XZDeL25T12l,Final Decision,Reject,"The paper studies Knowledge Distillation (KD) to better understand the reasons behind the performance gap between student and teacher models. The analysis is done by conducting exploratory experiments. The paper establishes that the distillation data used for training a student can play a critical role in the performance gap apart from the model capacity. Building on this idea, the authors propose a new approach to distillation, KD+, utilizing out-of-distribution data when training a student. Extensive experiments are performed to demonstrate the efficiency of KD+. Overall, the paper studies an interesting problem. The results provide a more in-depth explanation of how the distillation data and model capacity play a role in the performance gap between student and teacher models in KD. + +I want to thank the authors for providing the rebuttal and sharing their concerns about the quality of one of the reviews. The reviewers appreciated the paper's ideas; however, all the reviewers were on the fence with borderline scores. In summary, this is a borderline paper, and unfortunately, the final decision is a rejection. The reviewers have provided detailed and constructive feedback for improving the paper. In particular, the authors should incorporate the reviewers' feedback to better position the work w.r.t. the existing literature and provide clear reasoning behind the gains for KD+ in experiments. This is exciting and potentially impactful work, and we encourage the authors to incorporate the reviewers' feedback when preparing future revisions of the paper.",ICLR2021, +czI6e7XgJNL,1610040000000.0,1610470000000.0,1,bd66LuDPPFh,bd66LuDPPFh,Final Decision,Reject,"The paper considers ways to understand label smoothing methods, which are widely used in many applications. There is some theory on the performance of SGD with and without the methods of the paper, but there is s significant gap in terms of how the theory offers insight into label smoothing. There are some empirical results, but they are insufficient and there is not much description of the experimental setup. There was a diversity of reviews. But, after a discussion among reviewers, it was felt that, overall, another iteration on improving the coherence and presentation of the paper will make it much better for the community. ",ICLR2021, +AEkI9TmD7A,1610040000000.0,1610470000000.0,1,Tt1s9Oi1kCS,Tt1s9Oi1kCS,Final Decision,Reject,This paper presents an interesting idea for task-free incremental learning on the data stream. The reviewers have extensive discussions after reading all the reviews and the author's rebuttal. There are concerns raised about the presentation of the method and the justification for some parts of the model design choices. The reviewers believe that after addressing these weaknesses the work can be made stronger and may be accepted in a competitive venue. ,ICLR2021, +oVPgWgNkJc,1576800000000.0,1576800000000.0,1,H1gfFaEYDS,H1gfFaEYDS,Paper Decision,Accept (Poster),This paper proposes an novel way of expanding our VAE toolkit by tying it to adversarial robustness. It should be thus of interest to the respective communities.,ICLR2020, +ZbmJAKoT_f,1576800000000.0,1576800000000.0,1,BkePneStwH,BkePneStwH,Paper Decision,Reject,"This paper proposes a method for transferring an NLP model trained on one language a new language, without using labeled data in the new language. + +Reviewers were split on their recommendations, but the reviews collectively raised a number of concerns which, together, make me uncomfortable accepting the paper. Reviewers were not convinced by the value of the experimental setting described in the paper—at least in the experiments conducted here, the claim that the model is distinctively effective depend on ruling out a large class of models arbitrarily. it would likely be valuable to find a concrete task/dataset/language combination that more closely aligns with the motivations for this work, and to evaluate whether the proposed method is genuinely the most effective practical option in that setting. Further, the reviewers raise a number of points involving baseline implementations, language families, and other issues, that collectively make me doubt that the paper is fully sound in its current form.",ICLR2020, +BCuU1vyPTq5,1610040000000.0,1610470000000.0,1,Ec85b0tUwbA,Ec85b0tUwbA,Final Decision,Accept (Poster),"The paper introduces new methods and building blocks to improve hyperbolic neural networks, including a tighter parameterization of fully connected layers, convolution, and concatenate/split operations to define a version of hyperbolic multi-head attention. The paper is well written and relevant to the ICLR community. The proposed methods offer solid improvements over previous approaches in various aspects of constructing hyperbolic neural networks and also extends their applicability. As such, the paper provides valuable contributions to advance research in learning non-Euclidean representations and HNNs. All reviewers and the AC support acceptance for the paper's contributions. Please consider revising your paper to take feedback from reviewers after reubttal into account.",ICLR2021, +wl7TmjofHtU,1610040000000.0,1610470000000.0,1,bsRjn0RH620,bsRjn0RH620,Final Decision,Reject,"This paper proposes a contribution aiming at understanding the cause of errors in few-shot learning. The motivation is interesting but the reviewers pointed out many aspects that require more precisions and polishing in addition to the fact that the upper bound provided it rather loose. The rebuttal provided addresses some concerns, but there are still some remarks that require some clarifications en work. +Hence, I propose rejection.",ICLR2021, +_ZUDqplYWxp,1610040000000.0,1610470000000.0,1,pbUcKxmiM54,pbUcKxmiM54,Final Decision,Reject,"This paper is not suitable for publication at ICLR. The paper contains a useful message, that neural networks are not a silver bullet, and are especially not well suited to deductive problems. However, as several reviewers pointed out, the claims of the paper are undermined by the fact that it ignores a lot of relevant work on using neural networks in the context of logic reasoning. Reviewer 2 provides a particularly useful list of relevant works on the topic. ",ICLR2021, +ks7VupSbivz,1642700000000.0,1642700000000.0,1,4o1xPXaS4X,4o1xPXaS4X,Paper Decision,Reject,"The paper shows that adversarial training can be fooled to have robust test accuracy < 1% with a new type of poisoning attack ADVIN on the CIFAR-10 dataset, even though the robust training accuracy > 90%. This requires 100% poisoning rate, though the claim is that the poisoned data is 'semantically similar' to the original data. This is an interesting research direction. Questions were raised about novelty as well as whether these poisoned data could be detected. During the rebuttal phase the authors provide some evidence that with adaptive attacks detection could be evaded (as expected). The authors are encouraged to take all comments into account and update the paper as indicated in the rebuttal for any future revision.",ICLR2022, +HkluGyrKkN,1544270000000.0,1545350000000.0,1,Hkg1YiAcK7,Hkg1YiAcK7,Intersting paper that would profit from a better understanding of the underlying mechanism ,Reject,"The paper proposes a learning by teaching (LBT) framework to train an implicit generative model via an explicit one. It is shown experimentally, that the framework can help to avoid mode collapse. The reviewers commonly raised the question why this is the case, which was answered in the rebuttal by pointing to the differences between the KL- and the JS-divergence and by showing a toy problem for which the JS-divergence has local minima while the KL-divergence has not. However, it still remains unclear why this should be generally and for explicit models with insufficient capacity the case, and if the model will be scalable to larger settings, therefore the paper can not be accepted in the current form.",ICLR2019,3: The area chair is somewhat confident +r1lBColRJN,1544580000000.0,1545350000000.0,1,SJgsCjCqt7,SJgsCjCqt7,Slightly hacky but still good progress,Accept (Poster),"Strengths: +This paper develops a method for learning the structure of discrete latent variables in a VAE. The overall approach is well-explained and reasonable. + +Weaknesses: +Ultimately, this is done using the usual style of discrete relaxations, which come with tradeoffs and inconsistencies. + +Consensus: +The reviewers all agreed that the paper is above the bar.",ICLR2019,4: The area chair is confident but not absolutely certain +yBLnYArHEo,1610040000000.0,1610470000000.0,1,WW8VEE7gjx,WW8VEE7gjx,Final Decision,Reject,"Overall, there were significant concerns about the motivation and experiments in this paper, and these were thought not to merit acceptance on their own. Because of this, the reviewers started discussing the theory to see if that would justify acceptance. The reviewers were not able to find a clear advantage over existing approaches, nor sufficient motivation; also the presentation was found to be largely inaccessible. In the rebuttal there was a brief mentioning of background and possible implications, but they were hard to assess and the paper itself did not have such context nor was updated to have such context. For a future version, one recommendation could be to focus significantly more on context, motivation, and improvements over prior work. Also, making the paper more self-contained could help. ",ICLR2021, +CTe19BEjP6,1610040000000.0,1610470000000.0,1,a7gkBG1m6e,a7gkBG1m6e,Final Decision,Reject,"Thank you for your submission to ICLR. Overall the reviewers and I think that this paper presents some nice contributions to the adversarial attacks literature, demonstrating a low-sample-complexity, ""physically-realizable"" attack in a domain of clear importance and interest in machine learning. The move to considering more ""in the loop"" adversarial examples is particularly compelling, and the threat model and improvement over BO methods are both compelling here. + +The main downside of this paper, of course, is the fact that the ""physical adversarial examples"" are of course nothing of the sort: they are simulated. Rather, they are just simulated in a manner that may plausibly be slightly more amenable to real-world deployment. The authors claim that they don't carry out an evaluation on a real system because it is ""dangerous"" is a bit overly dramatic: the tests could easily be carried out in a controlled environment, and demonstration on an actual physical system (even, e.g., and RC car) would vastly improve the impact of this work. As it is, the paper is borderline, but ultimately slightly below the high bar set by ICLR publications. I would strongly encourage the authors to reconsider the inclusion of the word ""physical"" in the title, as it honestly sets expectations high for a promise that the paper cannot deliver on, or (even better) to run real experiments on even a small physical system, demonstrating the transferability there. The paper ultimately has the potential for a high impact in this field, if these issues are addressed.",ICLR2021, +SkGE71aHG,1517250000000.0,1517260000000.0,114,SyProzZAW,SyProzZAW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"All the reviewers are agree on the significance of the topic of understanding expressivity of deep networks. This paper makes good progress in analyzing the ability of deep networks to fit multivariate polynomials. They show exponential depth advantage for general sparse polynomials. + + I am very surprised that the paper misses the original contribution of Andrew Barron. He analyzes the size of the shallow neural networks needed to fit a wide class of functions including polynomials. The deep learning community likes to think that everything has been invented in the current decade. + +@article{barron1994approximation, + title={Approximation and estimation bounds for artificial neural networks}, + author={Barron, Andrew R}, + journal={Machine Learning}, + volume={14}, + number={1}, + pages={115--133}, + year={1994}, + publisher={Springer} +}",ICLR2018, +Hke_ah3ZeV,1544830000000.0,1545350000000.0,1,BkeUasA5YQ,BkeUasA5YQ,Modifies knowledge distillation by training student to match teachers intermediate representation at multiple layers.,Reject,"The authors propose a method for distilling a student network from a teacher network and while additionally constraining the intermediate representations from the student to match those of the teacher, where the student has the same width, but less depth than the teacher. The main novelty of the work is to use the intermediate representation from the teacher as an input to the student network, and the experimental comparison of the approach against previous work. + + The reviewers noted that the method is simple to implement, and the paper is clearly written and easy to follow. The reviewers raised some concerns, most notably that the authors were using validation accuracy to measure performance, and were thus potentially overfitting to the test data, and regarding the novelty of the work. Some of the criticisms were subsequently amended in the revised version where results were reported on a test set (the conclusions are as before). Overall, the scores for this paper were close to the threshold for acceptance, and while it was a tough decision, the AC ultimately felt that the overall novelty of the work was slightly below the acceptance bar.",ICLR2019,4: The area chair is confident but not absolutely certain +fv-ArfhUgTh,1610040000000.0,1610470000000.0,1,sjGBjudWib,sjGBjudWib,Final Decision,Reject,"Two reviewers recommend rejection, whereas two reviewers slightly lean towards acceptance. All reviewers agree that the paper tackles an important problem, and the proposed direction holds promise and is worth exploring. However, the reviewers raised concerns about the novelty of the proposed approach [R3,R4], the applicability of sparsification to GCN-based models [R3,R4], baseline experiments [R1,R3,R4] and the gap between the theoretical aspect of the paper and the implementation of the proposed approach [R2]. The authors engaged with the reviewers during the discussion period and succeeded in motivating the speedup gains of their method, and clarifying some of the reviewer's concerns. However, after discussion, the reviewers still think this is a borderline paper, which could be significantly strengthened by validating the applicability of the proposed sparsification to other GNNs [R1,R2,R3], and in particular, by including the suggested FastGAT-sparsified GCN experiment [R1,R3,R4]. The paper could also benefit from improving the presentation of both the analyzed approach and the practical one [R2]. I agree with reviewers' assessment and therefore must reject. However, I acknowledge that the paper does raise notable interest and I encourage the authors to consider the reviewers' suggestions in future iterations of their work.",ICLR2021, +2Bg0nLKC96,1610040000000.0,1610470000000.0,1,ZTFeSBIX9C,ZTFeSBIX9C,Final Decision,Accept (Poster),"This paper investigates knowledge distillation in the context of non-autoregressive machine translations. All reviewers are supportive of acceptance, especially after the thoughtful author responses. A well motivated and simple to implement approach that is giving good empirical results.",ICLR2021, +KcVk6kpKJ8s,1610040000000.0,1610470000000.0,1,wb3wxCObbRT,wb3wxCObbRT,Final Decision,Accept (Oral),The paper proposes a method to grow deep network architectures over the course of training. The work has been extremely well received and has clear novelty and solid experiment validation.,ICLR2021, +cqjsGeN5j5D,1642700000000.0,1642700000000.0,1,ajXWF7bVR8d,ajXWF7bVR8d,Paper Decision,Accept (Oral),"Current meta-learning algorithms suffer from the requirement of a large number of tasks in the meta-training phase, which may not be accessible in real-world environment. This paper addresses this bottleneck, introducing a cross-task interpolation in addition to the existing intra-task interpolation. The main idea is very simple, which can be viewed as an incremental adding-up to existing augmentation methods. However, the method is well supported by nice theoretical results which highlight the relation between task interpolation and the Rademacher complexity. In fact, this is not a trivial extension of existing work. Authors did a good job in the rebuttal phase, resolving most of concerns raised by reviewers, leading that two of reviewers raised their score. All reviewers agree to champion this paper. Congratulations on a nice work.",ICLR2022, +nWL6IA0Oi6,1576800000000.0,1576800000000.0,1,rklp93EtwH,rklp93EtwH,Paper Decision,Accept (Poster),"This paper proposes to deal with task heterogeneity in meta-learning by extracting cross-task relations and constructing a meta-knowledge graph, which can then quickly adapt to new tasks. The authors present a comprehensive set of experiments, which show consistent performance gains over baseline methods, on a 2D regression task and a series of few-shot classification tasks. They further conducted some ablation studies and additional analyses/visualization to aid interpretation. + +Two of the reviewers were very positive, indicating that they found the paper well-written, motivated, novel, and thorough, assessments that I also share. The authors were very responsive to reviewer comments and implemented all actionable revisions, as far as I can see. The paper looks to be in great shape. I’m therefore recommending acceptance. ",ICLR2020, +S1ekS70gxV,1544770000000.0,1545350000000.0,1,B1lxH20qtX,B1lxH20qtX,many missing details; strange physics; ,Reject,"Strengths: A co-evolution of body connectivity and its topology mimicing control policy is presented. + +Weaknesses: Reviewers found the paper to be lacking in detail. The importance of message passing in achieving the given results is clear on one example but not some others. Some reviewers had questions regarding the baseline comparisons. +The authors provided lengthy details in responses on the discussion board, but reviewers likely had limited time to fully reread the many changes that were listed. +AC: The physics in the motions shown in the video require signficant further explanation. It looks like the ball joints can directly attach themselves to the ground, and make that link stand up. Thus it seems that the robots are not underactuated and can effectively grab arbitrary points in the environment. Also it is strange to see the robot parts dynamically fly together as if attracted by a magnet. The physics needs significant further explanation. + +Points of Contention: The R2 review is positive on the paper (7), with a moderate confidence (3). +R1 contributed additional questions during the discussion, but R2 and R3 were silent. + +The AC further examined the submission (paper and video). +The reviewers and the AC are in consensus regarding +the many details that are behind the system that are still not understood. The AC is also skeptical +of the non-physical nature of the motion, or the unspecified behavior of fully-actuated contacts +with the ground. +",ICLR2019,5: The area chair is absolutely certain +SJ8ciMIug,1486400000000.0,1486400000000.0,1,r1rhWnZkg,r1rhWnZkg,ICLR committee final decision,Accept (Poster),"The program committee appreciates the authors' response to concerns raised in the reviews. While there are some concerns with the paper that the authors are strongly encouraged to address for the final version of the paper, overall, the work has contributions that are worth presenting at ICLR.",ICLR2017, +HJaMpf8dg,1486400000000.0,1486400000000.0,1,SJ8BZTjeg,SJ8BZTjeg,ICLR committee final decision,Reject,"This paper was reviewed by three experts. While they find interesting ideas in the manuscript, all three point to deficiencies (lack of significant novelty, potential problems with GAN training) and unanimously recommend rejection. I do not see a reason to overturn their recommendation.",ICLR2017, +QbFzAALiTy,1610040000000.0,1610470000000.0,1,uRuGNovS11,uRuGNovS11,Final Decision,Reject,"Reviewers have commented on the lack of novelty of the paper as it reads only as applying the variational inference framework of Blundell et al. (2015) to deep metric learning (R2 and R4). Furthermore, the paper has not properly positioned itself when compared to previous works on ""Deep variational metric learning"" and ""Deep adversarial metric learning"" (R1) and other previous literature that have studied robustness for metric learning. The argument on robustness to noisy labels needs to be expanded and better fleshed out in a future version of the paper. ",ICLR2021, +V2s3Kc0OLVe,1642700000000.0,1642700000000.0,1,KSugKcbNf9,KSugKcbNf9,Paper Decision,Accept (Poster),"This paper presents a method for using transformer models to perform approximate Bayesian inference, in the sense of approximating the posterior predictive distribution for a test example. This seems similar to doing amortized variational inference using a transformer model. The reviewers all found the paper to be clearly written, interesting, novel and compelling. Two of the reviewers found the results ""impressive"". There is some concern of over-claiming (is it really Bayesian?, are the authors making too broad statements based on very simple case studies?). The presented method is also not scalable O(n^2), so the setting is restricted to very small datasets and models. + However, the reviewers didn't seem especially concerned by this. The reviews were mixed but leaning positive (8, 6, 5) and the positive reviews are more substantial. Therefore the recommendation is to accept, but please incorporate the reviewer feedback and additional discussion about related methods (discussion below) into the camera ready.",ICLR2022, +r1492fLdx,1486400000000.0,1486400000000.0,1,SkCILwqex,SkCILwqex,ICLR committee final decision,Reject,"This paper studies the effects of modifying intermediate representations arising in deep convolutional networks, with the purpose of visualizing the role of specific neurons, and also to construct adversarial examples. The paper presents experiments on MNIST as well as faces. + + The reviewers agreed that, while this contribution presents an interesting framework, it lacks comparisons with existing methods, and the description of the method lacks sufficient rigor. In light of the discussions and the current state of the submission, the AC recommends rejection. + + Since the final scores of the reviewers might suggest otherwise, please let me explain my recommendation. + + The main contribution of this paper seems to be essentially a fast alternative to the method proposed in 'Adversarial Manipulation of Deep Representations', by Sabour et al, ICLR'16, although the lack of rigor and clarity in the presentation of section 3 makes this assessment uncertain. The most likely 'interpretation' of Eq (3) suggests that eta(x_o, x_t) = nabla_{x_o}( || f^(l)_w(x_t) - f^(l)_w(x_o) ||^2), which is simply one step of gradient descent of the method described in Sabour et al. One reviewer actually asked for clarification on this point on Dec. 26th, but the problem seems to be still present in the current manuscript. + + More generally, visualization and adversarial methods based on backpropagation of some form of distance measured in feature space towards the pixel space are not new; they can be traced back to Simoncelli & Portilla '99. + Fast approximations based on simply stopping the gradient descent after one iteration do not constitute enough novelty. + + Another instance of lack of clarity that has also been pointed out in this discussion but apparently not addressed in the final version is the so-called PASS measure. It is not defined anywhere in the text, and the authors should not expect the reader to know its definition beforehand. + + Besides these issues, the paper does not contribute to the state-of-the-art of adversarial training nor feature visualization, mostly because its experiments are limited to mnist and face datasets. Since the main contribution of the paper is empirical, more emphasis should be made to present experiments on larger, more numerous datasets.",ICLR2017, +Gi6bZIdc3Ie,1610040000000.0,1610470000000.0,1,EeeOTYhLlVm,EeeOTYhLlVm,Final Decision,Reject,"The reviewers agree that the contributions may not be relevant to the ML research community or perhaps are a poor fit for the venue, but otherwise find the work potentially useful and addressing a timely topic. Because the paper focuses on a simulation environment for existing epidemiological models, reviewers comment that the technical and methodological novelty is limited.",ICLR2021, +17_K3juMYSR,1610040000000.0,1610470000000.0,1,snaT4xewUfX,snaT4xewUfX,Final Decision,Reject,"The paper presents a stochastic variational inference method for posterior estimation in a Cox process with intensity given by the solution to a diffusion stochastic differential equation. The reviewers highlight the novelty of the approach. Some of the concerns with regards to clarity have been addressed by the authors satisfactorily. + +However, an important issue of the approach is that of estimating model parameters, which the authors do not address explicitly by simply referring to that as the task of the modeller. I believe this is an important issue and, although some of the parameters can be estimated along with the neural network parameters, this has not been shown empirically. Along a similar vein, the paper only presents results on a single real dataset (the bike-sharing dataset), which questions the applicability of the approach and no other baseline method is presented. At the very least, the authors should have provided an objective evaluation to other doubly stochastic point process models, e.g. based on Gaussian processes, where modern stochastic variational inference algorithms have been presented. +",ICLR2021, +ByfRoMI_l,1486400000000.0,1486400000000.0,1,ryhqQFKgl,ryhqQFKgl,ICLR committee final decision,Accept (Poster),"Given that all reviewers were positive aobut this paper and given the unusual application domain, we recommend to accept this paper for poster presentation at the main conference.",ICLR2017, +9MsGHGOkmP,1610040000000.0,1610470000000.0,1,mWnfMrd9JLr,mWnfMrd9JLr,Final Decision,Reject,"The paper proposes an algorithm for training flow models by minimizing the KL divergence in the latent space Z. The paper addresses an important problem in training flow models. However, some major concerns remain after the discussion among the reviewers. The scale of the experiments and the scalability of the approach appear limited in the current version of the paper. Moreover, the applicability of the current theoretical analysis to general distributions is quite limited. ",ICLR2021, +#NAME?,1576800000000.0,1576800000000.0,1,SJxHMaEtwB,SJxHMaEtwB,Paper Decision,Reject,"The paper proposes a domain-adaptive filter decomposition method via separating domain-specific and cross-domain features, towards learning invariant representations for unsupervised domain adaptation. + +Overall, this well-written paper is well motivated with a better technique for learning invariant representations using convolutional filters. Nonetheless, reviewers still have major concerns: 1) the novelty of the paper may be marginal given the significant line of recent work on learning domain-invariant representations; 2) when the label distributions differ, learning invariant representations can only lead to worse target generalizations; 3) the provided theory has an unclear connection to the presented filter decomposition method. The paper can be strengthened by further discussions on how to mitigate the aforementioned negative results. + +Hence I recommend rejection.",ICLR2020, +HkdI416SG,1517250000000.0,1517260000000.0,358,H1YynweCb,H1YynweCb,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"I tend to agree with the most positive reviewer who characterizes the work with the following statements: + +""Kronecker factorization was introduced for Convolutional networks (citation is in the paper). Soft unitary constraints also have been introduced in earlier work (citations are also in the paper). Nevertheless, showing that these two ideas work also for RNNs in combination (and seeing, e.g. the nice relationship between Kronecker factors and unitary) is a relevant contribution."" + +The most negative reviewer feels that the experimental work could have evaluated the different components explored here more clearly. For this reason the AC recommends an invitation to the workshop track.",ICLR2018, +a4RqqmjA_hU,1642700000000.0,1642700000000.0,1,Muwg-ncP_ec,Muwg-ncP_ec,Paper Decision,Reject,"This paper presents a second-order optimization algorithm for neural nets which extends LeCun's classic Lagrangian framework. The paper derives a method for computing the exact Newton step for a single training example for a multilayer perceptron. It then describes approximations that can be used to extend the method to more examples. + +The authors claim to have spotted factual errors in the reviews. However, I've looked into the issues, and I find myself agreeing with the reviewers on each of those points (or, if there are misunderstandings, they result from a lack of clarity in the paper rather than insufficient scientific computing background on the part of the reviewers). + +The authors claim to have solved a longstanding problem by giving an efficient method for calculating the stochastic Newton step (for a single training example). However, it's not clear this is very useful; as a reviewer points out, estimating the curvature with a single example can't give a very accurate estimate. Once the method is extended to batches, more approximations are required. I also agree with the reviewers that the later parts of the methods section appear a bit rushed. + +As the reviewers point out, in the experimental comparisons, the proposed method seems to underperform SGD with momentum even in terms of epochs, which is the setting where second-order methods usually shine. Other second-order optimizers (e.g. K-FAC) have been shown to outperform first-order methods in terms of both wall clock time and epochs, so epochwise improvement seems like the minimum bar for a second-order optimization paper.",ICLR2022, +BJq8hzLOl,1486400000000.0,1486400000000.0,1,S1jE5L5gl,S1jE5L5gl,ICLR committee final decision,Accept (Poster),"This paper proposes a neat general method for relaxing models with discrete softmax choices into closely-related models with continous random variables. The method is designed to work well with the reparameterization trick used in stochastic variational inference. This work is likely to have wide impact. + + Related submissions at ICLR: + ""Categorical Reparameterization with Gumbel-Softmax"" by Jang et al. contains the same core idea. ""Discrete variational autoencoders"", by Rolfe, contains an alternative relaxation for autoencoders with discrete latents, which I personally find harder to follow.",ICLR2017, +XIkTx68BZZc,1610040000000.0,1610470000000.0,1,uUAuBTcIIwq,uUAuBTcIIwq,Final Decision,Reject,"This paper aims at learning disentangled representation at different level without the supervision signal of group information. To achieve this, the proposed UG-VAE model uses both global variable $\beta$ to represent common information shared across all data, as well as a mixture of Gaussian prior for the local latent variable $p(z) = \int p(z|d)p(d)d$ where $d$ represents the assignment of the group for a particular datapoint. Experiments considered evaluation on unsupervised global factor learning, domain alignment and a downstream application task on batch classification. + +Reviewers agreed that the proposed model seems interesting and novel, however some reviewers raised clarity concerns on how to interpret the learned representation by UG-VAE. Revision has addressed this clarity issue to some extent, although some doubts from some reviewers still exists. Also reviewers raised concerns on less competitive experimental results, and the authors have updated the manuscript with improved results. + +To me the main issues of the experimental section are (1) no quantitative result is provided regarding global factor learning and domain alignment, and (2) there is no other benchmark being studied in the experimental section. In my view, at least some other VAE representation learning baselines can be included in the batch classification section in order to demonstrate the real benefit of learning global factor based representations in downstream tasks. ",ICLR2021, +rVnf6FnjN7,1576800000000.0,1576800000000.0,1,B1x6w0EtwH,B1x6w0EtwH,Paper Decision,Accept (Poster),"This paper applies reinforcement learning to text adventure games by using knowledge graphs to constrain the action space. This is an exciting problem with relatively little work performed on it. Reviews agree that this is an interesting paper, well written, with good results. There are some concerns about novelty but general agreement that the paper should be accepted. I therefore recommend acceptance.",ICLR2020, +SR0_3OKqgDx,1642700000000.0,1642700000000.0,1,ROpoUxw23oP,ROpoUxw23oP,Paper Decision,Reject,"The paper presents a gradient-based hyperparameter optimization method, wherein a differentiable reparameterization is proposed for various popular CNN hyperparameters including kernel size, number of channels and hidden layer size. + +All reviewers have pointed out the lack of novelty (such reparameterizations are standard) and lack of convincing experiments. + +The authors didn't write any rebuttal. + +Overall, there is a large consensus among the reviewers that this paper is not ready for publication at ICLR.",ICLR2022, +sPbCy9BhR4,1642700000000.0,1642700000000.0,1,nZeVKeeFYf9,nZeVKeeFYf9,Paper Decision,Accept (Poster),"This paper introduces a new method for fine-tuning large language models, which is lightweight since it only adds a small amount of parameters, while keeping the original parameters frozen. The main idea is to add a low rank matrix which is learned during fine-tuning to the original weight matrices of the model, which are frozen. The reviewers agreed that the method is simple, original and well motivated. Moreover, it compares well compared to other fine-tuning baselines, such as adaptors or full fine-tuning. For these reasons, I recommend to accept this work to the ICLR conference.",ICLR2022, +BklVKUk_eN,1545230000000.0,1545350000000.0,1,r1g1LoAcFm,r1g1LoAcFm,"Nice approach, more convincing empirical evaluation may be needed",Reject,"The paper proposes a nice approach to massively multi-label problems with rare labels which may only have a limited number of positive examples; the approach uses Bayes nets to exploit the relationships among the labels in the output layer of a neural nets. The paper is clearly written and the approach seems promising, however, the reviewers would like to see even more convincing empirical results. +",ICLR2019,4: The area chair is confident but not absolutely certain +zA4TnLNyBvh,1610040000000.0,1610470000000.0,1,5xaInvrGWp,5xaInvrGWp,Final Decision,Reject,"This paper proposes a method called Federated Bias-variance attacks (FedBVA) to generate adversarial examples for federated learning, which can be used to make the model more robust to adversarial attacks. All the reviewers found the problem and the approach very interesting. Their concerns include the following main points (please see the reviews for more details): +* The decomposition of the bias and variance can be made more rigorous +* The necessity for a shared dataset of adversarial examples makes the application a little limited +* Need fairer experimental baselines +* Vanilla federated learning is not guaranteed to preserve privacy - the authors should edit this claim in the motivation +* Need compatibility with secure aggregation approaches where the central server cannot access local updates + +The authors did do a great job of responding to the reviewers' comments. Given the interest in the problem and the novelty of the idea, I think an improved version of the paper would be quite well-received.",ICLR2021, +rklHoZp-xN,1544830000000.0,1545350000000.0,1,ByeLBj0qFQ,ByeLBj0qFQ,meta-review,Reject,"This paper was reviewed by three experts. After the author response, R2 and R3 recommend rejecting this paper citing concerns of experimental evaluation and poor quality of the manuscript. All three reviewers continue to have questions for the authors, which the authors have not responded to. The AC finds no basis for accepting this paper in this state. ",ICLR2019,4: The area chair is confident but not absolutely certain +#NAME?,1576800000000.0,1576800000000.0,1,HyeSin4FPB,HyeSin4FPB,Paper Decision,Accept (Spotlight),"The paper proposes a method to control dynamical systems described by a partial differential equations (PDE). The method uses a hierarchical predictor-corrector scheme that divides the problem into smaller and simpler temporal subproblems. They illustrate the performance of their method on 1D Burger’s PDE and 2D incompressible flow. +The reviewers are all positive about this paper and find it well-written and potentially impactful. Hence, I recommend acceptance of this paper.",ICLR2020, +k8StH3x4hg,1642700000000.0,1642700000000.0,1,fM8VzFD_2-,fM8VzFD_2-,Paper Decision,Reject,"This paper proposes a novel variational autoencoder to utilize functional connectivity (FC) features from resting state fMRI (rs-fMRI) scans in order to uncover latent nosological relationships between diverse yet related neuropsychiatric disorders. The methodology and main technical contributions are clearly articulated and explained, and the experimental results seem convincing. On the other hand, the proposed framework is somewhat limited in scope and clinical applicability, and the writing in the paper needs improvement (as pointed out by two reviewers).",ICLR2022, +EE4Ml5hZtI,1576800000000.0,1576800000000.0,1,ryxyCeHtPB,ryxyCeHtPB,Paper Decision,Accept (Poster),"This paper presents an attention-based approach to transfer faster CNNs, which tackles the problem of jointly transferring source knowledge and pruning target CNNs. + +Reviewers are unanimously positive on the paper, in terms of a well-written paper with a reasonable approach that yields strong empirical performance under the resource constraint. + +AC feels that the paper studies an important problem of making transfer learning faster for CNNs, however, the proposed model is a relatively straightforward combination of fine-tuning and filter-pruning, each having very extensive prior works. Also, AC has very critical comments for improving this paper: + +- The Attentive Feature Distillation (AFD) module is very similar to DELTA (Li et al. ICLR 2019) and L2T (Jang et al. ICML 2019), significantly weakening the novelty. The empirical evaluation should consider DELTA as baselines, e.g. AFS+DELTA. + +I accept this paper, assuming that all comments will be well addressed in the revision.",ICLR2020, +eH9T7r29yPC,1642700000000.0,1642700000000.0,1,EDeVYpT42oS,EDeVYpT42oS,Paper Decision,Accept (Spotlight),"This paper examined physics-inspired inductive biases in neural networks, in particular Hamiltonian and Lagrangian dynamics. The work separated the benefits arising from incorporating energy conservation, the symplectic bias, the coordinate systems, and second-order dynamics. Through a set of experiments, the paper showed the most important factor for improved performance in the test domains was the second-order dynamics, and not the more common explanation of energy conservation or the other factors. The increased generality of this approach was demonstrated with better predictions on Mujoco tasks that did not conserve energy. + +All reviewers liked the insights provided by the paper. They agreed that the paper clearly laid out several hypotheses and systematically tested them. The reviewers found the experiments thoughtful and the results compelling. The reviewers also pointed out several aspects of the document that could be improved, including additional formalism clarifications (reviewer nLbj), baseline algorithms (reviewer wu5x), and domains (reviewers 7KKB,SW9u). The reviewers found the author's response satisfactory but were disappointed that a revised paper was not ready to read. The reviewers want the final paper to include the modifications that were promised in the author response. + +All four reviewers indicated to accept this paper which contributes novel insights that simplify and generalize physics-inspired neural networks. The paper is therefore accepted.",ICLR2022, +e3Je48rrMKD,1610040000000.0,1610470000000.0,1,qU-eouoIyAy,qU-eouoIyAy,Final Decision,Reject,The approach proposed here have raised major concerns from multiple reviewers especially concerning the novelty and the experimental validation procedure.,ICLR2021, +tg5FJk7k16P,1642700000000.0,1642700000000.0,1,ngjR4Gw9oAp,ngjR4Gw9oAp,Paper Decision,Reject,"The paper proposes a hierarchical policy architecture with two substituent policies, ""go"" and ""stop"", and a controller mechanism for switching between them on every step (either a rule or a learned network), taking inspiration from neuroscience concepts of inhibition. Both are trained via Soft Actor Critic on the subset of states assigned to them and comparisons are made against a baseline. The use case targeted is the repurposing of pre-trained agents to new or updated environments. + +Reviewers regarded the method as sound, technically correct, and involving illustrative experiments (although perhaps picking problems too carefully adapted to the solution being presented), and were positive on the general direction of taking inspiration from neuroscience. Reviewer y7rr found the details unclear, recommended more focus on a concrete realization of the general method, and questioned the differences with more traditional hierarchical RL; while many specific inquiries were addressed the reviewer's broad concerns about contextualization remained. Reviewer dDqD had similar concerns around confusing presentation and positioning within the broader literature on ""multi-task RL, non-stationary environments, online/continual learning, etc."", and the discussion unfolded similarly -- many specific concerns addressed but fundamental issues remaining. 6ibM, like y7rr, raised the question of why one should stop at 2 policies rather than N policies, noted the under-discussed relationship to options, and questioned the starting point of SAC, and while this was clarified to be about value functions rather than policies, the reviewer still thought this was an ill-justified choice that rendered the system ""brittle"", and remained unhappy with baseline choices not extending beyond SAC-based agents. tm8g had similar concerns about clarity and in particular that reward engineering seemed central; the authors clarified that this was not the case. + +There is wide consensus among qualified reviewers that the presentation (and in particular situating the method with respect to prior work) is inadequate for publication, and I am inclined to agree. As y7rr put it, ""evaluating its importance and correctness is hard"" without adequate context on the relationships to in particular existing work on hierarchical, multi-task and continual learning. While the direction appears interesting, unfortunately the hard work of contextualizing one's contribution is an utterly essential part of the scientific enterprise; without it we risk retreading well-explored terrain while merely wearing slightly different boots. I encourage the authors to further clarify their presentation incorporating the valuable feedback from the reviewers on this aspect.",ICLR2022, +Unr8l2WfVu,1610040000000.0,1610470000000.0,1,E6fb6ehhLh8,E6fb6ehhLh8,Final Decision,Reject," +The question the authors address is relevant and interesting mostly in the UDA setting. However, there exists several recent works that have +highlighted the importance of label distribution ratio in DA (Wu et al., Combes et al. etc.), hence the main contribution of the +paper is to propose a novel analysis and results in the multi-source setting. That said, the paper has mixed reviews and +after going through the paper, the reviews and the discussion, I tend to agree with some of the reviewers that while +the idea is interesting, the paper lacks in several points that makes it unsuitable to publication, for now. + +Here are the main points leading to the decision. + + +A) UDA is usually the most frequent situation that occurs in domain adaptation and the most difficult to handle. +The theoretical novelty of the bound comes only from the multi-source aspect that seems to be original + +B) there is a strong contradiction in the paper. In the intro, they state that the paper addresses situations where conditional distributions differ. However in 4.2 they assume that they are finally equal. + +In section 4.1, the authors show that for optimizing their problem, they need to have labels, mostly for estimating the class-conditional distributions. When these labels are available in the target domain, the problem is pretty simple and there exists many simple baselines that can handle this problem. However, in a UDA setting, they do not have label and proposes a method for estimation label proportion by assuming S_t(z|y) = T(z|y), which is in contradiction with their initial hypotheses S_t(z|y) != T(z|y). Hence under their assumption, the left hand side of Lemma 1 is zero and the equality is useless. +Hence, I would suggest the authors to avoid such a contradiction. + +Under equality of S_t(z|y) = T(z|y), the approach proposed by the authors bears strong similarity with the work of Redko et al 2019 (cited in their paper). So I would highly to recommend them to compare with that algorithm. . + + +C) the authors use a lot a trick related to filtering, moving average.... I guess those parts is important for making the approach works and they are not properly analyzed. + +D) The paper is confusing in its writing and somehow this confusion makes the theoretical details hard to understand. +For instance, in section 3 the loss function is defined as having two variables but used one line after with only 1. In the theorem, it is not clear whether the true labelling function intervenes or how the y in h(x,y) is related to the true +labels. I guess a clarification is needed here for making the soundness of the theoretical results.",ICLR2021, +AZNqiSAyins,1642700000000.0,1642700000000.0,1,nbC8iTTXIrk,nbC8iTTXIrk,Paper Decision,Accept (Poster),"The paper proposes a multi-scale network that uses DEQ models to incorporate samples at multiple resolutions. The authors also propose a training strategy to improve the performance of the model. The authors investigate the interest of the approach through ablation and explainability, weighing the value of hierarchical heritage, diversity modules, perturbation size, and regularization penalties. + +The reviewers appreciated that the authors tackled the problem of incorporating multiple scales and the “impressive results” on CIFAR-10, CIFAR-100. The reviewers also expressed concerns regarding the computational assessment, in particular the additional computational/memory overhead of unrolling and what the authors mean by “explainability” in their experimental evaluation. The reviewers also made suggestions to organize the paper better. + +The authors submitted responses to the reviewers' comments. After reading the response, updating the reviews, and discussion, the reviewers who took part in the discussion considered that they are “satisfied by the response” and the “major concerns have been addressed”. The feedback provided was already fruitful and the final version should be already improved. The ablative analysis and comparison to baseline is careful and thorough. + +Accept. Poster.",ICLR2022, +3ira_wN0u7x,1610040000000.0,1610470000000.0,1,Pz_dcqfcKW8,Pz_dcqfcKW8,Final Decision,Accept (Poster),"This paper proposes an approach to unifying both full-context and streaming ASR in a single end-to-end model. Techniques such as weight sharing, joint training and teacher-student knowledge distillation are used to improve the training. The so-called dual-mode ASR is evaluated under the ContextNet and Conformer networks on Librispeech and MultiDomain datasets. The performance is good. While the technical novelty is not overwhelmingly significant, all reviewers agree that it may have impact to the speech machine learning community as high-performance streaming ASR is of great importance in real-world deployment of ASR systems. The authors have meticulously addressed the reviewers' comments and, in particular, changed the title from ""universal ASR"" to ""dual-mode ASR"" as suggested by some of the reviewers. After the rebuttal, all reviewers are supportive on accepting the paper. ",ICLR2021, +ZI3AKkpjwXz,1610040000000.0,1610470000000.0,1,#NAME?,#NAME?,Final Decision,Accept (Poster),"This meta-review is written after considering the reviews, the authors’ responses, the discussion, and the paper itself. + +The paper proposes a system for learning disentangled object-centric 3D-based representations of scenes and shows that the proposed model works well on several tasks, including few-shot classification and VQA. + +The reviewers point out that the direction is important (R1, R3), the model is sensible (R2), and the reported results are good (R1, R4); on the downside, they note that the system is complicated (R1), the considered datasets are relatively simplistic (R1, R3, R4), some ablations are missing (R2, R3), and comparisons with baselines are not necessarily convincing (R2, R3). The authors did a good job of addressing the concerns in the rebuttal, by reporting additional ablation results, baselines, and experiments on the realistic Replica dataset. + +All in all, I recommend acceptance. The direction of the work is important and complex, the experimental evaluation presented in the paper is extensive, and the results are good relative to relevant baselines. On the downside, the proposed system and the paper are somewhat complicated and overwhelming, which may limit the benefit for the readers. I hope the authors will take this into account in the future. +",ICLR2021, +LpcjT01-bI6,1610040000000.0,1610470000000.0,1,uHNEe2aR4qJ,uHNEe2aR4qJ,Final Decision,Reject,"Taking all reviews and the work in consideration, unfortunately the work does not present the breadth it needs to sustain the claims it makes. In particular, there work requires to analyse more architectures/variations of datasets with different properties and to provide more careful ablation studies that shows the efficiency of the 3 different proposed methods. Potentially removing one of this methods in order to give more space to analyse the others that seem more promising. ",ICLR2021, +uf525O9HLb,1576800000000.0,1576800000000.0,1,BJl-5pNKDB,BJl-5pNKDB,Paper Decision,Accept (Poster),"The paper provides a theoretical analysis of the recent and popular Generative Adversarial Imitation Learning (GAIL) approach. Valuable new insights on generalization and convergence are developed, and put GAIL on a stronger theoretical foundation. Reviewer questions and suggestions were largely addressed during the rebuttal.",ICLR2020, +wgyu66jUvlJ,1642700000000.0,1642700000000.0,1,LedObtLmCjS,LedObtLmCjS,Paper Decision,Accept (Poster),"This paper proposes a new bilinear decomposition for universal value functions. The bilinear network has one component dependent on state and goal and another component that depends on state and action. The experiments with the DDPG algorithm in robot simulations show that the proposed architecture improves performance data efficiency and task transfer over several baseline algorithms, including improvements on earlier bilinear decompositions. + +The reviews noted several aspects of the paper could be improved, and the author response addressed several of these concerns. Multiple reviewers appreciated the insights from the experiment added in section 4.5 on a simple grid environment, which enabled a direct interpretation of the vector fields used in the method. Several aspects of the presentation were clarified based on the reviewers comments. Additional details were also provided on the problem specification and the solution methods. During the discussion, the reviewers agreed that the revised paper presented a useful addition to the literature. + +Four knowledgeable reviewers indicate to accept the paper for its contribution of an effective network architecture for a goal-conditioned universal value function approximator. The paper is therefore accepted.",ICLR2022, +tYUnVI7dEbQ,1610120000000.0,1610470000000.0,1,qOCdZn3lQIJ,qOCdZn3lQIJ,Final Decision,Reject,"The paper introduces a new scheme for compressing gradients in distributed learning which is argued to exploit temporal correlation. + +The paper received very detailed reviews and generated a lot of discussions (thank you to the reviewers for the amazing job). Many reviewers acknowledge that this is interesting work, a simple and potentially useful algorithm but pointed out many problems with discussion, theoretical analysis, and experiments (e.g., it was not clear to R4 and R3 that these are temporal correlations which are beneficial rather 'lossiness'). Some of these issues were addressed and the current version is currently much stronger than the initial submission (and stronger than the low average scores suggest). Still, the reviewers do not believe that the paper is ready for publication and I share this sentiment. I would strongly encourage the authors to invest more effort in addressing the reviewers' comments and resubmit the work to one of the upcoming top conferences.",ICLR2021, +r1gphYh4xV,1545030000000.0,1545350000000.0,1,B1lnzn0ctQ,B1lnzn0ctQ,Solid contribution to unrolled iterative optimization and soft thresholding,Accept (Poster),This is a well executed paper that makes clear contributions to the understanding of unrolled iterative optimization and soft thresholding for sparse signal recovery with neural networks.,ICLR2019,5: The area chair is absolutely certain +yFul0fXAVbj,1642700000000.0,1642700000000.0,1,i7h4M45tU8,i7h4M45tU8,Paper Decision,Reject,"The paper gives a framework for learning temporal logic rules from noisy unlabeled data. The key novelty is a formulation of combinatorial rule search as an end-to-end differentiable problem. The method is evaluated on a video dataset and a healthcare dataset. + +The reviewers liked the high-level ideas behind the paper. However, the conclusion was that the experimental results, while interesting, are still somewhat preliminary (in particular, the baselines are weak). I agree with this point and am recommending rejection this time around. However, I urge the authors to develop the paper further and submit to the next deadline.",ICLR2022, +rLCjont0nai,1642700000000.0,1642700000000.0,1,tiKNfYpH8le,tiKNfYpH8le,Paper Decision,Reject,"The authors propose the OPT-in-Pareto algorithm that considers multi-objective optimization, and includes an extra ""non-informative"" reference metric for choosing between different Pareto-optimal solutions. + +The reviewers generally agreed that the work was compelling. However, one reviewer (6MZF) brought up the fact that the proposal is extremely similar to one proposed by a different arXiv paper, and convincingly argued that the authors of this paper were aware of the other before submission. + +This is a difficult situation. On the one hand, for the purposes of establishing priority, an arXiv paper ""doesn't count"". On the other hand, I believe that authors are obligated to appropriately credit all relevant work of which they are aware, in *any* form: this includes journals, conference proceedings, preprints, emails, personal conversations, stackoverflow posts, tweets, etc. In this case, it seems that the authors did not adhere to this second condition, and while they have updated their manuscript, two reviewers said that they were unsatisfied by the changes on this point. + +I want to emphasize that this isn't a question of priority: the first to publish ""wins"", and nobody has published this work, yet. However, other researchers working on the same problem, and proposing similar solutions, *must* be appropriately credited, even by the eventual winners (if they are aware of them).",ICLR2022, +K2Ul_Iq2kQ,1576800000000.0,1576800000000.0,1,SygSLlStwS,SygSLlStwS,Paper Decision,Reject,"The authors propose an algorithm for meta-rl which reduces the problem to one of model identification. The main idea is to meta-train a fast-adapting model of the environment and a shared policy, both conditioned on task-specific context variables. At meta-testing, only the model is adapted using environment data, while the policy simply requires simulated experience. Finally, the authors show experimentally that this procedure better generalizes to out-of-distribution tasks than similar methods. + +The reviewers agree that the paper has a few significant shortcomings. It's unclear how hyper-parameters are selected in the experimental section; the algorithm does not allow for continual adaptation; all policy learning is done through data relabelled by the model. + +Overall, the problem the paper addresses is very important, but we do not deem the paper publishable in its current form.",ICLR2020, +Hkgjg5J7xV,1544910000000.0,1545350000000.0,1,r1ledo0ctX,r1ledo0ctX,"Interesting approach, but poorly written. Needs to more work before acceptance.",Reject,"This paper proposes an anomaly-detection approach by augmenting VAE encoder with a network multiple hypothesis network and then using a discriminator in the decoder to select one of the hypothesis. The idea is interesting although the reviewers found the paper to be poorly written and the approach to be a bit confusing and complicated. + +Revisions and rebuttal have certainly helped to improve the quality of the work. However, the reviewers believe that the paper require more work before it can be accepted at ICLR. For this reason, I recommend to reject this paper in its current state. +",ICLR2019,5: The area chair is absolutely certain +ULX0rrnOZ-,1610040000000.0,1610470000000.0,1,ODKwX19UjOj,ODKwX19UjOj,Final Decision,Reject,"Although the paper studies a relevant and important topic, which is about learning of hierarchy of concepts in an unsupervised manner, the reviewers raised several critical concerns. In particular, although the hierarchical structure of concepts is the key idea in this paper, the concept of hierarchy itself is not well explained. How to define the hierarchical level of concepts should be carefully and mathematically discussed. In addition, empirical evaluation is not thorough as reviewers pointed out. Although we acknowledge that the authors addressed concerns by the author response, newly added results are still confusing and more careful treatment is needed before publication. I will therefore reject the paper. + +This work reminds me the the topic called ""formal concept analysis"" (e.g. see [1]), which mathematically defines concepts as closed sets and constructs a hierarchy of concepts in an unsupervised manner. This method can be viewed as co-clustering and also has a close relationship to closed itemset mining. This approach is used in machine learning (e.g. [2]). I think it is beneficial for the authors to refer such existing and well-established approaches to elaborate this work further. + +[1] Davey, B.A., Priestley, H.A.: Introduction to Lattices and Order, Cambridge Univ. Press (2002) +[2] Yoneda, et al., Learning Graph Representation via Formal Concept Analysis, arXiv:1812.03395 ",ICLR2021, +HkkKNJTBM,1517250000000.0,1517260000000.0,392,HkCnm-bAb,HkCnm-bAb,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"The paper introduces an interesting family of two-player zero-sum games with tunable complexity, called Erdos-Selfridge-Spencer games, as a new domain for RL. The authors report on extensive empirical results using a wide variety of training methods, including supervised learning and several flavors of RL (PPO, A2C, DQN) as well as single-agent vs. multi-agent training. The reviewers also appear to agree that the method appears to be technically correct, clearly written, and easy to read. + +A drawback of the paper is that it does not make a *significant* contribution to the field. In combing through the reviewer comments, none of them identify a significant contribution. Even in the text of the paper, the authors do not anywhere claim to have made a significant contribution. As the paper is still interesting, the committee would like to recommend this for the workshop track. + +Pros: + Interesting domain with tunable complexity + High-quality extensive empirical results + Writing is clear + +Cons: + Lacks a significant contribution + Appears to overlook self-play, the dominant RL training paradigm for decades (multiagent training appears to be related but different) + Per Reviewer3, ""I remain unconvinced that these games are good general tests for Deep RL""",ICLR2018, +Q_EUd531i,1576800000000.0,1576800000000.0,1,H1lac2Vtwr,H1lac2Vtwr,Paper Decision,Reject,"This paper proposes a few architectural modifications to the BERT model for language understanding, which are meant to apply during fine-tuning for target tasks. + +All three reviewers had concerns about the motivation for at least one of the proposed methods, and none of three reviewers found the primary experimental results convincing: The proposed methods yield a small improvement on average across target tasks, but one that is not consistent across tasks, and that may not be statistically significant. + +The authors clarified some points, but did not substantially rebut any of the reviewers concerns. Even though the reviewers express relatively low confidence, their concerns sound serious and uncontested, so I don't think we can accept this paper as is.",ICLR2020, +gXtM1dlwn1I,1642700000000.0,1642700000000.0,1,2DJn3E7lXu,2DJn3E7lXu,Paper Decision,Reject,"This paper comprehensively evaluated 18 different performance predictors on ten combinations of metrics, devices, network types, and training tasks for NAS. While evaluating and comparing different prediction models is not itself novel, the authors provided many insights that are potentially interesting to future NAS developments. + +Reviewer reactions to this paper are rather mediocre and lukewarm. It is in general consensus that this work gives a good empirical analysis on hardware metric predictors for NAS, but the novelty is low and it is perhaps a bit incremental (e.g., nothing ""shockingly new"" was revealed, and observations are mostly ""as expected""). Despite the authors improving the paper during rebuttal with new plots/tables, there remain to be unaddressed comments, e.g., adding experiments that run BO / evolution / etc with different hardware predictors and comparing the quality of the Pareto front. Those missed points were also raised in the private discussion. + +After personally reading this paper, AC sides with most reviewers that this paper lacks true novelty nor technical excitement. While the empirical study is valuable, it perhaps suits venues other than ICLR, e.g., the NeurIPS benchmark track.",ICLR2022, +xgbmpDmdp3_,1642700000000.0,1642700000000.0,1,GthNKCqdDg,GthNKCqdDg,Paper Decision,Reject,"This paper presents a reinforcement learning inspired algorithm to train task-specific adapters to adapt pretrained language models for downstream tasks. The paper attempts to tackle an important problem. All reviewers have concerns about whether the results are strong enough to justify claims made in the paper. I appreciate revisions that have been done by the authors during the rebuttal period. However, I believe that the paper is still below the bar for ICLR. I recommend rejecting this paper.",ICLR2022, +KaVtK9PSfN,1610040000000.0,1610470000000.0,1,tIjRAiFmU3y,tIjRAiFmU3y,Final Decision,Accept (Poster),"After reading the author’s response, all reviewers recommend accepting the paper. R2 and R3 strongly support the paper while R1 and R4 consider it borderline. + +There is agreement that the idea of the work is interesting and novel. The experimental results look solid. + +The authors provided an extensive response addressing most of the concerns of the reviewers. In light of this feedback, the reviewers provided some additional comments (which the authors could not address, as the discussion period was over). The AC considers that the authors should incorporate this feedback to the final version of the manuscript. Specifically, + +Responding to R1's first question regarding the noise distribution on the original image being significantly different from Gaussian. The authors provided detailed results, which is highly appreciated. As R1 points out, the authors had to introduce an additional VST for the method. These results should be added to the manuscript is important, to show the limitations of the approach. + +R3 asks about the importance of the initialization of the weights of the encode and decoder. This is a natural question as this is a non-convex problem. The authors clarify in the manuscript the initialization of x, but do not comment on the weights. It would be good to add a sentence in this regard (as done in the discussion). + +R4 mentioned, and the AC agrees, that the authors should try to improve the clarity of the exposition. + +The AC considers it important to add in the appendix more visual examples to quantitatively show the performance of the method.",ICLR2021, +v0ccgYOJZR,1576800000000.0,1576800000000.0,1,S1xHfxHtPr,S1xHfxHtPr,Paper Decision,Reject,"The paper proposes a new problem setup as ""online continual compression"". The proposed idea is a combination of existing techniques and very simple, though interesting. Parts of the algorithm are not clear, and the hierarchy is not well-motivated. Experimental results seem promising but not convincing enough, since it is on a very special setting, the LiDAR experiment is missing quantitative evaluation, and different tasks might introduce different difficulties in this online learning setting. The ablation study is well designed but not discussed enough.",ICLR2020, +YqfqyWENWa,1576800000000.0,1576800000000.0,1,BJg_2JHKvH,BJg_2JHKvH,Paper Decision,Reject,"This paper offers a novel method for semi-supervised learning using GMMs. Unfortunately the novelty of the contribution is unclear, and the majority of the reviewers find the paper is not acceptable in present form. The AC concurs.",ICLR2020, +I6QJgdj-p9s,1610040000000.0,1610470000000.0,1,yZkF6xqhfQ,yZkF6xqhfQ,Final Decision,Reject,"While the reviewers find the experiments in the paper somewhat interesting, they find that the paper does not sufficiently address whether the limitations shown for models in this paper translate to larger models and other, more realistic, tasks, or an artifact of the setup considered in the paper. Overall the takeaways seem unclear from the paper and I believe it is not ready for acceptance. Addressing the issues raised by reviewers and having a more clear discussion on connections to existing results will help the paper.",ICLR2021, +ryerSypBM,1517250000000.0,1517260000000.0,554,Syx6bz-Ab,Syx6bz-Ab,ICLR 2018 Conference Acceptance Decision,Reject,"This paper introduces a new dataset and method for a ""semantic parsing"" problem of generating logical sql queries from text. Reviews generally seemed to be very impressed by the dataset portion of the work saying ""the creation of a large scale semantic parsing dataset is fantastic,"" but were less compelled by the modeling aspects that were introduced and by the empirical justification for the work. In particular: + +- Several reviewers pointed out that the use of RL in particularly this style felt like it was ""unjustified"", and that the authors should have used simpler baselines as a way of assessing the performance of the system, e.g. ""There are far simpler solutions that would achieve the same result, such as optimizing the marginal likelihood or even simply including all orderings as training examples"" + +- The reviewers were not completely convinced that the authors' backed up their claims about the role of this dataset as a novel contribution. In particular there were questions about its structure, e.g. ""dataset only covers simple queries in form of aggregate-where-select structure"" and about comparisons with other smaller but similar datasets, e.g. ""how well does the proposed model work when evaluated on an existing dataset containing full SQL queries, such as ATIS"" + +There was an additional anonymous discussion about the work not citing previous semantic parsing datasets. The authors noted that this discussion inappropriately brought in previous private reviews. However it seems like the main reviewers issues were orthogonal to this point, and so it was not a major aspect of this decision. ",ICLR2018, +oUmALgrneE2,1610040000000.0,1610470000000.0,1,Cn706AbJaKW,Cn706AbJaKW,Final Decision,Reject,"Reviewers appreciated the care and substantial effort that went into the paper, for instance: +AR3) I think it's of good value for the community to see and discuss the paper in the conference. +AR4) would be quite valuable for the senior members of the community to read and be familiar with. + +The main argument for rejection is the the analysis done in the paper is not typical of ICLR research. Arguably, the paper could fall under the topic ""societal considerations of representation learning including fairness, safety, privacy"", but this does not apply because the subject of analysis is the conference ICLR, not representation learning. I support this argument. + +The reviewers posed a good number of questions and issues with the paper, and largely these were addressed well by the authors. In some cases they addressed the issues properly, and others they argued their case. For instance AR2 says ""think the ACs decision process is too simplified"" and the response summed up as ""our ability to do multi-factor studies is limited by the size of our dataset"". + +An important one of these discussions is as follows: +AR4) But since the AC are not identified as biased, and the papers are anonymous, it is not clear what is the mechanism suggested by the authors of how these biases manifest themselves. +Authors) .... we find the idea that anonymity does not genuinely exist to be entirely plausible. +I would argue that neither party can claim to have won this argument, and I am not really sure how it can be resolved. Fortunately, though, no evidence for gender bias in ACs was found. + +In conclusion, the paper is not topical to ICLR material, and the reviewer consensus is Reject. However, the paper is both valuable and interesting to the community, and it has seen substantial improvement through the review process and a lot of the issues defended well. + +The paper should be brought to the attention of the various committees and made available somehow at the conference and acknowledged as a useful publication. ",ICLR2021, +0-0EpCol3u,1642700000000.0,1642700000000.0,1,0DLwqQLmqV,0DLwqQLmqV,Paper Decision,Accept (Poster),"This paper proposes a novel benchmark for neural architecture search methods, which consists of 25 different combinations of search spaces and datasets. The main motivation is that existing NAS benchmarks, such as NAS-Bench-201, consider very small search space and few datasets, such that conclusions drawn with them do not generalize to unseen settings with different search spaces and datasets. The authors first describe the 25 different combinations of the search space and tasks for the given benchmark, and then conduct an extensive empirical study of existing NAS methods and performance predictors with the proposed benchmark, to show that architectures and hyperparameters found with the popular benchmarks do not generalize to other settings, which is consistent with their assumption. + +— + +All reviewers were initially positive about the paper, and remained positive throughout the discussion period. The reviewers found the paper well-motivated, and the proposed benchmark useful, as they agree with the need of introducing a single, unified framework that can validate a NAS method under diverse settings, since existing benchmarks only consider specific datasets and search spaces. However, the reviewers were also concerned with the weak technical novelty (Reviewer 2xvD), and that the work lacks deeper insights that could guide the community towards better methods (Reviewer Gku7). + +I also agree with the authors and the reviewers on the necessity of having a unified benchmark that incorporates all different settings considered in the previous benchmarks, and find the extensive empirical study of existing NAS methods useful. + +However, I find the work as rather technically weak as mentioned by R2xvD, since the authors spent too much time describing and showing the limitations of existing benchmark methods, while what is more important for benchmarks, is to justify how the proposed benchmark can evaluate the performance of different methods in a fair manner, while being representative of the practical settings. In short, the authors need to justify their design choices. Yet, the 25 settings proposed in the paper seem to have been arbitrarily chosen, and it is not clear if having a good performance on this benchmark is indeed a fair evaluation, or well-reflects how the NAS method will perform in practice. The proposed benchmark also does not really consider a novel search space or setting that have been overlooked in the past either, and does not provide much insights on the problem, as mentioned by Reviewer Gku7. + +Thus, although I recommend an acceptance for its practical value acknowledged by the reviewers, the authors need to put a considerable amount of effort in revising the paper, and If this were a journal submission, the paper may need to undergo a major revision. Most importantly, as described, the authors should justify their design choices as well as whether evaluating a model on the benchmark yields “fair” and “representative” results, focusing more on describing the proposed benchmark itself.",ICLR2022, +bLxj7angZk6,1610040000000.0,1610470000000.0,1,xP37gkVKa_0,xP37gkVKa_0,Final Decision,Reject,"This paper has some interesting ideas and is an incremental improvement over previous work. However, it needs further revisions and polishing. The relation to prior work is a bit unclear. Since you mention POMDPs, what would be an equivalent version of your method in POMDPS? Why not compare your algorithm with a state-of-the-art method for small discrete problems? It is also a bit unclear why training a model to predict beliefs would be faster than just calculating them (after all the data must come from somewhere)..",ICLR2021, +myLM_Km5oA,1576800000000.0,1576800000000.0,1,rJxq3kHKPH,rJxq3kHKPH,Paper Decision,Reject,"This paper focuses on mitigating the effect of label noise. They provide a new class of loss functions along with a new stopping criteria for this problem. The authors claim that these new losses improves the test accuracy in the presence of label corruption and helps avoid memorization. The reviewers raised concerns about (1) lack of proper comparison with many baselines (2) subpar literature review and (3) state that parts of the paper is vague. The authors partially addressed these concerns and have significantly updated the paper including comparison with some of the baselines. However, the reviewers were not fully satisfied with the new updates. I mostly agree with the reviewers. I think the paper has potential but requires a bit more work to be ready for publication and can not recommend acceptance at this time. I have to say that the authors really put a lot of effort in their response and significantly improved their submission during the discussion period. I recommend the authors follow the reviewers' suggestions to further improve the paper (e.g. comparing with other baselines) for future submissions",ICLR2020, +zXhAPKDNtnr,1642700000000.0,1642700000000.0,1,vfsRB5MImo9,vfsRB5MImo9,Paper Decision,Accept (Poster),"The paper introduces the problem of continual knowledge (language) learning. The authors point out the interesting duality between continual learning and knowledge learning where: in knowledge learning one must avoid forgetting time-invariant knowledge (avoid forgetting in CL), be able to acquire new knowledge (learn new tasks in CL), and replace outdated knowledge (a form of forgetting and re-learning or adaptation). In their paper, the authors develop an initial benchmark for the task along with a set of baselines and provide empirical studies. + +The initial reviews were quite mixed. The reviewers seem to agree this work studies an interesting and fairly novel direction for continual learning of language. However, the reviewers did not agree on whether this initial stab at the problem was ""enough."" In particular, reviewer U9Hk argues that the formulation is ""oversimplified"" and the current experiments are limiting. + +After the discussion, the reviewers remained split with one high score (8), two borderline accepts (3), and one reject. So three reviewers believe that this manuscript is already a good contribution. The fourth reviewer disagrees, but the authors provided clear and convincing responses to many of their comments (and point to results already available in the appendix). + +Overall, this is a clear and reasonable first step considering this paper proposes a new CL problem. The reviewers and I believe that this is interesting and rigorous enough to be impactful and to warrant follow-up works. As a result, I'm happy to recommend acceptance. I imagine that if the community demonstrates interest in this line of work, there will be work both on methodologies to improve the proposed baselines, but also work proposing extensions to the problem in line with some of the comments of reviewer U9Hk. + +In preparing their camera-ready version I strongly encourage the authors to take into account the suggestions of the reviewers and your replies. In particular, your discussion regarding encoder-decoder and decoder-only LMs and the associated results would be good to discuss in the main text (even if the full results are in the appendix).",ICLR2022, +mw-VeufuXB,1576800000000.0,1576800000000.0,1,BkgqExrYvS,BkgqExrYvS,Paper Decision,Reject,"This manuscript studies scaling distributed stochastic gradient descent to a large number of nodes. Specifically, it proposes to use algorithms based on population analysis (relevant for large numbers of distributed nodes) to implement distributed training of deep neural networks. + +In reviews and discussions, the reviewers and AC note missing or inadequate comparisons to previous work on asynchronous SGD, and possible lack of novelty compared to previous work. The reviewers also mentioned the incomplete empirical comparison to closely related work. On the writing, reviewers mentioned that the conciseness of the manuscript could be improved. +",ICLR2020, +HJgg51PHxN,1545070000000.0,1545350000000.0,1,HJMHpjC9Ym,HJMHpjC9Ym,Simple and effective,Accept (Poster),This paper propose a novel CNN architecture for learning multi-scale feature representations with good tradeoffs between speed and accuracy. reviewers generally arrived at a consensus on accept.,ICLR2019,4: The area chair is confident but not absolutely certain +Fz0A5YUrRf,1576800000000.0,1576800000000.0,1,H1lOUeSFvB,H1lOUeSFvB,Paper Decision,Reject,"The authors propose a novel approach to using surrogate gradient information in ES. Unlike previous approaches, their method always finds a descent direction that is better than the surrogate gradient. This allows them to use previous gradient estimates as the surrogate gradient. They prove results for the linear case and under simplifying assumptions that it extends beyond the linear case. Finally, they evaluate on MNIST and RL tasks and show improvements over ES. + +After the revisions, reviewers were concerned about: +* The strong (and potentially unrealistic) assumptions for the theorems. They felt that these assumptions trivialized the theorems. +* Limited experiments demonstrating advantages in situations where other more effective methods could be used. The performance on the RL tasks shows small gains compared to a vanilla ES approach. Thus, the usefulness of the approach is not clearly demonstrated. + +I think that the paper has the potential to be a strong submission if the authors can extend their experiments to more complex problems and demonstrate gains. At this time however, I recommend rejection.",ICLR2020, +HybM4k6Bf,1517250000000.0,1517260000000.0,302,S19dR9x0b,S19dR9x0b,ICLR 2018 Conference Acceptance Decision,Accept (Poster),The reviewers unanimously agree that this paper is worth publication at ICLR. Please address the feedback of the reviewers and discuss exactly how the potential speed up rates are computed in the appendix. I speed up rates to be different for different devices.,ICLR2018, +pZdtoxYrBTl,1642700000000.0,1642700000000.0,1,ZaVVVlcdaN,ZaVVVlcdaN,Paper Decision,Accept (Poster),"The paper analyzes a 2-stage method for federated learning, first using FL with local steps, followed by a final phase of 'always-communicate' centralized SGD. For the convex case, the paper studies the influence of the data heterogeneity, a key parameter in FL, on the convergence of related schemes. Surprisingly the results of the 2-stage method seem to be basically identical to pure local training followed by the final centralized phase, and almost match the lower bound for communication. + +Reviewers liked the interesting aspect of the heterogeneity-induced error floor when the phases are switched, and its impact on the convergence rates, which can be substantial. Downsides are that the analysis only works for strongly convex setting, and the combination of the two methods and proof being relatively straight-forward. Simplicity of the algorithm is a plus, while of the proof depends on novelty, about which reviewers are border-line but positive. + +Deep learning experiments should be expanded, as there the very opposite order of the two phases https://arxiv.org/abs/1808.07217 is more commonly used (i.e. more communication in early phase can help), which should be discussed. Also, in the experiments the tuning of hyperparameters in the single-stage baselines needs to be improved to be more fair, which the authors have started but not fully finished for the Cifar case. + +We hope the authors will incorporate the open points as mentioned by the reviewers.",ICLR2022, +BJxLRk7eeV,1544720000000.0,1545350000000.0,1,rJl-HsR9KX,rJl-HsR9KX,"Novel active learning approach, but work could still benefit from revisions and additional baselines",Reject,"This paper proposes a novel and interesting active learning approach, that trains a classifier to discriminate between the examples in the labeled and unlabeled data at each iteration. The top few samples that are most likely to be from the unlabeled set as per this classifier are selected to be labeled by an oracle, and are moved to the labeled training examples bin in the next iteration. The idea is simple and clear and is shown to have a principled basis and theoretical background, related to GANs and to previous results from the literature. Experiments performed on CIFAR-10 and MNIST benchmarks demonstrate good results in comparison to baselines. +During the review period, authors considered most of the suggestions by the reviewers and updated the paper. Although the proposed method is similar to density-based active learning methods, as also suggested by the reviewers, baselines do not include such approaches in the comparison experiments.",ICLR2019,3: The area chair is somewhat confident +tObSf3QKg2,1610040000000.0,1610470000000.0,1,OGg9XnKxFAH,OGg9XnKxFAH,Final Decision,Accept (Poster),"This paper proposes a simple but effective method to obtain ensembles of classifiers (almost) for free. +Essentially you train one network on multiple inputs to predict multiple outputs. The authors show that this leads to surprisingly diverse networks - without a significant increase in parameters - which can be used for ensembling during test time. +Because of its simplicity, I can imagine that this approach could become a standard trick in the ""deep learning tool chest"". + +-AC",ICLR2021, +H-35WH2vQK1,1610040000000.0,1610470000000.0,1,a5KvtsZ14ev,a5KvtsZ14ev,Final Decision,Reject,"The paper received 5 reviews, one of which had positive feedback. Although there are merits associated with the paper, several concerns raised in the reviews and the discussion period that prevents the paper to be accepted. It appears that experiments on noisy graphs are not properly done and competitive baselines are not used for validations. The quality of the learned graph structure is not adequately analyzed. and the experimental setup was not clearly explained. All these indicate that there is a need for a major revision before the paper can be considered for acceptance.",ICLR2021, +HJxFwTCPgV,1545230000000.0,1545350000000.0,1,B1lKtjA9FQ,B1lKtjA9FQ,meta-review,Reject,The reviewers reached a consensus that the paper is not fit for publication for the moment because a) the paper lacks thorough experiments and b) the criteria provided by the paper are relatively evague (see more details in reviewer 3's comments.),ICLR2019,5: The area chair is absolutely certain +r1Io2zUOl,1486400000000.0,1486400000000.0,1,B1MRcPclx,B1MRcPclx,ICLR committee final decision,Accept (Poster),"The paper proposes a simple recurrent architecture for question-answering, achieving good performance on several reasoning benchmarks. All three reviewers agree on the merit of the contribution.",ICLR2017, +fMq2VIvnpUA,1642700000000.0,1642700000000.0,1,rhDaUTtfsqs,rhDaUTtfsqs,Paper Decision,Reject,"This submission proposes a simple way to improve the stability of training GPT-2: Increase the sequence length of examples over the course of training. It is shown that this simple heuristic can result in using larger learning rates, therefore significantly speeding up convergence. Reviewers agreed that this was a simple and effective approach, but shared various concerns about the paper: +- The paper focuses on GPT-2, while stability issues can arise in a much wider range of models. Additional experiments with other models (and ideally other codebases/training setups) would help verify that the proposed method is broadly applicable. +- Better analysis of why using the sequence length as the difficulty metric would be helpful. What other criteria would be possible? Why is sequence length the best? + +I would suggest that the authors significantly expand the submission based on the above suggestions and resubmit.",ICLR2022, +Hkx5YGLZx4,1544800000000.0,1545350000000.0,1,SJMeTo09YQ,SJMeTo09YQ,"Interesting idea, but limited applicability",Reject,"The paper presents a simple and interesting idea to improve exploration efficiency, using the notion of action permissibility. Experiments in two problems (lane keeping, and flappy bird) show that exploration can be improved over baselines like DQN and DDPG. However, action permissibility appears to be very strong domain knowledge that limits the use in complex problems. + +Rephrasing one of reviewers, action permissibility essentially implies that some one-step information can be used to rule out suboptimal actions, while a defining challenge in RL is that the agent needs to learn/plan/reason over multiple steps to decide whether an action is suboptimal or not. Indeed, the two problems in the experiments have such a property that a myopic agent can solve the tasks pretty well. The paper would be stronger if the AP function can be defined for more common RL benchmarks, with similar benefits demonstrated.",ICLR2019,5: The area chair is absolutely certain +XhCUvYymvMW,1610040000000.0,1610470000000.0,1,FmMKSO4e8JK,FmMKSO4e8JK,Final Decision,Accept (Poster),This work proposes a model-based optimization using an approximated normalized maximum likelihood (NML). It is an interesting idea and has the advantage of scaling to large datasets. The reviewers are generally positive and are satisfied with authors' response. ,ICLR2021, +_ff90p8S7,1576800000000.0,1576800000000.0,1,rJxHcgStwr,rJxHcgStwr,Paper Decision,Reject,"The submission proposes to use CNN for Amharic Character Recognition. The authors used a straight forward application of CNNs to go from images of Amharic characters to the corresponding character. There was no innovation on the CNN side. The main contribution of the work is the Amharic handwriting dataset and the experiments that were performed. + +The reviewers indicated the following concerns: +1. There was no innovation to the method (a straight forward CNN is used) and is likely not of interest to the ICLR community +2. The dataset was divided into train/val split and does not contain a held-out test set. Thus it was impossible to determine the generalization of the model. +3. The paper is poorly written with the initial version having major formatting issues and missing references. The revised version has fixed some of the formatting issues. The paper still need to having more paragraph breaks to help with the readability of the paper (for instance, the introduction is still one big long paragraph). The terminology and writing can also be improved. For instance, in section 2.3, the authors write that ""500 dataset for each character were collected"". It would be clearer to say that ""500 images for each character were collected"". + +The submission received low reviews overall (3 rejects), which was unchanged after the rebuttal. Due to the general consensus, there was limited discussion. There were also major formatting issues with the initial submission. The revised version was improved to have proper inclusion of Amharic characters in the text, missing figures, and references. However, even after the revision, the paper still had the above issues with methodology (as noted by R4) and is likely of low interest for the ICLR community. + +The Amharic handwriting data and experiments using a CNN can be of interest to the different community and I would recommend the authors work on improving their paper based on reviewer comments and submit to different venue (such as a workshop focused on character recognition for different languages). +",ICLR2020, +pXIjHjTkJW-,1642700000000.0,1642700000000.0,1,cmt-6KtR4c4,cmt-6KtR4c4,Paper Decision,Accept (Spotlight),"This paper is about unsupervised translation between programming languages. The main positive is that it introduces the idea of using a form of unit test generation and execution behavior within a programming language back-translation setup, and it puts together together a number of pieces in an interesting way: text-to-text transformers, unit test generation, execution and code coverage. Results show a substantial improvement. The main weaknesses are that there are some caveats that need to be made, such as the (heuristic, not learned) way that test cases are translated across languages is not fully general, and that limits the applicability. There are also some cases where I find that the authors are stretching claims a bit beyond what experiments support, e.g., in the response to zd7L about applicability to COBOL. + +All-in-all, though, it's a good implementation of an idea that should have a lasting place in this line of work, so it's worth accepting.",ICLR2022, +H1t0szLOl,1486400000000.0,1486400000000.0,1,HkljfjFee,HkljfjFee,ICLR committee final decision,Accept (Poster),"Adding a manifold regularizer to a learning objective function is certainly not a new direction. The paper argues that using a support based regularizer is superior to using a standard graph Laplacian regularizer (which has been explored before), although this argument is not developed particularly rigorously and dominantly has to fall back on empirical evidence. The main contribution of the paper appears to be theoretical justification of an alternating optimization scheme for minimizing the resulting objective function (yet the optimization aspects of dealing with a sparse support regularizer are somewhat orthogonal to the current context). The empirical results are not very convincing since the dictionary size is relatively large compared to the dataset size; the gains with respect to l2 manifold regularizer are not consistent; and the gains using deep architectures to directly predict sparse codes are also modest and somewhat inconsistent. These points aside, the reviewers are overall enthusiastic about the paper and find it to be well written and complete.",ICLR2017, +H5G4pOD2vsL,1642700000000.0,1642700000000.0,1,LBvk4QWIUpm,LBvk4QWIUpm,Paper Decision,Accept (Spotlight),The authors extend the result of Ongie et al. (2019) and derive sparseneural network approximation bounds that refine previous results. The reuslts are quite ineteresting and relevant to ICLR. All the reviewers were positive about this paper.,ICLR2022, +JDC6NKEZjvf,1642700000000.0,1642700000000.0,1,mz7Bkl2Pz6,mz7Bkl2Pz6,Paper Decision,Reject,"The paper considers the global convergence and stability of SGD for non-convex setting. The main contribution of the work seems to be to remove uniform bounded assumption on the noise, and to relax the global Holder assumption typically made. Their discussions in Appendix A provide an example for which the uniform bounded assumption on the noise commonly assumed in the literature fails. The authors establish that SGD’s iterates will either globally converge to a stationary point or diverge and hence tehir result exclude limit cycle or oscillation. Under a more restrictive assumption on the joint behavior of the non-convexity and noise model they also show that the objective function cannot diverge, even if the iterates diverge. + +The reviewers are on the fence with this paper. While they agree that the paper is interesting, they only give it a score of weak accept (subsequent to rebuttal as well). One of the qualms is that while the authors claim the result helps show success of SGD in more natural non-convex problems, they don’t provide realistic examples supporting their claim. Further, while the extension to holder smoothness assumption while is indeed interesting, unless practical significance is shown via examples, the result is not that exciting. + +From my point of view and reading, while the reviews are not extensive, i do not disagree with reviewers sentiment. Technically the paper is strong but there is a unanimous lack of strong excitement for the paper amongst reviewers. While there is this lack of more enthusiasm, given the number of strong submissions this year, I am tending towards a reject.",ICLR2022, +wWg-QUvUW_,1576800000000.0,1576800000000.0,1,Bylkd0EFwr,Bylkd0EFwr,Paper Decision,Reject,"This paper introduces a biologically inspired locally sensitive hashing method, a variant of FlyHash. While the paper contains interesting ideas and its presentation has been substantially improved from its original form during the discussion period, the paper still does not meet the quality bar of ICLR due to its limitations in terms of experiments and applicability to real-world scenarios.",ICLR2020, +ByexeLZ-eE,1544780000000.0,1545350000000.0,1,HygYqs0qKX,HygYqs0qKX,decision,Reject,"The paper presents an interesting idea, but there are significant concerns about the presentation issues and experimental results (e.g., comparisons with baselines). Overall, it is not ready for publication. ",ICLR2019,4: The area chair is confident but not absolutely certain +B1lO6ZMexN,1544720000000.0,1545350000000.0,1,rJe4ShAcF7,rJe4ShAcF7,successful adaptation of transformer networks to generating long coherent music sequences,Accept (Poster),"1. Describe the strengths of the paper. As pointed out by the reviewers and based on your expert opinion. + +- improvements to a transformer model originally designed for machine translation +- application of this model to a different task: music generation +- compelling generated samples and user study. + +2. Describe the weaknesses of the paper. As pointed out by the reviewers and based on your expert opinion. Be sure to indicate which weaknesses are seen as salient for the decision (i.e., potential critical flaws), as opposed to weaknesses that the authors can likely fix in a revision. + +- lack of clarity at times (much improved in the revised version) + +3. Discuss any major points of contention. As raised by the authors or reviewers in the discussion, and how these might have influenced the decision. If the authors provide a rebuttal to a potential reviewer concern, it’s a good idea to acknowledge this and note whether it influenced the final decision or not. This makes sure that author responses are addressed adequately. + +The main contention was novelty. Some reviewers felt that adapting an existing transformer model to music generation and achieving SOTA results and minute-long music sequences was not sufficient novelty. The final decision aligns with the reviewers who felt that the novelty was sufficient. + +4. If consensus was reached, say so. Otherwise, explain what the source of reviewer disagreement was and why the decision on the paper aligns with one set of reviewers or another. + +A consensus was not reached. The final decision is aligned with the positive reviews for the reason mentioned above. +",ICLR2019,4: The area chair is confident but not absolutely certain +yNHIuV5RC,1576800000000.0,1576800000000.0,1,rkgz2aEKDr,rkgz2aEKDr,Paper Decision,Accept (Poster),"The paper considers an important topic of the warmup in deep learning, and investigates the problem of the adaptive learning rate. While the paper is somewhat borderline, the reviewers agree that it might be useful to present it to the ICLR community.",ICLR2020, +Y6iRu81WyFZ,1642700000000.0,1642700000000.0,1,bsr02xd-utn,bsr02xd-utn,Paper Decision,Reject,"This paper aims to address the imbalanced class problem in unsupervised domain adaptation. The challenge lies in how to handle the difficulties introduced by imbalanced classes. To this end, this work proposes a new data augmentation strategy by taking the interpolation of two samples from the same class but from different domains as the augmented samples. The experiments demonstrate promising performance on the class-imbalanced domain adaptation datasets. + +However, there are several concerns raised by the reviewers. 1) The interpolation between a source and target sample of the same class can potentially be unreliable as the pseudo label methods. 2) Some statements are based on intuition but not well supported by either theoretical analysis or experimental evaluations. 3) The proposed method is inferior to baseline methods on some datasets, it would be helpful to have further analysis of the advantages and limitations of the proposed method. + +Overall, the paper provides some new and interesting ideas. However, given the above concerns, the novelty and significance of the paper will degenerate. More discussions on the principles behind the proposed method and more experimental studies are needed. Addressing the concerns needs a significant amount of work. Although we think the paper is not ready for ICLR in this round, we believe that the paper would be a strong one if the concerns can be well addressed.",ICLR2022, +o2d0dklZkRl,1642700000000.0,1642700000000.0,1,AkJyAE46GA,AkJyAE46GA,Paper Decision,Reject,"The paper shows that active learning is an emergent property of pre-trained models. They show that simple uncertainty sampling improves sample efficiency by 6 times (up to 6x fewer samples for the same accuracy). This is an interesting and important observation that has practical implications. + +Initially, there were various concerns regarding the message of the paper, including the tile and use of uncertainty function in AL and lack of enough experiments that were addressed through rebuttal period. + +However, there are still remaining concerns that lead to the paper not being ready for publication. Namely, + (1) Clear discussion on how much of the gains are due to active learning vs pre-training with respect to different cases. it is also worth investigating additional causes for the failure cases. + (2) there are many observations here without a clear narrative or theory. +Moreover, making the story more cohesive will strengthen the paper.",ICLR2022, +ryyY3fIde,1486400000000.0,1486400000000.0,1,H1zJ-v5xl,H1zJ-v5xl,ICLR committee final decision,Accept (Poster),"The paper is well written and easy to follow. It has strong connections to other convolutional models such as pixel cnn and bytenet that use convolutional only models with little or no recurrence. The method is shown to be significantly faster than using RNNs, while not losing out on the accuracy. + + Pros: + - Fast model + - Good results + + Cons: + - Because of its strong relationship to other models, the novelty is incremental.",ICLR2017, +H1lIXkOel4,1544740000000.0,1545350000000.0,1,SJe3HiC5KX,SJe3HiC5KX,Attacking the domain adaptation problem from an interesting angle,Accept (Poster),"This paper proposes a new approach to domain adaptation based on sub-spacing, such that outliers are filtered out. While similar ideas have been used e.g. in multi-view learning, their application to domain adaptation makes it a novel and interesting approach. + +While the above is considered by the AC an adequate contribution to ICLR, the authors are encouraged to investigate further the implications of the assumptions made, in a way that the derived criteria seem less heuristic, as R1 pointed out. + +There had been some concerns regarding the experiments, but the authors have been very active in the rebuttal period and addressed these concerns satisfactorily. +",ICLR2019,4: The area chair is confident but not absolutely certain +R_wgDJKW9WJ,1642700000000.0,1642700000000.0,1,04pGUg0-pdZ,04pGUg0-pdZ,Paper Decision,Accept (Spotlight),This paper provides actor-critic method for fully decentralized MARL. The results remove some of the restrictions from existing results and have also obtained a sample bound that matches with the bound in single agent RL. The authors also give detailed responses to the reviewers' concerns. The overall opinions from the reviewers are positive.,ICLR2022, +S5pi5AvBiwk,1642700000000.0,1642700000000.0,1,6ET9SzlgNX,6ET9SzlgNX,Paper Decision,Accept (Poster),"The paper was praised for being clearly written, well-motivated, and for addressing an important problem: measuring intrinsic robustness. +It improves the previous results on intrinsic robustness based on concentration of data distribution, by incorporating the constraint on the label uncertainty of the models. +This requires information on label uncertainty for each data sample rarely available (here CIFAR-10H is considered), but could open new directions for future work on adversarial robustness, confidence calibration or label noises.",ICLR2022, +xMxfv-YJWPu,1610040000000.0,1610470000000.0,1,Ua6zuk0WRH,Ua6zuk0WRH,Final Decision,Accept (Oral),"This is a solid paper that proposes a new method for approximating softmax attention in transformer architectures that scales linearly with the size of the sequence. Even though linear architectures have been proposed before using a similar idea (Katharopoulos et al 2020), this paper provides a better solution along with theoretical analysis and makes a rigorous empirical comparison against other methods. All reviewers agree that this is a strong paper that should be accepted. I suggest citing the recent paper https://arxiv.org/abs/2011.04006 (Long Range Arena, mentioned in the discussion) which provides further comparisons on long-range benchmarks, including the method presented in this paper and Katharopoulos et al 2020, along with a detailed discussion of the differences between the two methods.",ICLR2021, +6HXWtX56QV,1576800000000.0,1576800000000.0,1,Hyx5qhEYvH,Hyx5qhEYvH,Paper Decision,Reject,"This work extends Leaky Integrate and Fire (LIF) by proposing a recurrent version. +All reviewers agree that the work as submitted is way too preliminary. Prior art is missing many results, presentation is difficult to follow and incomplete and contains errors. Even if these concerns were addressed, the benefit of the proposed method is unclear. Authors have not responded. +We thus recommend rejection.",ICLR2020, +7DQ36QD8ehw,1642700000000.0,1642700000000.0,1,6VpeS27viTq,6VpeS27viTq,Paper Decision,Accept (Poster),"The paper considers a relevant and interesting problem of protecting the intellectual property of data. The goal of the proposed method is to prevent unauthorized usage of the data, and the protection is attained when a model trained on the perturbed dataset will predict poorly and thus cannot be considered as a realistic inference model by the unauthorized attacker. + +Technically, the paper tackles the problem of ""unlearnable examples"": to perturb the images of a labeled dataset to obtain perturbed dataset such that models trained on perturbed dataset have significantly lower performance, the perturbations are small, and one can approximately recover the original labeled dataset with the correct ""secret key"" (learnable parameters). + +The authors propose two invertible transformations to craft adversarial perturbations: linear pixel-wise transformation and convolutional functional transformation based on invertible ResNet. Numerous experiments demonstrate the effectiveness of the proposed transformations in both securing the data (making the data unlearnable when transformation is applied) and unlocking the transformation (making the data learnable when the transformation is inverted). + +The paper is well motivated and exhibits competitive results. Although there are some concerns about the similarity of the work compared with [1], we believe the additional constraint of this work, that one can approximately recover the original labeled dataset with the correct ""secret key"", justifies a significant contribution. + +[1] ""Unlearnable Examples: Making Personal Data Unexploitable"" Huang et al., ICLR '21",ICLR2022, +9cnKeoDXOVq,1642700000000.0,1642700000000.0,1,ZnUHvSyjstv,ZnUHvSyjstv,Paper Decision,Reject,"All reviewers agree that the paper is below the acceptance threshold and the authors did not respond to the reviews. +In summary, this is a clear reject",ICLR2022, +hHmd_VmLas,1642700000000.0,1642700000000.0,1,TqNsv1TuCX9,TqNsv1TuCX9,Paper Decision,Accept (Poster),"The initial reviews for this paper were 6,6,6, the authors have provided a rebuttal and after the rebuttal the recommendation stayed the same. The reviewers have reached the consensus that the paper is borderline but they have all recommended keeping it above the acceptance threshold. Following the recommendation of the reviewers, the meta reviewer recommends acceptance.",ICLR2022, +Hkl-cPJ-gE,1544780000000.0,1545350000000.0,1,ryM07h0cYX,ryM07h0cYX,No rebuttal submitted,Reject,"The work proposes a method for smoothing a non-differentiable machine learning pipeline (such as the Faster-RCNN detector) using policy gradient. Unfortunately, the reviewers identified a number of critical issues, including no significant improvement beyond existing works. The authors did not provide a rebuttal for these critical issues. ",ICLR2019,5: The area chair is absolutely certain +HJxBhpabe4,1544830000000.0,1545350000000.0,1,BJgbzhC5Ym,BJgbzhC5Ym,Meta-Review,Reject,"This paper proposes a principled solution to the problem of joint source-channel coding. The reviewers find the perspectives put forward in the paper refreshing and that the paper is well written. The background and motivation is explained really well. + +However, reviewers found the paper limited in terms of modeling choices and evaluation methodology. One major flaw is that the experiments are limited to unrealistic datasets, and does not evaluate the method on a realistic benchmarks. It is also questioned whether the error-correcting aspect is practically relevant. + + + ",ICLR2019,3: The area chair is somewhat confident +rLQerqzZ7fk,1610040000000.0,1610470000000.0,1,AICNpd8ke-m,AICNpd8ke-m,Final Decision,Accept (Poster),"The paper proposes to maximizing the mutual information to optimize the bin for multiclass calibration. The idea, technique, and presentation are good. The paper solves some multiclass calibration issues. The author should revise the paper according the reviewer's comments before publish.",ICLR2021, +GF3sXBcrTZk,1642700000000.0,1642700000000.0,1,3Li0OPkhQU,3Li0OPkhQU,Paper Decision,Reject,"This paper proposes a new distributional assumption and a new algorithm for learning convolutional neural networks. However, the reviewers reach a consensus that this paper's assumptions are not natural and may not be satisfied in real-world domains. The meta reviewer agrees and thus decides to reject the paper.",ICLR2022, +1YzansUeo-G,1610040000000.0,1610470000000.0,1,kVZ6WBYazFq,kVZ6WBYazFq,Final Decision,Reject,"The authors present CLIME, a variant of LIME which samples from user-defined subspaces specified by Boolean constraints. One motivation is to address the OOD sampling issue in regular LIME. They introduce a metric to quantify the severity of this issue and demonstrate empirically that CLIME helps to address it. In order to stay close to the data distribution, they use constraints based on Hamming distance to data points. They demonstrate that this approach helps to defend against the recent approach of Slack et al. 2020 to fool LIME explanations. + +The paper is close to borderline, though concerns remain about experimental validation and the extent of novel contribution, since the original LIME framework is more flexible than described here and allows a custom distance function. Rev 1 believes that the original LIME framework is sufficient to handle Hamming distance constraints though sampling will be less efficient. To their credit, authors engaged in discussion but this should be further elaborated in a revised version.",ICLR2021, +wteBshrWy,1576800000000.0,1576800000000.0,1,r1gIdySFPH,r1gIdySFPH,Paper Decision,Reject,"This paper tackles the problem of exploration in RL. In order to maximize coverage of the state space, the authors introduce an approach where the agent attempts to reach some self-set goals. The empirically show that agents using this method uniformly visit all valid states under certain conditions. They also show that these agents are able to learn behaviours without providing a manually-defined reward function. + +The drawback of this work is the combined lack of theoretical justification and limited (marginal) algorithmic novelty given other existing goal-directed techniques. Although they highlight the performance of the proposed approach, the current experiments do not convey a good enough understanding of why this approach works where other existing goal-directed techniques do not, which would be expected from a purely empirical paper. This dampers the contribution, hence I recommend to reject this paper.",ICLR2020, +kQUb0orLWBw,1642700000000.0,1642700000000.0,1,F72ximsx7C1,F72ximsx7C1,Paper Decision,Accept (Poster),"This paper argues that the widely adopted graph attention networks (GAT) have a shortcoming that with the static nature of the attention mechanism, they may fail to represent certain graphs. This paper presents an alternative, GATv2, a simple variant with the same time complexity as GAT but with more expressivity, able to represent the graphs that GAT fails to. This is shown both empirically and theoretically, with various tasks on synthetic as well as standard benchmark graphs. + +GATs are of high interest to the ICLR community, and this paper makes fundamental progress in how attention works in GNNs. This is one of the few papers that present both empirical and theoretical analyses, and these findings will motivate others in the community to make further advances in this field.",ICLR2022, +BwHOao28jfz,1610040000000.0,1610470000000.0,1,o21sjfFaU1,o21sjfFaU1,Final Decision,Reject,"Reviewers raised concerns about the paper's clarity (interchangeable use of subtly different terms, notation, typos), and how realistic/practical certain assumptions are. The authors are encouraged to incorporate the reviewers' detailed comments for a future submission.",ICLR2021, +x34tFDMSjk_,1610040000000.0,1610470000000.0,1,npkSFg-ktnW,npkSFg-ktnW,Final Decision,Reject,"Overall, the paper makes some interesting and intuitive observations regarding the autoencoders with a cycle consistency, and aims at achieving controllable synthesis via a disentangled representation. However, the overall consensus was that the manuscript needs further iterations: + +In particular: +The ideas should be made more precise using mathematical arguments, as it stands some ideas are (e.g. DEAE and UDV) disconnected. + +The scope needs to be clarified, e.g. respective contributions of GSL-AE and DEAE, use of label information + +More numerical/quantitative evaluations, the current experimentation is not convincing enough, needed for better justification (spurious and not convincing experimentations) + +The English of the manuscript could be improved as it occasionally hampers the flow. +",ICLR2021, +dDH2ohBuWnx,1610040000000.0,1610470000000.0,1,jPSYH47QSZL,jPSYH47QSZL,Final Decision,Reject,"The paper proposes an unsupervised pretraining approach for 3D recognition, which is based on point cloud completion. The initial review receives a mixed rating, with two reviewers rate the paper below the bar and two above the bar. After the rebuttal, R3 changes the opinion from above the bar to a rejection recommendation. While several reviewers recognize the simplicity of the proposed method, R2 and R4 consider the proposed method a straightforward extension of known approaches for NLP and vision tasks. A lack of novelty was also pointed out as a weakness by R3 and R4. After consolidating the reviews and the rebuttal, the AC finds the weakness claims convincing and determines the paper is not ready for publication in the current form. ",ICLR2021, +mBo5Aw0iNyA,1610040000000.0,1610470000000.0,1,TGFO0DbD_pk,TGFO0DbD_pk,Final Decision,Accept (Poster),"This paper proposes a deep reinforcement learning algorithm Supe-RL that combines model free RL with genetic updates. The idea is to periodically mutate and evaluate the actor and greedily choose the best performing child, and incorporate it in the main actor via Polyak averaging on a target policy network. The algorithm can be in principle combined with any gradient based deep RL method. Supe-RL was demonstrated by combining it with Rainbow and PPO and evaluated in navigation tasks as well as standard MuJoCo benchmarks. + +Overall, the reviewers found the idea interesting and to have value to the RL community. The reviewers raised some questions regarding technical rigor, evaluations, and the choice of base DL algorithms. As is, I find this a slightly above borderline submission, and thus recommend acceptance. However, I would encourage the authors to test their method also with a state-of-the-art off-policy algorithms, such as TD3 or SAC, in continuous domains, to better calibrate its overall performance.",ICLR2021, +BJlRNJprG,1517250000000.0,1517260000000.0,460,HJ1HFlZAb,HJ1HFlZAb,ICLR 2018 Conference Acceptance Decision,Reject,"Given that the paper proposes a new evaluation scheme for generative models, I agree with the reviewers that it is essential that the paper compare with existing metrics (even if they are imperfect). The choice of datasets was very limited as well, given the nature of the paper. I acknowledge that the authors took care to respond in detail to each of the reviews.",ICLR2018, +BkJsizLOg,1486400000000.0,1486400000000.0,1,HyQWFOVge,HyQWFOVge,ICLR committee final decision,Reject,"The paper aims to compare the representations learnt by metric learning and classification objectives. While this is an interesting topic, the presented evaluation is not sufficiently clear for the paper to be accepted.",ICLR2017, +H1Qd7kpBM,1517250000000.0,1517260000000.0,169,BJ_wN01C-,BJ_wN01C-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Clearly explained, well motivated and empirically supported algorithm for training deep networks while simultaneously learning their sparse connectivity. +The approach is similar to previous work (in particular Welling et al., Bayesian Learning via Stochastic Gradient Langevin Dynamics, ICML 2011) but is novel in that it satisfies a hard constraint on the network sparsity, which could be an advantage to match neuromorphic hardware limitations.",ICLR2018, +nCGwpkT7WLz,1610040000000.0,1610470000000.0,1,ADWd4TJO13G,ADWd4TJO13G,Final Decision,Accept (Poster),"The paper addresses lifelong/continual learning (CL) by combining reusable components. The algorithm is based on, updating components, updating how they are combined for a given task and adding new components. + +Reviewers had concerns about the learning workflow, how it could scale to harder CL streams and how it differs from existing LL/CL work. They also asked for clarifications about compositionality. They highlighted the experiments as a point of strength. After the rebuttal, all reviewers found the paper to be above the acceptance bar. ",ICLR2021, +l4GGJDsvu7fT,1642700000000.0,1642700000000.0,1,Cm08egNmrl3,Cm08egNmrl3,Paper Decision,Reject,"The paper studies the problem of OOD classification: the test data and training data distribution can have different spurious feature-class dependencies. + +The reviewers have stated that the proposed procedure is a natural choice, with simple implementation. Another positive point is that it could easily be incorporated in many off-the-shelf machine learning training algorithms. + +Yet, the technical novelty was mentioned to be limited. The bilevel optimization point of view and the connection with min-max optimization problems raised some concerns, as the vocabulary used could be misleading. +It was also raised that the paper lacks theoretical supports: no formal analysis, most explanations are ad hoc, etc.",ICLR2022, +xDJmoq_oxX,1642700000000.0,1642700000000.0,1,0rcbOaoBXbg,0rcbOaoBXbg,Paper Decision,Accept (Poster),"This paper proposes a self-exciting temporal point process model with a non-stationary triggering kernel to model complex dependencies in temporal and spatio-temporal event data. The kernel is represented by its finite rank decomposition and a set of neural basis functions (feature functions). The proposed model has superior performance in comparison to other state-of-the-arts methods. All the reviewers recognized that the model is interesting and advances the state of the art in a meaningful way. While they were some concerns regarding the experimental evaluation, particularly in terms of real data, and the presentation, the rebuttal/revision by the authors cleared up these concerns.",ICLR2022, +L0xQPkGySdV,1610040000000.0,1610470000000.0,1,XEw5Onu69uu,XEw5Onu69uu,Final Decision,Reject,"The paper proposes a graph aligning approach generating rich and detailed labels given normal labels. Authors cast the problem in a domain adaptation setting, considering a source domain where ""expensive"" labels are available, and a target domain where only normal labels are available. The application scenario is the prediction of chemical compound graphs from 2D images, where a fully mediating layer is introduced to represent using a planar embedding of the chemical graph structure to be predicted. + +The paper received ratings all below-threshold. +The main issue transversal to all reviewers relate to clarity of the presentation. +Clear motivations for some of the adopted choices of the method and of the experimental procedure were also missing. In particular, missed to provide the clear usefulness of the main paper's contribution, i.e., to neatly show the importance of the mediating layer (ref. R4, R2). + +The lack of important details in the method description and experimental results were also deemed a major shortcoming: cost of the optimization, model generalization not discussed, contradictory results on the different datasets considered, comparative analysis, partial ablation, are among the main quoted remarks. + +Authors' rebuttal is carefully provided in general, but several issues are still remaining. + +Hence, overall, given the above issues, I consider the paper not yet ready for publication in ICLR 2021. +",ICLR2021, +rJIMmyaHG,1517250000000.0,1517260000000.0,89,H1mCp-ZRZ,H1mCp-ZRZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),Thank you for submitting you paper to ICLR. The reviewers agree that the paper’s development of action-dependent baselines for reducing variance in policy gradient is a strong contribution and that the use of Stein's identity to provide a principled way to think about control variates is sensible. The revision clarified an number of the reviewers’ questions and the resulting paper is suitable for publication in ICLR.,ICLR2018, +3v_pU3bjFbd,1610040000000.0,1610470000000.0,1,45NZvF1UHam,45NZvF1UHam,Final Decision,Accept (Poster),"This paper proposes a meta-learning approach for inferring the Hamiltonian governing the dynamics of physical systems from observational data, and using it to adapt to new systems from the same class of dynamics quickly. The paper does this by effectively combining the previously published Hamiltonian Neural Networks and MAML/ANIL. The reviewers agree that the paper is well written, and the experiments are comprehensive, however, they also have reservations about the technical novelty of the proposed solution, given that it appears to be combination of pre-existing models. Saying this, the authors were able to address a lot of the reviewers' concerns during the discussion period, hence I recommend this paper for acceptance.",ICLR2021, +rJXkX1TrM,1517250000000.0,1517260000000.0,44,SyMvJrdaW,SyMvJrdaW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper proposes a “warp operator” based on Taylor expansion that can replace a block of layers in a residual network, allowing for parallelization. Taking advantage of multi-GPU parallelization the paper shows increased speedup with similar performance on CIFAR-10 and CIFAR-100. R1 asked for clarification on rotational symmetry. The authors instead removed the discussion that was causing confusion (replacing with additional experimental results that had been requested). R2 had the most detailed review and thought that the idea and analysis were interesting. They also had difficulty following the discussion of symmetry (noted above). They also pointed out several other issues around clarity and had several suggestions for improving the experiments which seem to have been taken to heart by the authors, who detailed their changes in response to this review. There was also an anonymous public comment that pointed out a “fatal mathematical flaw and weak experiments”. There was a lengthy exchange between this reviewer and the authors, and the paper was actually corrected and clarified in the process. This anonymous poster was rather demanding of the authors, asking for latex-formatted equations, pseudo-code, and giving direction on how to respond to his/her rebuttal. I don't agree with the point that the paper is flawed by ""only"" presenting a speed-up over ResNet, and furthermore the comment of ""not everyone has access to parallelization"" isn’t a fair criticism of the paper.",ICLR2018, +LY6xtkkCxEz,1610040000000.0,1610470000000.0,1,TQt98Ya7UMP,TQt98Ya7UMP,Final Decision,Accept (Poster),"The paper looks at the soft-constrained RL techniques and proposes a meta-gradient approach. +One of the biggest problems with the Lagrange Optimization-based CMDP algorithms is that the optimization of the Lagrange multiplier is tricky +The proposed solution and empirical results have promise. The reviewers broadly agree on their evaluation and the major concerns on comprehension, additional experiments and as well as comparison with baselines have been addressed in the rebuttal. + +- Convergence rate and quality of fixed point reached. +The authors mention convergence to local optima but omit the quality of this solution from perspective of safety. It would be useful to include a discussion on the topic, with potential references to concurrent work. +Other relevant and concurrent papers to potentially take note of: +- Risk-Averse Offline Reinforcement Learning (https://openreview.net/forum?id=TBIzh9b5eaz) +- Distributional Reinforcement Learning for Risk-Sensitive Policies (https://openreview.net/forum?id=19drPzGV691) +- Conservative Safety Critics for Exploration (https://openreview.net/forum?id=iaO86DUuKi) + +I would recommend acceptance of the paper based on empirical results, conditional on release of sufficiently documented and easy to use implementation. +Given the fact that the main argument is empirical utility of the method, it would be limit the impact of this work if readers cannot readily build on this method. ",ICLR2021, +SRTv8_N6-D,1576800000000.0,1576800000000.0,1,BylfTySYvB,BylfTySYvB,Paper Decision,Reject,"This paper proposes a modification of RNN that does not suffer from vanishing and exploding gradient problems. The proposed model, GATO partitions the RNN hidden state into two channels, and both are updated by the previous state. This model ensures that the state in one of the parts is time-independent by using residual connections. + +The reviews are mixed for this paper, but the general consensus was that the experiments could be better (baseline comparisons could have been fairer). The reviewers have low confidence in the revised/updated results. Moreover, it remains unclear what the critical components are that make things work. It would be great to read a paper and understand why something works and not that something works. + +Overall: Nice idea, but the paper is not quite ready yet. + +",ICLR2020, +FpT3-bW3bd4,1610040000000.0,1610470000000.0,1,3AOj0RCNC2,3AOj0RCNC2,Final Decision,Accept (Oral),"The paper proposes a new approach to continual learning with known task boundaries that is scalable and highly performant, while preserving data privacy. To mitigate forgetting the proposed approach restricts gradient updates to fall in the orthogonal direction to the gradient space that are important for the past tasks. The main novelty of the approach is to estimate these subspaces by analysing the activations for the inputs linked for each given task. + +All reviewers give accepting scores. R2, R3 and R4 strongly recommend accepting the paper, while R1 considers it borderline. + +The authors provided an extensive response carefully considering all reviewers' comments. New experiments were introduced (training time analysis and comparisons with expansion-based methods), and several clarifications were added. + +All reviewers agree that the paper is well written and its literature review adequate. + +The main concern of R1 was the similarities with OGD (Farajtabar et al. 2020). R1 considered the authors’ response acceptable. R2, R3 and R4 consider the contribution well motivated and significant and highlight its simplicity. The AC agrees with this assessment. + +The empirical evaluation covers most of the typical benchmarks in CL. Very strong results are reported on a variety of tasks both in terms of performance and memory efficiency, as agreed by R2, R3 and R4. + +Overall the paper makes a strong contribution to the field of CL. +",ICLR2021, +rJAnNy6Hz,1517250000000.0,1517260000000.0,445,SkYibHlRb,SkYibHlRb,ICLR 2018 Conference Acceptance Decision,Reject,"The pros and cons of the paper cited by the reviewers can be summarized as follows: + +Pros: +- good problem, NL2SQL is an important task given how dominant SQL is +- incorporating a grammar (""sketch"") is a sensible improvement. + +Cons: +- The dataset used makes very strong simplification assumptions (that every token is an SQL keyword or appears in the NL) +- The use of a grammar in the context of semantic parsing is not novel, and no empirical comparison is made against other reasonable recent baselines that do so (e.g. Rabinovich et al. 2017). + +Overall, the paper seems to do some engineering for the task of generating SQL, but without an empirical comparison to other general-purpose architectures that incorporate grammars in a similar way, the results seem incomplete, and thus I cannot recommend that the paper be accepted at this time.",ICLR2018, +3l0RPCKgnw,1576800000000.0,1576800000000.0,1,rJxtgJBKDr,rJxtgJBKDr,Paper Decision,Accept (Poster),"This paper proposes a method, SNOW, for improving the speed of training and inference for transfer and lifelong learning by subscribing the target delta model to the knowledge of source pretrained model via channel pooling. + +Reviewers and AC agree that this paper is well written, with simple but sound technique towards an important problem and with promising empirical performance. The main critique is that the approach can only tackle transfer learning while failing in the lifelong setting. Authors provided convincing feedbacks on this key point. Details requested by the reviewers were all well addressed in the revision. + +Hence I recommend acceptance.",ICLR2020, +B1xoBMnkl4,1544700000000.0,1545350000000.0,1,Syxt2jC5FX,Syxt2jC5FX,Nice piece of work,Accept (Poster),"Dear authors, + +All reviewers liked your work. However, they also noted that the paper was hard to read, whether because of the notation or the lack of visualization. + +I strongly encourage you to spend the extra effort making your work more accessible for the final version.",ICLR2019,4: The area chair is confident but not absolutely certain +EVmSYYrq8pO,1642700000000.0,1642700000000.0,1,xbx7Hxjbd79,xbx7Hxjbd79,Paper Decision,Reject,"The main contribution of this paper is that it points out incorrect claims in the literature of multi-agent RL and provides new insight on the failure modes of current methods. Specifically, this paper investigates the inconsistency problem in LOLA (meaning it assumes the other agent as a naive learner, thus not converging to SFPs in some games). It then shows problems with two fixes in the literature: 1) HOLA addresses the inconsistency problem only when it converges; otherwise, HOLA does not resolve the issue. 2) GCD does not resolve the issue although it claims to do so. This paper then proposes a method COLA that fixes the inconsistency issue, which outperforms HOLA when it diverges. Reviewers generally agree that the insight from this work is interesting and important for the field. However, there were some concern on both the theory and the experiments. While the updated version addresses some of the concerns, it also made significant changes to both the theoretical and the empirical sections, and would benefit from another round of close review. Thus, I think the current version of this work is borderline.",ICLR2022, +hYGoJqwtkT2,1642700000000.0,1642700000000.0,1,oxC2IBx8OuZ,oxC2IBx8OuZ,Paper Decision,Reject,"This paper proposes “Continual Federated Learning (CFL)” to study time evolving heterogeneous data. To do this the authors introduce time-drift to capture data heterogeneity across time. The authors also present some preliminary convergence results. Finally, the authors carryout numerical experiments in time-varying and heterogeneous settings. The reviewers identified the following strengths: (1) combining FL and CL is interesting, (2) the development of a new algorithm and providing some initial analysis is a good step. They also identified weaknesses as follows: (1) limited technical novelty as the use of replay buffer is quite standard, (2) cumbersome and not easy to interpret results, (3) lack of time evolving patterns with a common component (4) lack of different metrics that demonstrate how the algorithm is able to maintain accuracy as time-shifts occur, (5) lack of questionable assumptions. The reviewers had a very bimodal view advocating acceptance with a score of 8 and 2 advocating a rejection and neither group changed their opinion. Although the authors thorough responses did alleviate the concerns IMO. My own reading of the paper is that this is an interesting paper working on an emerging area. However, I must agree with some of the reviewers that the final conclusions are not easy to interpret, and the assumptions are not fully motivated. After this is carried out, I think the novelty of the paper can also become much clear. Therefore, I cannot strongly advocate acceptance of the paper in its currently state given the scores. However, I very strongly encourage the authors to submit to a future ML venue after addressing the remaining comments of the reviewers. I would also like to commend the authors for a very strong rebuttal sorry the final decision couldn’t be more favorable given the borderline ratings and the aforementioned issues.",ICLR2022, +S1sZ6zUOg,1486400000000.0,1486400000000.0,1,BJhZeLsxx,BJhZeLsxx,ICLR committee final decision,Accept (Poster),"The authors agreed that the paper presented a solid contribution and interesting (and somewhat surprising findings). The experiments are thorough and convincing, and while some reviewers raised concerns about a lack of comparisons of methods on a clear quantifiable objective, this is unfortunately a common issue in this field. Overall, this paper is a solid contribution with findings that would be interesting to the ICLR audience.",ICLR2017, +S1fjHJaHG,1517250000000.0,1517260000000.0,637,B1spAqUp-,B1spAqUp-,ICLR 2018 Conference Acceptance Decision,Reject,"The paper received borderline-negative reviews with scores of 5,5,6. A consistent issue was the weakness of the experiments: (i) lack of comparison to appropriate baselines, (ii) differences between published/reported numbers for DeepLab-ResNet (R3) and (iii) related work, e.g. Wojna paper, as raised by R1. The AC did not find the author's responses to these issues convincing. For (ii) the gap between 73 and 79 is large and the author's explanation for the difference doesn't seem plausible. For (iii), the response promised comparisons/discussion but there were not added to the draft. + +Given this, the paper cannot be accepted in it current form. The experiments should be improved before the paper is resubmitted. ",ICLR2018, +BkKvUJ6rG,1517250000000.0,1517260000000.0,805,H1xJjlbAZ,H1xJjlbAZ,ICLR 2018 Conference Acceptance Decision,Reject,The paper tries to show that many of the state-of-the-art interpretability methods are brittle and do not provide consistent stable explanations. The authors show this by perturbing (even randomly) the inputs so that the differences are imperceptible to a human observer but the interpretability methods provide completely different explanations. Although the output class is maintained before and after the perturbation it is not clear to me or the reviewers why one shouldn't have different explanations. The difference in explanations can be attributed to the fragility of the learned models (highly non-smooth decision boundaries) rather than the explanation methods. This is a critical point and has to come out more clearly in the paper.,ICLR2018, +zy6qCwJjLHi,1642700000000.0,1642700000000.0,1,R3zqNwzAVsC,R3zqNwzAVsC,Paper Decision,Reject,"The paper proposes a novel post-processing method technique that can mitigate the model bias, called the Ethical Module. It transforms the deep embeddings of a given model to give more representation power to the disadvantaged subgroups. + +The idea of ​​resolving discrimination against a specific group through effective post processing is promising, and proposing new metrics for fairness is also a very important and relevant issue. + +However, the connection between the technique proposed in this paper and the newly proposed fairness metric is not clear, so the focus of the paper is somewhat lowered. Moreover, several design choices are somewhat unclear and ad-hoc. In particular, although there was a lot of improvement through the rebuttal period, it is difficult to verify the superiority of the proposed method via the experiments in the paper; Direct comparisons with existing methods for fairness is essential, and it seems necessary to consider a hyperparameter selection strategy that can be taken in a practical scenario rather than simply choosing the best performing hyperparameter for the test set.",ICLR2022, +aPbpuYySfqI,1610040000000.0,1610470000000.0,1,j6rILItz4yr,j6rILItz4yr,Final Decision,Reject,"Adversarial training is usually done on the image space by directly optimizing the pixels. This paper suggests the adversarial training over intermediate feature spaces in the neural network. The idea is very simple. The authors have done extensive experiments to justify its performance. But the performance gain though this idea seems to be marginal. Further, the layer to conduct the adversarial training can be optimized within the framework, which aligns with the general autoML idea. The new version L-ALFA has been well introduced, but unfortunately, the practical result can be very straightforward, that is just to select the final layer. The more important ALFA hyperparameters that would most benefit from automatic tuning are not sufficiently treated. There have been extensive discussions between the authors and the reviewers. After incorporating the reviewers' comments, the paper will have a good chance to be accepted at another venue. + +",ICLR2021, +_4ZW1EXXYH,1642700000000.0,1642700000000.0,1,_gZf4NEuf0H,_gZf4NEuf0H,Paper Decision,Reject,"*Summary:* Study isolated orientations of weights for networks with small initialization depending on multiplicity of activation functions. + +*Strengths:* +- Interesting analysis of properties in early stages depending on activations. + +*Weaknesses:* +- Reviewers found the settings limited. +- Reviewers found experiments limited. + +*Discussion:* + +In response to ejGJ authors reiterate scope of covered cases and submit to consideration that their experiments should be adequate for basic research. Reviewer acknowledges the response, but maintains their assessment (limited scope of theory, limited experiments). KucV found the experimental part limited in scope, the settings unclear (notion of early stage, compatibility with theory), and review of previous works lacking. KucV’s sincerely acknowledged authors for their efforts to address their comments and improving the manuscript, and raised their score, but maintained the experimental analysis is not fully convincing and unclear, and the comparison with prior work insufficient. zuZq also expressed concerns with the experiments and the notions and settings under consideration. They also raised questions about the comparison with standard initialization. Authors made efforts to address zuZq concerns. zuZq acknowledged this but maintained initial position that the article is just marginally above threshold. jDJ5 found the paper well written and the conclusion insightful. However, also raised concerns about the experiments the settings under consideration. Authors made efforts to address jDJ5’s concerns, who appreciated this but was not convinced to raise their score. + +*Conclusion:* +Two reviewers consider this article marginally above and two more marginally below the acceptance threshold. I find the article draws an interesting connection pertaining an interesting topic. However, the reviews and discussion conclude that the article lacks in several regards that in my view still could and should be improved. Therefore I am recommending reject at this time. I encourage the authors to revise and resubmit.",ICLR2022, +BkghMjKHgV,1545080000000.0,1545350000000.0,1,SkfTIj0cKX,SkfTIj0cKX,"important application area, not sufficiently placed in the context of prior work (both conceptually and empirically)",Reject,"This paper addresses the problem of recommendations within user sessions from a reinforcement learning perspective. The problem is naturally modeled as an RL problem, given its sequential nature and inherent uncertainty of any model over user preferences. The problem suffers from delayed and sparse rewards, which the authors propose to address using self-supervised prediction. The approach is empirically validated in a simulated setting, using data from the 2015 ACM RecSys Challenge. + +The reviewers and AC note that the problem studied is an important application area where RL has high potential to improve over current research results and industry practice. The proposed idea is interesting, and the strong empirical evaluation on a publicly available data set is highlighted. R1 also commends the authors' decision to address the challenging cold-start problem. + +The reviewers and AC also note several potential weaknesses. The choice of addressing the problem from a reinforcement learning perspective is not clearly motivated. This is needed, as many supervised learning (and other types) approaches to the problem exist. A performance comparison to current state-of-the-art RL baselines is missing. The proposed approach is related to both imagination augmented (I2A, Racaniere et al. 2017) and agents with auxiliary rewards (UNREAL, Jaderberg et al. 2016), but does not compare to either method. Neither does the related work section sufficiently clarify why the proposed approach is expected to improve over these prior approaches. A thorough comparison to these baselines in a real-world application like session-based recommendation would be a strong contribution in itself, but without the contributions of the paper are hard to assess. Reviewers also noted lack of clarity. Some concerns are addressed by the authors, but the consensus is that the paper would benefit from a major revision to clearly work out the method, as well as it's conceptual and empirical differences from existing reinforcement learning approaches. R3 mentions missing related work, some of which the authors include in the revision. The AC recommends also following up on references in cited papers to ensure a future revision of the paper is well placed in the context of prior work on recommender systems, especially when modeled as a reinforcement learning problem. + +Overall, the paper was assessed as borderline by the reviewers. The ACs view is that there are too many concerns for acceptance at ICLR in the present form, and that the paper will benefit from a thorough revision.",ICLR2019,4: The area chair is confident but not absolutely certain +mCiDBbHz4q,1642700000000.0,1642700000000.0,1,qTBC7E4c454,qTBC7E4c454,Paper Decision,Reject,"In the context of recurrent neural networks, the motivation of the paper is to explore the ""space"" between fully trained models and almost not trained models, e.g. echo state networks, using a formal approach. In fact, a modular approach has proven to be very successful in many practical applications, and in addition brain seems to adopt this strategy as well. The addressed theoretical issue is stability of the network (i.e., the network implements a contraction map.) Specifically, it is assumed that a network is composed of a set of subnetworks that meet by construction some stability condition, and the problem is to design a mixing weight matrix, interconnecting the latent spaces of the subnetworks, able to give stability guarantees during and after training. Some novel stability conditions are proposed as well as two different approaches to design a successful mixing weight matrix. The original submitted paper was not easy to read, and after revision major problems with presentation have been resolved, although the current version looks more like an ordered collection of results/statements than a smooth and integrated flow of discourse. The revision has also addressed some concerns by reviewers on the role of size and sparsity of the modules, as well as the sensitivity of the stabilization condition on the mixing weight matrix has been experimentally assessed, obtaining interesting results. Overall the paper reports interesting results, however the novelty of the contribution seems to be a bit weak, e.g. stability conditions on recurrent networks (although different from the reported ones) were already presented in literature. Also the idea of exploiting, in one of the proposed models, the fact that the matrix exponential of a skew-symmetric matrix is orthogonal to maintain the convergence condition during training, is not novel. Moreover, the experimental assessment does not provide a direct comparison, under the same architectural/learning setting, of the novel stability results versus the ones already presented in literature. Empirical results are obtained on simple tasks (using datasets with sequences of identical length), and relatively small networks, which limits a bit the scope of the assessment, as well as it is not clear if the observed improvements (where obtained) are statistically significant (especially when compared with results obtained by networks with the same order of parameters.) The quality of the assessment would increase significantly by considering datasets with sequences of different lengths, and involving more challenging tasks that do require larger networks.",ICLR2022, +9LKC9LU2w5,1576800000000.0,1576800000000.0,1,B1xxAJHFwS,B1xxAJHFwS,Paper Decision,Reject,"This was an extremely difficult paper to decide, as it attracted significant commentary (and controversy) that led to non-trivial corrections in the results. One of the main criticisms is that the work is an incremental combination of existing results. A potentially bigger concern is that of correctness: the main convergence rate was changed from 1/T to 1/sqrt{T} during the rebuttal and revision process. Such a change is not trivial and essentially proves the initial submission was incorrect. In general, it is not prudent to accept a hastily revised theory paper without a proper assessment of correctness in its modified form. Therefore, I think it would be premature to accept this paper without a full review cycle that assessed the revised form. There also appear to be technical challenges from the discussion that remain unaddressed. Any resubmission will also have to highlight significance and make a stronger case for the novelty of the results.",ICLR2020, +ByKJrJ6Sz,1517250000000.0,1517260000000.0,481,HJrJpzZRZ,HJrJpzZRZ,ICLR 2018 Conference Acceptance Decision,Reject,"The paper proposes adversarial flow-based neural network architecture with adversarial training for video prediction. Although the reported experimental results are promising, the paper seems below ICLR threshold due to limited novelty and issues in evaluation (e.g., mechanical turk experiment). No rebuttal was submitted.",ICLR2018, +THOeQizjK7K,1610040000000.0,1610470000000.0,1,8E1-f3VhX1o,8E1-f3VhX1o,Final Decision,Accept (Poster),After rebuttal the reviewers unanimously agree that this is a strong paper and should be accepted,ICLR2021, +KEgPBoofqD,1576800000000.0,1576800000000.0,1,S1gqraNKwB,S1gqraNKwB,Paper Decision,Reject,"The authors introduce a framework for inverse reinforcement learning tasks whose reward functions are dependent on context variables and provide a solution by formulating it as a convex optimization problem. Overall, the authors agreed that the method appears to be sound. However, after discussion there were lingering concerns about (1) in what situations this framework is useful or advantageous, (2) how it compares to existing, modern IRL algorithms that take context into account, and (3) if the theoretical and experimental results were truly useful in evaluating the algorithm. Given that these issues were not able to be fully resolved, I recommend that this paper be rejected at this time.",ICLR2020, +bjlplJKm-l1,1642700000000.0,1642700000000.0,1,8Z7-NG11HY,8Z7-NG11HY,Paper Decision,Reject,"The paper investigates various methods for cross-lingual alignment of contextual word embeddings. It also introduces a new method based on density matching via normalizing flows to align contextual representations in two languages. The paper has many strengths, but also reviewers identify several major weaknesses including the lack of strong baselines and the lack of extrinsic evaluation. These concerns were not addressed by the authors during the discussion period.",ICLR2022, +MTl1O7FNOe6,1610040000000.0,1610470000000.0,1,Zc36Mbb8G6,Zc36Mbb8G6,Final Decision,Reject,"The paper proposes to use a feature extractor (encoder) $C(x)$, pre-trained with label supervision or contrastive learning on a large image dataset, to both regularize the discriminator's last feature layer $D_f(x)$ and encode the data $x$ itself as the conditional input of the generator $G(z|G_{emb}(C(x)))$. The main purpose is to help the training of GANs when there is a limited number of images in the target domain. A clear concern of this approach is that to generate a fake image, one will need to first sample a true image, making the model unattractive if the training dataset size is large (need to store the whole training dataset even after training). To mitigate this issue, the authors propose to fit up to 200k randomly sampled $G_{emb}(C(x))$ with a GMM with 1k components. To validate the practice of requiring a GMM (a shallow generative model) to help a GAN (a deep generative model) to generate, the authors have done a rich set of experiments under state-of-the-art GAN architectures or training methods (SNGAN, BigGAN, StyleGAN2, DiffAugment) to illustrate the efficacy of the proposed data instance prior and its compatibility with the state-of-the-art methods in a variety of settings. In the AC's opinion, the paper is missing references to 1) related work that combines VAE (or some other type of auto-encoder) and GAN, which often helps stabilize the GAN training [1,2,3], 2) VAE with a VampPrior [4], and 3) more broadly speaking, empirical Bayes related methods where the prior model is learned from the observed data (see [5] and the references therein). The potential advantages of using a VAE rather than a GMM to help a GAN to generate include: 1) there is no need to store 1k GMM components, which may require a large amount of memory; 2) there is no need to subsample the training set; and 3) the VAE and GAN can be jointly trained. The AC recommend the authors to discuss the connections to these related work in their future submission. + +[1] Larsen, Anders Boesen Lindbo, et al. ""Autoencoding beyond pixels using a learned similarity metric."" International conference on machine learning. PMLR, 2016. + +[2] Zhang, Hao, et al. ""Variational Hetero-Encoder Randomized GANs for Joint Image-Text Modeling."" International Conference on Learning Representations. 2019. + +[3] Tran, Ngoc-Trung, Tuan-Anh Bui, and Ngai-Man Cheung. ""Dist-gan: An improved gan using distance constraints."" Proceedings of the European Conference on Computer Vision (ECCV). 2018. + +[4] Tomczak, Jakub, and Max Welling. ""VAE with a VampPrior."" International Conference on Artificial Intelligence and Statistics. PMLR, 2018. + +[5] Pang, Bo, Tian Han, Erik Nijkamp, Song-Chun Zhu, and Ying Nian Wu. ""Learning Latent Space Energy-Based Prior Model."" Advances in Neural Information Processing Systems 33 (2020). +",ICLR2021, +PfgC4fBSnS,1576800000000.0,1576800000000.0,1,SkxHRySFvr,SkxHRySFvr,Paper Decision,Reject,"There is insufficient support to recommend accepting this paper. The reviewers unanimously criticize the quality of the exposition, noting that many key elements in the main development and experimental set up are not clear. The significance of the contribution could be made stronger with some form of theoretical analysis. The current paper lacks depth and insufficient justification for the proposed approach. The submitted comments should be able to help the authors improve the paper.",ICLR2020, +2Vipgll91PO,1610040000000.0,1610470000000.0,1,XZzriKGEj0_,XZzriKGEj0_,Final Decision,Reject,"This paper is very pleasant to read. The reviewers also like the key idea discussed and find the targeted application interesting and practical. However, after reading the indeed interesting motivation, all four reviewers expected to see more from the evaluation section, including more challenging and realistic set-ups and clearer gains over standard methods. The reviewers also discuss how both the navigation problem as well as the GP constraint problem have been tackled in the past, often in combination (e.g. reference [1] by R1). Therefore, it would be needed to see additional experimental evaluation in line with those previous works.",ICLR2021, +kW6fI6aqKtY,1642700000000.0,1642700000000.0,1,MkTPtnjeYTV,MkTPtnjeYTV,Paper Decision,Accept (Spotlight),This paper studies the memorization power of Relu Neural networks and obtains sharp bounds in terms of parameters. The writing is very clear and the results very interesting.,ICLR2022, +7Bh4XajKP3X,1642700000000.0,1642700000000.0,1,QkRV50TZyP,QkRV50TZyP,Paper Decision,Accept (Poster),"This paper considers that the model's training data may be not accessible when learning the attacking model, and thus a more practical blackbox attack scheme, Beyond ImageNet Attack (BIA) framework, is designed. All the reviewers agreed that the setting in this paper is important and helpful when designing attack methods. However, the method is not totally new. Nevertheless, considering the importance of the problem investigated in this paper, the nice design of the overall framework, and the extensive experiments, the AC recommends accept for this paper.",ICLR2022, +UQM2-EKkyu,1576800000000.0,1576800000000.0,1,BJlowyHYPr,BJlowyHYPr,Paper Decision,Reject,"The paper presents an approach to forecasting over temporal streams of permutation-invariant data such as point clouds. The approach is based on an operator (DConv) that is related to continuous convolution operators such as X-Conv and others. The reviews are split. After the authors' responses, concerns remain and two ratings remain ""3"". The AC agrees with the concerns and recommends against accepting the paper.",ICLR2020, +ryyYm1arG,1517250000000.0,1517260000000.0,179,HkCsm6lRb,HkCsm6lRb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"All three reviewers recommend acceptance. Good work, accept",ICLR2018, +ySiNSkNwwH,1610040000000.0,1610470000000.0,1,pXi-zY262sE,pXi-zY262sE,Final Decision,Reject,"This paper proposes adding noise regularization, iteratively during training to word embeddings. +The method is evaluated on CNN-based text classification. +Overall, there is novelty in the proposed method, however there are concerns about the experiments and analysis of the proposed approach.",ICLR2021, +6FuzEXdJgfy,1610040000000.0,1610470000000.0,1,gkOYZpeGEK,gkOYZpeGEK,Final Decision,Reject,"Reviewers generally agree that the proposed method UMATO, a two-phase optimization dimensionality reduction algorithm based on UMAP, is interesting and has potential, and that the paper is well-written. However, there are several concerns with the current paper. In particular, R1 is not convinced by the performance of UMATO on real-world datasets compared with previous methods such as t-SNE (see the linked papers). Both R1 and R2 are concerned that given the 2-phase approach, UMATO might be much more adapted to clustered data than standard manifold embedding. They pointed out that in the Swiss roll/S-curve examples, UMATO stays very close to PCA, which is used for initialization, instead of globally unfolding the manifold as Isomap. These issues should be clarified/explored further for a better understanding and/or improvement of the current work.",ICLR2021, +H1xvUsAQlE,1544970000000.0,1545350000000.0,1,H1fF0iR9KX,H1fF0iR9KX,Area chair recommendation,Reject,"Strengths: + +This paper proposed to use graph-based deep learning methods to apply deep learning techniques to images coming from omnidirectional cameras. + +Weaknesses: + +The projected MNIST dataset looks very localized on the sphere and therefore does not seem to leverage that much of the global connectivity of the graph +All reviewers pointed out limitations in the experimental results. +There were significant concerns about the relation of the model to the existing literature. It was pointed out that both the comparison to other methodology, and empirical comparisons were lacking. + + +The paper received three reject recommendations. There was some discussion with reviewers, which emphasized open issues in the comparison to and references to existing literature as highlighted by contributed comment from Michael Bronstein. Work is clearly not mature enough at this point for ICLR, insufficient comparisons / illustrations",ICLR2019,5: The area chair is absolutely certain +h-i98Q9fa5,1642700000000.0,1642700000000.0,1,l_amHf1oaK,l_amHf1oaK,Paper Decision,Accept (Poster),"The authors improve upon existing algorithms for complete neural network verification by combining recent advances in bounding algorithms (better bounding algorithms under branching constraints and relaxations involving multiple neurons) and developing novel branching heuristics. They show the efficacy of their method on a number of rigorous experiments, outperforming SOTA solvers for neural network verification on several benchmark datasets. + +All reviewers agree that the paper makes valuable contributions and minor concerns were addressed adequately during the rebuttal phase. Hence I recommend that the paper be accepted.",ICLR2022, +x6KfhoeumSBA,1642700000000.0,1642700000000.0,1,swbAS4OpXW,swbAS4OpXW,Paper Decision,Reject,"This work was the subject of significant back and forth (between authors and reviewers, but also between reviewers & myself) due to the wide range of opinions. Two of the reviewers have found this work below the bar: they have provided multiple reasonings that I would rather not repeat here. The third reviewer found this work more compelling and argued for its acceptance. My attempts at reaching a consensus have yielding the following conclusions: + + * There's agreement that one-shot generation is indeed a challenging task + * Some of the results are indeed impressive, but many results are not compelling. + * The rebuttal addressed some of the concerns (e.g. visualization of latents), but some issues are unaddressed (e.g. more motivation, explanation of why the proposed method works better) + * One of the reviewers has argued rather forcefully that the work doesn't quite do domain adaptation in the typically understood sense. Moving beyond definitions of domain adaptation, the same reviewer was not very convinced by the quality of the results themselves. + * The reviewer most positive about this work agrees that this work only explores a limited form of domain transfer. They argued that some of the potential applications of this work do make the submission interesting. + +Fundamentally, the discussion did not necessarily resolve the differences in opinion one way or another. Ultimately, all 3 reviewers believe that it would fine if this work was not accepted to ICLR at this time, despite some of the interesting results and promise. Given the discussion and this mildest consensus, I am inclined to recommend rejection too. I do think there's a substantial amount of constructive feedback in the reviews that would make a subsequent revision of this work quite a bit better.",ICLR2022, +H1gS2mfelE,1544720000000.0,1545350000000.0,1,BJxbYoC9FQ,BJxbYoC9FQ,interesting problem and method; insufficient discussion of and comparison to related work,Reject,"{418}; {Classifier-agnostic saliency map extraction}; {Avg: 4.33}; {} + +1. Describe the strengths of the paper. As pointed out by the reviewers and based on your expert opinion. + +The paper is well-written and the method is simple, effective, and well-justified. + +2. Describe the weaknesses of the paper. As pointed out by the reviewers and based on your expert opinion. Be sure to indicate which weaknesses are seen as salient for the decision (i.e., potential critical flaws), as opposed to weaknesses that the authors can likely fix in a revision. + +1. The introduction, in particular the last row of pg 1, implies that this work is the first to show that a class-agnostic saliency estimation method can produce higher-quality saliency maps than class-dependent ones. However, Fan et al. have already shown this. For this reason, AR1 recommended that the authors reword the introduction to reflect prior work on this aspect but the authors declined to do so. The AC would have liked to see a discussion of how the different points of view of the two works (robustness to corruption vs class-agnosticism) both address the same issue (poor segmentation of the salient image regions). +2. The work of Fan et al has a very similar approach and a deeper comparison is needed. While the authors dedicated two paragraphs of discussion to this work, they should have gone further. For example, the work of Fan et al. uses a very simple saliency map extraction network and it's unclear how much this impacts their performance when compared to the proposed method, which uses ResNet50. The AC agrees with the authors that re-implementing the method of Fan et al. is asking a lot but a discussion of the potential impact would have sufficed. +3. The authors didn't mention at all the vast body of work on salient object detection (for a somewhat recent review see Borji et al. ""Salient object detection: A benchmark."" IEEE TIP). The differences to this line of work should have been discussed. + +Points 1 and 2 were particularly salient for the final decision. + +3. Discuss any major points of contention. As raised by the authors or reviewers in the discussion, and how these might have influenced the decision. If the authors provide a rebuttal to a potential reviewer concern, it’s a good idea to acknowledge this and note whether it influenced the final decision or not. This makes sure that author responses are addressed adequately. + +Two major points of contention were: +- The discussion of differences between the proposed method and the method of Fan et al. +- The fairness of the comparison to Fan et al. +AR1 felt that the paper was deficient on both counts (AR2 had similar concerns) and the authors disagreed, arguing that the discussion was complete and the quantitative comparison fair. + +The AC was sympathetic to these concerns and found the authors' responses to be dismissive of those concerns. In particular, the AC agrees that the paper, as currently organized, minimizes the degree to which the work is derived from Fan et al. + +4. If consensus was reached, say so. Otherwise, explain what the source of reviewer disagreement was and why the decision on the paper aligns with one set of reviewers or another. + +The reviewers reached a consensus that the paper should be rejected. +",ICLR2019,3: The area chair is somewhat confident +5944Z8zbo4,1576800000000.0,1576800000000.0,1,ByxJjlHKwr,ByxJjlHKwr,Paper Decision,Reject,"The authors propose a model-based RL algorithm, consisting of learning a +deterministic multi-step reward prediction model and a vanilla CEM-based MPC +actor. +In contrast to prior work, the model does not attempt to learn from observations +nor is a value function learned. +The approach is tested on task from the mujoco control suit. + +The paper is below acceptance threshold. +It is a variation on previous work form Hafner et al. +Furthermore, I think the approach is fundamentally limited: All the learning +derives from the immediate, dense reward signal, whereas the main challenges in RL +are found in sparse reward settings that require planning over long horizons, where value +functions or similar methods to assign credit over long time windows are +absolutely essential.",ICLR2020, +HklxPRElgN,1544730000000.0,1545350000000.0,1,rye7XnRqFm,rye7XnRqFm,Promising but more thorough investigation needed,Reject,"The paper proposes to use a convolutional/de-convolutional Q function over on-screen goal locations, and applied to the problem of structured exploration. Reviewers pointed out the similarity to the UNREAL architecture, the difference being that the auxiliary Q functions learned are actually used to act in this case. + +Reviewers raised concerns regarding novelty, the formality of the writing, a lack of comparisons to other exploration methods, and the need for ground truth about the sprite location at training time. A minor revision to the text was made, but the reviewers did not feel their main criticisms were addressed. While the method shows promise, given that the authors acknowledge that the method is somewhat incremental, a more thorough quantitative and ablative study would be necessary in order to recommend acceptance.",ICLR2019,5: The area chair is absolutely certain +1x-6OcNQqQd,1642700000000.0,1642700000000.0,1,H7Edu1_IZgR,H7Edu1_IZgR,Paper Decision,Reject,"At a high level, the novelty of this paper is limited: RL2 with transformers instead of RNNs. The emphasis is then placed on the experimental evaluation. Unfortunately, the reviewers felt that the experimental methodology and results were not strong enough at this stage to warrant publication. During the rebuttal, the reviewers did not engage nor discuss the author response, unfortunately, so I do not know what they think of the rebuttal. However, on evaluating the concerns of the reviewers against the updated manuscript, I think the updates do not go far enough to satisfy the concerns raised (experiments + baselines). Therefore, I recommend rejection.",ICLR2022, +ppSqOTGgWdh,1642700000000.0,1642700000000.0,1,kavTY__jxp,kavTY__jxp,Paper Decision,Accept (Poster),"After discussion, all reviewers are convinced about the novelty of the proposed method, and adjusted scores to recommend acceptance. They all appreciate the attempt to attack COVID-19 using machine learning.",ICLR2022, +2VAO0AhxO,1576800000000.0,1576800000000.0,1,H1gdAC4KDB,H1gdAC4KDB,Paper Decision,Reject,"This work starts with a decomposition of the adversarial risk into two terms: the first is the usual risk, while the second is a stability term, that captures the possible effect of an adversarial perturbation. The insight of this work is that this second term can be dealt with using unlabelled data, which is often in plentiful supply. Unfortunately, the same ideas was developed concurrently and independently by several groups of authors. + +The reviewer all agreed that this particular version was not ready for publication. In two cases, the authors compared the work unfavorably with concurrent independent work. I will note that the main bound somewhat ignores the issue of overfitting that the second term deals with via the Rademacher bound. Unless one assumes one has unlimited unlabeled data, could one not get an arbitrarily biased view of robustness from the sample. Seems like a gap to fill.",ICLR2020, +szSFylL1gQ,1576800000000.0,1576800000000.0,1,SklVI1HKvH,SklVI1HKvH,Paper Decision,Reject,The reviewers have raised several important concerns about the paper that the authors decided not to address.,ICLR2020, +hX2-PlsnZeq,1610040000000.0,1610470000000.0,1,0zvfm-nZqQs,0zvfm-nZqQs,Final Decision,Accept (Spotlight),"The paper argued some viewpoint about knowledge distillation quite interesting to me: the technically good KD might surprisingly be socially bad in helping outsiders ""stealing"" commercial models, even if the models are released as black boxes. Then the paper proposed a way called self-undermining KD in order to turn a well trained model into a ""nasty teacher"" (i.e., an undistillable model), and by this way the commercial models and the corresponding intellectual properties for training them from insiders can be nicely protected. + +Overall, the quality is quite high. The argument is very conceptually novel and the method is still technically novel. The idea of the method is simple but works for the purpose --- that's great! Although the experimental significance seems not too impressive, the paper opens a door to a new world concerning model privacy instead of data privacy, and hence it is of social significance. In my opinion, the paper should have a potentially huge social impact to DL practitioners (and company owners), because KD is being used almost everywhere in the Internet industry to provide the standalone mode of Apps without clouds on personal devices. Based on the quality and the impact, I recommend to accept the paper as a spotlight presentation.",ICLR2021, +HyfLLypBf,1517250000000.0,1517260000000.0,785,H1kMMmb0-,H1kMMmb0-,ICLR 2018 Conference Acceptance Decision,Reject,"The consensus among the reviewers is that this paper is not quite ready for publication for reasons I will summarize in more detail below. However, I think there are some things that are really nice about this approach, and worth calling out: + +PROS: + +1. the idea of tackling tasks broadly all the way from perception through symbolic reasoning is an important direction. + +2. It certainly would be useful to have a ""plug and play"" framework in which various knowledge sources or skills can be assembled behind a simple interface designed by the ML practioner to solve a given problem or class of problems. + +3. Clearly finding ways to increase sample efficiency -- especially in a deep net approach -- is of great importance practically. + +4. The writing is good. + +CONS: + +1. The comparison to feedforward networks needs to be made fair in order to disentangle the benefit of the architecture from the benefit of pre-training the modules. + +2. Using the very limited 2x2 grid was too low a bar for the reviewers. The authors aim at a more general, efficient architecture useful for a variety of tasks, and perhaps you didn't want to devote too much time to this particular task, but I think having a slam-dunk example of the power of the approach is really necessary to be convincing. + +3. Given the similarity, I think more has to be done to show the intellectual contribution over Zaremba et al, the difference in motivation notwithstanding. One way to do this is to really prove out the increased sample efficiency claim.",ICLR2018, +8fMB1xADQdF,1642700000000.0,1642700000000.0,1,gRCCdgpVZf,gRCCdgpVZf,Paper Decision,Accept (Poster),"Thanks for your submission to ICLR. + +This paper explores zero-shot adaptation from a theoretical perspective. Three of the four reviewers are quite positive about the paper, particularly after the discussion phase. One reviewer was more negative, citing a lack of compelling experiments and some possibly restrictive assumptions. + +The authors responded to these concerns, as well as the concerns of the other reviewers. One of the more positive reviewers increased their score from 6 to 8. I did not hear from the negative reviewer, but my feeling is that I tend to agree with the authors that the focus of the paper is more on the theoretical side. Moreover, the authors did add some additional results to the main paper, so I am of the opinion that the paper should indeed be accepted to the conference. + +Even though this is paper is on the theoretical side, please do include as strong a set of empirical results as possible in the final version. Also keep in mind the other suggestions from the reviewers when preparing the final manuscript.",ICLR2022, +YIcgBqa2U,1576800000000.0,1576800000000.0,1,SJeuueSYDH,SJeuueSYDH,Paper Decision,Reject,"The paper introduces a distributed algorithm for training deep nets in clusters with high-latency (i.e. very remote) nodes. While the motivation and clarity are the strengths of the paper, the reviewers have some concerns regarding novelty and insufficient theoretical analysis. ",ICLR2020, +KP7A3dMjZX,1576800000000.0,1576800000000.0,1,SJg7spEYDS,SJg7spEYDS,Paper Decision,Accept (Poster),"The paper proposes a training method for generative adversarial network that avoids solving a zero-sum game between the generator and the critic, hence leading to more stable optimization problems. It is similar to MMD-GAN, in which MMD is computed on a projected low-dim space, but the projection is trained to match the density ratio between the observed and the latent space. +The reviewers raised several questions. Most of them have been addressed after several rounds of discussions. Overall, they are all positive about this paper, so I recommend acceptance. I encourage the authors to incorporate those discussions in their revised paper.",ICLR2020, +rkuBnzLOg,1486400000000.0,1486400000000.0,1,Hk8rlUqge,Hk8rlUqge,ICLR committee final decision,Reject,"This paper addresses the importance task of learning generative models of multiple modalities. There are two concerns about the paper: limited novelty, which will not have sufficient impact; ineffectiveness of evaluation. The paper extends VAEs in an interesting way, but this extension on its own does provide sufficient new insight understanding. And the log-likelihood evaluations and data sets are not enough to be convincing. As a result, the paper is not yet ready for acceptance at the conference.",ICLR2017, +Zwqd_dNgnsT,1642700000000.0,1642700000000.0,1,KdcLdLuIjQT,KdcLdLuIjQT,Paper Decision,Reject,"This paper proposes intrinsic rewards to train agents without environment rewards in text-based games. The key contribution is a goal generation method that samples random goals from a set of valid goals in natural language, which are obtained based on commonsense rules. Reviewers generally agree that the proposed method is intuitive and simple to implement, and appreciates the new results added during discussion. However, there are two main concerns: 1) the goal creation process is largely rule-based and task-specific, therefor it's unclear how well this method would generalize to other tasks; 2) related to 1), the generate goals carry a significant amount of domain knowledge about the task that is not available to the baselines, making the comparison a bit unfair. A future submission would benefit from demonstrating the generalizability of the proposed approach, e.g., by using more generic resources such as game meta data, generic knowledge/commonsense bases.",ICLR2022, +teFA2T2dTp,1610040000000.0,1610470000000.0,1,193sEnKY1ij,193sEnKY1ij,Final Decision,Accept (Poster),"The approach explore the use of Conditional Risk Minimization (CRM) as a post-hoc operation to amend a classifier decision by averaging a prior class hierarchy. The authors show that it is beneficial for ranking predictions without sacrifying top-1 accuracy. + +The rebuttal period clarified some reviewers' concern on paper presentation and experiments, and all reviewers recommend acceptance after the discussion period. + +Although the approach is simple and directly revisits the use of CRM for deep models, the AC considers that the contribution is meaningful, and that the proposed method provides predictions with good ranking and calibration properties. The paper also sheds light into interesting issues in state-of-the-art methods integrating class hierarchies during training. +The AC therefore recommends paper acceptance. +",ICLR2021, +78Y4TNSx_O4,1610040000000.0,1610470000000.0,1,XMoyS8zm6GA,XMoyS8zm6GA,Final Decision,Reject,"This paper aims to study the dimension of the Class Manifolds (CM) which are defined as the region classified as certain classes by a neural network. The authors develop a method to measure the dimension of CM by generating random linear subspaces and compute the intersection of the linear subspace with CM. All reviewers agree that this is an interesting problem and worth studying. + +However, there are major concerns. One question raised by several reviewers is that the goal of this paper is to analyze the dimension of the region that has the same output for the neural network; while the method and analysis are for a single datum. It is not clear if the obtained result is what the paper really aimed at. Another issue is the experimental results are different from that of local analysis. The dimension estimated by using the method in this paper is much higher. + +Based on these, I am not able to recommend acceptance. But the authors are highly encouraged to continue this research. +",ICLR2021, +iT3RC9lMI8r,1642700000000.0,1642700000000.0,1,B8DVo9B1YE0,B8DVo9B1YE0,Paper Decision,Accept (Poster),This paper received 4 unanimous accept (including 1 marginal accept). This well-written and clear paper clarifies the relationship between transformers and a recent exciting model of the medial temporal lobe in neuroscience. There was some clarifications requested by the reviewers that were addressed during the revision. This paper will make a great computational neuroscience contribution to this year ICLR!,ICLR2022, +w730InVyT4,1610040000000.0,1610470000000.0,1,WqXAKcwfZtI,WqXAKcwfZtI,Final Decision,Reject,"This paper presents a novel theoretical analysis for unsupervised domain adaptation based on f-divergences. The reviews unanimously pointed out the interest and the quality of the theoretical part. However, some limitations in the experiments, presentation and the significance of the result have been raised. The authors provided a rebuttal that addresses some concerns. +However, the reviewers agree that the experimental part still requires some extension to fully support the claim of the paper, as well as some writing improvement. +The paper was evaluated to be not ready for ICLR, thus I recommend rejection. +",ICLR2021, +HJej2EhxeN,1544760000000.0,1545350000000.0,1,ryGfnoC5KQ,ryGfnoC5KQ,accept,Accept (Poster),"this submission follows on a line of work on online learning of a recurrent net, which is an important problem both in theory and in practice. it would have been better to see even more realistic experiments, but already with the set of experiments the authors have conducted the merit of the proposed approach shines. ",ICLR2019,4: The area chair is confident but not absolutely certain +F9Xx1ImJqe,1576800000000.0,1576800000000.0,1,ByxtC2VtPB,ByxtC2VtPB,Paper Decision,Accept (Poster),"This paper proposed a mixup inference (MI) method, for mixup-trained models, to better defend adversarial attacks. The idea is novel and is proved to be effective on CIFAR-10 and CIFAR-100. All reviewers and the AC agree to accept the paper.",ICLR2020, +Bkx6xzFmgE,1544950000000.0,1545350000000.0,1,SJe9rh0cFX,SJe9rh0cFX,Good motivation and new theoretical insights ,Accept (Poster),"This paper addresses a well motivated problem and provides new insight on the theoretical analysis of representational power in quantized networks. The results contribute towards a better understanding of quantized networks in a way that has not been treated in the past. + +The most moderate rating (marginally above acceptance threshold) explains that while the paper is technically quite simple, it gives an interesting study and blends well into recent literature on an important topic. + +A criticism is that the approach uses modules to approximate the basic operations of non quantized networks. As such it not compatible with quantizing the weights of a given network structure, but rather with choosing the network structure under a given level of quantization. However, reviewers consider that this issue is discussed directly and clearly in the paper. + +The reviewers report to be only fairly confident about their assessment, but they all give a positive or very positive evaluation of the paper. ",ICLR2019,4: The area chair is confident but not absolutely certain +SJxZ5WeXgV,1544910000000.0,1545350000000.0,1,rJgbSn09Ym,rJgbSn09Ym,interesting model learning framework,Accept (Poster),"The paper proposes a particle based framework for learning object dynamics. A scene is represented by a hierarchical graph over particles, edges between particles are established dynamically based on Euclidean distance. The model is used for model predictive control, and there is also one experiment with a particle graph built from a real scene as opposed to simulation. + +All reviewers agree that the architectural changes over previous relational networks are worthwhile and merit publication. They also suggest to tone down the ``dynamic” part of the graph construction by stating that edges are determined based on a radius. In particular, previous works also consider similar addition of edges during collisions, quoting Mrowca et al. ""Collisions between objects are handled by dynamically defining pairwise collision relations ... between leaf particles..."" which suggests that comparison against a baseline for Mrowca et al. that uses a static graph is not entirely fair. The authors are encouraged to repeat the experiment without disabling such dynamic addition of edges. + +",ICLR2019,5: The area chair is absolutely certain +_8soUrhEFu4,1642700000000.0,1642700000000.0,1,R0xRE2MU2uA,R0xRE2MU2uA,Paper Decision,Reject,"While the reviewers appreciated the new methodology and presentation of the paper the reviewers were concerned about the experimental section. Specifically they wanted to see optimization outside of penalized logP and QED, which are now viewed by the community as toy molecule optimization tasks (e.g., Penalized logP can always be improved by just adding a longer chain of carbon atoms). The authors responded that this would have taken too long to run Guacamol tasks in the rebuttal phase as all methods would need to be rerun for all tasks, but this is not true: many methods e.g., Ahn et al., 2020, already have reported these results and could be directly compared against (as this paper is near state of the art this would have been a convincing comparison). Another odd thing about the experimental setup is that the authors compare with Ahn et al., 2020 only for constrained property prediction. However Ahn et al., 2020 achieves a Penalized logP of 31.40 whereas the proposed method only achieves 13.95. It's suspicious that this result is missing in Table 2 of the current paper. If the authors are able to improve their work beyond Ahn et al., 2020 and related recent work on Guacamol and othe real-world tasks, the paper will make a much stronger submission.",ICLR2022, +HyzCUkpBf,1517250000000.0,1517260000000.0,893,By5ugjyCb,By5ugjyCb,ICLR 2018 Conference Acceptance Decision,Reject,"All of the reviewers agree that the experimental results are promising and the proposed activation function enables a decent degree of quantization. However, the main concern with the approach is its limited novelty compared to previous work on clipped activation functions. + +minor comments: +- Even though PACT is very similar to Relu, the names are very different. +- Please include a plot showing the proposed activation function as well. +",ICLR2018, +Sygd0qmJeN,1544660000000.0,1545350000000.0,1,SkeUG30cFQ,SkeUG30cFQ,Metareview,Reject,The paper conveys interesting study but the reviewers expressed concerns regarding the difference of this work compared to existing approaches and pointed a room for more thorough empirical evaluation.,ICLR2019,5: The area chair is absolutely certain +mIdUSpKIbxU,1642700000000.0,1642700000000.0,1,e95i1IHcWj,e95i1IHcWj,Paper Decision,Accept (Poster),"This work studies the question of increasing the expressive power of GNNs by adding positional encodings while preserving equivariance and stability to graph perturbations. +Reviewers were generally positive about this work, highlighting its judicious problem setup, identifying the right notion of stability and how it should drive the design of positional encodings. Despite some concerns about the discrepancy between the theoretical results and the empirical evaluation, the consensus was ultimately that this work is an interesting contribution, and therefore the AC recommends acceptance.",ICLR2022, +rU-ojt20foo,1610040000000.0,1610470000000.0,1,uFHwB6YTxXz,uFHwB6YTxXz,Final Decision,Reject,"This paper invariantizes distribution based deep networks by using pairwise embedding of the set’s elements. The idea is inspired from De Bie et al. (2019), which allows invariance to be incorporated through the interaction functional. Although the paper is well executed with solid theoretical analysis and solid response to the reviewers' comments, the novelty is limited, and reviewers have concerns with experiments and presentation. + +",ICLR2021, +R_r78Q1pNw,1642700000000.0,1642700000000.0,1,QJWVP4CTmW4,QJWVP4CTmW4,Paper Decision,Accept (Poster),"All reviewers agree that the presented ADA-Nets approach is very interesting and sufficiently novel, addressing the degradation problem in face clustering. The reviewers are satisfied with the presented experimental studies in most cases. The rebuttal addressed a large majority of additionally raised questions. I disagree with one reviewer’s comment – that the focus of the paper is too narrow – because clustering techniques are of great interest to the ICLR community. I believe that the paper will be of interest to the audience attending ICLR and would recommend a presentation of the work as a poster.",ICLR2022, +1_TCjkGw6k,1576800000000.0,1576800000000.0,1,Bylx-TNKvH,Bylx-TNKvH,Paper Decision,Accept (Poster),"This work proves that the weights of feed-forward ReLU networks are determined, up to a specified set of symmetries, by the functions they define. Reviewers found the paper easy to read and the proof technically sound. There was some debate over the motivation for the paper, Reviewer 1 argues that there is no practical significance for the result, a point that the authors do not deny. I appreciate the concerns raised by Reviewer 1, theorists in machine learning should think carefully about the motivation for their work. However, while there is no clear practical significance of this work, I believe there is value to accepting it. Because the considered question concerns a sufficiently fundamental property of neural networks, and the proof is both easy to read and provides insights into a well studied class of models, I believe many researchers will find value in reading this paper.",ICLR2020, +7RlwxfdCQu,1576800000000.0,1576800000000.0,1,BJxvH1BtDS,BJxvH1BtDS,Paper Decision,Reject,"The authors provide an empirical study of the recent 3-head architecture applied to AlphaZero style learning. They thoroughly evaluate this approach using the game Hex as a test domain. + +Initially, reviewers were concerned about how well the hyper parameters for tuned for different methods. The authors did a commendable job addressing the reviewers concerns in their revision. However, the reviewers agreed that with the additional results showing the gap between the 2 headed architecture and the three-headed architecture narrowed, the focus of the paper has changed substantially from the initial version. They suggest that a substantial rewrite of the paper would make the most sense before publication. + +As a result, at this time, I'm going to recommend rejection, but I encourage the authors to incorporate the reviewers feedback. I believe this paper has the potential to be a strong submission in the future. + +",ICLR2020, +ry72nfLOl,1486400000000.0,1486400000000.0,1,HJTXaw9gx,HJTXaw9gx,ICLR committee final decision,Invite to Workshop Track,"The basic approach of this paper is to use a neural net to sequentially generate points that can be used as the basis points in a PDE solver. The idea is definitely an interesting one, and all three reviewers are in agreement that the approach does seem to have a lot of potential. + + The main drawback of the paper, simply, is that it's unclear whether this result would be of sufficient interest for the ICLR audience. Ultimately, it seems as though the authors are simply training a neural network to generate this points, and the interesting contribution here comes from the application to PDE solving, not really from any advance in the NN/ML side itself. As such, it seems like the paper would be better appreciated (as a full conference paper or journal paper), within the control community, rather than ICLR. However, I do think that as an application, many at ICLR would be interested in seeing this work, even if its likely to have relatively low impact on the community. Thus, I think the best avenue for this paper is probably as a workshop post at ICLR, hopefully with further submission and exposure in the controls community. + + Pros: + + Nice application of ML to a fun problem, generating sample points for PDE solutions + + Overall well-written and clearly presented + + Cons: + - Unclear contribution to the actual ML side of things + - Probably better suited to controls conferences",ICLR2017, +u_2xRkpxGTJ,1642700000000.0,1642700000000.0,1,C4o-EEUx-6,C4o-EEUx-6,Paper Decision,Reject,"This paper describes Flashlight, a tool for ML researchers with specific design considerations for conducting systems research. The needs for such tool are significant, and recent advances in this topic have been relatively slow, so this research is timely and important. + +Reviewers are positive about the importance of the problem and the nice design of Flashlight. It seems the tool has been used by researchers with positive feedback. At the time of the original submission, reviewers expressed some concerns about the novelty and the weak arguments for convincingly showing the advantages over other similar tools. + +Authors provided nice replies including specific case studies, but with the short time period to reassess the proposed changes and additions, some reviewers remain hesitant, and thus this paper cannot be accepted at this time. I strongly encourage the authors to incorporate all of the proposed revisions and resubmit to a future venue.",ICLR2022, +dw54GhTAeHO,1642700000000.0,1642700000000.0,1,LtKcMgGOeLt,LtKcMgGOeLt,Paper Decision,Accept (Spotlight),"### Description +The paper demonstrates that efficient architectures such as transformers and MLP-mixers, which do not utilize translational equivariance in the design, when regularized with SAM (sharpness aware minimization) can achieve same or better performance as convolutional networks, in the vision problems where the convolutional networks were traditionally superior (with data augmentation or not, regularized or not). The paper demonstrates it very thoroughly through many experiments and analysis of the loss surfaces. + +### Decision + Discussion + +I find the paper to be very timely in its context. It has a remarkable coverage of experimental studies and different use cases: SAM + augmentation, +contrastive, +adversarial, +transfer learning; as well as ablation studies such as keeping first layers convolutional. The reviewers have asked further questions, and the authors were able to conduct respective experiments within the discussion period fully addressing all concerns and making the findings of the paper even more comprehensive and convincing. + +After the rebuttal 3 reviewers were for ""accept"", one ""marginally above"" and one ""marginally below"". In the latter case the concern was that the paper is an experimental study of a known method, SAM. While I understand that many researchers are expecting theoretical and innovative results from ICLR papers, I find that it does not prevent acceptance. Indeed, the experimental findings in this paper are on a ""hot"" topic, could be of wide interest and could lead to a change of paradigm in designing models towards more generic ones. On the other hand, it could just indicate that CNNs are not fully exploiting their potential, e.g. not exploiting the context well enough in the hidden layers? + +To get more insight, I am still wondering, how the predictions behave if the input is shifted by a few pixels in CNN and Transformers? It seems counterintuitive that making the first layers in ViT just an MLP of image patches is a good design. Furthermore, fully convolutional models allow to take input of an arbitrary size and average the predictions on the output if it happened to be larger than 1x1. + +Since convolutions are also used for e.g. semantic segmentation and generative models, one should not (and the authors do not in the paper) discard them too fast. See also a recent work combining transformers and convolutional networks, +Chen et al. (ICCV 2021) Visformer: The Vision-friendly Transformer.",ICLR2022, +MoxwbweEDNB,1642700000000.0,1642700000000.0,1,qaQ8kUBYhEK,qaQ8kUBYhEK,Paper Decision,Reject,"The authors analyze linear regression with gaussian covariates in an +asymptoptic setting, where the number of examples and the number of +covariates go to infinity together. They identify conditions +on the covariance under which ""multiple descent"" occurs, and +conditions under which a regularization removes this effect. + +Concerns were raised that the overlap between this paper and previous +research was too substantial for it to be published in ICLR. These +persisted after the authors' response and the discussion period.",ICLR2022, +rygCePU-gE,1544800000000.0,1545350000000.0,1,BkMXkhA5Fm,BkMXkhA5Fm,introducing a non conclusively useful dataset,Reject,"The paper introduces a new dataset that contains multiple landings from the X plane simulator, and each includes readings from multiple sensors for aircraft landing. The paper also trains a set of self-supervised methods presented in previous works in order to learn sensory representations, and evaluates the learnt representations in terms of disentanglement and re-purposing to a discriminative task. + Though the evaluations presented are interesting, they are not convincingly useful, as noted by the reviewers. Overall, it is not clear why this dataset is particularly well suited for representation learning. Furthermore, it is difficult to evaluate representation learning methods without relating them to an end-task, e.g., that of landing the aircraft. +The paper writing would also benefit from restructuring and improving on English expressions. In particular, the conclusion section contains half-finished sentences. +",ICLR2019,4: The area chair is confident but not absolutely certain +dsIQuI8DCU,1576800000000.0,1576800000000.0,1,S1xWh1rYwB,S1xWh1rYwB,Paper Decision,Accept (Talk),"All three reviewers strongly recommend accepting this paper. It is clear, novel, and a significant contribution to the field. Please take their suggestions into account in a camera ready version. Thanks!",ICLR2020, +#NAME?,1642700000000.0,1642700000000.0,1,oMI9PjOb9Jl,oMI9PjOb9Jl,Paper Decision,Accept (Poster),"Somewhat borderline paper given the scores, but leaning on the side of accepting mostly because the positive (and weak positive) reviews are a little more persuasive. The negative review is a bit of an outlier; the main issues raised in the negative review are that the novelty is on the lower side or otherwise that the work is incremental. These complaints are largely not shared by the other reviewers, and furthermore seem not like deal-breakers. Still a borderline paper, but fairly safe to accept.",ICLR2022, +SJxpkVPxl4,1544740000000.0,1545350000000.0,1,ByecAoAqK7,ByecAoAqK7,"Reasonable improvements, but novelty incremental",Reject,"This paper is essentially an application of dual learning to multilingual NMT. The results are reasonable. + +However, reviewers noted that the methodological novelty is minimal, and there are not a large number of new insights to be gained from the main experiments. + +Thus, I am not recommending the paper for acceptance at this time.",ICLR2019,4: The area chair is confident but not absolutely certain +bgXE-rJSQON,1610040000000.0,1610470000000.0,1,HowQIZwD_42,HowQIZwD_42,Final Decision,Reject,"This paper proposes a method to quantify transference, which is a measure of information transfer across tasks, for multi-task learning framework. Specifically, the transference is measured as the change in the loss for a specific task after performing a gradient update for another. The proposed transference measure is used to both understand the optimization dynamics of MTL and improve the MTL performance, either by grouping tasks or combining task gradients based on the transference. The method is validated on multiple datasets and is shown to bring in some performance gains over the base MTL model (PCGrad, UW-MTL). + +The majority of the reviewers were negative about this paper (4, 4, 5), while one reviewer gave it a positive rating (6). The reviewers in general agreed that the idea of measuring transference as the change in the loss with gradient updates is novel and intuitive. Yet, the reviewers had common concerns on the 1) weak performance improvements, and the 2) high-cost of computing the transference. While computing the transference requires additional computations with linear time complexity, which may be problematic with a large number of tasks, the performance gains using it were rather marginal (less than 0.5% over the baselines). Another common concern from the reviewers was its insufficient experimental validation, as a comparative study against existing works that perform task grouping is missing. Both the authors and reviewers actively participated in the interactive discussion. However, the reviewers found that the two critical limitations persist even after the authors’ feedback, and in a subsequent internal discussion, they reached a consensus that the paper is not yet ready for publication. + +Thus, although the proposed method is novel and appears to be promising, it may need more developments to make it both more effective and efficient. Moreover, there should be more in-depth analysis of its time-efficiency, and other benefits (e.g. interpretability) that could be achieved with the proposed transference measure. Finally, while there exist many works on learning both hard or soft task grouping, the authors do not reference or compare against them. To name a few, [Kang et al. 11] propose how to learn the discrete task groupings, [Kumar and Daume III 12] propose to learn a soft grouping between tasks, [Lee et al. 16] propose to learn soft grouping based on asymmetric knowledge transfer direction across the tasks, and [Lee et al. 18] proposes the extension of [Lee et al. 16] to a deep learning framework. I suggest the authors to discuss and compare against the above mentioned works, and fortify the related work section by searching for more classical works on multi-task learning. + +- [Kang et al. 11] Learning with Whom to Share in Multi-task Feature Learning, ICML 2011 +- [Kumar and Daume III 12] Learning Task Grouping and Overlap in Multi-task Learning, ICML 2012 +- [Lee et al. 16] Asymmetric Multi-task Learning based on Task Relatedness and Confidence, ICML 2016 +- [Lee et al. 18] Deep Asymmetric Multi-task Feature Learning, ICML 2018.",ICLR2021, +lsqKxT81X,1576800000000.0,1576800000000.0,1,Syx5eT4KDS,Syx5eT4KDS,Paper Decision,Reject,"The reviewers were unanimous that this submission is not ready for publication at ICLR in its current form. + +Concerns raised included that the method was not sufficiently general, including in choice of experiments reported, and the lack of discussion of some lines of significantly related work.",ICLR2020, +855mfqVWoH,1576800000000.0,1576800000000.0,1,B1eoyAVFwH,B1eoyAVFwH,Paper Decision,Reject,"This paper considers how to create efficient architectures for multi-task neural networks. R1 recommends Weak Reject, identifying concerns about the clarity of writing, unsupported claims, and missing or unclear technical details. R2 recommends Weak Accept but calls this a ""borderline"" case, and has concerns about experiments and comparisons to baselines. R3 also has concerns about experiments and baselines, and feels the approach is somewhat ad hoc. The authors submitted a response that addressed some of these issues, but the authors chose to maintain their decisions. The AC feels the paper has merit but given these slightly negative to borderline reviews, we cannot recommend acceptance at this time. We hope the reviewer comments help the authors to prepare a revision for another venue.",ICLR2020, +OTZexpY6chm,1642700000000.0,1642700000000.0,1,JJCjv4dAbyL,JJCjv4dAbyL,Paper Decision,Accept (Poster),"The authors propose to use genetic algorithms to learn variational autoencoders (VAEs) with discrete latent spaces. Specifically they employ natural evolution strategies (NES) to avoid backpropagating gradients through discrete variables. Experiments show how the proposed approach is competitive with the current state-of-the-art to train discrete VAEs. + +Some concerns arose from the review and discussion phases, these included confusion around the justification and derivation of NES for VAEs in the presentation and the limitation of the experiments. Authors were responsive and provided the reviewers the needed clarifications, an updated presentation in the revised paper and additional experimental results which ultimately were successful in raising the reviewers' scores towards full acceptance.",ICLR2022, +xT8BEa8BsfvW,1642700000000.0,1642700000000.0,1,H-sddFpZAp4,H-sddFpZAp4,Paper Decision,Reject,"The paper presents a novel architecture, ModeRNN, for unsupervised video prediction by learning spatiotemporal attention in the latent subspace (slots). ModeRNN effectively learns modular features using a set of mode slots and adaptively aggregates +the slot features with learnable importance weights. The paper has promising results on several benchmark video prediction datasets. + +During the post-rebuttal discussion, the reviewer Wt6k and VMMf responded to the authors' rebuttal, but there was no discussion among them. The consensus is that even though the paper is a very strong engineering effort, it was not clear how the proposed architecture addresses the spatiotemporal mode collapse problem. T-SNE in Fig. 3/10/13 is insufficient to show disentangled feature space. In fact, PhyDnet was designed to disentangle different factors (physical vs unknown), hence not a good baseline. [Hsieh et al 2018] is a better fit. In addition, synthetic data examples would be helpful to explain the underlying mechanism of the model and provide more insights for the video prediction community. + +Based on this reason, I recommend rejecting this paper as it is now and encourage the authors to revise the draft and submit to future venues. + +Hsieh, J. T., Liu, B., Huang, D. A., Li, F. F., & Niebles, J. C. (2018, January). Learning to Decompose and Disentangle Representations for Video Prediction. In NeurIPS.",ICLR2022, +H1xxDiagxV,1544770000000.0,1545350000000.0,1,B1eEKi0qYQ,B1eEKi0qYQ,"Interesting direction, still significant concerns on positioning with respect to wider literature and significance of contribution",Reject,"The authors present a new method for leveraging multiple parallel agents to speed RL in continuous action spaces. By monitoring the best performers, that information can be shared in a soft way to speed policy search. The problem space is interesting and faster learning is important. However, multiple reviewers [R2, R1] had significant concerns with how the work is framed with respect to the wider literature (even after the revisions), and some concerns over the significance of the performance improvements which seem primarily to come from early boosts. There is also additional related work on concurrent RL (Guo and Brunskill 2015; Dimakopoulou, Van Roy 2018 ; Dimakopoulou, Osband, Van Roy 2018) which provides some more formal considerations of the setting the authors consider, which would be good to reference. +",ICLR2019,4: The area chair is confident but not absolutely certain +pHKV61x68YG,1610040000000.0,1610470000000.0,1,WesiCoRVQ15,WesiCoRVQ15,Final Decision,Accept (Poster),"The paper tackles a very important problem. The formulation of the paper is sound as under lightweight assumptions, the supervised loss follows an f-divergence formulation (see ""Information, Divergence and Risk for Binary Experiments"" by Reid and Williamson (JMLR 2011), in particular Section 4.7). It would make sense to dig in the loss in the context of label noise; the variational formulation provides an interesting direction along those lines. The rebuttal on the experimental concerns of reviewers is appreciated (Cf authors’ rebuttal summary). +",ICLR2021, +ry9zpMUue,1486400000000.0,1486400000000.0,1,rk5upnsxe,rk5upnsxe,ICLR committee final decision,Accept (Poster),"On the one hand, the topic is considered important and the paper is technically correct. On the ohter hand, novelty and theoretical depth are a bit lacking. Overall, this is a borderline paper. + +Still, the Program Chairs recommend it for a poster presentation given the importance of the topic.",ICLR2017, +aq_UxqOsU_1,1642700000000.0,1642700000000.0,1,HI99z0aLsl,HI99z0aLsl,Paper Decision,Reject,"The paper studies the benign overfitting phenomenon for linear models with adversarial training. The main issue is that the result is quite expected for experts versed in the benign overfitting papers, and indeed the reviewers pointed out that they could not see much technical novelty. However, even more importantly, the original benign overfitting papers had the advantage of proposing of simpler model (linear!!!) with the same behavior as the complex ones in practice. This is not the case here, as the result diverge from empirical observations on deep networks. The authors argue that it is a valuable finding that the empirical observation is not ""universal"", but this is a somewhat moot point as linear models are a priori very very different from the setting in which these empirical observations were made. For these reasons I believe the paper does not meet the bar for ICLR (yet it could still be publishable elsewhere).",ICLR2022, +Syu23fIOe,1486400000000.0,1486400000000.0,1,By14kuqxx,By14kuqxx,ICLR committee final decision,Invite to Workshop Track,"Unfortunately, none of the reviewers, nor the AC, have strongly supported for the acceptance of this paper. The fact that fixed-point arithmetic is the focus of this work, while floating-point arithmetic is much more common, is also a concern. The PCs thus don't think this work can be accepted to ICLR. However, we are happy to invite the authors to present their work at the Workshop Track.",ICLR2017, +oCzyFee_vS,1610040000000.0,1610470000000.0,1,3EM0a2wC-jo,3EM0a2wC-jo,Final Decision,Reject,"This paper was on the borderline. While there was some support for the ideas presented, concerns were raised about the experiments. The exposition would also need to better demonstrate the significance of the contribution.",ICLR2021, +1UNxMOZqKqW,1610040000000.0,1610470000000.0,1,xtKFuhfK1tK,xtKFuhfK1tK,Final Decision,Reject,"The paper introduces a new locality-aware importance weighted sampling procedure for distributed training of GNNs. While the paper is interesting, the reviewers raised some fundamental concerns about it. + +The focus on the paper is on scalable methods and the experiments or only run on medium-size datasets(<2m nodes). For such a paper larger scale experiments are expected. + +Furthermore, the novelty of the paper is limited. + +Overall, the paper is below the high acceptance bar of ICLR.",ICLR2021, +_Exo7Gxf9,1576800000000.0,1576800000000.0,1,BJxDNxSFDH,BJxDNxSFDH,Paper Decision,Reject,All reviewers agree that this paper is not ready for publication.,ICLR2020, +A3WVKXXuiD,1576800000000.0,1576800000000.0,1,H1eD7REtPr,H1eD7REtPr,Paper Decision,Reject,"The reviewers attempted to provide a fair assessment of this work, albeit with varying qualifications. Nevertheless, the depth and significance of the technical contribution was unanimously questioned, and the experimental evaluation was not considered to be convincing by any of the assessors. The criticisms are sufficient to ask the authors to further strengthen this work before it can be considered for a top conference.",ICLR2020, +g6oIocKslj,1576800000000.0,1576800000000.0,1,Hye4KeSYDr,Hye4KeSYDr,Paper Decision,Reject,"The paper proposes an approach for finding an explainable subset of features by choosing features that simultaneously are: most important for the prediction task, and robust against adversarial perturbation. The paper provides quantitative and qualitative evidence that the proposed method works. + +The paper had two reviews (both borderline), and the while the authors responded enthusiastically, the reviewers did not further engage during the discussion period. + +The paper has a promising idea, but the presentation and execution in its current form have been found to be not convincing by the reviewers. Unfortunately, the submission as it stands is not yet suitable for ICLR.",ICLR2020, +HygO6n8GgV,1544870000000.0,1545350000000.0,1,HkGGfhC5Y7,HkGGfhC5Y7,a reasonable method but empirical evidence is questionable,Reject,"Strengths: + +- well-written +- strong results for non-autoregressive NMT +- a novel soft EM version of VQ-VAE + +Weaknesses: + +- as pointed out by reviewers, the improvements are mostly not due to the VQ-VAE modification rather due to orthogonal (and not interesting) changes e.g., knowledge distillation. If there is a genuine contribution of VQ-VAE, it is small and required extensive parameter selection + +- the explanations provided in the paper do not match the empirical results + +Two reviewers criticize the experiments / experimental section: rigour / their discussion. Overall, there is nothing wrong with the method but the experiments are not showing that the modification is particularly beneficial. Given these results and also given that the method is not particularly novel (switching from EM to Soft EM in VQ-VAE), it is hard for me to argue for accepting the paper.",ICLR2019,4: The area chair is confident but not absolutely certain +nvC8QfF93s,1642700000000.0,1642700000000.0,1,5LXw_QplBiF,5LXw_QplBiF,Paper Decision,Accept (Spotlight),"This paper advances the long running thread of sequence modelling research focussed on differentiable instantiations of stack based models. In particular it builds upon recent work on the Nondeterministic Stack RNN (NS-RNN) by introducing three extensions. The first is to relax the need for a normalised distribution over the state and action distribution and allow unnormalised weights, this mostly serves to facilitate gradient flow and thus easier training. The second extension allows the RNN to condition on the top stack state as well as the symbol, improving expressiveness. The third improvement introduces a method for limiting the memory required to run the proposed model on long sequences, thus allowing its application to practical language modelling tasks. Each of these requires substantial algorithmic innovations. + +The reviewers all agree that this is a strong paper worthy of publication. The paper includes a useful review of previous differentiable stack models which nicely sets up the rest of the paper where the contributions are well motivated and clearly presented. The reviewers had a number of clarification questions, partly due to the author's use of overly concise citations for key algorithms rather than inline descriptions. This situation has been improved by updates made to the paper. +The evaluation includes a series of synthetic experiments which are clear and provide a good elucidation of the various stack models properties. The practical evaluation on language modelling is more limited and serves mostly to demonstrate that the nondeterministic model can be scaled to a basic language modelling task. + +Overall this is a strong paper with a well motivated and clear hypothesis. It provides a substantial extension to the prior work on nondeterministic stack models and progresses this line of research toward practical applications.",ICLR2022, +w7du8FqOnw,1576800000000.0,1576800000000.0,1,H1xscnEKDr,H1xscnEKDr,Paper Decision,Accept (Spotlight),"This paper studies the problem of defending deep neural network approaches for image classification from physically realizable attacks. It first demonstrates that adversarial training with PGD attacks and randomized smoothing exhibit limited effectiveness against three of the highest profile physical attacks. Then, it proposes a new abstract adversarial model, where an adversary places a small adversarially crafted rectangle in an image, and develops two approaches for efficiently computing the resulting adversarial examples. Empirical results show the effectiveness. Overall, a good paper. The rebuttal is convincing.",ICLR2020, +B10hXyTBz,1517250000000.0,1517260000000.0,228,SkT5Yg-RZ,SkT5Yg-RZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"I fully agree with strong positive statements in the reviews. All reviewers agree that the paper introduces a novel and elegant twist on standard RL, wherein one agent proposes a sequence of diverse tasks to a second agent so as to accelerate the second agent's learning models of the environment. I also concur that the empirical testing of this method is quite good. There are strong and/or promising results in five different domains (Hallway, LightKey, MountainCar, Swimmer Gather and TrainingMarines in StartCraft). This paper would make for a strong poster at ICLR.",ICLR2018, +AWgaRPxEvZ,1642700000000.0,1642700000000.0,1,#NAME?,#NAME?,Paper Decision,Reject,"All of the reviewers believe the paper should not be accepted, and I concur with their recommendation for the reasons they mention. + +Four of the reviewers (vEBH, idrP, KoFV, 5k4c) believe the technique proposed in this paper is not particularly novel. Rather, the novelty is that it is being used on a BERT model rather than the computer vision models that are typically the starting point for pruning work. They also argue that the paper was not particularly thorough in its comparison to other pruning techniques (specifically dynamic sparsity techniques), which is essential for pruning work given how crowded and noisy the space is. Finally, they rightfully note that the paper does not look at the real-world speedups attainable on conventional hardware (GPUs and TPUs), the latter of which has no support for sparsity and the former of which (NVIDIA Ampere) has limited support for specific kinds of sparsity and especially limited support for sparse training. + +The reviewers also raised several more specific methodological issues with evaluation (e.g., using the MLM loss rather than fine-tuning as a basis for evaluation), but the above concerns alone were enough to convince me that the paper does not merit acceptance at this time.",ICLR2022, +B10B4yTBf,1517250000000.0,1517260000000.0,349,B14uJzW0b,B14uJzW0b,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"This submission is a continuation of a line of theoretical work that seeks to characterize optimization landscapes of neural networks by the presence or absence of spurious local minima. As the number of critical points grows combinatorially for larger networks, it is very challenging to show such results. The present submission extends slightly previous work by considering two hidden units and their proof technique goes beyond that of Brutzkus and Globerson, 2017, potentially leading to more interesting results if they can be extended to more complex networks. + +The setting of two hidden units is quite limited - far from any practical setting. If this were the stepping stone to proving optimality of certain optimization strategies for more complex networks, this may be of some interest, but it seems doubtful. One indication is given in Sec. 7 / Fig. 1 in which it is shown that for even quite small numbers of hidden units, spurious local optima do occur and are reached 40% of the time for random initializations even with only 11 nodes. + +",ICLR2018, +BkeSGjpelE,1544770000000.0,1545350000000.0,1,HkfwpiA9KX,HkfwpiA9KX,"Interesting combination of temporal logic for constructing new RL policies, presentation should be clearer",Reject,"The authors present an interesting approach for combining finite state automata to compose new policies using temporal logic. The reviewers found this contribution interesting but had several questions that suggests that the current paper presentation could be significantly clarified and situated with respect to other literature. Given the strong pool of papers, this paper was borderline and the authors are encouraged to revise their paper to address the reviewers’ feedback. + + +",ICLR2019,3: The area chair is somewhat confident +yI8b5cVJfm,1610040000000.0,1610470000000.0,1,oSrM_jG_Ng,oSrM_jG_Ng,Final Decision,Reject,"The paper introduces a procedure that uses low attention areas to de-noise temporal prediction. The paper appears to focus on 'temporal noise' as opposed to constant noise present in the video (it may handle shifting shadows, but not background noise). + +The idea is certainly interesting, however, the experimental protocol suffers from the issues pointed out by reviewer 3: +- maintaining the same protocol as prior methods to ensure a direct comparison of the results against reported scores by the sota +- in the context of attention, alignment (or lack or it) is extremely important; assuming perfect alignment is not very realistic +(if the alignment is perfect, one might try a simple method such as taking all readings at a point over time and considering the mode, then correcting any outliers in the off-attention areas) + +These specific issues were not fully addressed during the review period. + +The questions raised by reviewer 2 were addressed to a satisfactory degree in the rebuttal.",ICLR2021, +kxE6wOny4sL,1610040000000.0,1610470000000.0,1,ZJGnFbd6vW,ZJGnFbd6vW,Final Decision,Reject,"The paper proposes the use of contrastive learning to learn patient specific representations from medical data. The authors show how their method can be used to find similar patients within and across datasets. + +The paper has some issues, as indicated by the reviewers: +- similarity to past work; in the response to R1, the authors specify differences to related papers; however, experimental comparisons should still be performed against CLOCS and DROPS +- the evaluation is not fully convincing (the follow up comments of Reviewer 3), including the retrieval of similar patients",ICLR2021, +1Ry-6dYPd,1576800000000.0,1576800000000.0,1,HJe5_6VKwS,HJe5_6VKwS,Paper Decision,Reject,"This submission proposes a method for detecting adversarial attacks using saliency maps. + +Strengths: +-The experimental results are encouraging. + +Weaknesses: +-The novelty is minor. +-Experimental validation of some claims (e.g. robustness to white-box attacks) is lacking. + +These weaknesses were not sufficiently addressed in the discussion phase. AC agrees with the majority recommendation to reject. +",ICLR2020, +Hkes6Ea0JV,1544640000000.0,1545350000000.0,1,rylWVnR5YQ,rylWVnR5YQ,the merit needs to be validated,Reject,"The paper adds a new level of complexity to neural networks, by modulating activation functions of a layer as a function of the previous layer activations. The method is evaluated on relatively simple vision and language tasks. + +The idea is nice, but seems to be a special case of previously published work; and the results are not convincing. Four of five reviewers agree that the work would benefit from: improving comparisons with existing approaches, but also improving its theoretical framework, in light of competing approaches.",ICLR2019,5: The area chair is absolutely certain +zzj19Zi_Lz,1610040000000.0,1610470000000.0,1,FyucNzzMba-,FyucNzzMba-,Final Decision,Reject,"This paper evaluates several methods for physical prediction on the PHYRE benchmark, finding that while object-based methods (e.g. IN, Transformer) perform better in terms of predictive accuracy, pixel-based methods (e.g. STN, Deconv) perform better in terms of downstream task performance. The justification is that it is easier for the agent to evaluate good actions using an image-based representation rather than an object-based representation. + +Pros: +- Important attempt to catalogue the current state of the field of physical reasoning +- Improved baselines on PHYRE + +Cons: +- As pointed out by R5, there is a failure to evaluate any hybrid pixel-relational methods, such as OP3, R-NEM, C-SWM, etc. Given that the paper's main contribution is its assessment of the current state of the field (in the authors' own words: ""providing a realistic picture of the current state of the field""), this seems like a major oversight to me. +- As pointed out by several reviewers, the analysis itself is somewhat limited. I don't see it as a problem that the paper does not propose any new methods, but in that case it needs to present a more thorough picture of why certain methods work better in some cases. For example, I share R1's concern that the Dec model performs worse than the identity function. Can you provide more detailed analysis demonstrating why the latent space is more useful? Can you demonstrate in what cases the object-based classifiers struggle, and why? Incorporating more careful hypotheses and ablations I think would help a lot in turning this into a much stronger paper. + +I don't think it's a problem that the paper relies solely on 2D, fully-observed environments (many other papers on physical reasoning do this, so I think it's a reasonable choice), and I don't think it's a problem that the paper does not propose a new method. But I do find myself agreeing with the reviewers that the evaluations done within this context are insufficient. In the rebuttal, the authors emphasize the various conclusions stemming from the results (regarding the effect of model error, the extent of generalization, what ""accuracy"" means), but these conclusions are not that surprising (model error is a well-known problem in MBRL, deep models are notorious for their failure to achieve strong generalization, and the limitations of pixel accuracy have spawned whole research areas, such as contrastive and adversarial approaches). Again, I don't think the lack of surprising conclusions is itself an issue. But, the fact that the paper does not really make an attempt to explain any nuances or details regarding the conclusions makes it hard to draw a clear contribution from the paper; in that sense, I don't feel the paper really provides ""clear guidance"" as is argued in the rebuttal. + +I do think this paper is very close to being acceptable, and could make a great submission to a future conference if the authors can spend a bit more time on (1) the baselines (i.e., incorporating hybrid models, and ensuring all methods pass basic gut checks) and (2) supporting their conclusions with more detailed analyses.",ICLR2021, +Cabxfw9TFFR,1642700000000.0,1642700000000.0,1,6yVvwR9H9Oj,6yVvwR9H9Oj,Paper Decision,Accept (Poster),"Addressed semi-supervised learning with the MNAR setting. Well written paper. +Several additional experiments were reported in response to the reviewer questions. +General agreement amongst reviewers.",ICLR2022, +SafTnEejIfi,1642700000000.0,1642700000000.0,1,JEoDctbwCmP,JEoDctbwCmP,Paper Decision,Reject,"The paper introduces a framework for enforcing constraints into deep NNs used for modeling spatio-temporal dynamics characterizing physical systems. The authors consider different types of constraints (pointwise, differential and integral). They start from a formulation approximating PDEs as set of ODEs (method of lines). Their main idea is to approximate the solution of the equations using an interpolant between observations and imposing the constraints on this approximation function. The interpolant is built using basis functions located at observation points. The formalism considers irregular spatial grids and both soft and hard constraints. The main claim is then the introduction of a general formalism for considering different types of constraints on irregular grids. Experiments illustrate the behavior of the proposed method on different types of evolution equations and constraints. + +The reviewers agree that the proposed approach is interesting and that some of the ideas are original. However, they also consider that the paper is not convincing enough to demonstrate the interest and novelty of the approach, compared to alternative methods. The experimental section mainly considers (except for one application) regular grids and constraints that could be handled by other methods as well. The authors should present cases where their method provides a clear advantage, distinct from existing solutions. The authors provided a well-argued rebuttal, clarifying several points. However, all reviewers retained their original scores and encourage the authors to further develop the experimental analysis to present a stronger paper. In addition, the presentation could be improved, and some technical aspects better explained (e.g., description of interpolation methods, and some advice on which interpolant to choose for a given problem).",ICLR2022, +kLCrV6CyN,1576800000000.0,1576800000000.0,1,BkevoJSYPB,BkevoJSYPB,Paper Decision,Accept (Spotlight),"This paper proposes a method for efficiently training neural networks combined with blackbox implementations of exact combinatorial solvers. + +Reviewers and AC agree that it is a well written paper with a novel idea supported by good experimental results. Experimental results are of small scale and can be further improved, but the authors acknowledged this aspect well. + +Hence, I recommend acceptance.",ICLR2020, +YSxvkTOvq3M,1642700000000.0,1642700000000.0,1,ei3SY1_zYsE,ei3SY1_zYsE,Paper Decision,Accept (Poster),"In the paper, it introduces a forget-and-relearn framework to the iterative learning algorithm. It provides serval new insight that forgetting could be favorable to learning and validates the insights via image classification and language tasks. The idea is novel and inspiring. Although there are some debates on the experiment and the generality of the proposed method, I think authors answered those questions decently and many researchers would be interested in this direction.",ICLR2022, +auFGi97Ew,1576800000000.0,1576800000000.0,1,HkxU2pNYPH,HkxU2pNYPH,Paper Decision,Reject,"This paper proposes to improve the faithfulness of data-to-text generation models, through an attention-based confidence measure and a variational approach for learning the model. There is some reviewer disagreement on this paper. All agree that the problem is important and ideas interesting, while some reviewers feel that the methods are insufficiently justified and/or the results unconvincing. In addition, there is not much technical novelty here from a machine learning perspective; the contribution is to a specific task. Overall I think this paper would fit in much better in an NLP conference/journal.",ICLR2020, +eVpRQgREJMJ,1610040000000.0,1610470000000.0,1,6fb4mex_pUT,6fb4mex_pUT,Final Decision,Reject,"**Problem significance** This paper proposes an attack mechanism in the latent space of a neural network f(x), which produces out-of-distribution examples. The AC agrees reviewers on the significance of the OOD detection problem, particularly addressing the vulnerability aspect is relevant and of great interest to the community. + +**Technical contribution** The AC shares the concern with several reviewers on the limited technical novelty as well as the problem formulation. While the authors have clarified the difference between adversarial attack vs. OOD attack, the underlying attack mechanism is not new to the community (except for allowing for a larger degree of search space without constrained by the visual imperceptibility). In some sense, the search is made easier than the standard adversarial attack by removing the similarity constraint. Given the unrealisticness of the created OOD examples (largely noisy patches), the AC thinks perhaps a more interesting problem is to look at naturally occurring OOD examples that would lead to the similar latent encoding w.r.t in-distribution data, or adversarial robustness w.r.t the OOD detector. This to me, would steer the community in the right direction. + +From a problem formulation perspective, the AC thinks it's useful to differentiate three highly related attacks (that are distinct but can cause confusions): + +- adversarial attack w.r.t the classifier +- OOD attack w.r.t the classifier +- adversarial attack w.r.t the OOD detector (see recent works [1][2][3] which considered the robustness aspect of OOD detector) + + +**Rebuttal feedback** The AC recognizes the effort made by the authors to address the concerns and comments raised by reviewers. The AC agrees with R1/R2/R3 that the additional experiments are valuable, however, the changes to the manuscript are substantial enough to deem another round of review in the future venue. The paper can improve with better organization and presentation, moving the results in the appendix to the main paper. + +**Recommendation** The AC recommends rejection. + +References + +[1] Sehwag et al. Analyzing the robustness of open-world machine learning. 2019 + +[2] Hein et al. Why relu networks yield high-confidence predictions far away from the training data and how to mitigate the problem. 2019 + +[3] Chen et al. Informative Outlier Matters: Robustifying Out-of-distribution Detection Using Outlier Mining. arXiv:2006.15207 + + + +",ICLR2021, +HJmb3G8ug,1486400000000.0,1486400000000.0,1,HyNxRZ9xg,HyNxRZ9xg,ICLR committee final decision,Reject,There is consensus among the three reviewers that (1) the originality of the proposed approach is limited and (2) the experimental evaluation is too limited in that it lacks strong baseline models as well as an ablation study that explores the different aspects of the proposed model.,ICLR2017, +oFyQHUGDhwa,1610040000000.0,1610470000000.0,1,xjXg0bnoDmS,xjXg0bnoDmS,Final Decision,Accept (Poster),"This paper studies the link between generalization behavior and ""flatness"" of the loss landscape in deep networks. Specifically, the authors study two measures of flatness (local entropy and local energy), and show that these two measurements are strongly correlated with one another. Moreover they show via a careful set of numerical experiments that two previously proposed algorithms (entropy SGD and replica SGD) that optimize for local entropy tend to both find flatter minima as well as provide better generalization. + +Despite the fact that the paper proposes no new models or algorithms, the experiments are compelling and provide non-trivial insights into predicting generalization behavior of deep networks, as well as solid evidence on the benefits of entropy regularization in SGD. The authors also seem to have satisfactorily answered the (numerous) initial concerns raised by the authors. Overall, I recommend an accept.",ICLR2021, +#NAME?,1610040000000.0,1610470000000.0,1,LT0KSFnQDWF,LT0KSFnQDWF,Final Decision,Reject,"We thank the authors for their detailed responses and the revised version, which addresses several of the questions raised by the reviewers. + +The paper is correct and clearly written. All reviewers agree that the idea to add structural features in the message passing of graph neural networks is sensible. While different from previous work, the novelty is a bit incremental though, particularly given the previous work on colored graph neural network. The significance of the work is weak, given 1) the need to select ""by hand"" structural features that are passed as information, 2) the increased time complexity to compute the structural features compared to other GCNN, and 3) the experimental results that suggest that the benefit of the new approach is limited, particularly on challenging task. + +To summarize, this is not a bad paper, but we consider it below the standard of ICLR in terms of originality and significance.",ICLR2021, +OOECvSUdrjT,1642700000000.0,1642700000000.0,1,Yp4sR6rmgFt,Yp4sR6rmgFt,Paper Decision,Reject,"This paper was reviewed by four experts in the field and received mixed scores (1 borderline accept, 3 borderline reject). The reviewers raised their concerns on lack of novelty, unconvincing experiment, and the presentation of this paper. AC feels that this work has great potential, but needs more work to better clarify the contribution and include additional ablated study. The authors are encouraged to consider the reviewers' comments when revising the paper for submission elsewhere.",ICLR2022, +RmohApoe6a,1642700000000.0,1642700000000.0,1,3HJOA-1hb0e,3HJOA-1hb0e,Paper Decision,Accept (Poster),"This paper introduces a method to determine which precision to use for the weights, as well as a quantisation method using hysteresis to improve performance with low-precision weights, including 4-bits. +Reviewers tend to agree that the two points presented are useful and can have a large impact on the field. +Generally, reviewers pointed out that motivations, notations and experimental studies could be improved. This has been partly addressed by the authors. +I recommend to accept this paper for ICLR 2022.",ICLR2022, +HyF-IkaBz,1517250000000.0,1517260000000.0,725,rybDdHe0Z,rybDdHe0Z,ICLR 2018 Conference Acceptance Decision,Reject,"This paper tries to establish that LSTMs are suitable for modeling neural signals from the brain. However, the committee and most reviewers find that results are inconclusive. Results are mixed across subjects. We think it would have been far more interesting to compare other types of sequence models for this task other than the few simple baselines implemented here. It is also unclear what is the LSTM learning extra in contrast with the other models presented in the paper.",ICLR2018, +v0hOoPNtCW,1576800000000.0,1576800000000.0,1,ByxoqJrtvr,ByxoqJrtvr,Paper Decision,Reject,"The authors present an algorithm that utilizes ideas from imitation learning to improve on goal-conditioned policy learning methods that rely on RL, such as hindsight experience replay. Several issues of clarity and the correctness of the main theoretical result were addressed during the rebuttal period in way that satisfied the reviewers with respect to their concerns in these areas. However, after discussion, the reviewers still felt that there were some fundamental issues with the paper, namely that the applicability of this method to more general RL problems (complex reward functions rather than signle state goals, time ) is unclear. The basic idea seems interesting, but it needs further development, and non-trivial modifications, to be broadly applicable as an approach to problems that RL is typically used on. Thus, I recommend rejection of the paper at this time.",ICLR2020, +qJVJ_Hzi6Nf,1610040000000.0,1610470000000.0,1,UJRFjuJDsIO,UJRFjuJDsIO,Final Decision,Reject,"The reviewers raised a number of concerns, but +the authors provided no rebuttal to the reviewers' comments. + +One reviewer felt the experimental fitting was not thorough enough. +Suppose one used layers of oriented bandpass filters, separated by +non-linearities, would that perform well on the task convnets are +trained on? + +The AC doesn't agree with the arguments of R3. I hope the comments of +the reviewers, particularly the many specific comments of reviewers R1 and +R2, will be helpful to you as you revise the manuscript. + +The AC feels a more thorough experimental evaluation, and following-up +on many of the suggestions of the reviewers will lead to a strong +paper. As it stands, however, with 3 recommendations for rejection (1 +weak), and only 1 weak recommendation for acceptance, we need to reject. +",ICLR2021, +UYCME7I-FK1,1642700000000.0,1642700000000.0,1,5XmLzdslFNN,5XmLzdslFNN,Paper Decision,Accept (Poster),"The paper presents a method for compositional task learning in the continual RL setting, by composing and reconfiguring neural modules. The method is evaluated on mini-grids and simulated robot manipulation tasks. + +The reviewers agree, and I concur, that the paper proposes an interesting solution to a difficult and important problem. The paper is well presented and would make a good addition to the multi-task continual learning. The reviewers appreciate the authors' responses and the improvement to the manuscript, and in particular the extra experiments with the wrong number of modules. + +The final version of the paper should: + +- Clarify the explanation of functional modularity +- Move the relevant pieces to the main text. +- See Gur et al., NeurIPS 2021, https://openreview.net/forum?id=CeByDMy0YTL for a definition of learnable compositional tasks via Petri Net formalism. + +Reviewers appreciate the extra experiments with the wrong number of modules.",ICLR2022, +0TJZkVmQluM,1610040000000.0,1610470000000.0,1,4emQEegFhSy,4emQEegFhSy,Final Decision,Reject,"The paper extends previous work on intrinsic reward design based on curiosity or surprise toward multiple intrinsic rewards based multiple model predictions and fuse the reward using meta-gradient optimization. While most reviewers find the paper clearly written, several reviewers do bring up the concern on limited contribution of the work on top of existing ones. Reviewers also would like to see experiments conducted in environment with sparse reward rather than the delayed reward setting constructed from dense reward environments. More ablation studies on the different design choices will also be helpful. ",ICLR2021, +pzaA-1Wmgn,1576800000000.0,1576800000000.0,1,HylhuTEtwr,HylhuTEtwr,Paper Decision,Reject,"The consensus among all reviewers was to reject this paper, and the authors did not provide a rebuttal.",ICLR2020, +SkkE3GLde,1486400000000.0,1486400000000.0,1,SJ-uGHcee,SJ-uGHcee,ICLR committee final decision,Reject,"The reviewers generally agreed that exploring policy search methods of this type is interesting, but the results presented in the paper are not at the standard required for publication. There are no comparisons of any sort, and the only task that is tested is trivially simple, so it's impossible to conclude anything about the effectiveness of the method. Despite the author promising to add additional experiments, nothing was added in the current draft. Besides this, reviewers raised concerns about the relevance of this approach to ICLR. The crucial point here is that it's unclear if the method will scale -- while there is nothing wrong in principle in proposing a general policy search method, there isn't really a compelling argument that can be made that it is suitable for learning representations if there is no plausible story for how it will scale to sufficiently complex policy classes that can actually learn representations.",ICLR2017, +T_h2h6jJJIg,1610040000000.0,1610470000000.0,1,c5klJN-Bpq1,c5klJN-Bpq1,Final Decision,Reject,"The paper provides a neural generalization of decision trees with the idea of maintaining interpretability. The approach falls a bit short on theoretical grounds. For example, the main theorem portraying interpretability isn't properly defined and some definitions appear implicitly in the proof. The view of decision trees as a sequence of soft decisions appears to need to model how the full probability distribution over the nodes propagates at each depth. A much stronger case for interpretability (rather than assuming that each T_i is interpretable) should be made if this is kept as one of the main arguments for the architecture. Interpretability of decision trees does not directly carry over to these models. + ",ICLR2021, +1EWneJj9_7,1610040000000.0,1610470000000.0,1,_WnwtieRHxM,_WnwtieRHxM,Final Decision,Accept (Spotlight),"The paper studies the effect of importance weighting schemes on the implicit bias of gradient descent in deep learning models. It provides several theoretical results which give important insights on the effect of the importance weighting scheme on the limit of the convergence, as well as convergence rates. Results are presented for linear separators and deep learning models. A covariate shift setting is also studied. The theoretical results are supported with empirical demonstrations, and also lead to useful insights regarding which weighting schemes are expected to be more helpful. They also explain some previously observed empirical phenomena. + + +Pros: +- New theoretical results which provide important insights on an important topic +- Empirical demonstrations which support the theoretical results + + +Cons: +- No significant issues. +",ICLR2021, +KyWqTRlHy3k,1642700000000.0,1642700000000.0,1,g5odb-gVVZY,g5odb-gVVZY,Paper Decision,Reject,"The paper develops an instance of physics informed neural network inspired from multigrid methods for solving PDEs. The proposed framework describes the solution of a PDE problem as the sum of terms operating at different resolutions. Training is performed by an iterative optimization algorithm that alternates between the different resolution models. Experiments are performed on 1D and 2D problems. + +All the reviewers agree on the originality and the potential of the proposed method. They however all consider that the current version of the work is too preliminary both in the form and in the content. The experimental contribution should be developed further with tests performed on more complex problems and complementary analyses. Some of the claims should be given more evidence or moderated. It also appeared during the discussion that the models are not well tuned, making the results inconclusive. The authors are encouraged to develop and strengthen their work.",ICLR2022, +99xg5bekPcw,1642700000000.0,1642700000000.0,1,IptBMO1AR5g,IptBMO1AR5g,Paper Decision,Reject,"This paper regularizes deep neural networks via the Hessian trace. The algorithm is based on Hutchison’s method, further accelerated via dropout. Connection to the linear stability of dynamical system is discussed. The proposed regularization shows favorably in the experimental results. + +The idea of the method is clear. The paper’s writing needs a lot of improvement because there are a number of grammatical errors. The major technical concerns include: a) the experimental results are still not convincing; b) the explanation of favoring instability in the dynamical system that resorts to overfitting prevention (reviewer GDik). I’ve read the rebuttal, but remain unconvinced.",ICLR2022, +A4PZvJF0Opk,1610040000000.0,1610470000000.0,1,IVwXaHpiO0,IVwXaHpiO0,Final Decision,Reject," +The paper developed a method that estimates treatment effect with +longitudinal observational data under temporal confounding. It extends +the idea of the synthetic control method and offers flexibility and +ease of estimation. However, some major concerns remain after the +discussion among the reviewers. In particular, the proposed method +lacks a clear use case. Moreover, some arguments around +``trustworthiness`` (detecting unreliable ITE estimates) and ``avoid +over-matching`` need to be refined. The error bound for +``trustworthiness`` can not detect hidden confounding. For overlap +issues, the rejection of units with larger error could be overly +conservative because the error bound may often be too loose. Regarding +``avoid over-matching``, while SyncTwin uses a low-dimensional +representation as opposed to the whole x vector for matching, it is +unclear whether SyncTwin can avoid over-match. It is possible that +using low-dimensional representation makes it easier to find a match +in the data and may still over-match. Finally the paper would benefit +from proper causal identification results.",ICLR2021, +G-gHXQR7rwV,1642700000000.0,1642700000000.0,1,d71n4ftoCBy,d71n4ftoCBy,Paper Decision,Accept (Poster),"Dear Authors, + +The response you have provided, based on the main concerns of reviewers, have answered most of the questions raised. +As far as I understand from the added experiments you have provided, the proposed methodology shows resilience in being at least as good as state of the art approaches, while at the same time it is a mathematically interesting approach. + +Your response has covered concerns like comparison to other communication techniques (the comparison list is not complete, but yet your effort is appreciated), adding discussion on the rank parameter, add comments on expressiveness and the connection with low-rank parameterization, etc. + +These efforts cannot be overlooked, and for that reason I suggest acceptance (poster). + +Best + +AC",ICLR2022, +BG97Z_03q3,1642700000000.0,1642700000000.0,1,ckZY7DGa7FQ,ckZY7DGa7FQ,Paper Decision,Accept (Poster),"The paper proposes to fine tune the belief states of a MDP, for later using the learned model for decision-time planning, e.g. via search. +The contribution is well-presented, motivated and focused to a specific scenario, which is generally considered challenging in the literature. This scenario is exemplified by the cooperative card game Hanabi, which takes the role of the benchmarking setting for the empirical evaluation of the fine-tuning procedure. + +The major concern raised in the review and discussion phases are about the limited evaluation, which is centered around only Hanabi, as well as the magnitudes of the improvements over previous baselines. However, three knowledgeable reviewers agreed that since the setting has been historically challenging, the reported improvements are in fact significant and potentially inspiring future works in this direction. + +The paper is accepted provided that the authors include and polish in the camera-ready the additional experiments over the parameter sensitivities, the ablation tests and the discussions highlighted by the reviewers in the comments.",ICLR2022, +H10TrkaBf,1517250000000.0,1517260000000.0,675,SySpa-Z0Z,SySpa-Z0Z,ICLR 2018 Conference Acceptance Decision,Reject,All reviewers have acknowledged that the proposed regularization is novel and also results in some empirical improvements on the reported language modeling and image classification tasks. However there are serious concerns on writing and rigor (reviewers Anon1 and Anon3) of the paper. The authors have not uploaded any revision of the paper to address these concerns.,ICLR2018, +vPw0IGr0r-,1610040000000.0,1610470000000.0,1,enhd0P_ERBO,enhd0P_ERBO,Final Decision,Reject,"This paper looks at a natural application of robust learning for vehicle routing. The paper introduces some new ideas for this RL problem; although the problem has been considered before. The paper gives a nice algorithm with extensive experimental contributions. + +The paper has some shortcomings. The reviewers found there to be a lack of clarity in the mathematical definitions. Moreover, there were modeling choices that the reviewers felt needed more thorough explanation. For these reasons, this paper falls below the bar. The authors are encouraged to revise the manuscript taking these concerns into consideration. + +",ICLR2021, +SylDJXSygE,1544670000000.0,1545350000000.0,1,BJgvg30ctX,BJgvg30ctX,"Fascinating perspective with promising initial results, but needs more careful comparison to other regularization methods",Reject,"This paper proposes an approach to regularizing classifiers based on invertible networks using concepts from the information bottleneck theory. Because mutual information is invariant under invertible maps, the regularizer only considers the latent representation produced by the last hidden layer in the network and the network parameters that transform that representation into a classification decision. This leads to a combined ℓ1 regularization on the final weights, W, and ℓ2 regularization on W^{T} F(x), where F(x) is the latent representation produced by the last hidden layer. Experiments on CIFAR-100 image classification show that the proposed regularization can improve test performance. The reviewers liked the theoretical analysis, especially proposition 2.1 and its proof, but even after discussion and revision wanted a more careful empirical comparison to established forms of regularization to establish that the proposed approach has practical merit. The authors are encouraged to continue this line of research, building on the fruitful discussions they had with the reviewers.",ICLR2019,4: The area chair is confident but not absolutely certain +DqFCUHarBE,1576800000000.0,1576800000000.0,1,Hklr204Fvr,Hklr204Fvr,Paper Decision,Accept (Poster),"The AC has carefully looked at the paper/comments/discussion in order to arrive at this meta-review. + +Looking over the paper, the FGL layer is an interesting idea, but its utility is only evaluated in a limited setting (fMRI data), rather that other types of images/data. Also, the approach seems to work on some of the fMRI datasets, on others the performance is on par with the baselines. + +Overall, the paper is borderline but the AC believes the paper would be a good contribution to the conference.",ICLR2020, +A8K8yT-c-Yo,1642700000000.0,1642700000000.0,1,0EXmFzUn5I,0EXmFzUn5I,Paper Decision,Accept (Oral),"The authors propose a multi-resolution pyramidal attention mechanism to capture long-range dependencies in time series forecasting, achieving linear time and space complexity. The authors conduced an extensive set of experiments and ablation studies demonstrating that the proposed method consistently outperforms the state-of-the-art and provided evidence for the various components of the architecture. They also provided a proof guarantee the linear complexity of long sequence encoding and adequately addressed the concerns raised by the reviewers. The additional additional benchmarks conducted by the author further demonstrated the strong performance of the method. All reviewers agreed that this work makes a solid contribution to the field.",ICLR2022, +JnybjR4XX91R,1642700000000.0,1642700000000.0,1,tgcAoUVHRIB,tgcAoUVHRIB,Paper Decision,Accept (Poster),"This paper focuses on answering complex logical queries over an incomplete KG and use neural networks to do so flexibly handling multiple operations from FOL. Overall reviews agree that empirical performance is impressive. One reviewer gave a strong accept, one leaning to accept and two leaning to reject. Overall, the reviewers who are leaning to reject had mostly clarity issues which seem to have been addressed by the authors (without response from reviewers). +Given this I recomment acceptance.",ICLR2022, +SJlJE16rf,1517250000000.0,1517260000000.0,259,BkrSv0lA-,BkrSv0lA-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"While novelty is not the main strength of this paper, there is consensus that presentation is clear and the experimental results are convincing. Given the practical importance of designing and benchmarking methods to compactify deep nets, the paper deserves to be presented at ICLR-2018.",ICLR2018, +8CVG2PiTZf,1576800000000.0,1576800000000.0,1,Bke-6pVKvB,Bke-6pVKvB,Paper Decision,Reject,"This paper proposes a GAN-based approach to producing poisons for neural networks. While the approach is interesting and appreciated by the reviewers, it is a legitimate and recurring criticism that the method is only demonstrated on very toy problems (MNIST and Fashion MNIST). During the rebuttal stage, the authors added results on CIFAR, although the results on CIFAR were not convincing enough to change the reviewer scores; the SOTA in GANs is sufficient to generate realistic images of cars and trucks (even at the ImageNet scale), while the demonstrated images are sufficiently far from the natural image distribution on CIFAR-10 that it is not clear whether the method benefits from using a GAN. It should be noted that a range of poisoning methods exist that can effectively target CIFAR, and SOTA methods (e.g., poison polytope attacks and backdoor attacks) can even target datasets like ImageNet and CelebA.",ICLR2020, +5vwrmHgibj3,1610040000000.0,1610470000000.0,1,xHKVVHGDOEk,xHKVVHGDOEk,Final Decision,Accept (Poster),"This paper examines under what conditions influence estimation can be applied to deep networks and finds that, among of items, that influence estimates are poorer for deeper architectures, perhaps due to poor inverse Hessian vector approximations for poor for deeper models. The authors provide an extensive experimental evaluation across datasets and architectures, and demonstrates the fragility of influence estimates in a number of conditions. Although the reviewers noted that these issues are now ""folk knowledge"", there has been less scientific effort in identifying these failures. + +Of course, more theoretical understanding would help the community better understand where these fragilities lie, but the experimental evaluation is sufficiently strong to be of broad interest to the community.",ICLR2021, +srmqhfBkrDV,1610040000000.0,1610470000000.0,1,U850oxFSKmN,U850oxFSKmN,Final Decision,Reject,"This paper sits right at the borderline: the reviewers agree that it is interesting and addresses a relevant problem. On the negative side, the presentation could be improved (including some incorrect claims), and the experiments could be strengthened (both in terms of baselines and datasets used). Ultimately, the paper will probably require another round of reviews before it is ready for publication.",ICLR2021, +GC2bFzlPlj,1610040000000.0,1610470000000.0,1,Fmg_fQYUejf,Fmg_fQYUejf,Final Decision,Accept (Poster),"The paper is presenting an important empirical finding. When the learning algorithms are initialized with the same point, the continual and multitask solutions are connected by linear and low-error paths. Motivated by this finding, the paper proposes a new continual learning algorithm based on path regularization. The paper received unanimously good scores. I agree with the reviews and recommend acceptance. ",ICLR2021, +H1eiDZSyxV,1544670000000.0,1545350000000.0,1,HklbTjRcKX,HklbTjRcKX,"Neat approach, but more validation is needed",Reject,"This paper explores an approach to testing the information bottleneck hypothesis of deep learning, specifically the idea that layers in a deep model successively discard information about the input which is irrelevant to the task being performed by the model, in full-scale ResNet models that are too large to admit the more standard binning-based estimators used in other work. Instead, to lower-bound I(x;h), the authors propose using the log-likelihood of a generative model (PixelCNN++). They also attempt visualize what sort of information is lost and what is retained by examining PixelCNN++ reconstructions from the hidden representation at different positions in a ResNet trained to perform image classification on the CINIC-10 task. To lower-bound I(y;h), they perform classification. In the experiments, the evolution of the bounds on I(x;h) and I(y;h) are tracked as a function of training epoch, and visualizations (reconstructions of the input) are shown to support the argument that color-invariance and diversity of samples increases during the compression phase of training. These tests are done on models trained to perform either image classification or autoencoding. This paper enjoyed a good discussion between the reviewers and the authors. The reviewers liked the quantitative analysis of ""usable information"" using PixelCNN++, though R2 wanted additional experiments to better quantify the limitations of the PixelCNN++ model to provide the reader with a better understanding of plots in Fig. 3, as well as more points sampled during training. Both R2 and R3 had reservations about the qualitative analysis based on the visualizations, which constitute the bulk of the paper. Unfortunately, the PixelCNN++ training is computationally intensive enough that these requests could not be fulfilled during the ICLR discussion phase. While the AC recommends that this submission be rejected from ICLR, this is a promising line of research. The authors should address the constructive suggestions of R2 and R3 and submit this work elsewhere.",ICLR2019,4: The area chair is confident but not absolutely certain +Z6vmGRsibc8,1610040000000.0,1610470000000.0,1,Px7xIKHjmMS,Px7xIKHjmMS,Final Decision,Reject,"The paper presents a new GNN+ architecture and provide interesting theoretical observations about the architecture. The paper is quite promising and has several interesting insights. However, most of the reviewers believe that the paper is not ready for publication and can be significantly improved by: a) more formal and precise statements, b) clarifying the key points of the paper, c) more thorough experimental validation of the framework on real-world datasets. + +",ICLR2021, +wmHErBaNaFW,1610040000000.0,1610470000000.0,1,RcjRb9pEQ-Q,RcjRb9pEQ-Q,Final Decision,Reject,"The paper received two borderline accept recommendations and one accept recommendation from three reviewers with low confidence and a reject recommendation from an expert reviewer. + +Although all reviewers found that the paper addresses an important and challenging problem of semantically constraining adversarial attacks as opposed to constraining them artificially by an artificial norm ball. However, during the discussion phase it has been pointed out that there were some important weaknesses indicating that the paper may need one more evaluation round. The meta reviewer recommends rejection based on the following observations. + +In terms of evaluation, while it is understandable the authors were unable to compare to Gowal et al. due to the lack of publicly available implementation, showing Song et al.'s adversarials hurt performance and and are farther than the image manifold has been found puzzling, as this was done by Song et al. only to keep human prediction the same while changing model prediction. Furthermore, the paper did not contain a user study similar to Song et al. for a fair comparison Finally, the discussion revealed that the comparison to ""norm-bounded adversarial inputs"" may not have clarified whether this experiment faithfully demonstrates an advantage for the contribution as the norm could be contained to a point where accuracy is not reduced, and the discussion on the certified defense being ""broken"" was inconclusive.",ICLR2021, +pF1ROwBEl7o5,1642700000000.0,1642700000000.0,1,WxuE_JWxjkW,WxuE_JWxjkW,Paper Decision,Accept (Poster),"This paper studies the expressivity, complexity and unpredictability of emergent languages in referential games. The authors defined measures of complexity and unpredictability and empirically showed that the expressivity of emergent languages is a trade-off between the complexity and unpredictability of the context that the languages are used in. They introduced a contrastive loss based training method that alleviates the collapse of message types seen using standard referential loss functions. + +The paper is controversial among the reviewers. On the positive side, most liked how the paper has a clearly stated hypothesis and extensive evaluations which makes a clear contribution to the field of emergent languages. On the negative side, the paper only shows the results in an artificial setting where the key variables are highly simplified (e.g. size of candidates). The main negative review argue the authors used an inappropriate definition of unpredictability and that the batch size is actually the key independent variable instead of what is claimed. The paper does somewhat equate batch size with the candidate size that is so important to their results (after eq (1)), but they seem to measure candidate size in the key figures. Perhaps an experiment controlling for batch size independently of the candidates size can address this issue. On the point of defining unpredictability, the other reviewers and I find the given definition to be reasonable and at least defensible. However, the reviewer remained unconvinced. More generally, the paper relies on one definition of the concepts measured in one setting to make a general claim, which is at risk of missing other important variables. Overall, most reviewers found the scope to be sufficient, and two improved their scores after the discussion. + +Recommendation: accept",ICLR2022, +e6MAbpQaL6,1642700000000.0,1642700000000.0,1,7HhX4mbern,7HhX4mbern,Paper Decision,Reject,"This paper empirically evaluates the performance (in time and accuracy) of randomized signatures for time series, an idea that was developed theoretically in a series of recent paper. While reviewers acknowledge that implementing and testing this idea is relevant, they also consider that the lack of methodological and theoretical novelty, combined with the fact that the experimental results do not convincingly show that randomized signatures outperform existing methods on a variety of tasks, puts the paper below the acceptance bar.",ICLR2022, +RDDEBlKs3P,1576800000000.0,1576800000000.0,1,S1ltg1rFDS,S1ltg1rFDS,Paper Decision,Accept (Poster),"This paper addresses an important and relevant problem in reinforcement learning: learning from off-policy data, taking into account the offsets in the visitation distribution of states. This has the promise of lowering variance even with long horizon roll-outs. Existing methods have required access to the behavior policy (or have required data from the stationary distribution). The novel proposed approach instead uses an alternative method, based on the fixed point of the ""backward flow"" operator, to calculate the importance ratios required for policy evaluation in discrete and continuous environments. + +In the initial version of the submission, several concerns were expressed regarding both the quality of the paper and clarity. The authors have updated the paper to address these concerns to the satisfaction of the reviewers, who are now unanimously in favor of acceptance. ",ICLR2020, +zRMbXp6-mGV,1642700000000.0,1642700000000.0,1,5o7lEUYRvM,5o7lEUYRvM,Paper Decision,Reject,"A variational function-space prior is proposed, resulting in a variational Dirichlet posterior. After rebuttal, reviewers still had many remaining questions or concerns about the paper. For instance, rF5E outlines several concerns, many relating to factorization assumptions. Reviewer 7nPR also provides several suggestions. I will not repeat them here, but do encourage the authors to look closely at these questions and suggestions. At this particular time the paper is not strongly resonating with reviewers, but could be updated so that the value of the contributions is more obvious.",ICLR2022, +BJew6nYge4,1544750000000.0,1545350000000.0,1,ByxkCj09Fm,ByxkCj09Fm,meta-review,Reject,"The paper proposes to take into accunt the label structure for classification +tasks, instead of a flat N-way softmax. This also lead to a zero-shot setting +to consider novel classes. Reviewers point to a lack of reference to prior +work and comparisons. Authors have tried to justify their choices, but the +overall sentiment is that it lacks novelty with respect to previous approaches. +All reviewers recommend to reject, and so do I.",ICLR2019,4: The area chair is confident but not absolutely certain +b-sCkxegnsX,1610040000000.0,1610470000000.0,1,yvuk0RsLoP7,yvuk0RsLoP7,Final Decision,Reject,"This paper presents a framework for adversarial robustness by incorporating local and global structures of the data manifold. In particular, the authors use a discriminator-classifier model, where the discriminator tries to differentiate between the original and adversarial spaces and the classifier aims to classify between them. The authors implement the proposed approach on several datasets and the experimental results demonstrate performance improvements. The idea of using the global data manifold into addressing robustness of the learning model is interesting. However, the technical contribution and novelty have not been explained very well.",ICLR2021, +frgNg4eJunf,1610040000000.0,1610470000000.0,1,1Kxxduqpd3E,1Kxxduqpd3E,Final Decision,Reject,"The paper is proposing a novel representation of the GradNorm. GradNorm is presented as a Stackelberg game and its theory is used to understand and improve the convergence of the GradNorm. Moreover, in addition to the magnitude normalization, a direction normalization objective is added to the leader and a rotation matrix and a translation is used for this alignment. The paper is reviewed by three knowledgable reviewers and they unanimously agree on the rejection. Here are the major issues raised by the reviewers and the are chair: +- The motivation behind the rotation matrix layers is not clear. It should be motivated in more detail and explained better with additional illustrations and analyses. +- Empirical study is weak. More state of the art approaches from MTL should be included and more realistic datasets should be included. +- The proposed method is not properly explained with respect to existing methods. There are MTL methods beyond GradNorm like PCGrad and MGDA (MTL as MOO). These methods also fix directions. Hence, it is not clear what is the relationship of the proposed method with these ones. + +I strongly recommend authors to improve their paper by fixing these major issues and submit to the next venue.",ICLR2021, +HkYdnzL_x,1486400000000.0,1486400000000.0,1,BkJsCIcgl,BkJsCIcgl,ICLR committee final decision,Reject,"There is potential here for a great paper, unfortunately in its current form there is too deep of a disconnect between the framing and promise of the presentation, and the empirical validation actually delivered by the experiments. + The choice of the experimental setting (iid input from fixed distributions in a pattern recognition setting where targets are between 0 and 1) is too narrow to allow for convincing validation of the central hypothesis of the paper, namely, that the architecture (a recurrent convnet with sigmoid gating) can be useful for problems involving planning. Instead, it simply shows that the architecture is better at outputting the targets than other deep architectures without sigmoid gating. + I strongly encourage the authors to add more ambitious experiments to keep the empirical arm more in step with the stated promises of the set-up.",ICLR2017, +AO_5bl5KLcG,1642700000000.0,1642700000000.0,1,AB2r0YKBSpD,AB2r0YKBSpD,Paper Decision,Reject,"This paper analyzes the data scaling laws in NMT tasks with different network architectures and data qualities. The main purpose of this paper is to investigate how such different experimental setup affects the scaling law. The authors found that those difference does not have strong impact on the scaling exponent, and a small difference of model architecture and data noise can be compensated by larger data size. + +This paper gives nice justification of data scaling law from some different aspects which is instructive to some extent. On the other hand, the paper has some weakness as listed in the following: (1) The scaling law itself has been analyzed by many papers, and its novelty is rather limited. I acknowledge that this paper investigates different aspects of the data scaling law and the size of experiments are larger than existing work. However, the result is rather unsurprising. (2) The experiments are conducted mostly on one language pair (English-to-German), it is still unclear whether the findings are universal to other language pairs. As the authors responded, exhaustive experiments over all language pairs are unrealistic but some more investigation to more general data sets could be conducted to strengthen the paper. + +This paper is around the borderline. Some reviewers were rather positive to this paper. However, they also pointed out the concerns I listed above and they do not show strong support on the paper. +In summary, although this paper shows some instructive findings, it is still a bit below the threshold of acceptance.",ICLR2022, +53t9z6n6hII,1610040000000.0,1610470000000.0,1,o81ZyBCojoA,o81ZyBCojoA,Final Decision,Accept (Poster),"The reviewers’ main concern was a lack of experiments, and additional experiments were provided by the authors. While the rebuttal was not addressed by the reviewers, the AC feels that the rebuttal did address a number of experimental concerns well enough to justify accepting this paper.",ICLR2021, +rJe8jTzWx4,1544790000000.0,1545350000000.0,1,ryEkcsActX,ryEkcsActX,motivation could be improved,Reject,"The authors propose to accelerate neural architecture search by using feature similarity with a given teacher network to measure how good a new candidate architecture is. The experiments show that the method accelerates architecture search, and has competitive performance. However, both Reviewers 1 and 3 noted questionable motivation behind the approach, as the method assumes that there already exists a strong teacher network in the domain where we architecture search is performed, which is not always the case. The rebuttal and the revised version of the paper addressed some of the reviewers' concerns, but overall the paper remained below the acceptance bar. I suggest that the authors further expand the evaluation and motivate their approach better before re-submitting to another venue. +",ICLR2019,5: The area chair is absolutely certain +ExJmOU7UHz,1576800000000.0,1576800000000.0,1,SygkSkSFDB,SygkSkSFDB,Paper Decision,Reject,"The authors made no response to reviewers. Based on current reviews, the paper is suggested a rejection as majority.",ICLR2020, +SJkaofUOe,1486400000000.0,1486400000000.0,1,ry54RWtxx,ry54RWtxx,ICLR committee final decision,Reject,"There is a general consensus that, though the idea is interesting, the work is not mature enough for a conference publication (e.g., the problem is too toy, not clear that really solves any, even artificial problem, better than existing techniques).",ICLR2017, +59SZvEs_N_,1642700000000.0,1642700000000.0,1,-3Qj7Jl6UP5,-3Qj7Jl6UP5,Paper Decision,Reject,"This paper discusses a relatively new concept called ""magnitude"" for finite metric spaces and investigates its potential applications in machine learning, in particular for computer vision. + +Reviewers generally agree that this is an interesting concept and appreciate the algorithm for reducing its computational cost. +However, there are concerns (1) that the concept is not well-motivated theoretically for machine learning problems +(2) the experimental results, for edge detection and adversarial robustness, are not convincing. More rigorous empirical work should be carried out.",ICLR2022, +f1fISHK8zZ,1610040000000.0,1610470000000.0,1,uCY5MuAxcxU,uCY5MuAxcxU,Final Decision,Accept (Oral),"The paper analyzes the sample complexity of convolutional architectures, proving a gap between it and that of fully connected (fc) networks. The approach builds on certain invariances of fc nets. The reviewers appreciated the technical content and its contribution to understanding the relative advantages of different architecture, as well as the role of invariance. ",ICLR2021, +DWWN_6LZOi,1576800000000.0,1576800000000.0,1,HJlyLgrFvB,HJlyLgrFvB,Paper Decision,Reject,"A method is introduced to estimate the hidden state in imperfect information in multiplayer games, in particular Bridge. This is interesting, but the paper falls short in various ways. Several reviewers complained about the readability of the paper, and also about the quality and presentation of the interesting results. + +It seems that this paper represents an interesting idea, but is not yet ready for publication.",ICLR2020, +QJDjSmFlu4,1610040000000.0,1610470000000.0,1,ysti0DEWTSo,ysti0DEWTSo,Final Decision,Reject,"Reviewers found the construction is very clever and the empirical results are interesting. However, a more thorough theoretical explanation is needed for acceptance. ",ICLR2021, +r1_knzL_x,1486400000000.0,1486400000000.0,1,BJlxmAKlg,BJlxmAKlg,ICLR committee final decision,Reject,"This paper introduces a method for estimating the number of iterations of an attention mechanism in a neural machine reading module using REINFORCE and a custom baseline, which is estimated on the data. Estimating a baseline from data, multi-hop attention, and modelling latent variable models with REINFORCE are not new, so their composition, while sensible, is slightly incremental. There were some concerns from several of the reviewers with the impact the experiments have in terms of validating the model changes. Improvements on ""real"" tasks such as CNN/DailyMail did not impress reviewers, some of whom believe the benchmarks compared to were not representative of the best comparable architectures tried on these datasets (e.g. pointer networks, which definitely can be used). Overall, I do not find this paper strong enough to recommend acceptance to the main conference in its current state. With better evaluation, it could be a decently strong paper if the results come through.",ICLR2017, +#NAME?,1610040000000.0,1610470000000.0,1,jpDaS6jQvcr,jpDaS6jQvcr,Final Decision,Reject,"The paper describes an autoencoder-based approach to anomaly detection. The main weakness—not untypical for papers in this application area—is the experimental section. The problem itself may be not well-defined, and of course that makes practical comparison difficult. Perhaps different measures—e.g., remaining life—may be better to compare on, and give better data sets.",ICLR2021, +SJPn2zI_e,1486400000000.0,1486400000000.0,1,SyK00v5xx,SyK00v5xx,ICLR committee final decision,Accept (Poster),A new method for sentence embedding that is simple and performs well. Important contribution that will attract attention and help move the field forward.,ICLR2017, +Z1n_2e_E2G,1576800000000.0,1576800000000.0,1,B1elCp4KwH,B1elCp4KwH,Paper Decision,Accept (Talk),"The paper is extremely well-written with a clear motivation (Section 1). The approach is novel. But I think the paper's biggest strength is in its very thorough experimental investigation. Their approach is compared to other very recent speech discretization methods on the same data using the same (ABX) evaluation metric. But the work goes further in that it systematically attempts to actually understand what types of structures are captured in the intermediate discrete layers, and it is able to answer this question convincingly. Finally, very good results on standard benchmarks are achieved. + +To authors: Please do include the additional discussions and results in the final paper. ",ICLR2020, +40oFdJAtnOTu,1642700000000.0,1642700000000.0,1,d_2lcDh0Y9c,d_2lcDh0Y9c,Paper Decision,Accept (Poster),"This paper presents a novel method for identifying simuli induced +patterns in MEG and EEG signals. The authors develop a novel statistical +point process model and a fast EM algorithm to learn the parameters. + +Discussion of this paper centered around: how to fit hyperparameters, +and similarity and comparison with other algorithms, especially ICA, as +well as the small number of subjects + +Comparison to other methods would make the work stronger, as would +adding more datasets but this novel algorithm seems worth publishing. +I recommend acceptance as a poster.",ICLR2022, +rsg7Mx14yE,1576800000000.0,1576800000000.0,1,Skx82ySYPH,Skx82ySYPH,Paper Decision,Accept (Poster),This paper proposes a solid (if somewhat incremental) improvement on an interesting and well-studied problem. I suggest accepting it.,ICLR2020, +r1b8hzLdg,1486400000000.0,1486400000000.0,1,rkmDI85ge,rkmDI85ge,ICLR committee final decision,Invite to Workshop Track,"This is a solidly executed paper that received good reviews. However, the originality is a bit lacking. In addition, the paper would have been stronger with a comparison to the method proposed in Zweig et al. (2013). We recommend this paper for the workshop.",ICLR2017, +NAadhlTAJEx,1642700000000.0,1642700000000.0,1,0HkFxvSRDSW,0HkFxvSRDSW,Paper Decision,Reject,"This paper proposes a Role Diversity metric, meant to quantify how different roles are in a multi-agent RL setting. There's actually three versions of this metric, or three aspects (the distinction is not entirely clear to this area chair). + +The reviewers are generally not very enthusiastic about the paper, with scores hovering at or just below the acceptance threshold. There has been extensive discussion between reviewers and authors, but there a sense that there is confusion about the exact purpose and contribution of the paper. This is reinforced by the authors' ""letter to area chair"", which outlines several ways the reviewers have not gotten the message. Reading the paper, it appears to me that the root cause is that the authors are indeed not communicating clearly what the paper contributes and why. It is, after all, the authors' responsibility that the reviewers understand the work. My own impression is that the text is dense and not particularly easy to get through. Perhaps the authors are simply trying to cram too many contributions into a single conference paper? This is a classic error which leads to hard-to-read papers. In addition to this, there is a lingering concern about the generalizability of the proposed methods. + +I think the authors need to work more on their presentation, and perhaps reconsider which parts to include in their paper and exactly which measure they want to send, before they submit to another venue.",ICLR2022, +QJz8AjucXB_,1610040000000.0,1610470000000.0,1,8q_ca26L1fz,8q_ca26L1fz,Final Decision,Reject,"The authors study the expressive power of Graph Neural Network architectures for the link prediction problem and provides theoretical justification for the strong performance of SEAL on link prediction benchmarks. However, The reviewers think the paper needs to improve in several aspects before it can be published: 1. More clearly explain the theoretical analysis and contribution. 2. Extensive and in-depth discussion of the similarities with and difference to the work of Li et al. to show the novelty of current work. ",ICLR2021, +AMKp_qAFhg,1576800000000.0,1576800000000.0,1,rJecbgHtDH,rJecbgHtDH,Paper Decision,Reject,"This paper considers the situation where a set of reinforcement learning tasks are related by means of a Boolean algebra. The tasks considered are restricted to stochastic shortest path problems. The paper shows that learning goal-oriented value functions for subtasks enables the agent to solve new tasks (specified with boolean operations on the goal sets) in a zero-shot fashion. Furthermore, the Boolean operations on tasks are transformed to simple arithmetic operations on the optimal action-value functions, enabling the zero short transfer to a new task to be computationally efficient. This approach to zero-shot transfer is tested in the four room domain without function approximation and a small video game with function approximation. + +The reviewers found several strengths and weaknesses in the paper. The paper was clearly written. The experiments support the claim that the method supports zero-shot composition of goal-specified tasks. The weaknesses lie in the restrictive assumptions. These assumptions require deterministic transition dynamics, reward functions that only differ on the terminal absorbing states, and having only two different terminal reward values possible across all tasks. These assumptions greatly restrict the applicability of the proposed method. The author response and reviewer comments indicated that some aspects these restrictions can be softened in practice, but the form of composition described in this paper is restrictive. The task restrictions also seem to limit the method's utility on general reinforcement learning problems. + +The paper falls short of being ready for publication at ICLR. Further justification of the restrictive assumptions is required to convince the readers that the forms of composition considered in this paper are adequately general. ",ICLR2020, +_bSbiPBqP,1576800000000.0,1576800000000.0,1,ryxnJlSKvr,ryxnJlSKvr,Paper Decision,Reject,"This paper improves DeepBugs by borrowing the NLP method ELMo as new representations. The effectiveness of the embedding is investigated using the downstream task of bug detection. + +Two reviewers reject the paper for two main concerns: +1 The novelty of the paper is not strong enough for ICLR as this paper mainly uses a standard context embedding technique from NLP. +2 The experimental results are not convincing enough and more comprehensive evaluation are needed. + +Overall, this novelty of this paper does not meet the standard of ICLR. +",ICLR2020, +Bkg1sk6Ng4,1545030000000.0,1545350000000.0,1,rJgTTjA9tX,rJgTTjA9tX,Interesting new analysis of function approximation in the presence of sparse latent structure,Accept (Poster),"This paper makes a substantial contribution to the understanding of the approximation ability of deep networks in comparison to classical approximation classes, such as polynomials. Strong results are given that show fundamental advantages for neural network function approximators in the presence of a natural form of latent structure. The analysis techniques required to achieve these results are novel and worth reporting to the community. The reviewers are uniformly supportive.",ICLR2019,5: The area chair is absolutely certain +pBtxlpnTizW,1642700000000.0,1642700000000.0,1,P-gDXxGYCib,P-gDXxGYCib,Paper Decision,Reject,"This paper received a majority voting of rejection. During the internal discussion, all reviewers insisted their original scores. I have read all the materials of this paper including manuscript, appendix, comments and response. Based on collected information from all reviewers and my personal judgement, I can make the initial recommendation on this paper, *rejection*. Here are the comments that I summarized, which include my opinion and evidence. + +**Research Problem** + +In this paper, the authors consider a novel scenario that feature selection in the contrastive setting, where an extra *background* dataset is utilized to remove the background noisy features. However, this problem can be easily handled with a fully supervised feature selection method, where the samples in the *target* datasets are annotated as 1 and the samples in the *background* datasets are annotated as 0. Therefore, the research problem addressed in this paper is not novel. Reviewer UFq8 and ft7b held the same opinion. + +**Technical Points** + +The technical part could be more informative. The whole framework is based on auto-encoder based self-reconstruction, where the feature selection is finished by the recent CAE model. In my eyes, the major contribution of this paper lie in learning $g_z$, the background representation function. To achieve this, the authors proposed three strategies, *joint*, *pretraining* and *gates*. The *pretraining* idea does not involve any information from the target dataset, where the background representation function is a general one and it has no relationship with the target dataset. I believe the concept of background should be defined based on the target dataset. The *joint* idea suffers from the information leak, which was pointed by the authors. We can also see the inferior performance of the joint model, comparing with two other models. Unfortunately, the philosophy of *gates* is unclear. + +**Experimental Evaluation** + +(1) The authors only compared with one supervised method on the semi-synthetic dataset. No results of supervised methods on real-world datasets were reported. (2) The performance with different numbers of selected features were not reported.",ICLR2022, +aN2tKg-2-_,1576800000000.0,1576800000000.0,1,HklsHyBKDr,HklsHyBKDr,Paper Decision,Reject,"Nice start but unfortunately not ripe. The issues remarked by the reviewers were only partly addressed, and an improved version of the paper should be submitted at a future venue.",ICLR2020, +nPEjACZvlTj,1610040000000.0,1610470000000.0,1,dpuLRRQ7zC,dpuLRRQ7zC,Final Decision,Reject,"The paper considers ensambling of smooth classifiers to improve certified robustness. Theoretical results are provided showing that taking ensambles of a large number of models is useful, while experiments show that combining only a small number of models improves performance. On the negative side, the experiments are somewhat inconclusive, as the base models are not state-of-the-art, and the combined results do not achieve state-of-the-art performance. In this respect, further studies would be necessary to explore the effectiveness of the proposed technique. + +In summary, while the topic of the paper is interesting and timely, the proposed ensambling technique is not especially exciting (as it is what one would naturally expect). On the other hand, the problem is reasonably well investigated (e.g., details are worked out well, both theoretical and experimental results are presented), although further experiments are needed (as recommended by the reviewers) to properly assess the potential and limitations of the approach. Accordingly, all reviewers agreed in the discussion that this is a borderline paper. Therefore, unfortunately, it cannot be accepted this time due to the heavy competition at the conference. The authors are encouraged to resubmit a revised version to the next venue, taking into consideration the reviewers' recommendations.",ICLR2021, +ryl6OpaZe4,1544830000000.0,1545350000000.0,1,rklhb2R9Y7,rklhb2R9Y7,Metareview,Reject,"This paper proposes to combine rewards obtained through IRL from rewards coming from the environment, and evaluate the algorithm on grid world environments. The problem setting is important and of interest to the ICLR community. While the revised paper addresses the concerns about the lack of a stochastic environment problem, the reviewers still have major concerns regarding the novelty and significance of the algorithmic contribution, as well as the limited complexity of the experimental domains. As such, the paper does not meet the bar for publication at ICLR.",ICLR2019,4: The area chair is confident but not absolutely certain +YthJDdlm0Ra,1610040000000.0,1610470000000.0,1,lXoWPoi_40,lXoWPoi_40,Final Decision,Reject,"The reviewers are concerned about the novelty of the proposed learning rate schedule, the rigor of the empirical validation, and the relationship between the results and the discussion of sharp vs. local minima. I invite the authors to incorporate reviewers' comments and resubmit to other ML venues.",ICLR2021, +2fPvAqo50zO,1642700000000.0,1642700000000.0,1,htWIlvDcY8,htWIlvDcY8,Paper Decision,Accept (Poster),"This paper presents a meta learning framework to learn novel visual concepts with few examples. The proposed FALCON model uses an embedding prediction module to infer novel concept embeddings. This is done via paired image and text data as well as supplementary sentences. The resulting systems shows improvements on a series of datasets with synthetic and real images. The reviewers were supportive of this submission and praised the novelty, central ideas and experimental setups. + +Concerns included:\ +(a) [2P5Z] Justifying the formulations in this paper and situating it with past work -- ""Why is this an ecologically valid problem formulation?"", ""why a meta-learning approach is the best formulation to tackle the problem?"", ""Why the box embedding space?""\ +(b) [W3YC] More details required about the dataset and approach.\ +(c) [98FU] Failure patterns + +The authors provided detailed responses to these concerns. Concern (a), (b) and (c) were well addressed in the rebuttal and paper, and led to in increase in the reviewers rating. + +Given the above, I recommend acceptance. But I do urge the authors to add the details provided in the rebuttal into the main paper. In particular, the concerns/suggestions by reviewer 2P5Z can hugely help in improving the paper and informing the reader.",ICLR2022, +LBNnCAztZ,1576800000000.0,1576800000000.0,1,H1lTRJBtwB,H1lTRJBtwB,Paper Decision,Reject,"This paper is concerned with improving data-efficiency in multitask reinforcement learning problems. This is achieved by taking a hierarchical approach, and learning commonalities across tasks for reuse. The authors present an off-policy actor-critic algorithm to learn and reuse these hierarchical policies. + +This is an interesting and promising paper, particularly with the ability to work with robots. The reviewers did however note issues with the novelty and making the contributions clear. Additionally, it was felt that the results proved the benefits of hierarchy rather than this approach, and that further comparisons to other approaches are required. As such, this paper is a weak reject at this point.",ICLR2020, +KRHatu7d_q8,1642700000000.0,1642700000000.0,1,73MEhZ0anV,73MEhZ0anV,Paper Decision,Accept (Poster),"This paper introduces a technique to generate L0 adversarial examples in +a black-box manner. The reviews are largely positive, with the reviewers +especially commenting on the paper being well written and clearly explaining +the method. The main drawbacks raised by the reviewers is that the method +is not clearly compared to some prior work, but in the rebuttal the authors +provide many of these numbers. On the whole this is a useful and interesting +attack that would be worth accepting.",ICLR2022, +lHFP-u8Ac2C-,1642700000000.0,1642700000000.0,1,y7tKDxxTo8T,y7tKDxxTo8T,Paper Decision,Reject,"The authors propose zero-shot recommendations, a scenario in which knowledge from a recommender system enables a second recommender system to provide recommendations in a new domain (i.e. new users & new items). The idea developed by the authors is to transfer knowledge through the item content information and the user behaviors. + +The initial assessment of the reviewers indicated that this paper was likely not yet ready for publication. The reviewers all recognized the potential usefulness of zero-shot recommendations but argued that the implications of the proposed setup were somewhat unclear. Most notably, the reviewers raised the issue of how widely applicable this was in terms of distance between source and target domains (presumably the quality of the zero-shot recommendations depends on the distance). + +The reviewers also noted that this was an application paper. This is of course within the CFP, and recommender systems papers have been published at ICLR in the past (for example one of the initial Session-based RecSys paper w. RNNs) but the potential audience for this work is somewhat lower at ICLR. I should also add that I agree with the authors that their model is novel, but it's very much tailored to this application and it was unclear to me how it might be impactful on its own. All in all, this did not play a significant role in my recommendation. + +During the discussion, there were significant, yet respectful, disagreements between the authors and the reviewers. It also seems like perhaps the authors missed an important reply from reviewer hJB8 made available through their updated review (see ""Reply to rebuttal""). So the discussion between reviewers and authors did not converge. Having said that, even the two most positive reviewers have scores that would make this paper a very borderline one (a 6 and a 5). + +Further, I do find that reviewer's hJB8 arguments have merit and require another round of review. In particular, I think the role and effect of your simulated online scenario should be further discussed (note that I did read the new paragraph on it from your latest manuscript). For example, comparing to a baseline that can train with the data from this new domain would be useful even if at some point it ends up being an upper bound on the performance of your approach. I also found the question raised by the reviewer around the MIND results to be pertinent. Further characterizing pairs of domains in which the approach/works fails (even if empirically) would add depth to this paper. + +All in all, this paper has interesting ideas and I strongly encourage the authors to provide a more thorough experimental setup that fully explores the benefits and limitations of their zero-shot approach.",ICLR2022, +eNoHWxhexaF,1610040000000.0,1610470000000.0,1,_zHHAZOLTVh,_zHHAZOLTVh,Final Decision,Reject,"Overview: +This paper introduces a maximum mutual information method for helping to coordinate RL agents without communication. + +Discussion: +Some reviewers leaned towards accept, but I found the two reviewers recommending rejecting to be more convincing. + +Recommendation: +This is an important research topic and I'm glad this paper is focusing on the problem. Hopefully the reviews will help improve a future version of this paper. I agree that this is a new way of using mutual information, but it seems more like a small improvement rather than a very significant step forward. + +In addition, I think the setting needs to be better motivated. This is a centralized training with decentralized execution (CTDE) setting, and this paper helps the agents coordinate. In CTDE, the agents work in the environment and then pool their information to train before deploying on the next episode. I don't understand why, e.g., in multiwalker, agents would not be able to communicate while walking, can communicate after they succeed or drop the object (the episode ends), and then cannot communicate once the next episode starts.",ICLR2021, +VbIIy0sBW5,1576800000000.0,1576800000000.0,1,SkgjKR4YwH,SkgjKR4YwH,Paper Decision,Reject,"This paper builds a connection between MixUp and adversarial training. It introduces untied MixUp (UMixUp), which generalizes the methods of MixUp. Then, it also shows that DAT and UMixUp use the same method of MixUp for generating samples but use different label mixing ratios. Though it has some valuable theoretical contributions, I agree with the reviewers that it’s important to include results on adversarial robustness, where both adversarial training and MixUp are playing an important role.",ICLR2020, +#NAME?,1576800000000.0,1576800000000.0,1,H1livgrFvr,H1livgrFvr,Paper Decision,Reject,"This paper proposes an out-of-distribution detection (OOD) method without assuming OOD in validation. + +As reviewers mentioned, I think the idea is interesting and the proposed method has potential. However, I think the paper can be much improved and is not ready to publish due to the followings given reviewers' comments: + +(a) The prior work also has some experiments without OOD in validation, i.e., use adversarial examples (AE) instead in validation. Hence, the main motivation of this paper becomes weak unless the authors justify enough why AE is dangerous to use in validation. + +(b) The performance of their replication of the prior method is far lower than reported. I understand that sometimes it is not easy to reproduce the prior results. In this case, one can put the numbers in the original paper. Or, one can provide detailed analysis why the prior method should fail in some cases. + +(c) The authors follow exactly same experimental settings in the prior works. But, the reported score of the prior method is already very high in the settings, and the gain can be marginal. Namely, the considered settings are more or less ""easy problems"". Hence, additional harder interesting OOD settings, e.g., motivated by autonomous driving, would strength the paper. + +Hence, I recommend rejection.",ICLR2020, +KtVpihTjd,1576800000000.0,1576800000000.0,1,rkgAb1Btvr,rkgAb1Btvr,Paper Decision,Reject,"This paper presents a new method for detecting out-of-distribution (OOD) samples. + +A reviewer pointed out that the paper discovers an interesting finding and the addressed problem is important. On the other hand, other reviewers pointed out theoretical/empirical justifications are limited. + +In particular, I think that experimental supports why the proposed method is superior beyond the existing ones are limited. I encourages the authors to consider more scenarios of OOD detection (e.g., datasets and architectures) and more baselines as the problem of measuring the confidence of neural networks or detecting outliers have rich literature. This would guide more comprehensive understandings on the proposed method. + +Hence, I recommend rejection. + +",ICLR2020, +gonCIz3lhC5,1642700000000.0,1642700000000.0,1,zIUyj55nXR,zIUyj55nXR,Paper Decision,Accept (Oral),"The submission proposes a method to make a pre-existing model equivariant to desired symmetries: frame averaging. The strategy relies on a significant reduction of the number of symmetries to average over (with respect to the Reynolds operator) and uniform subsampling. The paper also demonstrates the usefulness of this method theoretically (universal approximation result) and practically (competitive performance). The contributions are clear and the core idea is simple. +I recommend this paper for acceptance with spotlight.",ICLR2022, +r1gm6z0egV,1544770000000.0,1545350000000.0,1,r1lM_sA5Fm,r1lM_sA5Fm,relatively weak contributions and novelty,Reject,"This paper investigates copying mechanisms and reward functions in sequence to sequence models for question generation. The key findings are threefold: (1) when the alignments between input and output are weak, it is better to use latent copying mechanism to soften the model bias toward copying, (2) while policy gradient methods might be able to improve automatic scores, their results poorly align with human evaludation, and (3) the use of adversarial objective also does not lead to useful training signals. + +Pros: +The task is well motivated and the paper presents potentially useful negative results on policy gradient and adversarial training. + +Cons: +All reviewers found the clarity and organization of the paper requires improvements. Also, the proposed methods are reletively incremental and the empirical results are not strong. While the rebuttal answered some of the clarification questions, it does not address major concerns about the novelty and contributions. + +Verdict: +Reject due to relatively weak contributions and novelty.",ICLR2019,5: The area chair is absolutely certain +M50OfUtyVCF,1610040000000.0,1610470000000.0,1,4Un_FnHiN8C,4Un_FnHiN8C,Final Decision,Reject,"This paper explores methods for pruning binary neural networks. The authors provide algorithms for developing sparse binary networks that perform okay on some basic ML benchmarks. They frame this as providing insights into synaptic pruning in the brain, and potentially providing a method for more efficient edge computing in the future. + +All four reviews placed the paper below the acceptance threshold. The reviewers noted that the paper was hard to follow in several places and were unsure as to the motivations. The authors attempted to address these concerns in their replies, but the Area Chair felt that these were insufficient. + +As well, the Area Chair notes that some of the claimed contributions of the paper are questionable. Specifically: + +(1) The claim that there is anything biologically plausible about the algorithms presented here is very suspect. The brain cannot use a search and test system for synaptic pruning like the algorithms proposed here. Thus, it is unclear how this paper provides any insight for neuroscience. In fact, the authors do not even really try to provide any neuroscience insights in the results or discussion. Moreover, they don't actually appear to use any neuroscience insights to develop their algorithms, other than the stochasticity of the pruning (though note: it is not actually clear in neuroscience data whether pruning is stochastic). Given the ultimately very poor performance on ML tasks, the paper doesn't seem to provide anything particularly useful for application in ML either. + +(2) The claim that the provide, ""The demonstration that network families with common architectural properties share similar accuracies and structural properties."" is odd. Surely this is the null hypothesis anyone would have about ANNs? It would be surprising if networks with common connectivity profiles (which is what the authors mean by ""architecture"") didn't share similar performance! + +(3) The claim that searching in architecture space like this leads to ""architecture agnostic networks"" is odd... As noted by Reviewer 2, the authors are really just specifying algorithms for sparsifying binary neural networks, which they frame as being ""architecture agnosticism"" according to a rather strained definition. There are other ways of approaching the sparsification of neural networks, and of doing architecture optimization, but the paper is not framed as contributing to this literature. + +Altogether, given these considerations, and the four reviews, a ""Reject"" decision was delivered.",ICLR2021, +UZSQLxHM-z,1576800000000.0,1576800000000.0,1,ryxPRpEtvH,ryxPRpEtvH,Paper Decision,Reject,"The paper investigates how to improve the performance of dropout and proposes an omnibus dropout strategy to reduce the correlation between the individual models. + +All the reviewers felt that the paper requires more work before it can be accepted. In particular, the reviewers raised several concerns about novelty of the method relative to existing methods, significance of performance improvements and clarity of the presentation. + +I encourage the authors to revise the draft based on the reviewers’ feedback and resubmit to a different venue. +",ICLR2020, +4WZsUiW01,1576800000000.0,1576800000000.0,1,HyxPIyrFvH,HyxPIyrFvH,Paper Decision,Reject,"The authors show that models trained to satisfy adversarial robustness properties do not possess robustness to naturally occuring distribution shifts. The majority of the reviewers agree that this is not a surprising result especially for the choice of natural distribution shifts chosen by the authors (for instance it would be better if the authors compare to natural distribution shifts that look similar to the adversarial corruptions). Moreover, this is a survey study and no novel algorithms are presented, so the paper cannot be accepted on that merit either.",ICLR2020, +r1lTV16Hf,1517250000000.0,1517260000000.0,447,BkUDW_lCb,BkUDW_lCb,ICLR 2018 Conference Acceptance Decision,Reject,"The pros and cons of the paper can be summarized below: + +Pro: +* The improvements afforded by the method are significant over baselines, although these baselines are very preliminary baselines on a new dataset. + +Con +* There is already a significant amount of work in using grammars to guide semantic parsing or code generation, as rightfully noted by the authors, and thus the approach in the paper is not extremely novel. +* Because there is no empirical comparison with these methods, the relative utility of the proposed method is not clear. + +As a result, I recommend that the paper not be accepted at this time.",ICLR2018, +uUklvaDR9OR,1610040000000.0,1610470000000.0,1,lVgB2FUbzuQ,lVgB2FUbzuQ,Final Decision,Accept (Spotlight),"The reviewers unanimously agree that the paper is timely, well motivated and correct, with potential to significantly impact digital contact tracing. ",ICLR2021, +SkeiPjWllE,1544720000000.0,1545350000000.0,1,HJlNpoA5YQ,HJlNpoA5YQ,Well-written paper and a useful extension to approximating the eigenvectors of the Laplacian,Accept (Poster),"This paper provides a novel and non-trivial method for approximating the eigenvectors of the Laplacian, in large or continuous state environments. Eigenvectors of the Laplacian have been used for proto-value functions and eigenoptions, but it has remained an open problem to extend their use to the non-tabular case. This paper makes an important advance towards this goal, and will be of interest to many that would like to learn state representations based on the geometric information given by the Laplacian. + +The paper could be made stronger by including a short discussion on why the limitations of this approach. Its an important new direction, but there must still be open questions (e.g., issues with the approach used to approximate the orthogonality constraint). It will be beneficial to readers to understand these issues.",ICLR2019,4: The area chair is confident but not absolutely certain +SMltj4yOvN,1576800000000.0,1576800000000.0,1,Hkeh21BKPH,Hkeh21BKPH,Paper Decision,Reject,"This paper proposes a curriculum-based reinforcement learning approach to improve theorem proving towards longer proofs. While the authors are tackling an important problem, and their method appears to work on the environment it was tested in, the reviewers found the experimental section too narrow and not convincing enough. In particular, the authors are encouraged to apply their methods to more complex domains beyond Robinson arithmetic. It would also be helpful to get a more in depth analysis of the role of the curriculum. The discussion period did not lead to improvements in the reviewers’ scores, hence I recommend that this paper is rejected at this time. +",ICLR2020, +H1g9SUNgxN,1544730000000.0,1545350000000.0,1,S1zz2i0cY7,S1zz2i0cY7,Interesting solution to a practical problem,Accept (Poster),"This paper addresses the issue of numerical rounding-off errors that can arise when using latent variable models for data compression, e.g., because of differences in floating point arithmetic across different platforms (sender and receiver). The authors propose using neural networks that perform integer arithmetic (integer networks) to mitigate this issue. The problem statement is well described, and the presentation is generally OK, although it could be improved in certain aspects as pointed out by the reviewers. The experiments are properly carried out, and the experimental results are good. +Thank you for addressing the questions raised by the reviewers. After taking into account the author's responds, there is consensus that the paper is worthy of publication. I therefore recommend acceptance. ",ICLR2019,4: The area chair is confident but not absolutely certain +UnyU8aCPLwU,1610040000000.0,1610470000000.0,1,JbuYF437WB6,JbuYF437WB6,Final Decision,Accept (Poster),"This paper proposes a graph neural network architecture to learn representations for directed acyclic graphs. Specifically, the proposed method performs the aggregation of the representations from neighboring nodes in the topological order defined by the DAG, with a novel topological batching scheme, which allows to process the message passing operations in parallel. The authors propose theoretical analysis of the proposed methods, to show that it is invariant to node indexing and learns an injective mapping to discriminate between two different graphs. The proposed method is further experimentally validated on multiple tasks involving DAGs, and the results show that it outperforms existing GNNs, including existing methods that can capture DAGs such as D-VAE (encoder). + +The reviewers were unanimously positive about the paper. All reviewers find the performance improvements and time-efficiency obtained with the proposed method to be satisfactory or promising, and one of the reviewers (R4) mentions that the tackled problem is important and the paper is well-written. However, there were concerns regarding insufficient explanations, missing ablation studies, and missing details of some parts of the proposed method. Yet, most of the issues have been satisfactorily addressed during the interactive discussion period. I agree with the reviewers that the paper is tackling an important problem, find the paper well-written, and the proposed DAGNN as practically useful. Thus I recommend an acceptance. + +However, the contributions of the proposed work over D-VAE, which also deals with DAGs, should be better described, as also noted by R2. The DAGNN uses attention, and can stack multiple layers as it is a more general GNN framework while D-VAE is a generative model, but these seem like incremental differences over D-VAE, and it is not clear which contributes to DAGNN’s superior performance over D-VAE. Topological batching is a clear advantage of DAGNN over D-VAE, but the experimental results showing the advantage of it over D-VAE’s sequential training was missing in the original paper (while it was added later to the appendix). I suggest the authors to introduce D-VAE in the introduction, acknowledge that it also tackles DAGs, and clearly describe how the proposed method differs from D-VAE encoder in a separate section. Also, there needs to be an analysis on why the proposed DAGNN outperforms D-VAE, as well as time-efficiency comparison with the original D-VAE in the main text. +",ICLR2021, +3CrCJMaaNFg,1610040000000.0,1610470000000.0,1,S2UB9PkrEjF,S2UB9PkrEjF,Final Decision,Reject,"This paper makes an interesting connection between the density matching in imitation learning and reaching the goal state in goal-oriented reinforcement learning. Reviewers generally expressed that the paper proposes an interesting approach, but some aspects of the paper need room for improvement. By reinforcing the experiments that address the reviewers’ various concerns, this paper will make a good contribution towards reinforcement learning research.",ICLR2021, +DRuPOdTs5st,1610040000000.0,1610470000000.0,1,NX1He-aFO_F,NX1He-aFO_F,Final Decision,Accept (Poster),"This paper is accepted, however, it could be much stronger by addressing the concerns below. + +The theoretical analysis of the proposed methods is weak. +* As far as I can tell, the proposition has more to do with the compatible feature assumption than their method. Furthermore the compatible feature assumption is very strong and not satisfied in any of their experiments. +* Sec 4.2 does not provide strong support for their method. R2 points out issues with their statements about variance and the next subsection argues from an overly simplistic diagram. + +The experimental results are promising, however, R3 brought up important issues in the private discussion: +* Their implementation of SAC systematically produces results worse than reported in the original paper (they use a version of SAC with automatically tuned temperature https://arxiv.org/pdf/1812.05905.pdf); 1a) Their SAC gets average returns of 2.5k at 500k steps while the original implementation gets 3k at 500k steps; 1b) Their SAC on HalfCheetah 10k at 1M steps, original paper - 11k at 1M steps; 1c) The same applies to Humanoid, there is no improvement with respect to the original SAC; +* Their approach degrades performance on Hopper. +* They use non-standard hyper parameters for SAC. 0.98 instead of 0.99 for the discount and 0.01 instead of 0.005 for the soft target updates. That might be the main reason why their SAC works worse than the original implementation. +* The authors use the hyper-parameters suggested for HalfCheetahBulletEnv for all continuous control tasks. For HalfCheetah, however, the authors of the stable-baselines repository (which this paper uses) suggest to use the hyper parameters from the original SAC paper (https://github.com/araffin/rl-baselines-zoo/blob/master/hyperparams/sac.yml#L48). Nonetheless, the results for the unmodified SAC reported in this work for HalfCheetah/Hopper/Walker/Ant are subpar to the original results, suggesting that the hyper-parameters for HalfCheetahBulletEnv are suboptimal for these tasks. + +Given the simplicity of the change and the promising experimental results (with some caveats), I believe the community will find this paper interesting and will lead to followup work that can patch the theoretical gaps.",ICLR2021, +r157QkaBM,1517250000000.0,1517260000000.0,107,BJRZzFlRb,BJRZzFlRb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper proposes an offline neural method using concrete/gumbel for learning a sparse codebook for use in NLP tasks such as sentiment analysis and MT. The method outperforms other methods using pruning and other sparse coding methods, and also produces somewhat interpretable codes. Reviewers found the paper to be simple, clear, and effective. There was particular praise for the strength of the results and the practicality of application. There were some issues, such as only being applicable to input layers, and not being able to be applied end-to-end. The author also did a very admirable job of responding to questions about analysis with clear and comprehensive additional experiments. ",ICLR2018, +kTDpMBiNQS,1610040000000.0,1610470000000.0,1,#NAME?,#NAME?,Final Decision,Reject,"This paper proposes a GAN for video generation based on stagewise training over different resolutions, addressing scalability issues with previous approaches. Reviewers noted that the paper is clearly written, proposes a method that improves upon the DVD-GAN architecture by reducing training time and memory consumption, and has competitive quantitative results. + +On the other hand, the more negative reviewers are concerned that the empirical improvements demonstrated are somewhat incremental, and that there is not much novelty as the proposed approach is similar to other methods that decompose the generation process into multiple stages at different temporal window lengths and/or spatial resolutions. The authors argue that these criticisms are subjective and non-actionable. I sympathize with their frustration, but an acceptance decision for a competitive conference like ICLR does involve some subjective judgment as to whether the method and/or results meet a high bar beyond mere correctness. For this submission that's a close call, but between the novelty/incrementality concerns and the other more minor issues raised by reviewers (e.g., missing frame-conditional evaluation) I believe this paper could benefit from another round of revisions and improvements and recommend rejection. + +I hope the authors will consider improving the submission based on the reviewers' feedback and resubmitting to a future venue, as the paper certainly has merit. To this end I have a few concrete recommendations for the authors which could have flipped my recommendation to an accept if implemented: + +* Report results in the frame-conditional setting for comparison with DVD-GAN and other methods that operate in this setting. +* Proofread the paper more thoroughly. I noticed several typos while skimming the paper, e.g. in the theory section, the second term of eq. 6 confusingly uses $\rho$ instead of $\log$. (Relatedly, given that appendix B.1 reports that the hinge loss is used, I'm not sure whether $\log$ is correct in the first place -- this probably deserves further explanation or correction.) +* Demonstrate/argue more convincingly (in one way or another) that SSW-GAN's improved efficiency really expands the frontier of what was possible before. It is true that the 128x128/100 video samples contain 2x as many total pixels as DVD-GAN's 256x256/12 samples, but this isn't a *strict* improvement as the spatial resolution is smaller, and a 2x difference leaves space for reviewers to reasonably wonder whether previous methods really couldn't have matched this if pushed. Some possible examples of this: show that SSW-GAN can generate longer 256x256 videos (a strict improvement over what was possible with DVD-GAN), or orders of magnitude longer (e.g., 1 minute) but still temporally coherent videos at 128x128, or videos with substantially improved subjective sample quality at the same (or higher) resolution. +* The paper notes that ""DVD-GAN models do not unroll well and tend to produce samples that become motionless past its training horizon"". If this were quantified, e.g. by additionally reporting IS/FID/FVD separately for different timestep ranges, it could make a more compelling argument in favor of SSW-GAN.",ICLR2021, +Bkz23fIug,1486400000000.0,1486400000000.0,1,ryrGawqex,ryrGawqex,ICLR committee final decision,Accept (Poster),All reviewers viewed the paper favorably as a nice/helpful contribution to the implementation of this important class of methods.,ICLR2017, +Sybn2f8Ox,1486400000000.0,1486400000000.0,1,HyWWpw5ex,HyWWpw5ex,ICLR committee final decision,Reject,"A nice paper, with sufficient experimental validation, and the idea of incorporating a form of change point detection is good. However, the technical contribution relative to the NIPS paper by the same authors is not significant, in that it primarily involves using an RNN instead of a Hawkes process to model the temporal dynamics. The results are significantly better than this earlier paper -- the authors should explore if this is due only to the RNN, or to the optimization method.",ICLR2017, +rklZ0sllg4,1544720000000.0,1545350000000.0,1,r1gVqsA9tQ,r1gVqsA9tQ,"interesting idea, evaluation lacking",Reject,"The paper presents a GAN-based generative model, where the generator consists of the base generator followed by several editors, each trained separately with its own discriminator. The reviewers found the idea interesting, but the evaluation insufficient. No rebuttal was provided.",ICLR2019,5: The area chair is absolutely certain +YXNa7eLLuq,1610040000000.0,1610470000000.0,1,33rtZ4Sjwjn,33rtZ4Sjwjn,Final Decision,Accept (Poster),"This paper studies the robustness of CapsNets under adversarial attacks. It is found that the votes from primary capsules in CapsNets are manipulated by adversarial examples and that the computationally expensive routing mechanism in CapsNets incurs high computational cost. As such, a new adversarial attack is specially designed by attacking the votes of CapsNets without having to involve the routing mechanism, making the method both effective and efficient. + +**Strengths:** + * This is the first work which proposes an attack specifically designed for CapsNets by exploiting their special properties. + * The proposed vote attack is more effective and efficient than the other attacks originally proposed for CNNs rather than CapsNets. + * The paper is generally well written. + * The experimental study is quite comprehensive. + * The code will be made available to facilitate reproducibility. + +**Weaknesses:** + * The study is mostly for only one type of CapsNets. It is not clear whether the observations in this paper still hold generally for other types of CapsNets even after some additional experiments have been added. + * The presentation of the paper has room for improvement. + +The authors are recommended to proofread the references thoroughly to ensure style consistency such as the consistent use of capitalization, e.g. + * “Star-caps” -> “STAR-Caps” + * “ieee symposium on security and privacy (sp)” -> “IEEE Symposium on Security and Privacy (SP)” + +Despite its weaknesses especially those pointed out by Reviewer 2, this paper would be of interest to other researchers as it is the first paper that studies adversarial attacks on CapsNets. +",ICLR2021, +qZ-mP6r9EC,1610040000000.0,1610470000000.0,1,2234Pp-9ikZ,2234Pp-9ikZ,Final Decision,Reject,"The paper proposes a new approach to knowledge distillation by searching for a family of student models instead of a specific model. The key idea is that given an optimal family of student models, any model sampled from this family is expected to perform well when trained using knowledge distillation. Overall this is an interesting idea and an important direction of research. However, the reviewers raised several concerns regarding novelty and experimental evaluation. There was a clear consensus among the reviewers that the paper is not yet ready for publication. The specific reasons for rejection include the following: (i) the proposed method is somewhat incremental, and the paper's contributions should be adjusted accordingly; (ii) the experimental results in the paper do not provide a clear/fair comparison with existing approaches, and additional baselines should be considered. The reviewers have provided detailed feedback in their reviews, and we hope that the authors can incorporate this feedback when preparing future revisions of the paper.",ICLR2021, +SJTYhMUde,1486400000000.0,1486400000000.0,1,SkC_7v5gx,SkC_7v5gx,ICLR committee final decision,Reject,"The reviewers agreed that the main contribution is the first empirical analysis on large-scale convolutional networks concerning layer-to-layer sparsity. The main concerns were that of novelty (connection-wise sparsity being explored previously but not in large-scale domains) and the importance given the current state of fast implementations of sparsely connected CNNs. The authors argued that the point of their paper was to drive software/hardware co-evolution and guide the next generation of, e.g. CUDA tools development. The confident scores were borderline while the less confident reviewer was pleased with the paper so I engaged the reviewers in a discussion post author-feedback. The consensus was that the paper presented promising but not fully developed nor convincing research.",ICLR2017, +7Mx6qIKa1wp,1610040000000.0,1610470000000.0,1,9z_dNsC4B5t,9z_dNsC4B5t,Final Decision,Accept (Poster),"This paper proposes an lightweight method for cross-domain few-shot learning, using a meta-learning approach to predict batch normalization statistics. +After the extensive paper revisions and discussion, the reviewers all agreed that this paper is above the bar for acceptance, assuming that the authors will include results for both the standard and expanded target set size in the final version of the paper. The authors are strongly encouraged to include these results in the camera-ready version of the paper.",ICLR2021, +Sylig4qxl4,1544750000000.0,1545350000000.0,1,r1zmVhCqKm,r1zmVhCqKm,rejection,Reject,"although the problem of text infilling itself is interesting, all the reviewers were not certain about the extent of experiments and how they shed light on whether, how and why the proposed approach is better than existing approaches. ",ICLR2019,4: The area chair is confident but not absolutely certain +6fRtMhG0UmE,1642700000000.0,1642700000000.0,1,9NVd-DMtThY,9NVd-DMtThY,Paper Decision,Accept (Poster),"This paper considers the problem of distributionally robust fair PCA for binary sensitive variables. The main modeling contribution of the paper is the consideration of fairness and robustness of the PCA simultaneously, and the main technical contribution of the paper is the provision of a Riemannian subgradient descent algorithm for this problem and proof that it reaches local optima of this non-convex optimization problem. The results will be of interest to those working at the intersection of fair and robust learning.",ICLR2022, +Cn3Dz3xhVR,1610040000000.0,1610470000000.0,1,Ms9zjhVB5R,Ms9zjhVB5R,Final Decision,Reject,"This paper proposes a regularization approach based on the second-order Taylor expansion of the loss objective to improve robustness of the trained models against \ell_inf and \ell_2 attacks. It is interesting to explore the second order-based regularization approach for network robustness. However, as pointed out by the reviewer, a major drawback of this approach is that SOAR is broken under a stronger attack - AutoPGD-DLR. In addition, the theoretical bound seems very loose in the \ell_inf case. ",ICLR2021, +r_QVtoaa34e,1610040000000.0,1610470000000.0,1,kBVJ2NtiY-,kBVJ2NtiY-,Final Decision,Accept (Poster),"First as a procedural point, the paper got 7, 7, 5, 5. AnonReviewer3 gave it a 5, but seemed satisfied by the discussion and promised to raise their score. They did not do so, but I must interpret their last messages as indicating they now support the paper. AnonReviewer2, the other 5, had some concerns that other reviewers seem to have helped address during rebuttal. They did not update their score, but were happy to leave their certainty low and defer to other reviewers' recommendation. As such, although the average score looks low in the system, the paper is of an acceptable standard according to reviews. + +The paper adapts a method from tabular RL to Deep RL, allowing (as the title aptly says), agents to learn What to do by simulating the past. Reviewers speaking in support of the paper found that the paper was clear and sound in its evaluation, providing interesting results and a useful and reusable method. It is my feeling that after discussion, the case for the paper has been clearly made, and in the absence of any strong objections from the reviewers, I am happy to go with the consensus and recommend acceptance.",ICLR2021, +SkeamY_i14,1544420000000.0,1545350000000.0,1,ByePUo05K7,ByePUo05K7,metareview: fundamentally flawed approach,Reject,"This paper claims to demonstrate that CNNs, unlike human vision, do not have a bias towards reliance on shape for object recognition. Both AnonReviewer1 and AnonReviewer2 point to fundamental flaws in the paper's argument, which the rebuttal fails to resolve. (AnonReviewer1's criticisms are unfortunately conflated with AnonReviewer1's reluctance to view neuroscience or biological vision as an appropriate topic for ICLR; nonetheless AnonReviewer1's technical criticism stands). + +These observations are: + +AnonReviewer2: + +""Authors have carefully designed a set of experiments which shows CNNs will [overfit] to non-shape features that they added to training images. However, this outcome is not surprising."" + +AnonReviewer1: + +""The experiments don't seem to effectively demonstrate the main claim of the paper that categorization CNNs do not have inductive shape bias"" + +""The best way to demonstrate this would have been to subject a trained image-categorization CNN to test data with object shapes in a way that the appearance information couldn’t be used to predict the object label. The paper doesn’t do this. None of the experiments logically imply that with an unaltered training regime, a trained network would not be predictive of the category label if shapes corresponding to that category are presented."" + +The AC agrees with both of these observations. CNN behavior is partially a product of the training regime. To examine the scientific question of whether CNNs have similar biases as human vision, the training regimes should be similar. Conversely, if human vision evolved in an environment in which shortcut recognition cues were available via indicator pixels, perhaps it would not have a shape bias. + +This paper appears fundamentally flawed in its approach. The results are not informative about differences between human vision and CNNs, nor are they surprising to machine learning practitioners.",ICLR2019,5: The area chair is absolutely certain +9ZQjmu77rusV,1642700000000.0,1642700000000.0,1,DYaFB19z1ig,DYaFB19z1ig,Paper Decision,Reject,"This article presents novel distillation-based methods for neural network training and uncertainty estimation. While the idea is interesting, there is a general agreement amongst reviewers that the paper lacks clarity, adequate discussion of the relevant literature and comparisons to existing work. Although the revision uploaded by the authors goes in the right direction by adding some experiments and clarifying some of the issues raised by the reviewers, further work is needed to make the submission stronger.",ICLR2022, +iStwBXNh1Wm,1642700000000.0,1642700000000.0,1,EMLJ_mTz_z,EMLJ_mTz_z,Paper Decision,Reject,"This paper proposes to represent a deep neural network as a graph and analyze its learning dynamics as a time series of weighted graphs corresponding to the neural network. As the graph representations, the authors propose to use a rolled representation in addition to a unrolled representation. Then, they proposed to utilize the graph features of the representations for predicting its predictive accuracy. + +This paper presents an interesting idea which could be used for predicting the test accuracy from the first few epochs training. However, there are also several weaknesses as pointed out by the reviewers. First, the justification of using the graph structure to predict the accuracy is weak (indeed, the graph structure can be used for prediction, but its necessity is not well supported), and there is no theory to support the proposed method. Second, the problem setting is a bit wired. The training data is generated by using the same architecture and data set. Although the authors gave additional experiments on the architecture generalization, it is still difficult to see how convincing the method is for more general settings. Third, baseline methods are not shown in their experiments. +In addition to that, the thresholds for the classification tasks seem to be too small (like 40% in CIFAR10) which would make the problem too easy. Therefore, the practicality of the method is rather unclear. + +This paper is quite on the borderline, but for the reasons listed above, it is a bit below the acceptance threshold.",ICLR2022, +HylF_4LbgV,1544800000000.0,1545350000000.0,1,ryfcCo0ctQ,ryfcCo0ctQ,"Solid paper, but unclear significance",Reject,"The paper gives an bilevel optimization view for several standard RL algorithms, and proves their asymptotic convergence with function approximation under some assumptions. The analysis is a two-time scale one, and some empirical study is included. + +It's a difficult decision to make for this paper. It clearly has a few things to be liked: (1) the bilevel view seems new in the RL literature (although the view has been implicitly used throughout the literature); (2) the paper is solid and gives rigorous, nontrivial analyses. + +On the other hand, reviewers are not convinced it's ready for publication in its current stage: +(1) Technical novelty, in the context of published works: extra challenges needed on top of Borkar; similarity to and differences from Dai et al.; ... +(2) The practical significance is somewhat limited. Does the analysis provide additional insight into how to improve existing approaches? How restricted are the assumptions? Are the online-vs-batch distinction from Dai et al. really important in practice? +(3) What does the paper want to show in the experiments, since no new algorithms are developed? Some claims are made based on very limited empirical evidence. It'd be much better to run algorithms on more controlled situations to show, say, the significance of two timescale updates. Also, as those algorithms are classic Q-learning and actor-critic (quote the authors in responses), how well do the algorithms solve the well-known divergent examples when function approximation is used? +(4) Presentation needs to be improved. Reviewers pointed out some over claims and imprecise statements. + +While the author responses were helpful in clarifying some of the questions, reviewers felt that the remaining questions needed to be addressed and the changes would be large enough that another full review cycle is needed.",ICLR2019,4: The area chair is confident but not absolutely certain +qG6UPU71Zgu,1642700000000.0,1642700000000.0,1,74cDdRwm4NV,74cDdRwm4NV,Paper Decision,Reject,This paper tackled the reward shaping problem under the framework of Markov games. The authors proposed reward shaping algorithms for RL with mild theoretical guarantees. The AC agrees with the reviewers that the empirical performance is ambiguous. The paper should be substantially improved before being accepted.,ICLR2022, +Bt3yyyvWgG,1610040000000.0,1610470000000.0,1,zQTezqCCtNx,zQTezqCCtNx,Final Decision,Accept (Spotlight),"This paper focuses on two new characteristics of adversarial examples from the channel-wise activation perspective, namely the activation magnitudes and the activated channels. The philosophy behind sounds quite interesting to me, namely, suppressing redundant activations from being activated by adversarial perturbations. This philosophy leads to a novel algorithm design I have never seen, i.e., Channel-wise Activation Suppressing (CAS) training strategy. + +The clarity and novelty are clearly above the bar of ICLR. While the reviewers had some concerns on the significance, the authors did a particularly good job in their rebuttal. Thus, all of us have agreed to accept this paper for publication! Please carefully address all comments in the final version.",ICLR2021, +Q2fKFbtLYe,1576800000000.0,1576800000000.0,1,B1gjs6EtDr,B1gjs6EtDr,Paper Decision,Reject,"This paper proposes a new model, the Routing Transformer, which endows self-attention with a sparse routing module based on online k-means while reducing the overall complexity of attention from O(n^2) to O(n^1.5). The model attained very good performance on WikiText-103 (in terms of perplexity) and similar performance to baselines (published numbers) in two other tasks. + +Even though the problem addressed (reducing the quadratic complexity of self-attention) is extremely relevant and the proposed approach is very intuitive and interesting, the reviewers raised some concerns, notably: +- How efficient is the proposed approach in practice. Even though the theoretical complexity is reduced, more modules were introduced (e.g., forced clustering, mix of local heads and clustering heads, sorting, etc.) +- Why is W_R fixed random? Since W_R is orthogonal, it's just a random (generalized) ""rotation"" (performed on the word embedding space). Does this really provide sensible ""routing""? +- The experimental section can be improved to better understand the impact of the proposed method. Adding ablations, as suggested by the reviewers, would be an important part of this work. +- Not clear why the work needs to be motivated through NMF, since the proposed method uses k-means. + +Unfortunately several points raised by the reviewers (except R2) were not addressed in the author rebuttal, and therefore it is not clear if some of the raised issues are fixable in camera ready time, which prevents me from recommend this paper to be accepted. + +However, I *do* think the proposed approach is very interesting and has great potential, once these points are clarified. The gains obtained in WikiText-103 are promising. Therefore, I strongly encourage the authors to resubmit this paper taking into account the suggestions made by the reviewers. ",ICLR2020, +Guus-yZTSh,1610040000000.0,1610470000000.0,1,8cpHIfgY4Dj,8cpHIfgY4Dj,Final Decision,Accept (Poster),"Meta-learning for offline RL is an understudied topic with lots of potential impact in the research community. This paper takes the first stab against that challenging problem by proposing a solution similar based on PEARL and distance metric learning. The results look good and it seems like the authors have addressed some of the concerns raised by the reviewers. As a result, I suggest to accept this paper. + +However, this paper still has some shortcomings as reviewers suggested more baselines with more experiments on standardized benchmarks such as D4RL or RL Unplugged could make the paper stronger.",ICLR2021, +BJe-DRE4l4,1544990000000.0,1545350000000.0,1,S1giVsRcYm,S1giVsRcYm,"Interesting idea, still some more to show potential",Reject,"This paper was on the borderline. I am sympathetic to the authors' point about computational resources. It is helpful to demonstrate performance gains that offer ""jump start"" performance benefits, as the authors argue. However, the empirical results even on this part are still somewhat mixed-- for example, the proposed approach struggles on Private Eye (doing far worse than DQN) in Table 2. In addition, while it is beneficial to remove the need for training a density model, it would be good to show a place where a density model fails (perhaps because it is so hard to find a good one) compared to their proposed approach. ",ICLR2019,4: The area chair is confident but not absolutely certain +AXhAZWyphcw,1610040000000.0,1610470000000.0,1,lXW6Sk1075v,lXW6Sk1075v,Final Decision,Reject,"All reviewers agreed that the novelty of the method was not at the level expected for publication, and also raised a number of technical concerns regarding the approach. There was no response from the authors on these issues, hence the reviewer consensus is that the paper is not ready for publication at this time.",ICLR2021, +ImrrBNyB8EN,1610040000000.0,1610470000000.0,1,Jq8JGA89sDa,Jq8JGA89sDa,Final Decision,Reject," +The paper proposes the novel task of detecting hallucinated tokens in sequence +generation, and a strategy to train such models using artificially generated +samples. The methods show reasonable correlation with human judgements. + +The expert reviewers are unanimous in their lack of enthusiasm +about this work, with overall borderline assessments. The +reviewers provided some suggestions for improvement, and it is worth remarking +that the authors provided an impressive amount of work in the revised version, +addressing the suggestions. Specifically, they added baselines that validate that +the task is non-trivial, and the case study on improving machine translation. + +In the discussion period, the reviewers appreciated the additions, and some +increased their rating, but the overall assessment remains borderline. The +reviewers find the work lacks the expected amount of depth. Some concerns +emphasized in the discussion period involve insufficient empirical analysis +(e.g., more NMT datasets and and analysis); understandable as this work was +added after submission, but still important. A reviewer stresses concerns about +the definition of the task itself, which I agree is vague (""... cannot be +entailed by the sentence"") and does not match the synthetic data generation +entirely, leading to unfortunate edge cases involving synonyms or -- worse -- +slight narrowing that technically would still be entailment but maybe should be +considered unfaithful. This casts doubt on the human evaluations and on +considering the task itself a main contribution, therefore leading to the +empirical framing that the reviewers perceive and expect. It also seems to me +that there is a incremental, cat-and-mouse spirit to predicting automatically +generated hallucinations. In short, it seems like this paper is caught in +between trying to be a significant empirical contribution and a linguistically +well-motivated task and annotation project, and I understand that the reviewers +would prefer committing to one of these directions. + +While I encourage the authors to pursue this direction more deeply, +in light of the borderline reviews, I do not recommend acceptance.",ICLR2021, +gdm5DfP-yk,1576800000000.0,1576800000000.0,1,HyxJ1xBYDH,HyxJ1xBYDH,Paper Decision,Accept (Poster),"This paper theoretically analyzes the use of an oracle to predict various quantities in data stream models. Building upon Hsu et al., (2019), the overriding goal is to examine the degree to which such an oracle is can provide memory and time improvements across broad streaming regimes. In doing so, optimal bounds are derived in conjunction with a heavy hitter oracle. + +Although the rebuttal and discussion period did not lead to a consensus in the scoring of this paper, two reviewers were highly supportive. However, the primary criticism from the lone dissenting reviewer was based on the high-level presentation and motivation, and in particular, the impression that the paper read more like a STOC theory paper. In this regard though, my belief is that the authors can easily tailor a revision to increase the accessibility to a wider ICLR audience.",ICLR2020, +U9MHnrhUHD,1610040000000.0,1610470000000.0,1,HZcDljfUljt,HZcDljfUljt,Final Decision,Reject,"Four reviewers rate this article borderline. R3 finds the paper clearly presented and the method effective, but misses quantitative analysis of the dynamic range problem as well as novelty. Following the discussion and revision, she/he considers the paper improved and updated the score to 5, still being concerned about the novelty. R1 considers the paper makes an important observation but has concerns about experiments, rating it 6. R2 considers that the paper contributes a clear idea, but indicates that more analysis and supporting results are needed. She/he indicated a number of shortcomings in the initial review, and found the update good, hence tending to rate the paper higher after the responses, 6. R4 considers the paper well motivated and the method valid. However, he/she found the writing poor and over-claiming results, and that more rigorous mathematical notation would help. After the discussion and revision, he/she found the paper better and increased the score to 5, but still found issues preventing the paper from being accepted. In summary, the reviewers agree that the paper contains an interesting and well motivated method, but they also point at a number of shortcomings. The revision improved several of them but others persisted. Although the ratings improved after the discussion, the overall rating is borderline. This is a very competitive call, and hence I have to recommend reject at this time. ",ICLR2021, +HPgGVnsgLjl,1610040000000.0,1610470000000.0,1,cTbIjyrUVwJ,cTbIjyrUVwJ,Final Decision,Accept (Poster),"This paper received moderately good reviews, 3 positives (6, 6, 7) and 1 negative (5). The reviewers are generally positive about the main idea but identified several limitations; performance improvement is marginal compared to existing approaches, the proposed method incurs higher computational complexity, and the presentation is not clear enough. Some of these issues are addressed in the rebuttal, though. Overall, the merits of this work outweigh the drawbacks and I recommend accepting this paper.",ICLR2021, +WGvdHqD34S,1576800000000.0,1576800000000.0,1,SJlgTJHKwB,SJlgTJHKwB,Paper Decision,Reject,"This paper claims to present a model-agnostic continual learning framework which uses a queue to work with delayed feedback. All reviewers agree that the paper is difficult to follow. I also have a difficult time reading the paper. + +In addition, all reviewers mentioned there is no baseline in the experiments, which makes it difficult to empirically analyze the strengths and weaknesses of the proposed model. R2 and R3 also have some concerns regarding the motivation and claim made in the paper, especially in relation to previous work in this area. + +The authors did not respond to any of the concerns raised by the reviewers. It is very clear that the paper is not ready for publication at a venue such as ICLR at the current state, so I recommend rejecting the paper.",ICLR2020, +UoK1Dp6n4Pz,1642700000000.0,1642700000000.0,1,DBiQQYWykyy,DBiQQYWykyy,Paper Decision,Accept (Poster),"The authors have done a good job methodologically addressing reviewer concerns. The empirical results are good, and the application impact is clear. There were some concerns about the technical heft of the approach, but there's overall agreement that the effective application to the domain is interesting and done very well. The AC is a bit concerned about the impact of the regularities of the domain used on the results, especially with regard to semantic regularities (homes have very particular regularities). But even without answering this question (it should be discussed in the camera ready though), this paper makes a solid contribution.",ICLR2022, +a-jjRCjTH_,1576800000000.0,1576800000000.0,1,BylKwnEYvS,BylKwnEYvS,Paper Decision,Reject,"The paper derives results for nonnegative-matrix factorization along the lines of recent results on SGD for DNNs, showing that the loss is star-convex towards randomized planted solutions. + +Overall, the paper is relatively well written and fairly clear. The reviewers agree that the theoretical contribution of the paper could be improved (tighten bounds) and that the experiments can be improved as well. In the context of other papers submitted to ICLR I therefore recommend to reject the paper. + +",ICLR2020, +g-YhRcFdh,1576800000000.0,1576800000000.0,1,Bke02gHYwB,Bke02gHYwB,Paper Decision,Reject,"The paper presents an approach to learning interpretable word embeddings. The reviewers put this in the lower half of the submissions. One reason seems to be the size of the training corpora used in the experiments, as well as the limited number of experiments; another that the claim of interpretability seems over-stated. There's also a lack of comparison to related work. I also think it would be interesting to move beyond the standard benchmarks - and either use word embeddings downstream or learn word embeddings for multiple languages [you should do this, regardless] and use Procrustes analysis or the like to learn a mapping: A good embedding algorithm should induce more linearly alignable embedding spaces. + +NB: While the authors cite other work by these authors, [0] seems relevant, too. Other related work: [1-4]. + +[0] https://www.aclweb.org/anthology/Q15-1016.pdf +[1] https://www.aclweb.org/anthology/Q16-1020.pdf +[2] https://www.aclweb.org/anthology/W19-4329.pdf +[3] https://www.aclweb.org/anthology/D17-1198/ +[4] https://www.aclweb.org/anthology/D15-1183.pdf",ICLR2020, +Hkvt2M8dx,1486400000000.0,1486400000000.0,1,BJ_MGwqlg,BJ_MGwqlg,ICLR committee final decision,Reject,"The reviewers feel that this is a well written paper on floating and fixed point representations for inference with several state of the art deep learning architectures. At the same time, in order for results to be more convincing, they recommend using 16-bit floats as a more proper baseline for comparison, and to analyze tradeoffs in overall workload speedup, i.e broader system-level issues surrounding the implementation of custom floating point units.",ICLR2017, +L79S-TI9yw,1576800000000.0,1576800000000.0,1,rJl05AVtwB,rJl05AVtwB,Paper Decision,Reject,The submission is proposed a rejection based on majority review.,ICLR2020, +3fneH0dndv,1610040000000.0,1610470000000.0,1,DMxOBm06HUx,DMxOBm06HUx,Final Decision,Reject,"This paper suggests extending pre-trained contextual language models to use both fine-grained and coarse-grained tokenizations of the sentence. A sentence is tokenized twice and then each is passed through a transformer block, with shared parameters except the embeddings. Having 2 granularities shows gains. + +Pros + +- Easy to read paper, straightforward method +- Gets experimental gains from using word/phrase combo +- Evaluates on a range of tasks + +Cons + +- Novelty is limited, since other models like SpanBERT and ZEN already explore different tokenization granularities +- Improvements may come as much from the ensemble of two models as the two tokenization granularities +- Number of parameters or amount of computation are increased by method, though authors do significantly address this in their revised paper. +- Some over claiming of results when there are modest incremental gains on small models (the abstract sentence ""outperforms the existing best performing models in almost all cases"" suggests that we are going to get results of a new model outperforming the state of the art models on tasks, but really we get improvements over baseline models at the BERT-base size. I believe this is fine for experiments to show the scientific value of ideas but it should not be described as it presently is in this abstract. +- Gains are more for Chinese than English + +On the better results for Chinese: Isn't the reason that the results are more impressive for Chinese because in Chinese the fine-grained version is just single characters, but this is more fine-grained than standard BERT word pieces in English, where the word pieces are already commonly words, most of which would be two or more character sequence in Chinse (whether for common words like, say ""fishing"" or ""vault"" or place names like ""Mississippi""), so the fine-grained Chinese here is more fine-grained than the standard English wordpieces, and so not too surprisingly there are bigger gains from using the Chinese word segmenter granularity. But really this is sort of equivalent to how the original BERT authors showed that you could get gains by masking whole words not individual word pieces. And at any rate, the value of word segmentation for Chinese was already shown by Yiming Cui et al.'s paper on Chinese BERT, no? + +Overall the strong majority of reviewers were unconvinced that this paper was suitable for ICLR 2021. They mainly emphasized concerns of novelty, missed or unfair comparisons, concerns of extra parameters or computation, and the fact the paper is somewhat incremental. I would add to that that to the extent that this paper is primarily an examination of the value of different granularities, that feels much more like a linguistic question for an NLP conference than an ML question well suited in particular to a conference on learning representations like ICLR. That is, the choice of granularities is hand-specified, and/or the grouping is done by simple n-gram statistics, not by learning representations. As such, I do not think the paper should be accepted to ICLR at this time, and in general think that an NLP venue may be more appropriate for it. ",ICLR2021, +RG1EvURgtL,1576800000000.0,1576800000000.0,1,SJg1lxrYwS,SJg1lxrYwS,Paper Decision,Reject,"The paper presents a generative approach to learn an image representation along a self-supervised scheme. + +The reviews state that the paper is premature for publication at ICLR 2020 for the following reasons: +* the paper is unfinished (Rev#3); in particular the description of the approach is hardly reproducible (Rev#1); +* the evaluation is limited to ImageNet and needs be strenghtened (all reviewers) +* the novelty needs be better explained (Rev#1). +It might be interesting to discuss the approach w.r.t. ""Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles"", Noroozi and Favaro. + +I recommend the authors to rewrite and better structure the paper (claim, state of the art, high level overview of the approach, experimental setting, discussion of the results, discussion about the novelty and limitations of the approach). + +",ICLR2020, +A5QD2HHQS,1576800000000.0,1576800000000.0,1,BJlNs0VYPB,BJlNs0VYPB,Paper Decision,Reject,This paper does extensive experiments to understand the lottery ticket hypothesis. The lottery ticket hypothesis is that there exist sparse sub-networks inside dense large models that achieve as good accuracy as the original model. The reviewers have issues with the novelty and significance of these experiments. They felt that it didn't shed new scientific light. They felt that epochs needed to do early detection was still expensive. I recommend doing further studies and submitting it to another venue.,ICLR2020, +VgHcd0kOv,1576800000000.0,1576800000000.0,1,S1lvn0NtwH,S1lvn0NtwH,Paper Decision,Reject,"This paper presents an understudied bias known to exist in the learning patterns of children, but not present in trained NN models. This bias is the mutual exclusivity bias: if the child already knows the word for an object, they can recognize that the object is likely not the referent when a new word is introduced. So that is, the names of objects are mutually exclusive. + +The authors and reviewers had a healthy discussion. In particular, Reviewer 3 would have liked to have seen a new algorithm or model proposed, as well as an analysis of when ME would help or hurt. I hope these ideas can be incorporated into a future submission of this paper.",ICLR2020, +f3j3SVgMBm,1576800000000.0,1576800000000.0,1,B1lyZpEYvH,B1lyZpEYvH,Paper Decision,Reject,"This paper proposes a neural network model for predicting multi-aspect sentiment and generating masks that can justify the predictions. The positive aspects of the paper include improved results over the state-of-the-art. + +Reviewers found the technical novelty limited, and the experiments short of being fully convincing. After the author rebuttal, there were discussions between the reviewers and the AC, and the reviewers still thought the paper is not fully convincing given these limitations. + +I thank the authors for their submission and detailed responses to the reviewers and hope to see this research in a future venue.",ICLR2020, +B1-CUk6rf,1517250000000.0,1517260000000.0,892,SyPMT6gAb,SyPMT6gAb,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers agree that the paper studies and interesting problem with an interesting approach. The reviewers raised some concerns regarding the theoretical and empirical results. The authors have made changes to the paper, but given the theoretical nature of the paper and the extent of changes, another review is needed before publication.",ICLR2018, +SknRsfUue,1486400000000.0,1486400000000.0,1,S1xh5sYgx,S1xh5sYgx,ICLR committee final decision,Reject,"The paper proposes a ConvNet architecture (""SqueezeNet"") and a building block (""Fire module"") aimed at reducing the model size while maintaining the AlexNet level of accuracy. The novelty of the submission is very limited as very similar design choices have already been used for model complexity reduction in Inception and ResNet. Because of this, we recommend rejection and invite the authors to further develop their method.",ICLR2017, +ryqhnf8_x,1486400000000.0,1486400000000.0,1,ByxpMd9lx,ByxpMd9lx,ICLR committee final decision,Accept (Poster),"One weak and one positive review without much concrete substance. The third review is positive, but the experiments are not that convincing: the gains from transfer are small in table 3 and in table 2 it is unclear how strong the baselines are. Given how competitive ICLR is, the area chair has no alternative than to unfortunately reject this paper.",ICLR2017, +SyxT2fngxV,1544760000000.0,1545350000000.0,1,HJxYwiC5tm,HJxYwiC5tm,empirical study of invariance on modern CNNs,Reject,"This paper attempts to answer its suggestive title by arguing that this generic lack of invariance in large CNN architectures is due to aliasing introduced during the downsampling stages. +This paper received mixed reviews. Positive aspects include the clarity and exhaustive empirical setups, whereas negative aspects focused on the lack of substance behind some of the claims. Ultimately, the AC took these considerations into account and made his/her own assessment, summarized here. + +The main claim of this paper implies the following: modern CNNs are unable to build invariance to small shifts, but somehow are able to learn far more complex invariances involving lighting, pose, texture, etc. This must be empirically verified beyond reasonable doubt, and the AC thinks that the current experimental setup does not achieve this threshold. As mentioned by reviewers and by public comments, the preprocessing pipeline is a key factor that may be confounding the analysis, and this should be better analysed. For example, as mentioned in the reviews below, the shift in the image can be either done by inpainting, cropping, or using a fixed background. The authors claim that there are no qualitative differences between those preprocessing choices, but by inspecting Figures 2B and 8C, the AC notices a severe change in 'jaggedness'; in other words, the choice of preprocessing *does* affect the quantitative measures of (un)stability, even though the qualitative assessment (unstable in all setups) is the same. In particular, using non-centered crops should be the default setup, since it requires no preprocessing. It is confusing that it appears in the appendix instead of the inpainting version of figure 2b. This is important, since it implies that the analysis is mixing two perturbations: the actual action of the translation group and the choice of preprocessing, and that the latter is by no means negligible. I would suggest the authors to perform the following experiment to disentangle the effect of translation by the effect of preprocessing. Since the translation forms a group, for any shift applied to the image, one can 'undo' it by applying the inverse shift. Say one applies a shift to image x of d pixels and obtains x'=T(x,+d) as a result (by using whatever border handling procedure). If border effects were negligible, then x''=T(x',-d) should give us back x, so a good measure of how unstable the network is is to measure the difference in prediction between x,x' and x''. If predicting x'' is as unstable as predicting x', it follows that the network is actually unstable to the border effect introduced by T. + +Given this, the AC recommends rejection at this time, and encourages the authors to resubmit their work by addressing the above point. ",ICLR2019,5: The area chair is absolutely certain +HbBlKsMRoZ,1576800000000.0,1576800000000.0,1,SkloDJSFPH,SkloDJSFPH,Paper Decision,Reject,"The paper presents a technique for approximately sampling from autoregressive models using something like a a proposal distribution and a critic. The idea is to chunk the output into blocks and, for each block, predict each element in the block independently from a proposal network, ask a critic network whether the block looks sensible and, if not, resampling the block using the autoregressive model itself. +The idea in the paper is interesting, but the paper would benefit from +- a better relation to existing methods +- a better experimental section, which details the hyper-parameters of the algorithm (and how they were chosen) and which provides error bars on all plots (and tables)",ICLR2020, +rklUcATSxN,1545100000000.0,1545350000000.0,1,rJ4km2R5t7,rJ4km2R5t7,Multitask learning is one of the most important problems in AI,Accept (Poster),"This paper provides an interesting benchmark for multitask learning in NLP. +I wish the dataset included language generation tasks instead of just classification but it's still a step in the right direction. +",ICLR2019,5: The area chair is absolutely certain +ivaMiRznSyK,1610040000000.0,1610470000000.0,1,RLRXCV6DbEJ,RLRXCV6DbEJ,Final Decision,Accept (Spotlight),"The paper posits that VAEs, if made sufficiently deep, are able to implement autoregressive models, and could possibly outperform them. Experimentally, the authors attempt make VAEs sufficiently deep so that they are able to outperform autoregressive models on image generation. The authors use a variety of tricks to scale the depth of the model to up to 78 stochastic layers, and achieve SOTA, or near-SOTA NLLs on a number of datasets. Furthermore, in comparison to other models (in particular the recently proposed Nouveau VAE), the models achieve these scores using far fewer parameters. + +Although the tricks are a bit ad-hoc and the novelty is a bit weak, the experimental results are quite strong and would be of interest to anyone working on VAE research. Moreover, one of the weakness of the paper, a lack of ablations, was addressed during the rebuttal. All reviewers believed that the paper should be accepted, and I see nothing in the paper or the reviews to suggest otherwise.",ICLR2021, +Btduz0kF8ZO,1642700000000.0,1642700000000.0,1,ZnUwk6i_iTR,ZnUwk6i_iTR,Paper Decision,Reject,"The paper introduces a theory of mind benchmark. + +This paper certainly improved during the discussion period. However, the paper is still incomplete. The authors are working related work (paper was not updated in this regard). The experiments still need significant work. The original submission used only 3 runs (very, very low). Although the authors bumped up the # of runs, the learning curves in the appendix feature very large and overlapping error bars, and the main table of results presented in the paper contains no measures of certainty---those a reported in a separate table in the appendix making comparison tedious. The paper has a fairly informal approach to dealing with hyper-parameters that should be discussed and improved. The reviewers pointed out (in their reviews and dialog with the authors) several ways the experiments should be extended. + +The contribution of the benchmark is evaluated primarily via experiments; much work needs to be done before acceptance.",ICLR2022, +nzmRo__rhpb,1610040000000.0,1610470000000.0,1,kdm4Lm9rgB,kdm4Lm9rgB,Final Decision,Reject,"The paper tackles the problem of mitigating the effect of model discrepancies between the learning and deployment environments. In particular, the author focus on the worst-case possible performance. The paper has both an empirical and theoretical flavor. The algorithm they derived is backed by theoretical guarantees. There exists a gap between the theory presented and the final practical algorithm, which generated some elements of concern from the reviewers. Some of these issues (choice and sensitivity of the Lipschitz constant, in what cases can we make that assumption, choice of p_w, discrepancy between the theoretical proposal and the practical algorithm) are well addressed in the rebuttal. However, after careful examination of the reviews, the meta-reviewer is still not convinced that the paper meets the minimum requirements for acceptance, as many of the reviewers' initial concerns still remain.",ICLR2021, +EGnsdhZhxxG,1642700000000.0,1642700000000.0,1,sPfB2PI87BZ,sPfB2PI87BZ,Paper Decision,Accept (Poster),"This paper considers the generalized target shift setting for domain adaptation and proposes an optimal transport map-based approach to it. The considered setting for domain adaptation is rather general and of practical use. The proposed method seems sensible, as supported by the theoretical identifiability and empirical results. + +It is worth noting that the way to cite previous work seems to be improved. For instance, in the first paragraph of Introduction, the authors reviewed various settings for domain adaptation. For model shift, the authors cited previous work. However, when discussing covariate shift, target shift, and generalized target shift, the authors did not cite the original work that provides the categorization. For completeness, the authors may want to consider including the setting of conditional shift as well, which has received a number of applications in domain adaptation in computer vision. I believe the categorization of target shift, conditional shift, and generalized target shift was provided by Zhang et al. (2013). This work should also be cited when the authors give the problem definition in Section 2.1. The quality of the paper will be even better if the authors cite previous work in all the right places--this may also make the authors' contribution clearer.",ICLR2022, +1SPWSeErLCn,1642700000000.0,1642700000000.0,1,7Rnf1F7rQhR,7Rnf1F7rQhR,Paper Decision,Reject,"The authors study the training settings that may affect active deep learning performance, including code/warm start, leveraging unlabeled data, and initial set selection, for each active learning strategy. The findings on several data sets help understand AL more, with some pieces of insights to inspire future research. + +The reviewers were at best lukewarm about the work prior to the rebuttal. Some turned more positive but none were willing to strongly champion for the paper's acceptance, even after the authors provided a decent rebuttal. This leaves the paper to be a borderline case, and the recommendation comes from carefully checking the latest revision and calibrating its score with other submissions. + +The reviewers are generally positive about the breadth of the study, the potential impact of the codebase and the systematic study that can inspire future works. Some clarified issues include comments on future research directions and the labeling efficiency plot (which is, however, not analyzed deeper in the main text), and results on additional settings like transfer learning (somewhat preliminary). In the end, two remaining concerns surround whether the technical contribution and the conclusions are sufficiently solid, including + +* limited insights: Some reviewers comment that the insights are on the lighter side. The authors identify several issues that may affect the performance of the underlying tasks of active learning, and find that the best setting differs across different active learning strategies. But given that the paper offers at best ""best practices of training models on actively-queried labels"", it is not clear whether the authors achieve their claimed goal of ""compare different strategies in a fair way""---in particular, the conclusion for this particular comparison seems to be missing (e.g. which is recommended in practice, BADGE or LL4AL or others?). Also, given that only three data sets (5 after rebuttal) have been studied in this work (see item below), the ""generalization ability"" of the conclusions in this paper cannot be clearly established. While the authors provided some additional pieces in the rebuttal, the pieces can use more study to be fully conclusive. Some reviewers are also concerned that the conclusions are rather scattered. + +From a practical perspective, it appears to be a chicken-egg problem on whether to fix the active strategy first (and then train the model with the best setting/practice), or fix the training setting first (and then select the best strategy). The authors may want to add more arguments on why they focus on the former rather than the latter. + +* limited experiments: several reviewers point out that the few data sets used could not fully justify the ""best practice"", and demand data sets like ImageNet. The authors offered some new results on TinyImageNet and CIFAR100, but those are not studied as deeply as other data sets at the current point. A more careful study on the two (and other) data sets are thus strongly recommended.",ICLR2022, +HJps8JTBz,1517250000000.0,1517260000000.0,862,H1Nyf7W0Z,H1Nyf7W0Z,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers agreed that this paper is not quite ready for publication at ICLR. One of the reviewers thought the paper was well written and easy to follow while the two others said the opposite. One of the main criticisms was issues with the composition. The paper seems to lack a clear formal explanation of the problem and the proposed methodology. The reviewers in general weren't convinced by the experiments, complaining about the lack of a required baseline and that the proposed method doesn't seem to significantly help in the experiment presented. + +Pros: +- The proposed idea is interesting +- The problem is timely and of interest to the community +- Addresses multiple important problems at the intersection of ML and RL in sequence generation + +Cons: +- Novel but somewhat incremental +- The experiments are not compelling (i.e. the results are not strong) +- A necessary baseline is missing +- Significant issues with the writing - both in terms of clarity and correctness.",ICLR2018, +S1gAoe9TyN,1544560000000.0,1545350000000.0,1,rkxoNnC5FQ,rkxoNnC5FQ,A new approach to learning from simulated data with privileged information,Accept (Poster),"The paper proposes an unsupervised domain adaptation solution applied for semantic segmentation from simulated to real world driving scenes. The main contribution consists of introducing an auxiliary loss based on depth information from the simulator. All reviewers agree that the solution offers a new idea and contribution to the adaptation literature. The ablations provided effectively address the concern that the privileged information does in fact aid in transfer. The additional ablation on the perceptual loss done during rebuttal is also valuable and should be included in the final version. + +The work would benefit from application of the method across other sim2real dataset tasks so as to be compared to the recent approaches mentioned by the reviewers, but the current evaluation is sufficient to demonstrate the effectiveness of the approach over baseline solutions. ",ICLR2019,5: The area chair is absolutely certain +__SmApa4YhS,1642700000000.0,1642700000000.0,1,IvepFxYRDG,IvepFxYRDG,Paper Decision,Accept (Poster),"This work presents a new sample-based policy extragradient algorithm for finding an approximate Nash equilibrium in tabular two-player zero-sum Markov games with improved sample complexity guarantees. While originally the reviewers had concerns regarding the novelty and technical difficulty of the paper, these were successfully resolved during the rebuttal, and now all reviewers agree that this is an interesting contribution. Hence, I recommend acceptance of the paper. + +In the final version the authors should make the following changes: +- Please mention early on (e.g., in the abstract and the introduction, as well as in the definition of the Markov game) that you consider a tabular problem (finite state and action spaces). Furthermore, it would be important to define informally the quantities in the bound in the abstract and when presenting Table 1. +- While not entirely uncommon, Assumption 1 is quite strong, requiring mixing for any policies. It would be great if the authors could also add a comment on this, emphasizing that this is the case, as well as explaining how weakening the assumption would introduce problems (as explained in the response to Reviewer RwGu). +- The comparison to the lower bound of Zhang et al. (2020) should also be included, as discussed in the response to Reviewer 5TU3. +- Please discuss Assumption 2 in relation to the work of Wei et al. (2021), and rephrase the relation to the latter paper accordingly, as promised in the discussion with Reviewer Hsr5.",ICLR2022, +9V42Pvwd2t,1642700000000.0,1642700000000.0,1,vEZyTBRPP6o,vEZyTBRPP6o,Paper Decision,Accept (Poster),"The paper makes a significant contribution in the rather sparse and challenging field of convergence analyses of actor-critic style algorithms, under the linear MDP structural assumption, showing that there is a natural bias towards being high-entropy. As one of the reviewers points out, although it is unlikely that the strategy actually proposed is amenable to implementation, the paper nevertheless provides a clean and novel analysis of convergence of learning by eschewing the usual mixing time type assumptions often found in the theoretically-oriented RL literature. Based on this strength of the paper, I am glad to recommend its acceptance.",ICLR2022, +B1xQ0nIMl4,1544870000000.0,1545350000000.0,1,rJgvf3RcFQ,rJgvf3RcFQ,Not enough novel technical content nor insights,Reject,"The paper studies inductive biases in DRL, by comparing with different reward shaping, and curriculums. The authors performed comparative experiments where they replace domain specific heuristics by such adaptive components. + +The paper includes very little (new) scientific contributions, and, as such, is not suitable for publication at ICLR.",ICLR2019,4: The area chair is confident but not absolutely certain +ryeUlbflxE,1544720000000.0,1545350000000.0,1,SJfZKiC5FX,SJfZKiC5FX,novel approach with convincing results,Accept (Poster),"1. Describe the strengths of the paper. As pointed out by the reviewers and based on your expert opinion. + +- The approach is novel +- The experimental results are convincing. + +2. Describe the weaknesses of the paper. As pointed out by the reviewers and based on your expert opinion. Be sure to indicate which weaknesses are seen as salient for the decision (i.e., potential critical flaws), as opposed to weaknesses that the authors can likely fix in a revision. + +- The authors didn't show results with non-Gaussian noise +- Some details that could help the understanding of the method are missing. + +3. Discuss any major points of contention. As raised by the authors or reviewers in the discussion, and how these might have influenced the decision. If the authors provide a rebuttal to a potential reviewer concern, it’s a good idea to acknowledge this and note whether it influenced the final decision or not. This makes sure that author responses are addressed adequately. + +No major points of contention. + +4. If consensus was reached, say so. Otherwise, explain what the source of reviewer disagreement was and why the decision on the paper aligns with one set of reviewers or another. + +The reviewers reached a consensus that the paper should be accepted. +",ICLR2019,4: The area chair is confident but not absolutely certain +rylSGFEleE,1544730000000.0,1545350000000.0,1,BJe-Sn0ctm,BJe-Sn0ctm,More experimental rigour is needed,Reject,"The paper presents a new neural program synthesis architecture, SAPS, which seems to produce accuracy improvements in some synthesis tasks. The reviewer consensus, even after discussion with the authors, was that the paper is not acceptable at the conference. Two concerns emerge during discussion, even considering the authors efforts to improve the paper. First, the system seems to have many ""moving parts"", but there is a lack of rigorous ablation studies to demonstrate which components of the system (or combination thereof) make significant contributions to the results. I agree with this assessment: it is not sufficient to demonstrate increased scores, even if the experimental protocol and clear and sound (more on this later), but there must be some evidence as to why this increase happens, both in the discussion and in the empirical segment of the paper, by conducting a thorough ablation study. Second, all reviewers had issues with proper and fair comparison with prior work, with the consensus being that the model is not adequately compared to convincing benchmarks in the paper. + +The results of the paper sound like there is something promising going on, but the need for a clear presentation of what is the driving factor behind any improvement is not only a superficial stylistic requirement, but a key tenet of proper scholarship. This is one front on which the paper fails to make a successful case for the work and methods it describes, and unfortunately is not ready for publication at this time (despite having a cool title).",ICLR2019,5: The area chair is absolutely certain +r1g-X8JpyN,1544510000000.0,1545350000000.0,1,rJl_NhR9K7,rJl_NhR9K7,VAE with ISA prior,Reject,"The paper proposes to improve VAE by using a prior distribution that has been previously proposed for independent subspace analysis (ISA). The clarity of the paper could be improved by more clearly describing the proposed method and its implementation details. The originality is not that high, as the main change to VAE is replacing the usual isotropic Gaussian prior with an ISA prior. Moreover, the paper does not provide comparison to VAEs with other more sophisticated priors, such as the VampPrior, and it is unclear whether using the ISA prior makes it difficult to scale to high-dimensional observations. Therefore, it is difficult to evaluate the significance of ISA-VAE. The authors are encouraged to carefully revise their paper to address these concerns. ",ICLR2019,5: The area chair is absolutely certain +NzjTr9_wiT4,1642700000000.0,1642700000000.0,1,NK5hHymegzo,NK5hHymegzo,Paper Decision,Reject,"This paper studies the convergence of Adam-type algorithms (two variants of AMSGrad in particular) in min-max problems that satisfy a one-sided ""Minty variational inequality"" condition. + +The reviewers identified several weaknesses in the paper and the authors did not provide a rebuttal to these concerns so there was consensus to reject the paper.",ICLR2022, +jpCj9mth2O,1576800000000.0,1576800000000.0,1,SJxmfgSYDB,SJxmfgSYDB,Paper Decision,Reject,"Main summary: Paper is about generating feature representations for set elements using weighted multiset automata + +Discussion: +reviewer 1: paper is well written but experimental results are not convincing +reviewer 2: well written but weak motivation +reviewer 3: well written but reviewer has some questions around the motivation of weighted automata machinery. +Recommendation: all the reviewers agree its well written but the paper could be stronger with motivation and experiments, all reviewers agree. I vote Reject.",ICLR2020, +rkxhLlrylN,1544670000000.0,1545350000000.0,1,HJxpDiC5tX,HJxpDiC5tX,"Excellent engineering work, but it's hard to see how others can build on it",Reject,"This paper describes the development of a large-scale continuous visual speech recognition (lipreading) system, including an audiovisual processing pipeline that is used to extract stabilized videos of lips and corresponding phone sequences from YouTube videos, a deep network architecture trained with CTC loss that maps video sequences to sequences of distributions over phones, and an FST-based decoder that produces word sequences from the phone score sequences. A performance evaluation shows that the proposed system outperforms other models described in the literature, as well as professional lipreaders. A number of ablation experiments compare the performance of the proposed architecture to the previously proposed LipNet and ""Watch, Attend, and Spell"" architectures, explore the performance differences caused by using phone- or character-based CTC models, and some variations on the proposed architecture. This paper was extremely controversial and received a robust discussion between the authors and reviewers, with the primary point of contention being the suitability of the paper for ICLR. All reviewers agree that the quality of the work in the paper is excellent and that the reported results are impressive, but there was strong disagreement on whether or not this was sufficient for an ICLR paper. One reviewer thought so, while the other two reviewers argued that this is insufficient, and that to appear in ICLR the paper either (1) should have focused more on the preparation of the dataset, included public release of the data so other researchers could build on the work, and put forth the V2P model as a (very) strong baseline for the task; or (2) done a more in-depth exploration of the representation learning aspects of the work by comparing phoneme and viseme units and providing more (admittedly costly) ablation experiments to shed more light on what aspects of the V2P architecture lead to the reported improvements in performance. The AC finds the arguments of the two negative reviewers to be persuasive. It is quite clear at this point that many supervised classification tasks (even structured classification tasks like lipreading) can be effectively tackled by a combination of a sufficiently flexible learning architecture and collection of a massive, annotated dataset, and the modeling techniques used in this paper are not new, per se, even if their application to lipreading is. Moreover, if the dataset is not publicly available, it is impossible for anyone else to build on this work. The paper, as it currently stands, would be appropriate in a more applications-oriented venue.",ICLR2019,3: The area chair is somewhat confident +ak-sUlTipR4,1610040000000.0,1610470000000.0,1,6t_dLShIUyZ,6t_dLShIUyZ,Final Decision,Accept (Poster),"**Overview** This paper provides a way to combine SVRG and greedy-GQ to improve the algorithm performance. In particular, the finite iteration complexity is improved from $\epsilon^{-3}$ to $\epsilon^{-2}$. + +**Pros** The paper is well-written. Reviewers believe this is a solid theoretical work on advancing value-based algorithms for off-policy optimal control. It has sufficient theoretical advancement and experiments demonstrations of the methods. + +**Cons** Some reviewers are concerned that SVRG is not SOTA. SVRG is not used in practice. The techniques appear to be similar to some existing works. + +**Recommendation** The meta-reviewer believes that the paper has solid theoretical contributions. SVRG is a component in the new algorithm to improve the complexity. It does not need to be ""useful"" or ""SOTA"". The paper is also well-written. Hence the recommendation is accept.",ICLR2021, +N5Gk1GNQkDE,1642700000000.0,1642700000000.0,1,DIjCrlsu6Z,DIjCrlsu6Z,Paper Decision,Accept (Spotlight),"This paper introduces the concept of classifier orthogonalization. This is a generalization of orthogonality of linear classifiers (linear classifiers with orthogonal weights) to the non-linear setting. It introduces the notion of a full and principal classifier, where the full classifier is one that minimizes the empirical risk, and the principal classifier is one that uses only partial information. The orthogonalization procedure assumes that the input domain, X can be divided into two sets of latent random variables Z1 and Z2 via a bijective mapping. The random variables Z1 are the principal random variables, and Z2 contains all other information. Z1 and Z2 are assumed to be conditionally independent given the target label. The paper outlines two approaches to construct orthogonal classifiers that operate only on Z2. The approach is highlighted in three applications: controlled style transfer, domain adaptation, and fair classification. + +The reviewers all found the proposed method to be principled and compelling. Beyond clarification questions and some discussion on related work, the reviewers raised a few issues that were subsequently addressed: 1) Additional baselines for domain adaptation and fairness. 2) Controlled style transfer being a new task with no established baselines, and 3) The feasibility of training a proper “full classifier” that minimizes the empirical risk, and its necessity in the approach. The authors addressed these concerns and updated the paper, to the satisfaction of the reviewers. All of them unanimously recommend acceptance.",ICLR2022, +4Is4Bmr7Gg3,1610040000000.0,1610470000000.0,1,VG3i3CfFN__,VG3i3CfFN__,Final Decision,Reject,"This paper proposes an attention mechanism that works at the phrase level for semantic parsing. +Reviewrs agree that the idea has been previously explored outside semantic parsing, that the gains should be shown on less saturated datasets, and that there are issues in the experimental design (observing test set results for many experiments). Thus, at this point I recommend that the paper is rejected.",ICLR2021, +nQLtb0CXrB,1610040000000.0,1610470000000.0,1,kEnBH98BGs5,kEnBH98BGs5,Final Decision,Accept (Poster),"This paper proposes methods to estimate how informative a single training data is wrt the weights and output of the neural network. All reviewers think this is an interesting problem and the proposed method is easy to implement. On the other hand, the reviewers also raise a few questions: +1. There is a large body of work analyzing the informativeness of a feature wrt the model. The authors should compare their work to the feature importance analysis. +2. The derived informativeness of a data depends not only on the network architecture, but also depends on the training algorithm, such as initialization and number of epochs. This makes the notion of data informativeness less general. +3. The writing should be substantially improved. +",ICLR2021, +SkyfTfUux,1486400000000.0,1486400000000.0,1,H1Go7Koex,H1Go7Koex,ICLR committee final decision,Reject,"The paper introduces some interesting architectural ideas for character-aware sequence modelling. However, as pointed out by reviewers and from my own reading of the paper, this paper fails badly on the evaluation front. First, some of the evaluation tasks are poorly defined (e.g. question task). Second, the tasks look fairly simple, whereas there are ""standard"" tasks such as language modelling datasets (one of the reviewers suggests TREC, but other datasets such as NANT, PTB, or even the Billion Word Corpus) which could be used here. Finally, the benchmarks presented against are weak. There are several character-aware language models which obtain robust results on LM data which could readily be adapted to sentence representation learning, eg. Ling et al. 2016, or Chung et al. 2016, which should have been compared against. The authors should look at the evaluations in these papers and consider them for a future version of this paper. As it stands, I cannot recommend acceptance in its current form.",ICLR2017, +ry0ErkaHz,1517250000000.0,1517260000000.0,552,BkQCGzZ0-,BkQCGzZ0-,ICLR 2018 Conference Acceptance Decision,Reject,"This paper presents a different method for learning autoencoders with discrete hidden states (compared to recent discrete-like VAE type models). The reviewers in general like the method being proposed and are convinced that there is worth to the underlying proposal. However there are several shared complaints about the setup and writing of the paper. + +- Several reviewers complained about the use of qualitative evaluation, particularly in the ""Deciphering the latent code"" section of the paper. +- One reviewer in particular had significant issues with the experimental setup of the paper and felt that there was insignificant quantitative evaluation, particularly using standard metrics for the task (compared to the metric introduced in the paper). +- There were further critiques about the ""procedural"" nature of the writing and the lack of formal justifications for the ideas introduced. ",ICLR2018, +V-ZsX1vziw1-,1642700000000.0,1642700000000.0,1,qSTEPv2uLR8,qSTEPv2uLR8,Paper Decision,Reject,"This work proposes to define densities via the pushforward of a base density through the gradient field of a convex potential as studied in OT theory and, in particular, inspired by Brenier's theorem. + +More concretely, it proposes to use ICNNs to parametrize the convex potentials and considers two mechanisms to match a target density: 1) with a known (normalized) target approximately solve the Monge-Ampere equation via optimization; 2) with only samples available, they propose to use the maximum-likelihood approach. + +While the paper is overall well-written, the idea is very close to existing work that was not mentioned or discussed in the paper. The paper would benefit from a substantial revision to incorporate the missing references and emphasize the relative novelty.",ICLR2022, +SkSsnfU_l,1486400000000.0,1486400000000.0,1,B186cP9gx,B186cP9gx,ICLR committee final decision,Reject,"This is quite an important topic to understand, and I think the spectrum of the Hessian in deep learning deserves more attention. However, all 3 official reviewers (and the public reviewer) comment that the paper needs more work. In particular, there are some concerns that the experiments are too preliminary/controlled and about whether the algorithm has actually converged. One reviewer also comments that the work is lacking a key insight/conclusion. I like the topic of the paper and would encourage the authors to pursue it more deeply, but at this time all reviewers have recommended rejection.",ICLR2017, +8yuOuoixLg,1610040000000.0,1610470000000.0,1,o3iritJHLfO,o3iritJHLfO,Final Decision,Accept (Poster),Non autoregressive modelling for text to speech (TTS) is an important and challenging problem. This paper proposes a deep VAE approach and show promising results. Both the reviewers and the authors have engaged in a constructive discussion on the merits and claims of the paper. This paper will not be the final VAE contribution to TTS but represents a significant enough contribution to the field to warrant publication. It is highly recommended that the authors take into account the reviewers' comments.,ICLR2021, +SmwvDyUXdoO,1610040000000.0,1610470000000.0,1,Vfs_2RnOD0H,Vfs_2RnOD0H,Final Decision,Accept (Spotlight),"The paper presents an online algorithm for dynamic tensor rematerialization. The theoretic analysis on the tensor operation and memory budget bound of the proposed method, as well as on the relationship between the proposed method and optimal static analysis method is novel and interesting. It covers a pretty comprehensive study across theory, simulation and system implementation. In addition, the paper is well written. ",ICLR2021, +S10TiGI_x,1486400000000.0,1486400000000.0,1,r10FA8Kxg,r10FA8Kxg,ICLR committee final decision,Accept (Poster),The reviewers unanimously recommend accepting this paper.,ICLR2017, +fm9lufeU0m-,1610040000000.0,1610470000000.0,1,po-DLlBuAuz,po-DLlBuAuz,Final Decision,Accept (Poster),"The paper got a quite high disagreement in the scores from the reviewers. R2 voted for rejecting the paper as he did not see the connection of the algorithm to the continuation method and also that the continuation method does not address the distributional shift, which is one of the main problems for offlline RL. Yet, these concerns have been properly answered in the rebuttal of the authors and the distributional shift is also addressed by the continuation method by reducing the error in policy evaluation. Further concerns from the reviewers were raised in terms of related work to a similar algorithm (BRAC), which is also addressed in the revision of the paper. + +The reviewers also identified the following strong points of the paper: +- The algorithm is a simple and very effective adaptation to SAC +- The presented results are exhaustive and convincing +- The paper provides strong theoretical results for the presented algorithm +- The authors did a very good job with their revision, adding more comparisons and ablation studies. + +I agree that this paper very interesting and recommend acceptance.",ICLR2021, +dh14GiB-7,1576800000000.0,1576800000000.0,1,HklRKpEKDr,HklRKpEKDr,Paper Decision,Reject,"This work extends previous work (Castellini et al) with parameter sharing and low-rank approximations, for pairwise communication between agents. +However the work as presented here is still considered too incremental, in particular when compared to Castellini et al. +The advances such as parameter sharing and low-rank approximation are good but not enough of a contribution. Authors' efforts to address this concern did not change reviewers' judgment. +Therefore, we recommend rejection.",ICLR2020, +rqDAhSsALdU,1642700000000.0,1642700000000.0,1,vh-0sUt8HlG,vh-0sUt8HlG,Paper Decision,Accept (Poster),"This paper presents a light weight hybrid model using both convolutions and Transformer layers resulting in models with lower computational cost and good performance. Reviewers find the paper interesting and agree that the paper did a good job in presenting convincing experimental results. There were questions about role of different components in the proposed model, which author's addressed in the response with additional ablation studies. One reviewer expressed concerns about lack of theoretical foundations for the proposed approach. However they also agree that the paper presents a good and useful experimental study. I think overall the paper has good contributions that others can build on in the future and recommend acceptance.",ICLR2022, +Hk8fhMLOg,1486400000000.0,1486400000000.0,1,S1oWlN9ll,S1oWlN9ll,ICLR committee final decision,Accept (Poster),"It's a simple contribution supported by empirical and theoretical analyses. After some discussion, all reviewers viewed the paper favourably.",ICLR2017, +Hk5kB1pHM,1517250000000.0,1517260000000.0,482,Hy8hkYeRb,Hy8hkYeRb,ICLR 2018 Conference Acceptance Decision,Reject,"The paper attempts to develop a method for learning latent representations using deep predictive coding and deconvolutional networks. However, the theoretical motivation for the proposed model in relation to existing methods (such as original predictive coding, deconvolutional networks, ladder networks, etc.), as well as the empirical comparison against them is unclear. The experimental results on the CIFAR10 dataset do not provide much insight on what kind of meaningful/improved representations can be learned in comparison to existing methods, both qualitatively and quantitatively. No rebuttal was provided. ",ICLR2018, +lCUVPPi_Kq,1576800000000.0,1576800000000.0,1,Bkxd9JBYPH,Bkxd9JBYPH,Paper Decision,Reject,"This paper presents a variant of recently developed Kronecker-factored approximations to BNN posteriors. It corrects the diagonal entries of the approximate Hessian, and in order to make this scalable, approximates the Kronecker factors as low-rank. + +The approach seems reasonable, and is a natural thing to try. The novelty is fairly limited, however, and the calculations are mostly routine. In terms of the experiments: it seems like it improved the Frobenius norm of the error, though it's not clear to me that this would be a good measure of practical effectiveness. On the toy regression experiment, it's hard for me to tell the difference from the other variational methods. It looks like it helped a bit in the quantitative comparisons, though the improvement over K-FAC doesn't seem significant enough to justify acceptance purely based on the results. + +Reviewers felt like there was a potentially useful idea here and didn't spot any serious red flags, but didn't feel like the novelty or the experimental results were enough to justify acceptance. I tend to agree with this assessment. +",ICLR2020, +BytiiGL_x,1486400000000.0,1486400000000.0,1,ryuxYmvel,ryuxYmvel,ICLR committee final decision,Accept (Poster),"The paper presents a new dataset and initial machine-learning results for an interesting problem, namely, higher-order logic theorem proving. This dataset is of great potential value in the development of deep-learning approaches for (mathematical) reasoning. + + As a personal side note: It would be great if the camera-ready version of the paper would provide somewhat more context on how the state-of-the-art approaches in automatic theorem proving perform on the conjectures in HolStep. Also, it would be good to clarify how the dataset makes sure there is no ""overlap"" between the training and test set: for instance, a typical proof of the Cauchy-Schwarz inequality employs the Pythagorean theorem: how can we be sure that we don't have Cauchy-Schwarz in the training set and Pythagoras in the test set?",ICLR2017, +xbpRimfgipP,1610040000000.0,1610470000000.0,1,g6OrH2oT5so,g6OrH2oT5so,Final Decision,Reject,"Reviewers acknowledged that the problem addressed in this paper is interesting and is not solved by the existing literature. They appreciated that the setup was well defined and the paper was clearly written. Yet they kept several concerns after the rebuttal. Especially, they expected the comparison to be done with algorithms using both demonstrations and rewards and the current empirical evaluation was not judged as fair. Also, the simple baseline consisting of adding an LSTM to BC to integrate past observations has not been considered either. This baseline is still missing to assess the quality of the proposed method. ",ICLR2021, +#NAME?,1642700000000.0,1642700000000.0,1,UgNQM-LcVpN,UgNQM-LcVpN,Paper Decision,Reject,"The paper proposes a modulation layer to address the problem of missing data. + +The results do not show that the approach outperforms existing sota approaches. +The results do not demonstrate that the proposed modulation layer is an improvement over attention layer. +Many smaller errors (spelling, etc.) where found in the manuscript. +Experimental details are insufficient to make the results reproducible. +The authors have not provided a response to the reviewers.",ICLR2022, +PI5wHzc5MN,1576800000000.0,1576800000000.0,1,Byekm0VtwS,Byekm0VtwS,Paper Decision,Reject,"The paper is proposing uncertainty of the NN’s in the training process on analog-circuits based chips. As one reviewer emphasized, the paper addresses important and unique research problem to run NN on chips. Unfortunately, a few issues are raised by reviewers including presentation, novelly and experiments. This might be partially be mitigated by 1) writing motivation/intro in most lay person possible way 2) give easy contrast to normal NN (on computers) to emphasize the unique and interesting challenges in this setting. We encourage authors to take a few cycles of edition, and hope this paper to see the light soon. +",ICLR2020, +OMRvTio2ym2,1642700000000.0,1642700000000.0,1,boJy41J-tnQ,boJy41J-tnQ,Paper Decision,Accept (Poster),"The paper proposes a subspace regularization technique that encourages the new class weight vector to be in the subspace spanned by those of the base classes for few-shot class incremental learning. Even though similar techniques exist in few-shot learning literature, reviewers appreciate the simplicity of the method and thorough experiments. The authors have revised the paper to include missing references suggested by reviewers during the rebuttal. They were not able to add experiment comparisons to Tao et al. (2020) and Chen & Lee (2021) as requested by reviewer Vrap due to missing code release. Please consider adding them in your draft later.",ICLR2022, +R92XOH8Hyl,1576800000000.0,1576800000000.0,1,rJeidA4KvS,rJeidA4KvS,Paper Decision,Reject,"This paper studies Population-Based Augmentation in the context of knowledge distillation (KD) and proposes a role-wise data augmentation schemes for improved KD. While the reviewers believe that there is some merit in the proposed approach, its incremental nature and inherent complexity require a cleaner exposition and a stronger empirical evaluation on additional data sets. I will hence recommend the rejection of this manuscript in the current state. Nevertheless, applying PBA to KD seems to be an interesting direction and we encourage the authors to add the missing experiments and to carefully incorporate the reviewer feedback to improve the manuscript.",ICLR2020, +Pckrzrm8G,1576800000000.0,1576800000000.0,1,ByxloeHFPS,ByxloeHFPS,Paper Decision,Reject,"This paper pursues an ambitious goal to provide a theoretical analysis HRL in terms of regret bounds. However, the exposition of the ideas has severe clarity issues and the assumptions about HMDPs used are overly simplistic to have an impact in RL research. +Finally, there is agreement between the reviewers and AC that the novelty of the proposed ideas is a weak factor and that the paper needs substantial revision.",ICLR2020, +LTYVxGqw2O,1576800000000.0,1576800000000.0,1,r1xHxgrKwr,r1xHxgrKwr,Paper Decision,Reject,"The paper presents AnoDM (Anomaly detection based on unsupervised Disentangled representation learning and Manifold learning) that combine beta-VAE and t-SNE for anomaly detection. Experiment results on both image and time series data are shown to demonstrate the effectiveness of the proposed solution. + +The paper aims to attack a challenging problem. The proposed solution is reasonable. The authors did a job at addressing some of the concerns raised in the reviews. However, two major concerns remain: (1) the novelty in the proposed model (a combination of two existing models) is not clear; (2) the experiment results are not fully convincing. While theoretical analysis is not a must for all models, it would be useful to conduct thorough experiments to fully understand how the model works, which is missing in the current version. + +Given the two reasons above, the paper did not attract enough enthusiasm from the reviewers during the discussion. We hope the reviews can help improve the paper for a better publication in the future. + + + +",ICLR2020, +SyleMjflxV,1544720000000.0,1545350000000.0,1,SJMnG2C9YX,SJMnG2C9YX,Meta-Review,Reject,"The paper studies learning from complementary labels – the setting when example comes with the label information about one of the classes that the example does not belong to. The paper core contribution is an unbiased risk estimator for arbitrary losses and models under this learning scenario, which is an improvement over the previous work, as rightly acknowledged by R1 and R2. + +The reviewers and AC note the following potential weaknesses: (1) R3 raised an important concern that the core technical contribution is a special case of previously published more general framework which is not cited in the paper. The authors agree with R3 on this matter; (2) the proposed unbiased estimator is not practical, e.g. it leads to overfitting when the cross-entropy loss is used, it is unbounded from below as pointed out by R1; (3) the two proposed modifications of the unbiased estimator are biased estimators, which defeats the motivation of the work and limits its main technical contributions; (4) R2 rightly pointed out that the assumption that complementary label is selected uniformly at random is unrealistic – see R2’s suggestions on how to address this issue. +While all the reviewers acknowledged that the proposed biased estimators show advantageous performance on practice, the AC decides that in its current state the paper does not present significant contributions to the prior work, given (1)-(3), and needs major revision before submitting for another round of reviews. + +",ICLR2019,5: The area chair is absolutely certain +PZ91G6umkmK,1610040000000.0,1610470000000.0,1,IkYEJ5Cps5H,IkYEJ5Cps5H,Final Decision,Reject,"This paper proposed a new optimization framework for pruning CNNs considering coupling between channels in the neighboring layers. Two reviewers suggested acceptance and two did rejection. The main concerns of the negative reviewers are (a) limited novelty, (b) limited performance metrics and (c) limited baselines. The authors' response did not fully clarify the reviewers' concerns during the discussion phase, and AC also agrees that they should be resolved to meet the high standard of ICLR. Hence, AC recommend rejection. + +Here is additional thought from AC. The authors propose ours-c and ours-cs. The latter is reported to outperform the former in terms of FLOPs, but AC thinks the former may have merits in other more important performance metrics, e.g., the actual latency and/or memory consumption on a target device. More discussions and results for this would strengthen the paper.",ICLR2021, +FSkewIv756,1576800000000.0,1576800000000.0,1,B1e9Y2NYvS,B1e9Y2NYvS,Paper Decision,Accept (Spotlight),"This paper studies the robustness of NeuralODE, as well as propose a new variant. The results suggest that the neuralODE can be used as a building block to build robust deep networks. The reviewers agree that this is a good paper for ICLR, and based on their recommendation I suggest to accept this paper.",ICLR2020, +Sy--pfU_e,1486400000000.0,1486400000000.0,1,Skn9Shcxe,Skn9Shcxe,ICLR committee final decision,Accept (Poster),"The paper provides interesting new interpretations of highway and residual networks, which should be of great interest to the community.",ICLR2017, +HfeiuWl9H,1576800000000.0,1576800000000.0,1,SJlpYJBKvH,SJlpYJBKvH,Paper Decision,Accept (Spotlight),"Main content: + +This paper provides a unified way to provide robust statistics in evaluating the reliability of RL algorithms, especially deep RL algorithms. Though the metrics are not particularly novel, the investigation should be useful to the broader community as it compares seven specific evaluation metrics, including 'Dispersion across Time (DT): IQR across Time', 'Short-term Risk across Time (SRT): CVaR on Differences', 'Long-term Risk across Time (LRT): CVaR on Drawdown', 'Dispersion across Runs (DR): IQR across Runs', 'Risk across Runs (RR): CVaR across Runs', 'Dispersion across Fixed-Policy Rollouts (DF): IQR across Rollouts' and 'Risk across Fixed-Policy Rollouts (RF): CVaR across Rollouts'. The paper further proposed ranking and also confidence intervals based on bootstrapped samples, and compared against continuous control and discrete actions algorithms on Atari and OpenAI Gym. + +-- + +Discussion: + +The reviews clearly agree on accepting the paper, with a weak accept coming from a reviewer who does not know much about this subarea. Comments are mostly just directed at clarifications and completeness of description, which the authors have addressed. + +-- + +Recommendation and justification: + +This paper should be accepted due to its useful contributions toward doing a better job of measuring performance of RL.",ICLR2020, +2gfExF5g4B,1642700000000.0,1642700000000.0,1,vaRCHVj0uGI,vaRCHVj0uGI,Paper Decision,Accept (Poster),"This is an interesting paper on improving score-based conditional sampling and its use in solving inverse problems. The current method of sampling from NCSNv2 is somewhat inefficient and the authors propose a different SDE that seems to work better for conditional generation. + +The paper is applied to Computational imaging and MRI and shows very good results and reasonable comparisons with the recent state of the art. One limitation is that the measurement process is artificial and ignores specifics of MRI (real measurements and multi-coils would strengthen the paper). In any case since this is a fundamental methods paper with a solid technical innovation on score-based sampling, I recommend acceptance.",ICLR2022, +xxJgAvD97D,1576800000000.0,1576800000000.0,1,SkeGvaEtPr,SkeGvaEtPr,Paper Decision,Reject,"This paper on extending MLNs using NNs is borderline acceptable: one reviewer is strongly opposed, although I confess I don't really understand their response to the rebuttal or see what the issue with novelty is (a position shared by the other reviewers). I'm not sure how to weigh this review, but there is not a lot of signal in favour of rejection aside from the rating. + +The remaining two reviews are in favour of acceptance, with their enthusiasm only bounded by the lack of scalability of the method, something they appreciate the authors are upfront about. My view is this paper brings something new to the table which will interest the community, but doesn't oversell the result. + +Given the distribution of papers in my area, this one is just a little too borderline to accept, but this is primarily a reflection of the number of high-quality papers reviewed and the limited space of the conference. I have no doubt this paper will be successful at another conference, and it's a bit of a shame we were not in a position to accept it to this one.",ICLR2020, +vtNR7E8-VFz,1642700000000.0,1642700000000.0,1,hcQHRHKfN_,hcQHRHKfN_,Paper Decision,Accept (Poster),"In this paper, a new method is proposed to discover diverse policies solving a given task. The key ideas are to (1) learn one policy at a time, with each new policy trying to be different enough from the previous ones, and (2) switch between two rewards on a per-trajectory basis: the ""normal"" reward on trajectories that are unlikely enough under previoiusly discovered policies, and a ""diversity-inducing"" reward on trajectories that are too likely (so as to push the policy being learned away from the previous ones). The main benefit of this switching mechanism is to ensure that the new policy will be optimal, because the reward signal isn't ""diluted"" by the diversity-inducing signal as long as the policy stays far away from the previous ones. + +After the discussion period, most reviewers clearly recommended acceptance of the paper. One reviewer remained on the ""reject"" side though, especially due to an unconvincing theoretical analysis of the method, in spite of several back and forth with authors. I also had my own concerns regarding that part after reading the paper, and further discussions with authors eventualy led to a significant rewrite of the corresponding theorems and proofs. I believe the final version (shared in comments by authors after the dealine for paper revisions) to at least be technically correct, though the relevance of the theory w.r.t. practical usage of the method is still not entirely convincing (e.g., assumptions regarding the number of distinct global optima, and the need for positive rewards). + +That being said, in spite of these concerns regarding the practical significance of the theoretical analysis, I believe the paper has a strong enough empirical validation, and the method is (1) simple, (2) intuitively reasonable, (3) original due to the trajectory-switching mechanism, which makes me recommend acceptance.",ICLR2022, +w-C7OPuhim,1576800000000.0,1576800000000.0,1,rJgqalBKvH,rJgqalBKvH,Paper Decision,Reject,This paper has been withdrawn by the authors.,ICLR2020, +rJDtH1THG,1517250000000.0,1517260000000.0,613,HkeJVllRW,HkeJVllRW,ICLR 2018 Conference Acceptance Decision,Reject,"The paper studies factorizations of convolutional kernels. The proposed kernels lead to theoretical and practical efficiency improvements, but these improvements are very, very limited (for instance, Figure 5). It remains unclear how they compare to popular alternative approaches such as group convolutions (used in ResNeXt) or depth-separable convolutions (used in MobileNet). The reviewers identify a variety of smaller issues with the manuscript.",ICLR2018, +53C9SLfYZrw,1610040000000.0,1610470000000.0,1,pBqLS-7KYAF,pBqLS-7KYAF,Final Decision,Accept (Spotlight),"The paper presents a nice analysis of the spectrum of a matrix that is obtained by applying non-linear functions to a random matrix. The paper is mostly well-written, the result is novel and interesting, and has clear implications for ML problems like spectral clustering. +So I would enthusiastically recommend the paper for acceptance at ICLR. +It would be important for authors to take into account reviewer comments. In particular, instantiating the theorems for simple ML-centric examples would be very useful. ",ICLR2021, +Abbs6rgJHt,1576800000000.0,1576800000000.0,1,SJxIkkSKwB,SJxIkkSKwB,Paper Decision,Reject,"This paper proposes a new active learning algorithm based on clustering and then sampling based on an uncertainty-based metric. This active learning method is not particular to deep learning. The authors also propose a new de-noising layer specific to deep learning to remove noise from possibly noisy labels that are provided. These two proposals are orthogonal to one another and its not clear why they appear in the same paper. + +Reviewers were underwhelmed by the novelty of either contribution. With respect to active learning, there is years of work on first performing unsupervised learning (e.g., clustering) and then different forms of active sampling. + +This work lacks sufficient novelty for acceptance at a top tier venue. Reject",ICLR2020, +6n5hzSh9ma,1576800000000.0,1576800000000.0,1,HyeEIyBtvr,HyeEIyBtvr,Paper Decision,Reject,"This paper proposes a neural architecture search method that uses balanced sampling of architectures from the one-shot model and drops operators whose importance drops below a certain weight. + +The reviewers agreed that the paper's approach is intuitive, but main points of criticism were: +- Lack of good baselines +- Potentially unfair comparison, not using the same training pipeline +- Lack of available code and thus of reproducibility. (The authors promised code in response, which is much appreciated. If the open-sourcing process has completed in time for the next version of the paper, I encourage the authors to include an anonymized version of the code in the submission to avoid this criticism.) + +The reviewers appreciated the authors' rebuttal, but it did not suffice for them to change their ratings. +I agree with the reviewers that this work may be a solid contribution, but that additional evaluation is needed to demonstrate this. I therefore recommend rejection and encourage resubmission to a different venue after addressing the issues pointed out by the reviewers.",ICLR2020, +FBYDmXnMSN,1576800000000.0,1576800000000.0,1,Hkx7_1rKwS,Hkx7_1rKwS,Paper Decision,Accept (Poster),"The submission proposes a novel solution for minimax optimization which has strong theoretical and empirical results as well as broad relevance for the community. The approach, Follow-the-Ridge, has theoretical guarantees and is compatible with preconditioning and momentum optimization strategies. + +The paper is well-written and the authors engaged in a lengthy discussion with the reviewers, leading to a clearer understanding of the paper for all. The reviews all recommend acceptance. ",ICLR2020, +Sp46ehZnQBR,1610040000000.0,1610470000000.0,1,o6ndFLB1DST,o6ndFLB1DST,Final Decision,Reject,"There is a general consensus on the fact that the paper is not yet ready for publication. I encourage the authors to carefully address the detailed concerns raised by the reviewers, which include among other: i) the incompleteness of the literature overview, which should include the references provided by the reviewers, ii) poor (or bias towards the proposed approach) experimental evaluation, and iii) a vague treatment of key terms in the interpretability literature like feasibility (e.g., to make sure that the counterfactual lie in high data density regions). ",ICLR2021, +rhEuoMUMAx,1576800000000.0,1576800000000.0,1,r1geR1BKPr,r1geR1BKPr,Paper Decision,Reject,"This paper extends the idea of influence functions (aka the implicit function theorem) to multi-stage training pipelines, and also adds an L2 penalty to approximate the effect of training for a limited number of iterations. + +I think this paper is borderline. I also think that R3 had the best take and questions on this paper. + +Pros: + - The main idea makes sense, and could be used to understand real training pipelines better. + - The experiments, while mostly small-scale, answer most of the immediate questions about this model. + +Cons: + - The paper still isn't all that polished. E.g. on page 4: ""Algorithm 1 shows how to compute the influence score in (11). The pseudocode for computing the influence function in (11) is shown in Algorithm 1"" + - I wish the image dataset experiments had been done with larger images and models. + +Ultimately, the straightforwardness of the extension and the relative niche applications mean that although the main idea is sound, the quality and the overall impact of this paper don't quite meet the bar.",ICLR2020, +B1Z0oMUdl,1486400000000.0,1486400000000.0,1,H1kjdOYlx,H1kjdOYlx,ICLR committee final decision,Invite to Workshop Track,"As per all the reviews, the work is clearly promising, but is seen to need additional discussion / formalization / experimental comparison with related work, and stronger demonstrations of the application of this technique. + Further back-and-forth with the reviewers would have been useful, but there should be enough to go on in terms of directions. This work would benefit from being part of the workshop track.",ICLR2017, +8eg96tALJJ,1576800000000.0,1576800000000.0,1,B1eY_pVYvB,B1eY_pVYvB,Paper Decision,Accept (Poster),"This paper introduces a new approach that consists of the invertible autoencoder and a reversible predictive module (RPM) for video future-frame prediction. + +Reviewers agree that the paper is well-written and the contributions are clear. It achieves new state-of-the-art results on a diverse set of video prediction datasets and with techniques that enable more efficient computation and memory footprint. Also, the video representation learned in a self-supervised way by the approach can have good generalization ability on downstream tasks such as object detection. The concerns of the paper were relatively minor, and were successfully addressed in the rebuttal. + +AC feels that this work makes a solid contribution with well-designed model and strong empirical performance, which will attain wide interests in the area of video future-frame prediction and self-supervised video representation learning. + +Hence, I recommend accepting this paper.",ICLR2020, +q-T6nSNejIs,1642700000000.0,1642700000000.0,1,HMR-7-4-Zr,HMR-7-4-Zr,Paper Decision,Reject,"The reviewers initially struggled to position this contribution in terms of usefulness. During the discussion phase, it became (more) clear that the proposed method is best used to reduce the communication overhead of ZeRO3. While the integration of this work and ZeRO hasn't been attempted yet, the authors claim that this work ""clears the theoretical barrier"". From that point of view, the reviewers were not satisfied with the guarantees of the method, arguing that the resulting algorithm is slower than standard EF and could suffer in terms of runtime (when one factors the cost of compression) even when compared to standard uncompressed SGD. Overall, the discussion greatly improved the paper, although directly integrating ConEF with ZeRO could be even more convincing.",ICLR2022, +HJOywyaHM,1517250000000.0,1517260000000.0,912,S14EogZAZ,S14EogZAZ,ICLR 2018 Conference Acceptance Decision,Reject,"The authors present a toy stacking task where the goal is to stack blocks to match a given configuration, and a method that is a slightly modified DQN algorithm where the target configuration is observed by the network as well as the current state. There are a few problems with this paper. First, the method lacks novelty - it is very similar to DQN. Second, the claims of learning physical intuitions is not borne out by the method or experimental results. Third, the tasks are very simple and there is no held-out test set of target configurations. ",ICLR2018, +hJA_pMYGUYi,1642700000000.0,1642700000000.0,1,_B8Jd7Nqs7R,_B8Jd7Nqs7R,Paper Decision,Reject,"The paper provides a new geometric functional analysis perspective for the generalization bounds for neural networks. As the AC, I actually quite liked the twist the authors are providing for this particular work. Unfortunately, the current presentation is too crude to provide an elementary picture for the developments and I strongly encourage the authors to revise the paper for the next deadline based on the remarks from the reviewers.",ICLR2022, +HJgWDYRUlV,1545170000000.0,1545350000000.0,1,ByxHb3R5tX,ByxHb3R5tX,Reasonable but somewhat incremental result,Reject,"In considering the reviews and the author response, I would summarize the evaluation of the paper as following: The main idea in the paper -- to combine goal-conditioning with successor features -- is an interesting direction for research, but is somewhat incremental in light of the prior work in the area. Most of the reviewers generally agreed on this point. While a relatively incremental technical contribution could still result in a successful paper with a thorough empirical analysis and compelling results, the evaluation in the paper is unfortunately not very extensive: the provided tasks are very simple, and the difference from prior methods is not very large. All of the tasks are equivalent to either grid worlds or reaching, which are very simple. Without a deeper technical contribution or a more extensive empirical evaluation, I do not think the paper is ready for publication in ICLR.",ICLR2019,5: The area chair is absolutely certain +4F48beLOXiE,1642700000000.0,1642700000000.0,1,WQIdU90Gsu,WQIdU90Gsu,Paper Decision,Reject,"All reviewers have substantial concerns regarding this work including novelty and experimental validation. The authors do not provide a rebuttal for the raised concerns. As such, the area chair agrees with the reviewers and does not recommend it be accepted at this conference.",ICLR2022, +_4uNzD_Rnj,1576800000000.0,1576800000000.0,1,HklxbgBKvr,HklxbgBKvr,Paper Decision,Accept (Poster),"The paper proposes a model based proximal policy optimization reinforcement learning algorithm for designing biological sequences. The policy of for a new round is trained on data generated by a simulator. The paper presents empirical results on designing sequences for transcription factor binding sites, antimicrobial proteins, and Ising model protein structures. + +Two of the reviewers are happy to accept the paper, and the third reviewer was not confident. The paper has improved significantly during the discussion period, and the authors have updated the approach as well as improved the presented results in response to comments raised by the reviewers. This is a good example of how an open review process with a long discussion period can improve the quality of accepted papers. + +A new method, several nice applications, based on a combination of two ideas (simulating a model to train a policy RL method, and discrete space search as RL). This is a good addition to the ICLR literature.",ICLR2020, +u9zn7lAkapZ,1642700000000.0,1642700000000.0,1,jT1EwXu-4hj,jT1EwXu-4hj,Paper Decision,Accept (Poster),"This paper presented a domain transportation perspective on optimizing recommender systems. The basic motivation is to view recommendation as applying some form of intervention, implying a distributional shift after the recommendation/intervention. Distribution shift brings tremendous difficulty to traditional causal inference or missing data theory perspective of recommender systems as it violates the distributional overlapping assumption: in simple terms, if the model recommends radically different set of items, there isn't much you can say about its generalization ability; on the other hand, if the model only recommends items that it already observed during training (no distribution shift at all), it would inherent all the biases which already exist in the data. To that end, this paper proposed a domain transportation perspective by introducing a Wasserstein distance constrained risk minimization to find interventions that can best transport the patterns it learns from the observed domain to the post-intervention domain. + +The paper received overall borderline scores. All the reviewers acknowledged that the proposed perspective is novel and has the potential to spark a new direction for future work. The reviewers raised concerns, ranging from the bounds in the paper, sensitivity of the optimization w.r.t. the hyperparameter, to some relevant but missing baselines. The authors provided very detailed response and revised the paper quite substantially to address most of the feedback. I also read the paper myself given the borderline scores, and I think the authors did a reasonably good job improving the paper and I agree this paper provides an interesting and novel perspective on viewing recommendation, though I also agree with one reviewer that the idea of ""partially extrapolation"" can be further explored. + +My major complaint is around experimental evaluation. It seems to me that only the semi-synthetic experiment actually makes sense in this context (where the measure is based on the unobserved relevance as opposed to observed click), as the traditional random-split-on-clicks evaluation would inevitably favor models with little distributional shift (the training and test data essentially come from the same distribution, maybe not so with a sequential setting but still close). Furthermore, the inclusion of Yahoo R3 and Coat dataset is even more confusing, as the associated test set implies random exposure which is certainly not what this paper aims to address, unless I am missing something in which case more clarification would be nice. + +My overall assessment of the paper is still leaning towards positive but I also wouldn't be too upset if this paper doesn't end up making it. However, if accepted, I do want the authors to carefully revise the presentation of the experimental results for the final version.",ICLR2022, +CgjVTGj5292,1610040000000.0,1610470000000.0,1,pwwVuSICBgt,pwwVuSICBgt,Final Decision,Reject,"After reading the paper, reviews and authors’ feedback. The meta-reviewer agrees that this paper addresses an important topic. However, as the reviewers pointed out. The paper mainly builds the technique on simulated setting, and it is unclear how the method will translate to real world speedups. Past work(e.g. [1]) has also shown that many cases there could be a huge gap when the solution is not built carefully. + +The paper would benefit from a prototype to demonstrate the applicability of the approach. This paper is therefore rejected. + +Thank you for submitting the paper to ICLR. + +[1] Riptide: Fast End-to-End Binarized Neural Networks +",ICLR2021, +ryPMUk6BG,1517250000000.0,1517260000000.0,737,B1EGg7ZCb,B1EGg7ZCb,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers agree that the manuscript is below the acceptance threshold at ICLR. Many points of criticism were evident in the reviewer comments, including small artificial test domain, no new methods introduced, poor writing in some places, and dubious need for DeepRL in this domain. The reviews mentioned a number of constructive comments to improve the paper, and we hope this will provide useful guidance for the authors to rewrite and resubmit to a future venue.",ICLR2018, +rkmWXy6rf,1517250000000.0,1517260000000.0,72,S1D8MPxA-,S1D8MPxA-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The paper proposes a new sparse matrix representation based on Viterbi algorithm with high and fixed index compression ratio regardless of the pruning rate. The method allows for faster parallel decoding and achieves improved compression of index data storage requirement over existing methods (e.g., magnitude-based pruning) while maintaining the pruning rate. The quality of paper seems solid and of interest to a subset of the ICLR audience.",ICLR2018, +_fHS1cKvUUn,1642700000000.0,1642700000000.0,1,XpmTU4k-5uf,XpmTU4k-5uf,Paper Decision,Reject,"This paper has been reviewed by four experts. Their independent evaluations were all below the acceptance threshold citing various issues ranging from disconnection between stated goals of the presented work and the means in which the approach was evaluated, to doubts about the scalability of the proposed approach, to the lack of clarity regarding the actual novelty of the approach given some key missed references, to name a few items of criticism. Most reviewers were impressed with the empirical performance achieved in the conducted experiments, and one of the reviewers raised their mark in response to the author's rebuttal. Yet, the overall evaluation places this work as it stands now below the threshold for ICLR acceptance. I would like to encourage the authors to continue pushing their promising endeavor and systematically incorporating the feedback received here to improve the overall quality of this work.",ICLR2022, +36PJm4Zrz9,1576800000000.0,1576800000000.0,1,BygSP6Vtvr,BygSP6Vtvr,Paper Decision,Accept (Poster),"The paper investigates how to distill an ensemble effectively (using a prior network) in order to reap the benefits of uncertainty estimation provided by ensembling (in addition to the accuracy gains provided by ensembling). + +Overall, the paper is nicely written, and makes a valuable contribution. The authors also addressed most of the initial concerns raised by the reviewers. I recommend the paper for acceptance, and encourage the authors to take into account the reviewer feedback when preparing the final version.",ICLR2020, +JEKqGlERtQC,1610040000000.0,1610470000000.0,1,a9nIWs-Orh,a9nIWs-Orh,Final Decision,Reject,"This paper proposes a new mechanism, called HIRE, to improve the down-stream performance of a pre-trained Transformer on NLP tasks. Different from directly using the last layer of transformer, the proposed model allows the system to dynamically decide which intermediate layers to use based on the input through some sort of gating. The model is evaluated on GLUE, a benchmark for natural language understanding. My major concerns are the following +1. the gating mechanism on using intermediate sentence representation is not new, as pointed by some reviewers, although its implementation on transformers is still interesting. +2. the empirical part is not convincing enough: a) GLUE data set is relatively simple, the authors should try something more complex, b)the improvement over baseline is rather modest, which could be achieved with simpler modification. + +I'd suggest to reject this paper.",ICLR2021, +8Jp-ogWVfL,1642700000000.0,1642700000000.0,1,AlPBx2zq7Jt,AlPBx2zq7Jt,Paper Decision,Reject,"## A Brief Summary +This paper proposes two critical modifications to the original RUDDER algorithm: +1. Proposes the Align-RUDDER method that assumes that the episodes with high rewards can be used as demonstrations. +2. Uses a profile model from the Multiple sequence alignment approach to align the demonstrations and redistribute the rewards according to how frequently events in the demos are shared across different demonstrators. MSA is being used as a profile model instead of LSTM. + +The paper uses successor features to represent state-action pairs, which is then used to compute the similarity matrix used for MSA afterward. +The paper shows promising results in the Minecraft environment (ObtainDiamond task,) as well as synthetic grid-world environments. + +## Reviewer bJbP +*Strengths:* +- Empirical evaluation is well-done. +*Weaknesses:* +- The writing requires more work. +- Limited experiments: Mostly on toy-grid world/navigation environments, it is not clear if the results will generalize to the control problems. + +## Reviewer mK3T +*Strengths:* +- Simple and effective technique for identifying sub-goals. +- Large improvements over original RUDDER. +- Impressive results on Minecraft. +*Weaknesses:* +- More through ablations on the importance of different elements of Align-RUDDER. +- Presentation and writing need more improvements. +- Assumption of a single underlying successful strategy is an important limitation. +- Figure 1 is problematic and confusing because of the way it explains the RUDDER algorithm. + +## Reviewer nk2L +*Strengths:* +- Impressive results on Minecraft. +*Weaknesses:* +- Poor justification and motivation. +- RUDDER vs Align-RUDDER comparisons are only done on two grid-world environments. +- More ablations are required to justify the approach. +- Writing requires more work, some important concepts require more clarity. Some undefined concepts... +- Incorrect claims such as: +> Q-function of an optimal policy resembles a step function + +## Reviewer YcqX +*Strengths:* +- Strong motivation. +- MSA for demos is novel. +- Strong experimental results. +*Weaknesses:* +- Several grammatical errors. +- The method is not explained well in the paper, the writing needs more work to improve the clarity. +- Lack of sufficient analysis and ablations on the Align-RUDDER approach. + +## Key Takeaways and Thoughts +Overall, the result provided in this paper in the Minecraft environment is impressive. The motivation for the Align-RUDDER is clear for me. I like the paper; in particular, the application of the MSA for the alignments across the demos is novel. However, as all the reviewers of this paper agreed that the paper is unclear, especially the method description requires more work. The paper needs to present more ablations and analysis to justify which components of Align-RUDDER algorithm are essential. I agree with both insights, the authors have made improvements in the paper to improve the exposition of the algorithms, but still, the paper feels a bit rushed. I would recommend the authors reconsider the paper's current structure and improve the writing further, especially the description of the method can be further improved. I would recommend that the authors fix those essential issues with the paper and the other comments reviewers made in a future resubmission.",ICLR2022, +yXMoxJc7CTT,1642700000000.0,1642700000000.0,1,ZWykq5n4zx,ZWykq5n4zx,Paper Decision,Reject,"Confidence boosting via aggregating multiple run of algorithms has been used before. The main result of the paper relies on a generic confidence boosting trick. The authors for instance cite Shalev-Schwartz et al 2010 theorem 26 in remark 4 of their paper and correctly point out that for deterministic algorithms like ERM one can use that for confidence boosting. While that theorem there is proved for excess risk and for deterministic algorithms, the main idea there to me seems like what is used in the authors paper as well. + +The basic idea: +Property A holds in expectation, Hence use Markov inequality to get a low grade probability version of it in each of the K pieces +Now probability that at least one of the pieces is good is high since each piece is independent of the other +Finally aggregate with simple concentration with union bound. + +In Shalev-Schwartz et al 2010 this is done with property being excess risk, here it is done with generalization error. + +(Oh and I should add, the fact that the algorithm is randomized does not affect this line of reasoning as long as we use fresh randomness for each of the K blocks). + +Now the missing piece covered is that on-average stability implies generalization in expectation. But isn’t this already known to be true in the stability literature? + +To me it seems like the main technical contribution of the paper is not as novel. Further, as one of the reviewers points out, the main goal should be to prove high probability guarantee for the algorithm popularly used like SGD not the confidence boosted version of it. + +None the less, it seems like the application of the result to SGD seems interesting and somewhat new. + +I am reluctant to propose an accept here.",ICLR2022, +I5K0v0RnKi,1610040000000.0,1610470000000.0,1,4rsTcjH7co,4rsTcjH7co,Final Decision,Reject,"The authors propose a technique called Autoencoder Adversarial Interpolation (AEAI). The key idea is to train autoencoder architectures that explicitly ""shapes"" trajectories in the encoder (latent) space to correspond to smooth geodesics between data points. This is achieved by a combination of several loss terms that are fairly intuitive. The authors empirically justify each term via ablation studies on simple datasets. + +Initial review scores had wide variance. The reviewers liked the overall approach as well as the clarity with which the theory and experiments were presented, but raised several concerns. The authors provided succinct responses that seem to have satisfied the reviewers on average. + +Unfortunately, after having carefully read this paper (and the authors' responses), I have to go against the wishes of the majority of the reviewers, and recommend a reject. My two main concerns are as follows: +a) The synthetic pole dataset, as well as the COIL-100 dataset, are far too simplistic to evaluate performance. It is now standard to report results on considerably more challenging datasets. +b) Echoing R2 -- the authors should articulate why a shaped latent space should actually matter in applications, beyond giving intuitive(I guess?) visualizations and reconstruction error curves. Results on downstream tasks may be one avenue to achieve this.",ICLR2021, +LJUn1BRBY44,1642700000000.0,1642700000000.0,1,_CfpJazzXT2,_CfpJazzXT2,Paper Decision,Accept (Oral),"This paper proposes an approach for 8-bit fixed point training of NNs, based on a careful analysis of quantization error in fixed-point methods. They present convincing and thorough empirical results in addition to a detailed analysis providing insights about their method. Reviews for this paper were quite split. One reviewer was a strong advocate, asserting that the paper will have substantial impact in the area, and that the authors’ approach of minimizing quantization error for fixed-point training is of substantial practical interest. Other reviewers were concerned that the proposed method was not novel enough, and that the proposed approach was not practical enough to work in realistic hardware use cases. The authors provided substantial detailed responses addressing the majority of reviewers’ concerns, and after following the discussion in detail I agree with the reviewer advocating for the paper, that the paper presents a practical, novel approach with valuable insights for the field from their analysis and results. + +I indicated I am certain about this decision, but I would be ok with the paper being bumped down from oral to poster.",ICLR2022, +dobXbcTo3lX,1642700000000.0,1642700000000.0,1,OqHtVOo-zy,OqHtVOo-zy,Paper Decision,Reject,"This paper received a majority voting of rejection. In the internal discussion, one reviewer updated his/her score from 1 to 3 according to the author response. I have read all the materials of this paper including manuscript, appendix, comments and response. Based on collected information from all reviewers and my personal judgement, I can make the recommendation on this paper, *rejection*. Here are the comments that I summarized, which include my opinion and evidence. + +**Interesting Idea** + +Every reviewer including me agree that the idea of modelling Bayes label transition is novel and interesting. + +**The motivation lacks of supportive evidence** + +The second motivation that ""the feasible solution space of the Bayes label transition matrix is much smaller than that of the clean label transition matrix"" is not well supported. The authors should theoretically or empirically demonstrate this point. The current description on uncertainty is not strong enough. Moreover, if so, the benefits are not illustrated. The feasible solution space, even with a small coverage area is continuous with infinite solutions. + +**A new concept** + +The authors tried to sell the concept of a new transition matrix, but failed. I believe it might result from the organization and presentation. The authors spent too much pages introducing others' work. At least, a formal definition of the new concept should be given. In the current version, Definition 1 is from Cheng et al., 2020 on distilled examples. + +**Title** + +Literally from title, I guess DNN is a key component or a selling point of this paper. Actually no. We expect the authors could provide the insights on what benefits are using DNN over other techniques and how to apply DNN to estimate the transition matrix. If this is not a selling point, this word might be removed from the title. + +**Algorithm 1** + +I am a little surprised that the only algorithm listed in this paper is label noise generation. Instead the proposed algorithm of this paper is expected. + +**Experimental Evaluation** + +The experimental results look much better than other baselines. It is a little confusing that some best results are bold, some not. + +**Presentation** + +Although I did not notice obvious grammar errors, some sentences are very long (3 lines). They made difficulties to follow the idea. I have to read these sentences several times. In my eyes, this is the biggest one! Presentation means how to sell the idea to audience (not only reviewers, but also future readers) in an easy understood way. The current version spent much space introducing others' work; on the contrary, the original or key part is not well illustrated. + +Although this paper has a novel idea and good experimental support, other issues listed above demonstrate the current version is not ready for a top-tier conference. No objection from reviewers was raised to again this recommendation.",ICLR2022, +HkqS7ypSf,1517250000000.0,1517260000000.0,133,BydLzGb0Z,BydLzGb0Z,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Simple idea (which is a positive) to regularize RNNs, broad applicability, well-written paper. Initially, there were concerns about comparisons, but he authors have provided additional experiments that have made the paper stronger. ",ICLR2018, +SyxvKhdxxN,1544750000000.0,1545350000000.0,1,ryxhB3CcK7,ryxhB3CcK7,Very borderline,Reject,"This paper proposes a latent variable approach to the neural module networks of Andreas et al, whereby the program determining the structure of a module network is a structured discrete latent variable. The authors explore inference mechanisms over such programs and evaluate them on SHAPES. + +This paper may seem acceptable on the basis of its scores, but R1 (in particular) and R3 did a shambolic job of reviewing: their reviews are extremely short, and offer no substance to justify their scores. R2 has admirably engaged in discussion and upped their score to 6, but continue to find the paper fairly borderline, as do I. Weighing the reviews by the confidence I have in the reviewers based on their engagement, I would have to concur with R2 that this paper is very borderline. I like the core idea, but agree that the presentation of the inference techniques for V-NMN is complex and its presentation could stand to be significantly improved. I appreciate that the authors have made some updates on the basis of R2's feedback, but unfortunately due to the competitive nature of this year's ICLR and the number of acceptable paper, I cannot fully recommend acceptance at this time. + +As a complete side note, it is surprising not to see the Kingma & Welling (2013) VAE paper cited here, given the topic.",ICLR2019,3: The area chair is somewhat confident +iDumLBWDOjT,1642700000000.0,1642700000000.0,1,3rULBvOJ8D2,3rULBvOJ8D2,Paper Decision,Accept (Poster),"After carefully reading all reviews and rebuttal, I actually think the paper provides sufficient new insight in understanding MAML that is worth being accepted. I want to thank the authors for actively engaging with the reviewers, and providing sufficient changes to the paper in order to clarify and improve its contributions. + +Theoretical results tend to be harder to judge, as they often need to happen under assumptions that make them tractable. Nevertheless they provide intuitions and understanding of the underlying principle that end up having an impact even in more realistic scenarios where these assumptions might not hold. I think this is such a scenario, and I think better understanding the relationship between ERM and approaches as meta-learning is important for the field moving forward.",ICLR2022, +H1lRBJZHxE,1545040000000.0,1545350000000.0,1,rkMD73A5FX,rkMD73A5FX,"Expensive approach, unclear writing",Reject,"This paper introduces Mahe, a model-agnostic hierarchical explanation technique, that constructs a hierarchy of explanations, from local, context-dependent ones (like LIME) to global, context-free ones. The reviewers found the proposed work to be a quite interesting application of the neural interaction detection (NID) framework, and overall found the results to be quite extensive and promising. + +The reviewers and the AC note the following as the primary concerns of the paper: (1) a crucial concern with the proposed work is the clarity of writing in the paper, and (2) the proposed work is quite expensive, computationally, as the exhaustive search is needed over local interactions. + +The reviewers appreciated the detailed comments and the revision, and felt the revised the manuscript was much improved by the additional editing, details in the papers, and the additional experiments. However, both reviewer 1 and 3 have strong reservations about the computational complexity of the approach, and the additional experiments did not alleviate it. Further, reviewer 1 is still concerned about the clarity of the work, finding much of the proposed work to be unclear, and recommends further revisions. + +Given these considerations, everyone felt that the idea is strong and most of the experiments are quite promising. However, without further editing and some efficiency strategies, it barely misses the bar of acceptance. +",ICLR2019,3: The area chair is somewhat confident +BJlvuxYAJE,1544620000000.0,1545350000000.0,1,rkeMHjR9Ym,rkeMHjR9Ym,ICLR 2019 decision,Reject,"This paper shows convergence of stochastic gradient descent for the problem of learning weight matrices for a linear dynamical system with non-linear activation. Reviewers agree that the problem considered is both interesting and challenging. However the paper makes many simplifying assumptions - 1) both input and hidden state are observed, a very non standard assumption, 2) analysis requires increasing activation functions, cannot handle ReLU functions. I agree with R2 and think these assumptions make the results significantly weaker. R1 and R3 are more optimistic, but authors response does not give an insight into how one might extend this analysis to the setting where hidden state is not observed. Relaxing these assumptions will make the paper more interesting. ",ICLR2019,4: The area chair is confident but not absolutely certain +BkeKq8AggV,1544770000000.0,1545350000000.0,1,HyzdRiR9Y7,HyzdRiR9Y7,Universal Transformers (with optional dynamic halting/ACT),Accept (Poster),"This paper presents Universal Transformers that generalizes Transformers with recurrent connections. The goal of Universal Transformers is to combine the strength of feed-forward convolutional architectures (parallelizability and global receptive fields) with the strength of recurrent neural networks (sequential inductive bias). In addition, the paper investigates a dynamic halting scheme (by adapting Adaptive Computation Time (ACT) of Graves 2016) to allow each individual subsequence to stop recurrent computation dynamically. + +Pros: +The paper presents a new generalized architecture that brings a reasonable novelty over the previous Transformers when combined with the dynamic halting scheme. Empirical results are reasonably comprehensive and the codebase is publicly available. + +Cons: +Unlike RNNs, the network recurs T times over the entire sequence of length M, thus it is not a literal combination of Transformers with RNNs, but only inspired by RNNs. Thus the proposed architecture does not precisely replicate the sequential inductive bias of RNNs. Furthermore, depending on how one views it, the network architecture is not entirely novel in that it is reminiscent of the previous memory network extensions with multi-hop reasoning (--- a point raised by R1 and R2). While several datasets are covered in the empirical study, the selected datasets may be biased toward simpler/easier tasks (--- R1). + +Verdict: +While key ideas might not be entirely novel (R1/R2), the novelty comes from the fact that these ideas have not been combined and experimented in this exact form of Universal Transformers (with optional dynamic halting/ACT), and that the empirical results are reasonably broad and strong, while not entirely impressive (R1). Sufficient novelty and substance overall, and no issues that are dealbreakers. ",ICLR2019,4: The area chair is confident but not absolutely certain +3GC4oMIvYo,1610040000000.0,1610470000000.0,1,yEnaS6yOkxy,yEnaS6yOkxy,Final Decision,Reject,"The authors have provided very detailed responses and added additional experimental results, which have helped address some of the referees' concerns. However, since the modification made to a vanilla GAN algorithm is relatively small, the reviewers are hoping to see the experiments on more appropriate real-world datasets (not artificially created imbalanced datasets with relatively few classes), more/stronger baselines, and rigorous theoretical/empirical analysis of the method's sensitivity to the quality of the pre-trained classifier. The paper is not ready for publication without these improvements. ",ICLR2021, +tg8-Nt9HO4S,1642700000000.0,1642700000000.0,1,mTcO4-QCOB,mTcO4-QCOB,Paper Decision,Reject,"The paper explores ""Astuteness of explainer"", to measure reliability of the explanations. There were concerns about the overlap of the proposed work with existing literature. It was felt that both theory and experiments need more development",ICLR2022, +SkLlwJTSf,1517250000000.0,1517260000000.0,925,SkF2D7g0b,SkF2D7g0b,ICLR 2018 Conference Acceptance Decision,Reject,"The paper explores an increasingly important questions, especially showing the attack on existing APIs. The update to the paper has also improved it, but the paper is still not yet as impactful as it could be and needs much more comprehensive analysis to correctly appreciate its benefits and role.",ICLR2018, +E3CJgBqpS4,1576800000000.0,1576800000000.0,1,Bke_DertPB,Bke_DertPB,Paper Decision,Accept (Poster),"This paper introduces an adversarial approach to enforcing a Lipschitz constraint on neural networks. The idea is intuitively appealing, and the paper is clear and well written. It's not clear from the experiments if this method outperforms competing approaches, but it is at least comparable, which means this is at the very least another useful tool in the toolbox. There was a lot of back-and-forth with the reviewers, mostly over the experiments and some other minor points. The reviewers feel like their concerns have all been addressed, and now agree on acceptance. +",ICLR2020, +lQWvXUknDH,1576800000000.0,1576800000000.0,1,ryxsUySFwr,ryxsUySFwr,Paper Decision,Reject,"The paper investigates out-of-distribution detection for regression tasks. + +The reviewers raised several concerns about novelty of the method relative to existing methods, motivation & theoretical justification and clarity of the presentation (in particular, the discussion around regression vs classification). + +I encourage the authors to revise the draft based on the reviewers’ feedback and resubmit to a different venue. +",ICLR2020, +S1e6qoJbgE,1544780000000.0,1545350000000.0,1,H1gZV30qKQ,H1gZV30qKQ,The paper needs to be imrpvoed,Reject,"The paper studies whether the best strategy for transfer learning in RL is to transfer value estimates or policy probabilities. The paper also presents a model-based value-centric (MVC) framework for continuous RL. The reviewers raised concerns regarding (1) the coherence of the story, (2) the novelty and importance of the MVC framework and (3) the significance of the experiments. I encourage the authors to either focus on the algorithmic aspect or the transfer learning aspect and expand on the experimental results to make them more convincing. I appreciate the changes made to improve the paper, but in its current form the paper is still below the acceptance threshold at ICLR. + +PS: in my view one can think of value as (shifted and scaled) log of policy. Hence, it is a bit ambiguous to ask whether to transfer value or policy.",ICLR2019,4: The area chair is confident but not absolutely certain +YYopxreh3o,1576800000000.0,1576800000000.0,1,rkxKwJrKPS,rkxKwJrKPS,Paper Decision,Reject,"There is insufficient support to recommend accepting this paper. Although the authors provided detailed responses, none of the reviewers changed their recommendation from reject. One of the main criticisms, even after revision, concerned the quality of the experimental evaluation. The reviewers criticized the lack of important baselines, and remained unsure about adequate hyperparameter tuning in the revision. The technical exposition lacked a sober discussion of limitations. The paper would be greatly strengthened by the addition of a theoretical justification of the proposed approach. In the end, the submitted reviews should be able to help the authors strengthen this paper.",ICLR2020, +VvitcfKbge,1642700000000.0,1642700000000.0,1,Sb4hTI15hUZ,Sb4hTI15hUZ,Paper Decision,Reject,All reviewers recommended reject. No responses from the authors.,ICLR2022, +Hk7TfJTrf,1517250000000.0,1517260000000.0,28,Hy6GHpkCW,Hy6GHpkCW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This work presents a RNN tailored to generate sketch drawings. The model has novel elements and advances specific to the considered task, and allows for free generation as well as generation with (partial) input. The results are very satisfactory. Importantly, as part of this work a large dataset of sketch drawings is released. The only negative aspect is the insufficient evaluation, as pointed out by R1 who points out the need for baselines and evaluation metrics. R1’s concerns have been acknowledged by the authors but not really addressed in the revision. Still, this is a very interesting contribution.",ICLR2018, +SJ1Riz8_x,1486400000000.0,1486400000000.0,1,BJAA4wKxg,BJAA4wKxg,ICLR committee final decision,Reject,"This work demonstrates architectural choices to make conv nets work for NMT. In general the reviewers liked the work and were convinced by the results but found the main contributions to be ""incremental"". + + Pros: + - Clarity: The work was clearly presented, and besides for minor comments (diagrams) the reviewers understood the work + - Quality: The experimental results were thorough, ""very extensive and leaves no doubt that the proposed approach works well"". + + Mixed: + - Novelty: There is appreciation that the work is novel. However as the work is somewhat ""application-specific"" the reviewers felt the technical contribution was not an overwhelming contribution. + - Impact: While some speed ups were shown, not all reviewers were convinced that the benefit was sufficient, or ""main speed-up factor(s)"" were. + + This work is clearly worthwhile, but the reviews place it slightly below the top papers in this area.",ICLR2017, +fAfHTpGZOkt,1610040000000.0,1610470000000.0,1,NPab8GcO5Pw,NPab8GcO5Pw,Final Decision,Reject,"The paper studies optimization landscapes arising the fitting of sparse linear networks to data. It argues that for scalar outputs, every local minimum is global, while for d >= 3 dimensional outputs, there can be spurious local minimizers. The paper also argues that similar results hold for deep networks. Counterexamples on the existence of non-global local minimizers are constructed analytically and corroborated by probing the optimization landscape experimentally. + +Pros and cons: + +[+] Network sparsification is an important practical problem, and there are relatively few theoretical guidelines on when and how sparsification can be achieved. In particular, results that help to explain why trained networks are often sparsifiable and/or provide theoretical guarantees for sparsification algorithms would be significant. + +[-] Reviewers raised concerns about the significance of the paper’s results. In particular, they found it difficult to connect the paper’s analysis of landscapes of linear networks to the question of when and why practically occurring networks can be pruned. They also had difficulty isolating new ideas in the mathematical analysis beyond previous works on the landscape of linear networks. Finally, several reviewers expressed concerns about the extent to which the paper’s observations on linear networks generalize to nonlinear networks. + +[-] Reviewers raised concerns about the clarity of the paper. The mathematical exposition is unclear in places: conditions are not clearly stated, terminology is occasionally vague. Moreover the paper’s handling of optimality conditions for constrained optimization problems is unclear (e.g., the proof of Theorem 6 uses the unconstrained optimality condition $\ell^T X = 0$ in the inductive step, even though the inductive hypothesis pertains to a constrained problem). + +[-] The technical results make assumptions that are occasionally quite strong. For example, Theorem 7 requires orthogonality of the data matrices, when restricted to indices where the weights are nonzero. As reviewers note, this assumption seems highly restrictive. + +Overall, the paper addresses a topic that is important to the ICLR community: developing theoretical analyses of network sparsification. However, the significance of its results is not clear, and the technical exposition would need significant improvement to meet the bar for publication. ",ICLR2021, +eOT8yYYYkl,1642700000000.0,1642700000000.0,1,size4UxXVCY,size4UxXVCY,Paper Decision,Reject,"First, I would like to thank all the reviewers for their efforts in reading and understanding this paper. I tried to read the paper as well and I also find it's really difficult (if possible) for me to understand the ideas presented here. The most important task in writing a paper (as Reviewer Svha also suggested in his/her review) in the field of machine learning is to explain to your peers what is the problem you are trying to solve and how you solve (or partially solve) that problem. I think there is a consensus among the reviewers that the paper did not do a great job of that. I am not questioning the quality of the idea or the research here, but I think the paper here will need to do a significantly better job here in explaining the idea before it can be a good ICLR publication.",ICLR2022, +bmukdcHZsa,1576800000000.0,1576800000000.0,1,ryxgsCVYPr,ryxgsCVYPr,Paper Decision,Accept (Poster),"This paper extracts a list of conditions from the question, each of which should be satisfied by the candidate answer generated by an MRC model. All reviewers agree that this approach is interesting (verification and validation) and experiments are solid. One of the reviewers raised concerns are promptly answered by authors, raising the average score to accept. +",ICLR2020, +QZ6WKTPIYUi,1642700000000.0,1642700000000.0,1,VXqNHWh3LL,VXqNHWh3LL,Paper Decision,Reject,"This paper introduces a perceptual similarity on top on the commonly used perceptual loss in the literature (LPIPS). The authors draw experiments highlighting that human perceptual similarity is invariant to small shifts, whereas standard metrics are not. The paper studies several factors (anti-aliasing, pooling, striding, padding, skip connection) in order to propose a measure on top of LPIPS achieving shift-invariance. + +This paper initially received mixed reviews. RLHuY was positive about the submission, pointing out the relevance of the real human data and the studied factors for measuring the impact on shift invariance. RGQvy was slightly positive, but also raised several concerns on justification of the claimed properties, human perception experiments, and positioning with respect to data augmentation (PIM). RLHuY, an expert on the field, recommended clear rejection, pointing out missing references (including DISTS), the limited scope of the paper (shift invariance and tiny shifts). After rebuttal, RLHuY and RLHuY stuck to their positions ; RGQvy were inclined to borderline reject because of unconvincing answers on comparison to data augmentation techniques. + +The AC's own readings confirmed the concerns raised by RGQvy and RLHuY, and points the following limitations: + +- The submission includes limited contribution and expected results: the studied modifications on neural networks' architecture, although meaningful, directly follow ideas borrowed from the literature. They are not supported by stronger theoretical analysis, and several insights related to accuracy or robustness remain unclear. +- Experimental results are contrasted, e.g. compared to data-augmentation: although these approaches are more demanding at train time, they do not induce any overhead at test time - in contrast to the proposed approach. + +Therefore, the AC recommends rejection.",ICLR2022, +rkV28JaHG,1517250000000.0,1517260000000.0,868,SkffVjUaW,SkffVjUaW,ICLR 2018 Conference Acceptance Decision,Reject,"Regarding clarity, while the paper definitely needs work if it is to be resubmitted to an ML venue, different revisions would be appropriate for a physics audience. And given the above comment, any suggested changes are likely to be superfluous.",ICLR2018, +KT8aePozXB,1576800000000.0,1576800000000.0,1,Syx79eBKwr,Syx79eBKwr,Paper Decision,Accept (Spotlight),"This paper explores several embedding models (Skip-gram, BERT, XLNet) and describes a framework for comparing, and in the end, unifying them. The framework is such that it actually suggests new ways of creating embeddings, and draws connections to methodology from computer vision. + +One of the reviewers had several questions about the derivations in your paper and was worried about the paper's clarity. But all of the reviewers appreciated the contributions of the paper, which joins multiple seemingly disparite models under into one theoretical framework. + +The reviewers were positive about the paper, and in particular were happy to see the active response of authors to their questions and willingness to update the paper with their suggested improvements.",ICLR2020, +b5RAHLmJLaW,1610040000000.0,1610470000000.0,1,1AyPW2Emp6,1AyPW2Emp6,Final Decision,Reject,"The authors develop a novel robustness certificate based on randomized smoothing that accounts for second-order smoothness of functions smoothed with Gaussian noise. They develop a variant of Gaussian smoothing based on these insights that improves sample-efficiency of randomized smoothing using gradient information. + +While the ideas presented were interesting, reviewers were concerned about the quality of presentation of the paper (confused positioning of results relative to prior work) as well as the lack of significant improvements upon existing methods in the experimental section. Overall, the paper is borderline based on the reviewers' comments and ratings - however, there is not sufficient evidence to justify acceptance. + +I would encourage the authors to consider a significant revision to improve the clarity of contributions made and strengthen experimental results to demonstrate significant improvements, which would validate the power of the theoretical ideas presented.",ICLR2021, +b1AfX7AT80,1642700000000.0,1642700000000.0,1,OqlohL9sVO,OqlohL9sVO,Paper Decision,Reject,"The paper proposes a new method to combine global and local image features, targeted at image retrieval applications. The main idea is a model branch where both spatial and channel attention are used. The local feature branch undergoes supervision directly and this branch’s output is concatenated to the global feature branch’s output in order to eventually produce the final image embedding. + +The reviewers appreciated the care in the evaluation (ablative analysis) and the promise of the approach compared to existing baselines. The reviewers also expressed concerns about several claims, for instance that the proposed approach is able to learn homography transformations, the quality of the exposition, and missing baselines. The reviewers also pointed out that several parts of the paper were hard to follow and important details were missing. + +The authors submitted responses to the reviewers' comments. After reading the response, updating the reviews, and discussion, the reviewers considered that ‘the concatenation of local features with global ones works does not mean at all that some geometric transformation is learned’ and the justification provided for omitting baselines (suggested by the reviewers) were unconvincing. The feedback provided was already fruitful, yet major issues still remain. + +We encourage the paper to pursue their approach further taking into account the reviewers' comments, encouragements, and suggestions. The detailed feedback lays out a clear path to generate a stronger submission to a future venue. + +Reject.",ICLR2022, +6lGlipFgr3U,1642700000000.0,1642700000000.0,1,pVU7Gp7Nq4k,pVU7Gp7Nq4k,Paper Decision,Reject,"This paper undertakes an empirical investigation of overparameterized neural networks, studying the last hidden representation and identifying ""representation mitosis,"" a cloning effect whereby neurons split into groups that carry the same information. The effect is observed for a variety of architectural configurations/datasets, and a detailed set of experiments are performed to investigate the behavior. + +The reviewers had split opinions about this paper, with most reviewers appreciating the novelty and salience of the observations, but with some reviewers expressing skepticism about the generality of the effect. While the experiments are thorough and revealing, the practical importance of representation mitosis remains somewhat unconvincing. + +A primary motivating factor for the analysis is the search for an explanation of the unexpectedly good generalization behavior of oveparameterized networks and the origin of ""benign overfitting."" As highlighted in the reviews, the sensitivity of the mitosis effect to (1) training to zero loss and (2) optimal regularization suggests that it cannot be the sole explanation for benign overfitting, since the latter can and does occur without these conditions. The authors acknowledge this situation, and respond that their focus is on state-of-the-art models used by the community, rather than on toy settings. For this to be a persuasive response, more compelling results in these state-of-the-art situations should be evidenced -- in particular, as several reviewers pointed out, the negative results on ImageNet undermine this point to some extent. + +Overall, representation mitosis does seem like an interesting and potentially important phenomenon, but further work is needed to develop persuasive evidence in support of the interpretations and implications. While this is a borderline submission, I believe it falls just short of the mark, and cannot recommend acceptance.",ICLR2022, +ypZwAstWcOz,1642700000000.0,1642700000000.0,1,_PHymLIxuI,_PHymLIxuI,Paper Decision,Accept (Poster),"The paper proposes several modifications to vision transformers: multiscale features, a variant of factorized attention, and ""dynamic position bias"". The proposed architecture with these modifications achieves strong results on classification, detection, and segmentation. + +After considering the authors' responses, all reviewers are positive about the paper (reviewer K7wS mentioned upgrading to weak accept, but apparently forgot to do so). Main pros include clean architecture and strong empirical results. The main con is the somewhat limited novelty. + +Overall, I recommend acceptance. While each of the proposed modifications might not be that unique, they are reasonably new in the context of transformers and their combination makes for a clean architecture that performs very well in practice.",ICLR2022, +Q0bcibzWTB,1576800000000.0,1576800000000.0,1,BkgCv1HYvB,BkgCv1HYvB,Paper Decision,Reject,"This paper proposes an end-to-end approach for abstractive summarization of on-line discussions. The approach is contrary to the previous work that first disentangles discussions, and the summarizes them, and aims to tackle transfer of disentanglement errors in the pipeline. The proposed method is a hierarchical encoder - hierarchical decoder architecture. Experimental results on two corpora demonstrate the benefits of the proposed approach. The reviewers are concerned about the synthetic nature of the datasets, limited novelty given the previous work, lack of clear explanation of whether disentanglement is actually needed for summarization, and simpler baselines in comparison to the state-of-the-art. Hence, I recommend rejecting the paper.",ICLR2020, +S_lNsZUai9W,1642700000000.0,1642700000000.0,1,twv2QlJhXzo,twv2QlJhXzo,Paper Decision,Accept (Poster),"The submitted paper considers the very interesting problem of imitation learning from observations under transition model disparity. The reviewers recommended 2x weak accept and 1x weak reject for the paper. Main concerns about the paper regarded clarity of the presentation, complicatedness of the proposed method, and experimental validation. During the discussion phase, the authors addressed some of the comments and provided an update of the paper providing additional details. While some of the reviewers' concerns still stand, I think the addressed problem is very relevant and the proposed method can be (with clarifications and improvements of the presentation) be interesting to parts of the community. Hence I am recommending acceptance of the paper. Nevertheless, I strongly urge the authors to carefully revise their paper, and taking the reviewers' concerns carefully into account when preparing the camera ready version of the paper.",ICLR2022, +tMseedRHnf,1610040000000.0,1610470000000.0,1,vNw0Gzw8oki,vNw0Gzw8oki,Final Decision,Reject,"The paper presents a framework for incorporating physics knowledge (through, potentially incomplete, differential equations) into the deep kernel learning approach of Wilson et al. The reviewers found the paper addresses an important problem and presents good results. However, one of the main issues raised by R1 is that, although the proposed method can be applied to broader settings such as that of incomplete differential equations, there are still regimes where the comparison is not only possible but perhaps insightful. An example baseline is the work of Lorenzi and Filippone, “Constraining the Dynamics of Deep Probabilistic Models” (ICML, 2018). Another critical issue, raised by R4, is the insufficient clarity in the presentation. Many of the concerns raised by this reviewer were clarified in the discussion and I thank the authors for their engagement. However, the AC believes some of the points raised by R4 in this regard were left unaddressed in the paper and the manuscript does indeed require at least one more iteration. + +The format violation concerns raised during the reviewing process did not affect the decision on this paper, as the PCs confirmed that they did not meet the bar for desk rejection and recommended to assess the paper on its technical merits.",ICLR2021, +sIlKqsCbOK6,1610040000000.0,1610470000000.0,1,HP-tcf48fT,HP-tcf48fT,Final Decision,Reject,"The paper present a new learning-based approach` to solve the Maximum Common Subgraph problem. All the reviewers find the idea of using GCN and RL to guide the branch and bound interesting although, even after reading the rebuttal, there are some important concerns about the paper. + +The main issue raised by many reviewers are on scalability of the methods and motivation of the problem. It would be nice to add a scalability experiments on large networks(>1M nodes) to show that the method could potentially scale. In fact, the original motivation based on drug discovery, chemoinformatics etc. application is a bit weak because in those area domain specific heuristic should work better. + +Overall, the paper is interesting but it does not meet the high publication bar of ICLR.",ICLR2021, +O5W-znner8Q,1642700000000.0,1642700000000.0,1,6sh3pIzKS-,6sh3pIzKS-,Paper Decision,Accept (Poster),"This paper uses chemical reaction data as a means to help train molecule embeddings, by requiring embeddings to satisfy known reaction equations. The idea is nice and clear, and the paper includes strong empirical evaluation. All four reviewers agreed the paper could be accepted, with two of them raising their scores after a detailed author rebuttal and discussion, which included additional experiments.",ICLR2022, +A-GEi32Mm5,1576800000000.0,1576800000000.0,1,rJe9lpEFDH,rJe9lpEFDH,Paper Decision,Reject,The paper is rejected based on unanimous reviews.,ICLR2020, +OqSOQ83FiLE,1610040000000.0,1610470000000.0,1,SRDuJssQud,SRDuJssQud,Final Decision,Accept (Spotlight),"This paper addresses a central problem in inference in implicit models-- classical approaches on such problems ('ABC') rely on computation of summary statistics, and multiple methods for automatically finding summary statistics have been proposed. This paper provides a fresh take on this classical problem, by providing a methods for finding information-maximising summary stats. The work is original, likely impactful, and carried out rigorously and carefully. The reviewers flagged some issues with empirical comparisons, as well as discussion or relevant work-- those issues mainly seem to have been resolved in the review process. Moreover, given the originality of the approach, and provided that the description of empirical comparisons and relationship with other work are carefully and conservatively worded, I believe this will be worth publishing even if it is not always the 'best' method on all problems. ",ICLR2021, +G6tBqlTaAAM,1642700000000.0,1642700000000.0,1,JKRVarUs3A1,JKRVarUs3A1,Paper Decision,Reject,"While some of the reviewers find that the paper proposes a solid contribution to a problem, I will tend +to agree with other ones that the proposed approach has limited novelty and limited potential for improvement over baselines. In addition, simulations are pretty weak due to lack of comparisons to strong baselines and to lack of clarity.",ICLR2022, +IXp32sYV6MH,1642700000000.0,1642700000000.0,1,RAoBtzlwtCC,RAoBtzlwtCC,Paper Decision,Reject,"The reviewers had a number of concerns which seem to remain after the authors response. In particular, the reviewers were concerned about the validity of the paper's assumptions in real-world applications and lack of experimental results. Also, while the reviewers acknowledge the novelty in technical contributions, they suggested that the authors explain more clearly how the results of this paper are distinguishable from prior art.",ICLR2022, +2xmVSfCFEQ,1576800000000.0,1576800000000.0,1,HyeJf1HKvS,HyeJf1HKvS,Paper Decision,Accept (Poster),"The paper proposed an end-to-end network architecture for graph matching problems, where first a GNN is applied to compute the initial soft correspondence, and then a message passing network is applied to attempt to resolve structural mismatch. The reviewers agree that the second component (message passing) is novel, and after the rebuttal period, additional experiments were provided by the authors to demonstrate the effectiveness of this. Overall this is an interesting network solution for graph-matching, and would be a worthwhile addition to the literature.",ICLR2020, +E2TGMvqJfn,1610040000000.0,1610470000000.0,1,j0yLJ-MsgJ,j0yLJ-MsgJ,Final Decision,Reject,"The paper studies the effectiveness of few-shot learning techniques in settings where the training labels are imbalanced. While addressing an interesting practical problem, reviewers raised concerns about the paper's technical depth, insufficient distinction to existing techniques for coping with label imbalance, and limited qualitative conclusions from the results. The authors incorporated some of these comments in their revision, but a more comprehensive update on the latter two points appears appropriate.",ICLR2021, +Sk1CV1TrM,1517250000000.0,1517260000000.0,459,H1vCXOe0b,H1vCXOe0b,ICLR 2018 Conference Acceptance Decision,Reject,"The paper proposes a new method for interpreting the hidden units of neural networks by employing an Indian Buffet Process. The reviewers felt that the approach was interesting, but at times hard to follow and more analysis was needed. In particular, it was difficult to glean any advantage of this method over others. The authors did not provide a response to the reviews.",ICLR2018, +#NAME?,1576800000000.0,1576800000000.0,1,Byg5flHFDr,Byg5flHFDr,Paper Decision,Reject,"The paper proposes a combination graph neural networks and graph generation model (GraphRNN) to model the evolution of dynamic graphs for predicting the topology of next graph given a sequence of graphs. + +The problem to be addressed seems interesting, but lacks strong motivation. Therefore it would be better if some important applications can be specified. + +The proposed approach lacks novelty. It would be better to point out why the specific combination of two existing models is the most appropriate approach to address the task. + +The experiments are not fully convincing. Bigger and comprehensive datasets (with the right motivating applications) should be used to test the effectiveness of the proposed model. + +In short, the current version failed to raise excitement from readers due to the reasons above. A major revision addressing these issues could lead to a strong publication in the future. ",ICLR2020, +B1sqjzIdg,1486400000000.0,1486400000000.0,1,rkpACe1lx,rkpACe1lx,ICLR committee final decision,Accept (Poster),"The paper contains an interesting idea, and after the revision of 3rd Jan, the presentation is clear enough as well. (Although I find it now contains an odd repetition where related work is presented first in section 2, and then later in section 3.2).",ICLR2017, +_M0MhgGqPfFv,1642700000000.0,1642700000000.0,1,BZnnMbt0pW,BZnnMbt0pW,Paper Decision,Accept (Poster),"The paper received two accepts and 1 marginally above acceptance recommendations. The authors provided satisfactory answers, mostly on clarifying the unsupervised learning methodology, in conjunction with the MAA recommendation. I recommend the paper be accepted as poster.",ICLR2022, +TMOhk4hbEr,1576800000000.0,1576800000000.0,1,B1xDq2EFDH,B1xDq2EFDH,Paper Decision,Reject,"This paper received two weak and one strong reject from the reviewers. The major issues cited were 1) a lack of strong enough baselines or empirical results, 2) Novelty with respect to ""Certified adversarial robustness via randomized smoothing"" and 3) a limitation to Gaussian noise perturbations. Unfortunately, as a result the reviewers agreed that this work was not ready for acceptance. Adding stronger empirical results and a careful treatment of related work would make this a much stronger paper for a future submission.",ICLR2020, +BkgtA2LelV,1544740000000.0,1545350000000.0,1,HkxWrsC5FQ,HkxWrsC5FQ,Meta-review,Reject,"This manuscript proposes a generative model for images, then proposes a training procedure for fitting a convolutional neural network based on this model. One novelty if this result is that the generative procedure seems to be more complex than generative assumptions required for previous work. It is clear that the problem addressed -- training methods that may improve on SGD, with convergence guarantees -- is of significant interest to the community. + +The reviewers and AC note several issue (i) the initial version of the manuscript includes several assumptions that are not clearly stated. This seems to have been fixed in the updated manuscript (ii) reviewers suspect that the accumulation of stated assumptions may result in an easily separable generative model -- limiting the generality of the results (iii) experiemental results are underwhelming, and only comparable to much older published results.",ICLR2019,3: The area chair is somewhat confident +60S2eLVFVa,1576800000000.0,1576800000000.0,1,BylT8RNKPH,BylT8RNKPH,Paper Decision,Reject,"This paper proposes to speed up finetuning of pretrained deep image classification networks by predicting the success rate of a zoom of pre-trained networks without completely running them on the test set. The idea is that a sensible measure from the output layer might well correlate with the performance of the network. All reviewers consider this is an important problem and a good direction to make the effort. However, various concerns are raised and all reviewers unanimously rate weak reject. The major concerns include the unclear relationship between the metrics and the fine-tuning performance, non- comprehensive experiments, poor writing quality. The authors respond to Reviewers’ concerns but did not change the major concerns. The ACs concur the concerns and the paper can not be accepted at its current state.",ICLR2020, +CiCsKPUSHX,1576800000000.0,1576800000000.0,1,SyxhaxBKPS,SyxhaxBKPS,Paper Decision,Reject,"This paper studies mixed-precision quantization in deep networks where each layer can be either binarized or ternarized. The proposed regularization method is simple and straightforward. However, many details and equations are not stated clearly. Experiments are performed on small-scale image classification data sets. It will also be more convincing to try larger networks or data sets. More importantly, many recent methods that can train mixed-precision networks are not cited nor compared. Figures 3 and 4 are difficult to interpret, and sensitivity on the new hyper-parameters should be studied. The use of ""best validation accuracy"" as performance metric may not be fair. Finally, writing can be improved. Overall, the proposed idea might have merit, but does not seem to have been developed enough.",ICLR2020, +oecYLGMfCE,1576800000000.0,1576800000000.0,1,SkgQwpVYwH,SkgQwpVYwH,Paper Decision,Reject,"The primary contribution of this manuscript is a conceptual and theoretical solution to the sample elicitation problem, where agents are asked to report samples. The procedure is implemented using score functions to evaluate the quality of the samples. + +The reviewers and AC agree that the problem studied is timely and interesting, as there is limited work on credible sample elicitation in the literature. However, the reviewers were unconvinced about the motivation of the work, and the clarity of the conceptual results. There is also a lack of empirical evaluation. IN the opinion of the AC, this manuscript, while interesting, can be improved by significant revision for clarity and context, and revisions should ideally include some empirical evaluation.",ICLR2020, +1YYTsovr96,1576800000000.0,1576800000000.0,1,Bye6weHFvB,Bye6weHFvB,Paper Decision,Reject,"The paper proposes a representation learning objective that makes it +amenable to planning, + +The initial submission contained clear holes, such as missing related work and only containing very simplistic baselines. The authors have substantially updated the paper based on this feedback, resulting in a clear improvement. + +Nevertheless, while the new version is a good step in the right direction, there is some additional work needed to fully address the reviewers' complaints. For example, the improved baselines are only evaluated in the most simple domain, while the more complex domains still only contain simplistic baselines that are destined to fail. There are also some unaddressed questions regarding the correctness of Eq. 4. Finally, the substantial rewrites have given the paper a less-than-polished feel. + +In short, while the work is interesting, it still needs a few iterations before it's ready for publication.",ICLR2020, +GuzCD_2DL_d,1610040000000.0,1610470000000.0,1,fGiKxvF-eub,fGiKxvF-eub,Final Decision,Reject,"The reviewers found it hard to understand the motivation of using both oblivious sketching and maintaining feasibility throughout the course of the algorithm, given that the ultimate running times matched those of existing work. Because there wasn't a concrete improvement over prior work, the worry is what the impact of the paper would ultimately be. There was also a concern with novelty, similarity to the work of Cohen, Lee, and Song, and a reliance on fast matrix multiplication exponents. The paper could also benefit from an improved presentation. ",ICLR2021, +Tl2zET1BxAs,1610040000000.0,1610470000000.0,1,hx1IXFHAw7R,hx1IXFHAw7R,Final Decision,Accept (Poster),"This paper considers the reinforcement learning in rich observation setting. Concretely, the authors provide a provable sample efficient algorithm for the rich-observation factored MDP. As the majority of the reviewers commented, although the techniques used in the proof share some similarities to the existing work, the analysis for the whole algorithm is still challenging. As a theoretical oriented paper, I think this paper should have a position in ICLR. + +The major concern of the paper is the necessity of the assumptions (R2). The validation and justification of the Assumptions should be stated clearly in main text, even they are adapted from the prior work. ",ICLR2021, +O784iqxFDER,1642700000000.0,1642700000000.0,1,F5Em8ASCosV,F5Em8ASCosV,Paper Decision,Accept (Poster),"This paper considers a new setting of contextual bandits where the learning agent has the ability to perform interventions on targeted subsets of the population. The problem is motivated from software product experimentation but with more general applicability. The paper provides a method under this setting, with both empirical and theoretical support. Reviewers agree that this is an interesting setting, and the paper contributes new results. The initial concerns on assumptions, correctness and experiments were addressed in the rebuttal. I thus recommend accept. The authors should include the response carefully in the final version.",ICLR2022, +touKdQTkHhyd,1642700000000.0,1642700000000.0,1,hzmQ4wOnSb,hzmQ4wOnSb,Paper Decision,Accept (Poster),"This paper proposes a graph soft counter (GSC) model which is very simple and lightweight compared to the conventional graph neural network for solving QA tasks that benefit from knowledge graphs. Compared to the conventional KG-GNN combination, the proposed method is much simpler but produces better results for QA tasks. The paper originally dealt only with multiple-choice QA tasks, but during the rebuttal process, the authors added more complex QA tasks which the reviewers appreciated. Additionally, there was (and still remains) some concern over the exact reasons and mechanisms behind this ""too good to be true"" result, and the authors addressed this with additional ablation studies, to be included in the appendix. With the publicly released code, others will be able to try GSC and its too-good-to-be-true performance and figure out how it actually works.",ICLR2022, +rJxAc-gelV,1544710000000.0,1545350000000.0,1,SJeT_oRcY7,SJeT_oRcY7,Interesting demonstration of a biologically plausible neural network but analysis and compelling experiments are lacking,Reject,"This paper presents a biologically plausible architecture and learning algorithm for deep neural networks. The authors then go on to show that the proposed approach achieves competitive results on the MNIST dataset. In general, the reviewers found that the paper was well written and the motivation compelling. However, they were not convinced by the experiments, analysis or comparison to existing literature. In particular, they did not find MNIST to be a particularly interesting problem and had questions about the novelty of this approach over past literature. Perhaps the paper would be more impactful and convincing if the authors demonstrated competitive performance on a more challenging problem (e.g. machine translation, speech recognition or imagenet) using a biologically plausible approach. ",ICLR2019,5: The area chair is absolutely certain +msg0-I6hDj,1610040000000.0,1610470000000.0,1,65MxtdJwEnl,65MxtdJwEnl,Final Decision,Reject,"This paper presents a method for improving the learning of neural controlled differential equation (CDE) models. Neural CDE models provide a number of advantages over neural ODE models in terms of their ability to incorporate continuous-time observations. The primary strength of this paper is that it proposes a mathematically rigorous approach to enable neural CDE models to be learned more efficiently from long time series by converting the CDE to an ODE via the log-ODE method. The results are promising in that the method is able to simultaneously improve accuracy, reduce running time and reduce memory required during learning. + +The paper has two main weaknesses. First, the authors claim that due to the problems they are solving (time series with up to 17,000 steps), there are no viable baselines outside of the family of methods that they are proposing. As was noted in the reviews, it would be advisable to consider even very basic baselines for these experiments in addition to current benchmark results. For example, the EigenWorms data set was used in the time series classification benchmark described in Bagnall et al. and there are benchmark results available that appear to outperform those shown in Table 2 (see mean test accuracy results reported here: http://www.timeseriesclassification.com/results/AllAccuracies.zip). The authors are also encouraged to consider even coarse RNN approximations such as partitioning the time series into tractable blocks for learning. It is not clear that the data sets actually have long-range dependencies despite being long. + +The second weakness is that the representation that underlies the log-ODE method (the log-signature transform) has been used in previous work in conjunction with discrete-time RNNs. It can be viewed as a preprocessing method in a sense, as was noted by a reviewer. However, it is much more fundamentally integrated with methods for solving CDE's than its prior application to RNNs indicates. + +Overall, support for the paper did not rise to the bar required for acceptance, but we encourage the authors to revise and re-submit the work to a future venue. +",ICLR2021, +_VPl4ALFrBe,1610040000000.0,1610470000000.0,1,N33d7wjgzde,N33d7wjgzde,Final Decision,Accept (Poster),"This article proposes a novel weakly supervised segmentation method that unifies several annotation types using contractive/metric learning. This method clearly outperforms the current SOTA. While the unified framework itself is not novel enough, the reviewers agree that the contrastive loss formulation is interesting and the extensive experiments show its effectiveness. Overall, I consider that this unified framework is well engineered, the formulations are insightful, and the results advance the SOTA of weakly supervised segmentation. Accordingly, I propose to accept this paper at ICLR 2021.",ICLR2021, +B1gaF9L4gE,1545000000000.0,1545350000000.0,1,ryfMLoCqtQ,ryfMLoCqtQ,interesting result,Accept (Poster),"The authors provide a new analysis of generalization in deep linear networks, provide new insight through the role of ""task structure"". Empirical findings are used to cast light on the general case. This work seems interesting and worthy of publication.",ICLR2019,4: The area chair is confident but not absolutely certain +B1lPQ2VreE,1545060000000.0,1545350000000.0,1,ryetZ20ctX,ryetZ20ctX,"Good paper, accept.",Accept (Poster),The reviewers agree the paper brings a novel perspective by controlling the conditioning of the model when performing quantization. The experiments are convincing experiments. We encourage the authors to incorporate additional references suggested in the reviews. We recommend acceptance. ,ICLR2019,4: The area chair is confident but not absolutely certain +2fp9CQzhQz,1610040000000.0,1610470000000.0,1,XOuAOv_-5Fx,XOuAOv_-5Fx,Final Decision,Reject,"This work proposes a novel metric for measuring calibration error in classification models. + +Pros: +* Novel calibration metric addressing limitations of previously used metrics such as ECE + +Cons: +* Limited experimental validation on CIFAR-10/CIFAR-100 only +* Unclear impact beyond proposing a new calibration metric +* Unclear value of using the proposed UCE metric for regularization and OOD detection + +All reviewers appreciate the aim of the work to produce a calibration metric that addresses shortcomings of commonly used existing metrics such as expected calibration error (ECE), which is known to be sensitive to discretization choices. However, all reviewers remain in doubt whether the proposed metric (uncertainty calibration error, UCE) is truly a better metric of calibration than previous proposals. This doubt comes from two sources: 1. limited experiments that do not convincingly show the usefulness of UCE; and 2. interpretability of UCE not being as intuitive to the reviewers. The experiments also use UCE as regularizer but the benefit of doing so over simple entropy regularization is not clear. + +Overall the work is well-motivated and written and the proposed UCE measure is interesting. However, the reviewers remain unconvinced of the claimed benefits and the potential impact for measuring or improving calibration.",ICLR2021, +FQxot4BXoVb,1610040000000.0,1610470000000.0,1,dcktlmtcM7,dcktlmtcM7,Final Decision,Reject,"The paper introduces an approach for learning the dynamics of PDEs. It makes use of bi-directional LSTMs trained to regress future values from past observations, up to a given horizon. Experiments are performed on data generated from numerical solvers on two examples, inviscid Burgers and a Navier-Stokes system. While the topic is fine, the solution is nothing more than regression with sequence models and only shows that RNNs could learn to predict the data generated by these PDEs. The reviewers also highlight that the comparison with the baselines is not appropriate.",ICLR2021, +wSPuo7okhR,1576800000000.0,1576800000000.0,1,BylVcTNtDS,BylVcTNtDS,Paper Decision,Accept (Poster),"The reviewers were generally in agreement that the paper presents a valuable contribution and should be accepted for publication. However, I would strongly encourage the authors to carefully read over the reviews and address the suggestions and concerns insofar as possible for the final.",ICLR2020, +R8fkTtsXlPf,1642700000000.0,1642700000000.0,1,rS9-7AuPKWK,rS9-7AuPKWK,Paper Decision,Accept (Poster),"The main contribution is a way of analyzing the generalization error of neural nets by breaking it down into bias and variance components, and using separate principles to analyze each of the two components. The submission first proves rigorous generalization bounds for overparameterized linear regression (motivated in a general sense by the NTK); there are settings where this improves upon existing bounds. It extends the case to a matrix recovery model, showing that it's not limited to the linear regime. Finally, experimental results show that the risk decomposition holds empirically for neural nets. + +The numerical scores would place this paper slightly below the cutoff. The reviewers feel that the paper is well written and have not identified anything that looks like a critical flaw. They have a variety of concerns, mostly centered around whether the results apply to practical situations. Specifically, they're worried about (1) the theory not applying directly to neural nets, (2) the high-noise setting being less relevant for modern deep learning, and (3) whether there's a realistic situation where it improves over past bounds. Regarding (1), the theory covers not only the linear regime, but also the nonlinear matrix recovery regime; combined with the empirical results, this seems pretty solid by the standards of a DL theory paper. Regarding (2), even though the most common benchmarks indeed have low label noise, the high-noise regime still seems worth understanding (after all, we'd like our nets to work in domains like medicine). I haven't dug deeply enough to properly evaluate (3), but the author response seems believable to me. + +Overall, the paper strikes me as creative and well-executed. Regardless of whether the theory is easily extendable to neural nets, this seems like an interesting paper that can be built on in future work. I recommend acceptance.",ICLR2022, +kbo3W1p37qV,1610040000000.0,1610470000000.0,1,L7Irrt5sMQa,L7Irrt5sMQa,Final Decision,Reject,"In this paper, the authors show the effect of RNI on the expressive power of GNN for the first time, where the RNI was initially proposed in Sato et al. 2020. Overall, I like the idea of random node initialization because it is simple, effective, and theoretically well-founded. The key concern was that the novelty over the Sato's paper and the reviewers were still not convinced by the response. Therefore, the paper is still below the acceptance threshold. I strongly encourage authors to revise the paper based on the reviewer's comments and resubmit it to a future venue. +",ICLR2021, +V1tA0y4XRl,1576800000000.0,1576800000000.0,1,BJlaG0VFDH,BJlaG0VFDH,Paper Decision,Reject,"This paper proposes to apply regularizers such as weight decay or weight noise only periodically, rather than every epoch. It investigates how the ""non-regularization period"", or period between regularization steps, interacts with other hyperparameters. + +Overall, the writing feels somewhat scattered, and it is hard to identify a clear argument for why the NRP should help. Certainly one could save computation this way, but regularizers like weight decay or weight noise incur only a small computational cost anyway. One explicit claim from the paper is that a higher NRP allows larger regularization. There's a sense in which this is demonstrated, though not a very interesting sense: Figure 4 shows that the weight decay strength should be adjusted proportionally to the NRP. But varying the parameters in this way simply results in an unbiased (but noisier) estimate of gradients of exactly the same regularization penalty, so I don't think there's much surprising here. + +Similarly, Section 3 argues that a higher NRP allows for larger stochastic perturbations, which makes it easier to escape local optima. But this isn't demonstrated experimentally, nor does it seem obvious that stochasticity will help find a better local optimum. + +Overall, I think this paper needs substantial cleanup before it's ready to be published at a venue such as ICLR. +",ICLR2020, +PAYB-UNwI8,1576800000000.0,1576800000000.0,1,HJeiDpVFPr,HJeiDpVFPr,Paper Decision,Accept (Poster),"This paper proposes a neural network approach to approximate distances, based on a representation of norms in terms of convex homogeneous functions. The authors show universal approximation of norm-induced metrics and present applications to value-function approximation in RL and graph distance problems. + +Reviewers were in general agreement that this is a solid paper, well-written and with compelling results. The AC shares this positive assessment and therefore recommends acceptance. ",ICLR2020, +SyPxafLOx,1486400000000.0,1486400000000.0,1,Bkul3t9ee,Bkul3t9ee,ICLR committee final decision,Invite to Workshop Track,"Quality, Clarity: + + The work is well motivated and clearly written -- no issues there. + + Originality, Significance: + + The idea is simple and well motivated, i.e., the learning of reward functions based on feature selection from identified subtasks in videos. + + pros: + - the problem is difficult and relevant: good solutions would have impact + + cons: + - the benefit with respect to other baselines for various choices, although the latest version does contain updated baselines + - the influence of the initial controller on the results + - the work may gain better appreciation at a robotics conference + + I am very much on the fence for this paper. + It straddles a number of recent advances in video segmentation, robotics, and RL, which makes the specific technical contributions harder to identify. I do think that a robotics conference would be appreciative of the work, but better learning of reward functions is surely a bottleneck and therefore of interest to ICLR. + Given the lukewarm support for this paper by reviewers, the PCs decided not to accept the paper, but invite the authors to present it in the workshop track.",ICLR2017, +1jgv0KCqbs3d,1642700000000.0,1642700000000.0,1,xQUe1pOKPam,xQUe1pOKPam,Paper Decision,Accept (Poster),"This paper studies the problem of how to use 3D molecular geometry information during training to improve performance during prediction time when 3D information is not available. This is a highly interesting problem as obtaining 3D molecular geometry information requires expensive calculations and such information is usually not available in practice during prediction, while there are some training data with both 2D and 3D information. The work proposes to use self-supervised, predictive and generative approaches to make use of such information. The reviewers overall expressed mixed recommendations. One of the reviewers who scored 5 did not provide further feedback after author response even being prompted multiple times. The other reviewer who scored 5 actively participated in discussion and it seems most of the concerns have been addressed. Given the importance of this problem, and this work seems to be among the first to address this problem, I lean toward accept.",ICLR2022, +WK_YXDRGsb,1576800000000.0,1576800000000.0,1,rJljdh4KDH,rJljdh4KDH,Paper Decision,Accept (Spotlight),"This paper proposes to follow inspiration from NLP method that use position embeddings and adapt them to spatial analysis that also makes use of both absolute and contextual information, and presents a representation learning approach called space2vec to capture absolute positions and spatial relationships of places. Experiments show promising results on real data compared to a number of existing approaches. +Reviewers recognize the promise of this approach and suggested a few additional experiments such as using this spatial encoding as part of other tasks such as image classification, as well as clarification and further explanations on many important points. Authors performed these experiments and incorporated the results in their revisions, further strengthening the submission. They also provided more analyses and explanations about the granularity of locality and motivation for their approach, which answered the main concerns of reviewers. +Overall, the revised paper is solid and we recommend acceptance.",ICLR2020, +rJJXBJpHf,1517250000000.0,1517260000000.0,527,HkbJTYyAb,HkbJTYyAb,ICLR 2018 Conference Acceptance Decision,Reject,"Thank you for submitting you paper to ICLR. ICLR. Although there revision has improved the paper, the consensus from the reviewers is that this is not quite ready for publication.",ICLR2018, +3F2UcL_nRN7,1610040000000.0,1610470000000.0,1,zWy1uxjDdZJ,zWy1uxjDdZJ,Final Decision,Accept (Spotlight),"The paper presents a sound and efficient (but not complete) algorithm for verifying that a piecewise-linear neural network is constant in an Lp ball around a given point. This is a significant contribution towards practical protection from adversarial attacks with theoretical guarantees. The proposed algorithm is shown to be sound (that is, when it returns a result, that result is guaranteed to be correct) and efficient (it is easily parallelizable and can scale to large networks), but is not complete (there exist cases where the algorithm will return ""I don't know""). The experiments show good results in practice. The reviewers are positive about the paper, and most initial concerns have been addressed in the rebuttal, with the paper improving as a result. Overall, this is an important contribution worth communicating to the ICLR community, so I'm happy to recommend acceptance.",ICLR2021, +5q198mPtb8,1610040000000.0,1610470000000.0,1,MmcywoW7PbJ,MmcywoW7PbJ,Final Decision,Reject,"This work extends previous work on unsupervised learning of goal-conditioned policies: an abstract skill policy, which drives exploration of the state space, is used to propose goals as well as derive rewards for a goal conditioned policy. + +Reviewers agreed the approach was novel and interesting. All reviewers raised significant concerns about clarity and/or lack of details, as well as a lack of comparison to DIAYN/DISCERN, though these points were adequately addressed in revisions. One remaining issue raised by two reviewers are that the content related to the information bottleneck/disentangled representation learning seems out of place and ill-justified. Detailed discussion of this aspect of the work has been relegated to the appendix. + +This is an important problem and a growing area of study, and while the submission has potential, improvements needed are not minor, and given the short process, we can only accept papers as is, rather than expecting certain changes. We urge the authors to further improve the focus of the work and perhaps plan to investigate the role and importance of disentanglement with IB in this setting in follow up work wherein they have the space to properly do justice to the topic in its own right.",ICLR2021, +w1wvfqTjQ1I5,1642700000000.0,1642700000000.0,1,6gLEKETxUWp,6gLEKETxUWp,Paper Decision,Reject,"The reviewers find the work to address an interesting and important problem but have several critical concerns about its insufficient treatment of prior work in this area, lack of novelty in relation to the body of existing literature.",ICLR2022, +TpQ9jgFqicq,1610040000000.0,1610470000000.0,1,nPVlVsBTiJ,nPVlVsBTiJ,Final Decision,Reject,"Although the connection between randomized smoothing and PDE revealed in this paper is an interesting direction to explore, the method proposed unfortunately is not certified. The method could work as a good empirical defense since the smoothed classifier could be learned more efficiently. ",ICLR2021, +vodTpHk4B,1576800000000.0,1576800000000.0,1,BJe-unNYPr,BJe-unNYPr,Paper Decision,Reject,"The paper makes its contribution by deriving an accelerated gradient flow for the Wasserstein distances. It is technically strong and demonstrates it applicability using examples fo Gaussian distributions and logistic regression. + +Reviewer 3 provided a deep technical assessment, pointing out the relevance to our ML community since these ideas are not yet widespread, but had concerns about the clarity of the paper. Reviewer 2 had similar concerns about clarity, and was also positive about its relevance to the ML community. The authors provided details responses to the technical questions posed by the reviewers. The AC believes that such work is a good fit for the conference. The reviewers felts that this paper does not yet achieve the aim of making this work more widespread and needs more focus on communication. + +This is a strong paper and the authors are encouraged to address the accessibility questions. We hope the review offers useful points of feedback for their future work.",ICLR2020, +DoQRNUXgZL,1576800000000.0,1576800000000.0,1,rkem91rtDB,rkem91rtDB,Paper Decision,Accept (Poster),"The paper focuses on the problem of finding dense representations of graph-structured objects in an unsupervised manner. The authors propose a novel framework for solving this problem and show that it improves over competitive baselines. The reviewers generally liked the paper, although were concerned with the strength of the experimental results. During the discussion phase, the authors bolstered the experimental results. The reviewers are satisfied with the resulting paper and agree that it should be accepted.",ICLR2020, +HJg2QAH4xE,1545000000000.0,1545350000000.0,1,HJMjW3RqtX,HJMjW3RqtX,"a novel approach for a novel task, not sufficiently grounded in prior work",Reject,"The paper introduces a setting called high-fidelity imitation where the goal one-shot generalization to new trajectories in a given environment. The authors contrast this with more standard one-shot imitation approaches where one-shot generalization is to a task rather than a precise trajectory. The authors propose a technique that works off of only state information, which is coupled with an RL algorithm that learns from a replay buffer that is populated by the imitator. The authors emphasize that their approach can leverage very large deep learning models, and demonstrate strong empirical performance in a (simulated) robotics setting. + +A key weakness of the paper is its clarity. All reviewers were unclear about the precise setting as well as relation to prior work in one-shot imitation learning. As a result, there were substantial challenges in assessing the technical contribution of the paper. There were many requests for clarification, including for the motivation, difference between the present setting and those addressed in previous work, algorithmic details, and experiment details. + +I believe that a further concern was the lack of a wide range of baselines. The authors construct several baselines that are relevant in the given setting, but did not consider ""naive baseline"" approaches proposed by the reviewers. For example, behavior cloning is mentioned as a potential baseline several times. The authors argue that this is not applicable as it would require expert actions. Instead of considering it a baseline, BC could be used as an ""oracle"" - performance that could be achieved if demonstration actions were known. As long as the access to additional information is clearly marked, such a comparison with a privileged oracle can be properly placed by the reader. Without including such commonly known reference approaches, it is very challenging to assess the proposed method's performance in the context of the difficulty of the task. Generally, whenever a paper introduces both a new task and a new approach, a lot of care needs to be taken to build up insights into whether the task appropriately reflects the domain / challenge the paper claims to address, how challenging the task is in comparison to those addressed in prior work, and to place the performance of the novel proposed method in the context of prior work. In the present paper, on top of the task and approach being novel, the pure RL baseline D4PG is not yet widely known in the community and it's performance relative to common approaches is not well understood. Including commonly known RL approaches would help put all these results in context. + +The authors took great care to respond to the reviewer comments, providing thorough discussion of related work and clarifications of the task and approach, and these were very helpful to the AC to understand the paper. The AC believes that the paper has excellent potential. At the same time, a much more thorough empirical evaluation is needed to demonstrate the value of the proposed approach in this novel setting, as well as to provide additional conceptual insights into why and in what kinds of settings the algorithm performance well, or where its limitations are. +",ICLR2019,4: The area chair is confident but not absolutely certain +of-GIksfNYr,1642700000000.0,1642700000000.0,1,xo_5lb5ond,xo_5lb5ond,Paper Decision,Reject,"I do not recommend accepting this paper, although I make this decision with reservations. The review quality for this paper was not particularly strong, and I wish to emphasize to the authors that I read the paper myself in detail in the process of writing this metareview. + +This paper proposes a new structured pruning technique called LEAN. It involves computing an operator norm of the convolutions in a convolutional neural network, multiplying these norms over paths through the network, and keeping the paths with the highest such values (and pruning everything else). This paper makes the argument that this metric is robust to scaling and prevents network discontinuities. (In this way, the technique is very reminiscent of SynFlow (Tanaka et al., Pruning Neural Networks without any data by iteratively conserving synaptic flow) in terms of motivation and resulting technique, although SynFlow is unstructured. I do not mean this as a criticism - just a suggestion for the authors of a connection they might be able to make in the future.) + +One big concern I have about this paper based on the methodology alone is as follows: the paper states a number of hypotheses about why this is a sensible way to prune (e.g., in the beginning of Section 4). I see no reason why any of these hypotheses are wrong, but the paper never makes an effort to evaluate any of them. I don't mean a theoretical justification here - that's difficult and unlikely to yield useful information about what happens in practice. I mean experiments to ablate the salient properties of the heuristic mentioned in the paragraph at the beginning of Section 4 (Does scaling invariance actually matter in practice? Is network disconnectivity actually a risk in practice?). + +My biggest concerns about the paper, however, are in the evaluation. I share two major reviewer concerns that were mentioned: +(1) The paper compares to a very limited set of baseline pruning methods, and relatively older ones at that (2019 is indeed old in the world of pruning). +(2) The paper does not look at standard, ""large-scale"" benchmarks for computer vision - namely, ResNet-50 on ImageNet. + +Neither of these concerns is necessarily decisive in my view. For example, with respect to Concern 1, the reviewers unhelpfully do not suggest very many additional structured pruning benchmarks to consider, and I think the additional baselines added during the revision process have softened this concern. I would also recommend taking a look at ""Growing Efficient Deep Networks by Structured Sparsification"" (Yuan et al) for a useful method and a good set of baselines. There are an arbitrary number of baselines one could add and the structured pruning space is a confusing mess, but I think the claims in this paper merit more than are currently present. + +With respect to Concern 2, I'm even more conflicted. On the one hand, I have rarely seen any pruning techniques proposed for or evaluated on vision tasks beyond image classification, despite the fact that - in the real world - segmentation is much more popular than it would seem by reading the ICLR proceedings. To that end, I applaud the authors for focusing on those settings and I see substantial value in a paper that does so. On the other hand, ResNet-50 on ImageNet (among other standard classification benchmarks) is the de facto measuring stick for evaluating pruning methods in computer vision, and the exclusive focus on segmentation here means it is very difficult to compare the proposed technique to other benchmarks. If the paper is to focus on segmentation alone, this places a higher burden on adding many additional comparisons to other methods (i.e., Concern 1). Finally, I don't see any reason why the paper *couldn't* also include ResNet-50 on ImageNet or the like in addition to segmentation; if it is a limitation on the compute available to the authors (something I empathize with), they did not say so in any of the author responses. Upon reading the author responses, I was left asking, ""Why not both?"" + +For those reasons, I do not recommend accepting the paper, although I think there are some good reasons to value the paper's contributions. At the end of the day, there are some relatively simple things that could be changed to make the paper much easier to contextualize within the pruning literature. As of right now, it would be very difficult for me to say whether or in what contexts this method should actually be used in practice. + +(P.S. I agree with the reviewers that Figure 3 is exceptionally hard to parse.)",ICLR2022, +#NAME?,1610040000000.0,1610470000000.0,1,BnokSKnhC7F,BnokSKnhC7F,Final Decision,Reject,"The paper considers an alternative to the standard MDP formulation, motived by the novo drug design problem. The formulation is meant to optimize a notion of expected maximum reward along the trajectory rather than the expected sum of rewards. The formulation is presented through a variation of the Bellman equation. Thus mode of presentation does not make it entirely clear what the fundamental problem is and whether it is the right formulation for the application. The reviewers point out some problems with the analysis. Experiments compare the proposed max-Q algorithm to Q-learning and demonstrate that it achieves higher maximum reward. Experiments involving novo drug design show promise. + +This looks like an interesting idea and direction, but the consensus view is that the project deserves further work and polish.",ICLR2021, +rygnSaeNx4,1544980000000.0,1545350000000.0,1,S1eEmn05tQ,S1eEmn05tQ,Presentation shortcomings,Reject,"This paper presents a meta-learning approach which relies on a learned prior over neural networks for different tasks. + +The reviewers found this work to be well-motivated and timely. While there are some concerns regarding experiments, the results in the miniImageNet one seem to have impressed some reviewers. + +However, all reviewers found the presentation to be inaccurate in more than one points. R1 points out to ""issues with presentation"" for the hierarchical Bayes motivation, R2 mentions that the motivation and derivation in Section 2 is ""misleading"" and R3 talks about ""short presentation shortcomings"". + +R3 also raises important concerns about correctness of the derivation. The authors have replied to the correctness critique by explaining that the paper has been proofread by strong mathematicians, however they do not specifically rebut R3's points. The authors requested R3 to more specifically point to the location of the error, however it seems that R3 had already explained in a very detailed manner the source of the concern, including detailed equations. + +There have been other raised issues, such as concerns about experimental evaluation. However, the reviewers' almost complete agreement in the presentation issue is a clear signal that this paper needs to be substantially re-worked. ",ICLR2019,5: The area chair is absolutely certain +3XMc45HVYXn,1642700000000.0,1642700000000.0,1,gLtMe3vpfZa,gLtMe3vpfZa,Paper Decision,Reject,"This paper provides some novel ideas combining neural processes, active learning, and deep sequence models toward accelerating the computationally intensive task of stochastic simulations for epidemic models. The authors incorporate aspects such as spatiotemporal dependence and age structure in the models they consider, and propose an active learning framework to leverage a neural process that can serve as a proxy for direct simulation of the stochastic dynamics. Most of the reviewers agreed that some of these ideas are novel, but the assessments together present a borderline case, and one reviewer remains convinced that the paper needs refinement and a more focused exposition to make clear the contributions and relative efficacy compared to existing state of the art. I tend to agree with some of the concerns raised by that reviewer, who responded favorably to the author response but remained not fully convinced.I also agree that important parts of the paper feel rushed, if I am to interpret ""feeling rushed"" as a statement on the ideas being spread thin in exposition due to the amount of ground the authors try to cover (rather than a statement on the quality of the presentation). There are many moving parts that combine good intuitions toward accelerating simulation-based methods for stochastic epidemic models. Several reviewers mentioned improving the empirical comparisons with recent existing methods, including likelihood-based methods, and though I agree with the authors too that it is not possible to exhaustively explore outcomes in stochastic process approaches, the largely empirically supported contributions can be better motivated with a more complete comparison and streamlined exposition. I encourage the authors to continue to revise and refine this manuscript to maximize the potential behind the ideas they propose here.",ICLR2022, +ByxdoUzegV,1544720000000.0,1545350000000.0,1,S1eOHo09KX,S1eOHo09KX,Meta-Review,Accept (Poster),"This paper presents a reinforcement learning approach for online cost-aware feature acquisition. The utility of each feature is measured in terms of expected variations of the model uncertainty (using MC dropout sampling as an estimate of certainty) which is subsequently used as a reward function in the reinforcement learning formulation. The empirical evaluations show improvements over prior approaches in terms of accuracy-cost trade-off on three datasets. AC can confirm that all three reviewers have read the author responses and have significantly contributed to the revision of the manuscript. + +Initially, R1 and R2 raised important concerns regarding low technical novelty. R1 requested an ablation study to understand which of the following components gives the most improvement: 1) using proper certainty estimation; 2) using immediate reward; 3) new policy architecture. Pleased to report that the authors addressed the ablation study in their rebuttal and confirmed that MC-dropout certainty plays a crucial rule in the performance of the proposed method. R1 subsequently increased the assigned score to 6. R2 raised concerns about related prior work Contardo et al 2016, which similarly evaluates the most informative features given budget constraints with a recurrent neural network approach. After a long discussion and a detailed rebuttal, R2 upgraded the rating from below the threshold to 7, albeit acknowledging an incremental technical contribution. R3 raised important concerns regarding presentation clarity that were subsequently addressed by the authors. In conclusion, all three reviewers were convinced by the authors rebuttal and have upgraded their initial rating, and AC recommends acceptance of this paper – congratulations to the authors! +",ICLR2019,5: The area chair is absolutely certain +rJiNIJpBM,1517250000000.0,1517260000000.0,764,SJahqJZAW,SJahqJZAW,ICLR 2018 Conference Acceptance Decision,Reject,"The paper proposes to use multiple discriminators to stabilize the GAN training process. Additionally, the discriminators only see randomly projected real and generated samples. + +Some valid concerns raised by the reviewers which makes the paper weak: + - Multiple discriminators have been tried before and the authors do not clearly show experimentally / theoretically if the random projection is adding any value. +- Authors compare only with DCGAN and the results are mostly subjective. How much improvement the proposed approach provides when compared to other GAN models that are developed with stability as the main goal is hence not clear.",ICLR2018, +dj0h512fORx,1610040000000.0,1610470000000.0,1,fGF8qAqpXXG,fGF8qAqpXXG,Final Decision,Accept (Poster),"This paper extends an earlier work with scalar output to vector output. It establish a relationship of two-layer ReLu network and convex program. The result can be used to design training algorithms for ReLu networks with provably computational complexity. Overall, this is an interesting idea, leading to better theoretical insights to computational issues of two-layer ReLu networks. ",ICLR2021, +uuRm_sl0SrT,1610040000000.0,1610470000000.0,1,3UTezOEABr,3UTezOEABr,Final Decision,Reject,"The paper introduces an AutoML method for irregular multivariate time series. The method automates the selection of the configuration as well as the hyperparameter optimization depending on the task. A Bayesian approach handles the network structure search while VAEs + attention is used to learn representations from irregularly sampled data. There is an additional contribution: anomaly detection via a sample energy function from a GMM on time windows. + +While there is some novelty in the proposed approach, mostly in the way in which existing techniques are combined, the paper also has some limitations: +- running the framework over the set of possible models is computationally intensive; in their response, the authors indicate the search space can be constrained, however, doing so would also decrease the performance; in AutoML, added complexity cannot be avoided, but there is no notion of how much longer it takes to find suitable models compared to taking off-the-shelf methods. +- although the paper is geared towards irregularly sampled time series, there are no experiments where the data is naturally irregularly samples; artificially introduced patterns are no substitute for this; (PhysioNet, as suggested by one of the reviewers or MIMIC III both have this type of data and are frequently used in benchmarks) +- AutoML is presented as a general framework, but mostly handles clustering and anomaly detection; unclear of how useful it would be for forecasting or regression; classification realists are shown in Appendix F against simple baselines (GRU-D is not considered, for instance) and even so AutoML does not achieve state of the art results in half of the cases",ICLR2021, +6zSzCZ0AP27,1610040000000.0,1610470000000.0,1,9QLRCVysdlO,9QLRCVysdlO,Final Decision,Accept (Poster),"The authors propose techniques to deal with binarization of 3D point clouds and propose EMA and layer wise scale recovery that improve results across the board for PointNet style models. +An accept.",ICLR2021, +SkxZRW51x4,1544690000000.0,1545350000000.0,1,ryzHXnR5Y7,ryzHXnR5Y7,"Interesting task, but clear consensus among reviewers to reject this paper based on limited experiments",Reject,"There reviewers unanimously recommend rejecting this paper and, although I believe the submission is close to something that should be accepted, I concur with their recommendation. + +This paper should be improved and published elsewhere, but the improvements needed are too extensive to justify accepting it in this conference. I believe the authors are studying a very promising algorithm and it is irrelevant that the algorithm is a relatively obvious one. Ideally, the contribution would be a clear experimental investigation of the utility of this algorithm in realistic conditions. Unfortunately, the existing experiments are not quite there. + +I agree with reviewer 2 that the method is not particularly novel. However, I disagree that this is a problem, so it was not a factor in my decision. Novelty can be overrated and it would be fine if the experiments were sufficiently insightful and comprehensive. + +I believe experiments that train for a single epoch on the reduced dataset are absolutely essential in order to understand the potential usefulness of the algorithm. Although it would of course be better, I do not think it is necessary to find datasets traditionally trained in a single pass. You can do single epoch training on other datasets even though it will likely degrade the final validation error reached. This is the type of small scale experiment the paper should include, additional apples-to-apples baselines just need to be added. Also, there are many large language modeling datasets where it is reasonable to make only a single pass over the training set. The goal should be to simulate, as closely as is possible, the sort of conditions that would actually justify using the algorithm in practice. + +Another issue with the experimental protocol is that, when claiming a potential speedup, one must tune the baseline to get a particular result in the fewest steps. Most baselines get tuned to produce the best final validation error given a fixed number of steps. But when studying training speed, we should fix a competitive goal error rate and then tune for speed. Careful attention to these experimental protocol issues would be important. +",ICLR2019,5: The area chair is absolutely certain +p7Y45IxCLaF8,1642700000000.0,1642700000000.0,1,HY6i9FYBeFG,HY6i9FYBeFG,Paper Decision,Reject,"This work describes a two-stage method for learning with noisy labels. The crux of the reviews, discussions with the authors and post-rebuttal discussions between reviews (and myself) was related to the novelty of this work. The main concern is that while this body of work presents a relatively solid method (from an empirical point of view), the underlying components are not altogether that novel, and have been used in the context of learning with noisy labels before. Fundamentally, the proposed S3 method did not feel *convincingly* better, given its relative lack of novel technical insights. I appreciate that this is a frustrating reasoning to get -- after all, much of what we do in empirical ML is combinations of existing things. Ultimately, there was consensus amongst the reviewers that the work did not have sufficient insights or such outstanding empirical results so as to overcome this relative lack of technical novelty. + +All the reviewers have engaged meaningfully in discussions, provided constructive feedback and I hope that this will make subsequent iterations of this work better in many dimensions.",ICLR2022, +ryl8ns2-l4,1544830000000.0,1545350000000.0,1,HylTBhA5tQ,HylTBhA5tQ,Paper decision,Accept (Poster),"Reviewers are in a consensus and recommended to accept after engaging with the authors. Please take reviewers' comments into consideration to improve your submission for the camera ready. +",ICLR2019,4: The area chair is confident but not absolutely certain +xqjoDy__Cc,1610040000000.0,1610470000000.0,1,wXoHN-Zoel,wXoHN-Zoel,Final Decision,Reject,"The problem as formalized in this paper is essentially a domain adaptation problem. There is a training distrinution P and a test distribution P*. the learner gets training data generated by P and aims to minimize the loss of its hypothesis w.r.t. P*. How is it relate dto fairness? The authors add the assumption ""we assume the unbiased Bayes decision rule is algorithmically fair in some sense and hope that enforcing the correct notion of fairness allows us to recover h∗ from P"". Under such an assumption, almost by definition ""enforcing fairness may improve accuracy"". By a similar logic, if we assume the unbiased byes decision rule is biased against a certain group, then enforcing bias against that group will imporve accuracy ....",ICLR2021, +6t9wBKXaJnC,1642700000000.0,1642700000000.0,1,qLqeb9AjD2o,qLqeb9AjD2o,Paper Decision,Reject,"This authors seek to improve upon previous work on randomized smoothing for certifiably robust models. They develop loss functions inspired by the notion of distinguishing hard and easy samples while training the base classifier that is randomly smoothed and conduct experiments evaluating their proposed losses on benchmark datasets. + +While the reviewers agree that the paper contains interesting ideas, the paper in its current form is unacceptable for publication because: +1) Missing large scale experiments: All prior work on randomized smoothing report results on ImageNet, and this was seen as one of the main advantages of randomized smoothing. Since the authors do not report this, it brings into question the robustness and scalability of improvements obtained. +2) Computational complexity and improvements: The authors' approach has significant computational complexity and the final improvements obtained are marginal. This makes it difficult to justify the use of a more expensive method.",ICLR2022, +DC1oomAsbc-,1642700000000.0,1642700000000.0,1,bjy5Zb2fo2,bjy5Zb2fo2,Paper Decision,Accept (Poster),"The submission develops a rotationally equivariant scattering transform on the sphere. Many developments in deep learning make use of spherical representations, and the development of a rotationally equivariant scattering transform is an important if not unexpected development. The reviews are split with half of the reviewers believing it to be slightly above the threshold for acceptance, and half believe it to be slightly below the threshold for acceptance. In the papers favor, it solves an important case of the scattering transform framework, which has been demonstrated to be important in diverse machine learning applications such as learning with small data sets, differentially private learning, and network initialization. As such, continued fundamental development in this area is valuable, especially in the context of representation learning, the focus of ICLR.",ICLR2022, +nkEhVPRTig,1576800000000.0,1576800000000.0,1,BkgTwRNtPB,BkgTwRNtPB,Paper Decision,Reject,"This paper proposes an end-to-end deep reinforcement learning-based algorithm for the 2D and 3D bin packing problems. Its main contribution is conditional query learning (CQL) which allows effective decision over mutually conditioned action spaces through policy expressed as a sequence of conditional distributions. Efficient neural architectures for modeling of such a policy is proposed. Experiments validate the effectiveness of the algorithm through comparisons with genetic algorithm and vanilla RL baselines. + +The presentation is clear and the results are interesting, but the novelty seems insufficient for ICLR. The proposed model is based on transformer with the following changes: +* encoder: position embedding is removed, state embedding is added to the multi-head attention layer and feed forward layer of the original transformer encoder; +* decoder: three decoders one for the three steps, namely selection, rotation and location. +* training: actor-critic algorithm",ICLR2020, +HJBxNkTSz,1517250000000.0,1517260000000.0,278,H1MczcgR-,H1MczcgR-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"An interesting analysis of the issue of short-horizon bias in meta-optimization that highlights a real problem in a number of existing setups. I concur with Reviewer 3 that it would be nice to provide a constructive solution to this issue: if something like K-FAC does indeed work well, it would be a great addition to a final version of this paper. Nonetheless, I think the paper would be a interesting addition to ICLR and recommend acceptance.",ICLR2018, +rknpnGUdg,1486400000000.0,1486400000000.0,1,SywUHFcge,SywUHFcge,ICLR committee final decision,Invite to Workshop Track,"The authors propose a framework to analyze ""robustness"" to adversarial perturbations using topological concepts. The authors conduct an empirical study using a siamese networks. + + The paper generated extensive discussions. The authors implemented many changes following the reviewers' suggestions. The resulting version of the paper is rather dense. It is unclear whether a conference is appropriate for reviewing such material in limited time. + + We invite the authors to present their main results in the workshop track. Revising the material of the paper will generate a stronger submission to a future venue such as a journal.",ICLR2017, +sRVubiMCbc,1610040000000.0,1610470000000.0,1,JydXRRDoDTv,JydXRRDoDTv,Final Decision,Reject,"While the reviewers appreciated the aim of the work, they found the technical contribution to be too incremental to be of sufficient interest and the exploration of the problem and its significance to be incomplete in the paper's current state.",ICLR2021, +2CvExPRm3Kp,1642700000000.0,1642700000000.0,1,SgEhFeRyzEZ,SgEhFeRyzEZ,Paper Decision,Reject,"This paper provides a theoretical analysis for the Feedback Alignment (FA) algorithm, an alternative to backpropagation for training deep linear neural networks. The main drawback of the analysis is that it assumes that the initial weight matrices are diagonal, which makes the dynamics of the algorithm reduce to K independent one-dimensional dynamic. Most of the reviewers feel that this assumption is too strong. Note that in many existing papers on the implicit bias/regularization of gradient descent (GD) for optimizing deep linear networks, they do not assume the initial weight matrices are diagonal. The authors provide some additional proof in appendix B during the rebuttal to try to relax the assumption on the diagonal initialization, but it is not critical clear if the same results still hold. While this paper studies a very important problem, I suggest the authors take into account the reviewers’ comments and improve the presentation/results. In addition, a comparison with the implicit bias of GD would help better position this work, as one of the reviewers suggested.",ICLR2022, +RDIQtXNkno,1576800000000.0,1576800000000.0,1,SJxRKT4Fwr,SJxRKT4Fwr,Paper Decision,Reject,"The paper proposes a solution based on self-attention RNN to addressing the missing value in spatiotemporal data. + +I myself read through the paper, followed by a discussion with the reviewers. We agree that the model is reasonable, and the results are promising. However, there is still some room for improvement: +1. The self-attention mechanism is not new. The specific way proposed in the paper is an interesting tweak of existing models, but not brand new per se. Most importantly, it is unclear if the proposed way is the optimal one and where the performance improvement comes from. As the reviewer suggested, more thorough empirical analysis should be performed for deeper insights of the model. + +2. The datasets were adopted from existing work, but most of them do not have such complex models as the one proposed in the paper. Therefore, the suggestion for bigger datasets is valid. + +Given the considerations above, we agree that while the paper has a lot of good materials, the current version is not ready yet. Addressing the issues above could lead to a good publication in the future. ",ICLR2020, +siMZ_mcnrNNp,1642700000000.0,1642700000000.0,1,7QDPaL-Yl8U,7QDPaL-Yl8U,Paper Decision,Reject,"The paper shows how to make use of a linear program for extracting logical rules for knowledge graph completion. Overall, the reviewers and I agree that this is an interesting and important direction for research. Moreover, the presented approach shows good performance with rather small sets of rules extracted. However, all reviewers point out that the related work is not well discussed. While the authors have improved the related work sections during the rolling discussion, overall the positioning of the new method has still to be improved, including a better empirical comparison across different datasets. Overall, we would like to encourage the authors to polish their line of research based on the feedback from the reviews.",ICLR2022, +X0YE3ALDygc,1642700000000.0,1642700000000.0,1,rOGm97YR22N,rOGm97YR22N,Paper Decision,Reject,"After carefully reading the reviews and the rebuttal I feel the paper fails slightly short. + +Unfortunately some of the issues that I have are aligned with the feedback from reviewer 6YwU and pULY. +A significant part of the paper is the formalism and theory introduced by this work, followed then by the empirical evaluation. The theory I feel is not sufficiently well formulated. I understand this is a complex topic, and one can only make minimal statements about a system (particularly when learning is involved). And I understand that the authors are looking at a slightly different phenomena, and not the traditional vanishing/exploding gradient problem, where they consider a per-unit scenario. And I believe one can make a case that this alternative definition holds value and should be investigated. + +However, I believe being more explicit of this alternate view, and make sure that one does not go into the theory with the wrong preconception of what these results are about is important. And secondly making sure the claims are adequate is important and not overly strong (or over claiming). I think this is important particularly in such works, dealing with systems that do not allow a full mathematical analysis. In particular, just to give some examples: + 1. Thm 3, pointed out by the reviewers as well. I don't understand the point of this thm. It basically says that around initialization things are well behaved. The same can be said or proven for many other methods. You argue that this is different, as in other models beside initialization forgetting is not controlled, while you could potentially control it by a forgetting gate. However this is not a theoretical, precise argument. The forgetting gate is learned as well. If we go back to the LSTM scenario, LSTM suffer for vanishing gradients. Also Gers et al. paper does not prove that trying to preserve error has to harm learning (it provides some empirical evidence that is the case, but there have been many other things that affected this results). The point here is not that forget gates are not useful, nor that the gating mechanism proposed by LSTM are not extremely useful. They are. Is that the Thm 3 can not prove or show that using mmRNN is a better way of mitigating (and trading of) vanishing gradient than another model. You do that through your empirical evidence, and I think that is how most of ML works. But is not clear what the point of the theorem is. + 2. I do not understand how one reasons theoretically about epsilon in Def 1. I don't see how an empirical observation by Gers et al resolves this. It justifies maybe why vanishing gradients are not always problematic, but that should not affect the definition of what vanishing now means. In the current form, if T goes to infinite, even if technically the network does not suffer from vanishing gradients, the gradients go to 0. Or at least T and epsilon should somehow be tied together to make the definition work. + The issue of defining the vanishing / exploding gradient per unit is also that now is not clear what is problematic or not. Probably having exploding gradient for any given unit is bad, as it might affect the overall gradient. But having a few units suffering from vanishing gradient, is that problematic? This things need to be quantified better. + +I think overall to me the problem is that some of this mathematical statements do not seem to be strong enough or contextualized enough to be properly understood by the reader. I would have understood the formalism if it was trying to correct some misconception in the community, case in which it is important to formalize just to be precise. But I don't think this is what is happening here. As in stands it just feels sloppy. + +And I think this retracts considerably from the empirical side of the work and reduces the space you had to give it enough attention. Which should have played main stage. I think the empirical work would have benefited from more analysis (showcasing some of the arguments you were making using the theory), which would have made for a much stronger and convincing paper. The current framing of the paper is unfortunately not the right one.",ICLR2022, +sc2OJE_vuy,1610040000000.0,1610470000000.0,1,5jRVa89sZk,5jRVa89sZk,Final Decision,Accept (Poster),"This paper studies the unlabeled entity problem in NER. Specifically, performance degradation in training of NER models due to unlabeled entities. It analyzes the reason through evaluation on synthetic datasets and finds that it is due to the fact that all the unlabeled entities are treated as negative examples. To cope with the problem, it proposes a negative sampling method which considers the use of only a small subset of unlabeled entities. Experimental results show that the proposed method achieves better performances than the baselines on real-world datasets and achieves competitive performances compared with the state-of-the-art methods on well-annotated datasets. + +Pros +• The paper is clearly written. +• The proposed method appears to be technically sound. +• Experimental results support the main claims. +• The findings in the paper are useful for the field. + +Cons +• Novelty of the work might not be enough. + +The authors have addressed some clarity and reference issues pointed out by the reviewers in the rebuttal. Discussions have been made among the reviewers. +",ICLR2021, +hbFXR9cGSZo,1610040000000.0,1610470000000.0,1,p5uylG94S68,p5uylG94S68,Final Decision,Accept (Poster),"**Overview** This paper performs detailed ablation studies over different dynamics prediction methods for MBRL. It proposes metrics for models to evaluate how different types of uncertainty impact predictions. The paper also measures control performance with random shooting MPC. The paper further implements a new hyper parameter schedule to achieve new SOTA performance on the acrobat task. + +**Pro** +- The paper is well-written. +- The analysis in this paper is very warranted. +- The paper provides a very detailed ablation study. +- The authors do a great job defining and arguing for evaluation metrics. +- The seven properties and metrics are mostly well-motivated and well-defined. +- The authors discussed the results clearly with implications. +- The result of the necessity of probabilistic vs. deterministic models in different scenarios is a good contribution to this field. + +**Con** +- The methodology might be hard generalizable, i.e., there is difficulty in matching the paper to the literature based on its own defined metric. +- The scope might be limited. + +**Recommendation** The paper provides a significant contribution to MBRL by providing a detailed empirical study. During the rebuttal phase, the authors addresses many reviewers' concerns in a satisfactory way. The paper is well-written and easy to read. The recommendation is an accept. +",ICLR2021, +r2CJ1BDQx2h,1642700000000.0,1642700000000.0,1,j30wC0JM39Q,j30wC0JM39Q,Paper Decision,Reject,"The authors of this work introduced new metrics for node embedding that can measure the evolution of the embeddings, and compare them with existing graph embedding approaches, and experimented on real datasets. + +All reviewers agreed that the work addresses interesting problem and that the proposed measures are nove, but there are too many flaws in the initial version of the paper, and despite the thorough responses of the authors, it is believed that there are still too many open questions for this paper to be accepted this year ICLR.",ICLR2022, +wSEyBuSwHw,1576800000000.0,1576800000000.0,1,rJxyqkSYDH,rJxyqkSYDH,Paper Decision,Reject,"This paper proposes an automatic tuning procedure for the learning rate of SGD. Reviewers were in agreement over several of the shortcomings of the paper, in particular its heuristic nature. They also took the time to provide several ways of improving the work which I suggest the authors follow should they decide to resubmit it to a later conference.",ICLR2020, +82r0lcJRaCS,1642700000000.0,1642700000000.0,1,H4EXaI6HR2,H4EXaI6HR2,Paper Decision,Reject,"The reviews are of good quality. The responses by the authors are commendable, but ICLR is selective and reviewers continue to feel that important choices in the research are not sufficiently clear and fully justified.",ICLR2022, +ByD6iGU_g,1486400000000.0,1486400000000.0,1,SyEiHNKxx,SyEiHNKxx,ICLR committee final decision,Invite to Workshop Track,"Originality, significance: + The paper implements a physics-based simulator directly using Theano. This avoids the type of finite differentiation that physics engines such as MuJoCo use to compute derivatives. It is quite an interesting idea, and is demonstrated using learned control for several models. + + Quality, clarity: + The original version was somewhat loosely written; the current version is improved. + + Pros: + - The nice idea of implementing a physics engine in a language such as Theano, and showing that this is quite feasible. + - May inspire further work in this direction. + + Cons: + - The speed is not systematically evaluated, as compared to finite-difference-based engines. It is thought to be ""in the same ballpark"" as other more full-featured engines. It is not clear that those using simulators will care whether it uses the true derivatives or finite differences.",ICLR2017, +AZc4_CBOt,1576800000000.0,1576800000000.0,1,SklibJBFDB,SklibJBFDB,Paper Decision,Reject,"This paper presents a dataset to evaluate the quality of embeddings learnt for source code. The dataset consists of three different subtasks: relatedness, similarity, and contextual similarity. The main contribution of the paper is the construction of these datasets which should be useful to the community. However, there are valid concerns raised about the size of the datasets (which is pretty small) and the baselines used to evaluate the embeddings -- there should be a baselines using a contextual embeddings model like BERT which could have been fine-tuned on the source code data. If these comments are addressed, the paper can be a good contribution in an NLP conference. As of now, I recommend a Rejection.",ICLR2020, +4Ea1-4ujV,1576800000000.0,1576800000000.0,1,rygHe64FDS,rygHe64FDS,Paper Decision,Reject,"Main content: + +Blind review #2 summarizes it well: + +This paper investigates the security of distributed asynchronous SGD. Authors propose Zeno++, worker-server asynchronous implementation of SGD which is robust to Byzantine failures. To ensure that the gradients sent by the workers are correct, Zeno++ server scores each worker gradients using a “reference” gradient computed on a “secret” validation set. If the score is under a given threshold, then the worker gradient is discarded. + +Authors provide convergence guarantee for the Zeno++ optimizer for non-convex function. In addition, they provide an empirical evaluation of Zeno++ on the CIFAR10 datasets and compare with various baselines. + +-- + +Discussion: + +Reviews are generally weak on the limited novelty of the approach compared with Zeno, but the rebuttal of the authors on Nov 15 is fair (too long to summarize here). + +-- + +Recommendation and justification: + +I do not feel strongly enough to override the weak reviews (but if there is room in the program I would support a weak accept).",ICLR2020, +rJg_hlaex4,1544770000000.0,1545350000000.0,1,H1l-SjA5t7,H1l-SjA5t7,"Interesting idea, but the novelty is not high and the experimental analysis is weak",Reject,"While the paper has good quality and clarity and the proposed idea seems interesting, all three reviewers agree that the paper needs more challenging experiments to justify the proposed idea. The authors are not able to include additional experiments (such as these based on different transformations) into their revision to better convince the reviewers. In addition, the AC feels that the technical novelty of the paper is rather minor (some incremental change to VAE). In particular, related to some concerns of Reviewer 3, the AC feels the proposed idea is not too much different than introducing certain kind of side-information for supervision; the main novelty seems to be distorting the data itself somehow to provide these side information (which does not seems to be that novel). +",ICLR2019,5: The area chair is absolutely certain +H1mr7J6rf,1517250000000.0,1517260000000.0,128,ByJIWUnpW,ByJIWUnpW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"With an 8-6-6 rating all reviewers agreed that this paper is past the threshold for acceptance. + +The quality of the paper appears to have increased during the review cycle due to interactions with the reviewers. The paper addresses issues related to the quality of heterogeneous data sources. The paper does this through the framework of graph convolutional networks (GCNs). The work proposes a data quality level concept defined at each vertex in a graph based on a local variation of the vertex. The quality level is used as a regularizer constant in the objective function. Experimental work shows that this formulation is important in the context of time-series prediction. + +Experiments are performed on a dataset that is less prominent in the ML and ICLR community, from two commercial weather services Weather Underground and WeatherBug; however, experiments with reasonable baseline models using a ""Forecasting mean absolute error (MAE)"" metric seem to be well done. + +The biggest weakness of this work was a lack of comparison with some more traditional time-series modelling approaches. However, the authors added an auto-regressive model into the baselines used for comparison. Some more details on this model would help. + +I tend to agree with the author's assertion that: ""there is limited work in ICLR on data quality, but it is definitely one essential hurdle for any representation learning model to work in practice. "". + +For these reasons I recommend a poster. + +",ICLR2018, +SJTX7ypHM,1517250000000.0,1517260000000.0,110,Sy2ogebAW,Sy2ogebAW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This work presents new results on unsupervised machine translation using a clever combination of techniques. In terms of originality, the reviewers find that the paper over-claims, and promises a breakthrough, which they do not feel is justified. +However there is ""more than enough new content"" and ""preliminary"" results on a new task. The experimental quality also has some issues, there is a lack of good qualitative analysis, and reviewers felt the claims about semi-supervised work had issues. Still the main number is a good start, and the authors are correct to note that there is another work with similarly promising results. Of the two works, the reviewers found the other more clearly written, and with better experimental analysis, noting that they both over claim in terms of novelty. The most promising aspect of the work, will likely be the significance of this task going forward, as there is now more interest in the use of multi-lingual embeddings and nmt as a benchmark task. ",ICLR2018, +2QdfhKRESc,1576800000000.0,1576800000000.0,1,BJedt6VKPS,BJedt6VKPS,Paper Decision,Reject,"This paper proposes a new design space for initialization of neural networks motivated by balancing the singular values of the Hessian. Reviewers found the problem well motivated and agreed that the proposed method has merit, however more rigorous experiments are required to demonstrate that the ideas in this work are significant progress over current known techniques. As noted by Reviewer 2, there has been substantial prior work on initialization and conditioning that needs to be discussed as they relate to the proposed method. The AC notes two additional, closely related initialization schemes that should be discussed [1,2]. Comparing with stronger baselines on more recent modern architectures would improve this work significantly. + +[1]: https://nips.cc/Conferences/2019/Schedule?showEvent=14216 +[2]: https://arxiv.org/abs/1901.09321.",ICLR2020, +ZEbtWjZuMc,1576800000000.0,1576800000000.0,1,B1elqkrKPH,B1elqkrKPH,Paper Decision,Reject,"This paper introduces an unsupervised learning objective that attempts to improve the robustness of the learnt representations. This approach is empirically demonstrated on cifar10 and tiny imagenet with different network architectures including all convolutional net, wide residual net and dense net. Two of three reviewers felt that the paper was not suitable for publication at ICLR in its current form. Self supervision based on preserving network outputs despite data transformations is a relatively minor contribution, the framing of the approach as inspired by biological vision notwithstanding. Several references, including at a past ICLR include: +http://openaccess.thecvf.com/content_CVPR_2019/papers/Kolesnikov_Revisiting_Self-Supervised_Visual_Representation_Learning_CVPR_2019_paper.pdf +and +Gidaris, P. Singh, and N. Komodakis. Unsupervised representation learning by predicting image rotations. In International Conference on Learning Representations (ICLR), 2018.",ICLR2020, +mqNRPLgvuAH,1610040000000.0,1610470000000.0,1,TtYSU29zgR,TtYSU29zgR,Final Decision,Accept (Poster),"It is common in imitation learning to measure and minimize the differences between the agent’s and expert’s visitation distributions. This paper proposes using Wasserstein distance for this, named PWIL, by considering the upper bound of its primal form and taking it as the optimization objective. The effectiveness of the approach is demonstrated by an extensive set of experiments. + +Overall, reviewers reached general agreement that this paper makes a good contribution to the conference, and given the overall positive reviews, I also recommend accepting the paper. +",ICLR2021, +BJxo5-bLeV,1545110000000.0,1545350000000.0,1,ryf6Fs09YX,ryf6Fs09YX,A comprehensive mathematical framework for unbiased low variance gradient estimator that applies to continuous and discrete random variables ,Accept (Poster),"This clearly written paper develops a novel, sound and comprehensive mathematical framework for computing low variance gradients of expectation-based objectives. The approach generalizes and encompasses several previous approaches for continuous random variables (reparametrization trick, Implicit Rep, pathwise gradients), and conveys novel insights. +Importantly, and originally, it extends to discrete random variables, and to chains of continuous random variables with optionally discrete terminal variables. These contributions are well exposed, and supported by convincing experiments. +Questions from reviewers were well addressed in the rebuttal and helped significantly clarify and improve the paper, in particular for delineating the novel contribution against prior related work. +",ICLR2019,4: The area chair is confident but not absolutely certain +CjzLx08ckf,1576800000000.0,1576800000000.0,1,Skeh-xBYDH,Skeh-xBYDH,Paper Decision,Reject,"The two main concerns raised by reviewers is that whether the results are significant, and a potential issue in the proof. While the rebuttal clarified some steps in the proof, the main concerns about the significance remain. The authors are encouraged to make this significance more clear. + +Note that one reviewer argued theoretical papers are not suitable for ICLR. This is false, as a theoretical understanding of neural networks remains a key research area that is of wide interest to the community. Consequently, this review was not considered in the final evaluation.",ICLR2020, +BkC_HkTrf,1517250000000.0,1517260000000.0,604,H11lAfbCW,H11lAfbCW,ICLR 2018 Conference Acceptance Decision,Reject,"This paper attempts to connect the expressivity of neural networks with a measure of topological complexity. The authors present some empirical results on simplified datasets. +All reviewers agreed that this is an intriguing line of research, but that the current manuscript is still presenting preliminary results, and that further work is needed before it can be published. ",ICLR2018, +hHSAbsHqDEx,1610040000000.0,1610470000000.0,1,YCXrx6rRCXO,YCXrx6rRCXO,Final Decision,Accept (Poster),"The paper provides a new distance preserving embedding based on a recent result called sigma-delta quantization. The authors notice that in many realistic scenarios, the input vectors are well-spread and under assumptions regarding the spreadness provide a fast technique to convert the input vectors into binary vectors, possibly of lower dimension. For completeness, the authors analyze the setting where the vectors are not spread and show that by using a randomized Walsh-Hadamard transform, their results still apply. +The authors do not provide a completely novel approach, to quote R2 “On a technical level the results in this paper are hardly too surprising for the JL community, but it is nice to see this analysis worked out in detail”. That being said, they show that a natural idea indeed works out by providing both a theoretical analysis and experimental results. The experiments can be more thorough but do convey the point that the result indeed works and moreover is somewhat robust in that it works well even when the formal requirements do not entirely hold. +There are a few issues mentioned by the reviewers that should be addressed: A clearer exposition of the guarantees and assumptions, some comparison with previous papers. However given the responses and discussions these seem minor and fixable towards a camera ready version. I recommend accepting the paper +",ICLR2021, +r1qMpM8_x,1486400000000.0,1486400000000.0,1,HkEI22jeg,HkEI22jeg,ICLR committee final decision,Accept (Poster),"This work is an important step in developing the tools for understanding the nonlinear response properties of visual neurons. The methods are sound and the results are meaningful. Reviewer 3 gave a much lower score than the other two reviewers because Rev 3 does not appreciate the improvement of prediction performance as an advance in itself. For understanding of the visual algorithms in the brain, however, prediction performance is the most critical success criterion. The paper provides convincing evidence that the approach is promising and likely to facilitate further advances towards achieving this long-term goal. + + I am confident enough to defend acceptance of this paper for a poster.",ICLR2017, +IpSroQfbIN,1610040000000.0,1610470000000.0,1,9GUTgHZgKCH,9GUTgHZgKCH,Final Decision,Reject,This is a clear reject. None of the reviewers supports publication of this work. The concerns of the reviewers are largely valid.,ICLR2021, +auWeXmrFQ8_,1642700000000.0,1642700000000.0,1,dn4B7Mes2z,dn4B7Mes2z,Paper Decision,Reject,"This paper experimentally investigate the inductive bias of deep neural networks tending to produce low rank embeddings of data, which is important to explain why over-parameterized DNN can generalize. In particular, the paper empirically finds that deeper networks are more likely to produce lower rank embedding through thorough numerical experiments with different network architectures, hyperparameters and so on. The authors also proposed a linear over-parameterization technique to induce low-rank bias and empirically justifies its effectiveness. + +Overall, this paper is well written, and the numerical experiments are carefully executed. However, the main drawback of the paper is that the low-rank inductive bias itself is a well known phenomenon and this paper gives a kind of additional confirmation to it. I acknowledge that there are several differences from existing papers, but overall we see rather limited insight from the results. Indeed, some of existing studies gave theoretical analyses to understand ""why it happens"", but this paper does not give a sufficiently novel insight to reveal the reason. + +To summarize the decision, this paper lacks deeper insight compared with existing work although the authors did a good job to execute through experiments. Therefore, it is a bit below the acceptance threshold.",ICLR2022, +IWMijNANQw,1576800000000.0,1576800000000.0,1,HJlRFlHFPS,HJlRFlHFPS,Paper Decision,Reject,"This paper aims to disentangle semantics and syntax inside of popular contextualized word embedding models. They use the model to generate sentences which are structurally similar but semantically different. + +This paper generated a lot of discussion. The reviewers do like the method for generating structurally similar sentences, and the triplet loss. They felt the evaluation methods were clever. However, one reviewer raised several issues. First, they thought the idea of syntax had not been well defined. They also thought the evaluation did not support the claims. The reviewer also argued very hard for the need to compare performance to SOTA models. The authors argued that beating SOTA is not the goal of their work, rather it is to understand what SOTA models are doing. The reviewers also argue that nearest neighbors is not a good method for evaluating the syntactic information in the representations. + +I hope all of the comments of the reviewers will help improve the paper as it is revised for a future submission.",ICLR2020, +BkQsEkaBG,1517250000000.0,1517260000000.0,422,BJjquybCW,BJjquybCW,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"Dear authors, + +While I appreciate the result that a convolutional layer can have full rank output, this allowing a dataset to be classified perfectly under mild conditions, the fact that all reviewers expressed concern about the statement is an indication that the presentation sill needs quite a bit of work. + +Thus, I recommend it as an ICLR workshop paper.",ICLR2018, +pomwaSThW7,1576800000000.0,1576800000000.0,1,SJeC2TNYwB,SJeC2TNYwB,Paper Decision,Reject,"The authors observe that batch normalization using the statistics computed from a *test* batch significantly improves out-of-distribution detection with generative models. Essentially, normalizing an OOD test batch using the test batch statistics decreases the likelihood of that batch and thus improves detection of OOD examples. The reviewers seemed concerned with this setting and they felt that it gives a significant advantage over existing methods since they typically deal with single test example. The reviewers thus wanted empirical comparisons to methods designed for this setting, i.e. traditional statistical tests for comparing distributions. Despite some positive discussion, this paper unfortunately falls below the bar for acceptance. The authors added significant experiments and hopefully adding these and additional analysis providing some insight into how the batchnorm is helping would make for a stronger submission to a future conference.",ICLR2020, +8uFQZhbNxq,1576800000000.0,1576800000000.0,1,HJe6uANtwH,HJe6uANtwH,Paper Decision,Accept (Poster),"This work presents a routing algorithm for capsule networks, and demonstrates empirical evaluation on CIFAR-10 and CIFAR-100. The results outperform existing capsule networks and are at-par with CNNs. Reviewers appreciated the novelty, introducing a new simpler routing mechanism, and achieving good performance on real world datasets. In particular, removing the squash function and experimenting with concurrent routing was highlighted as significant progress. There were some concerns (e.g. claiming novelty for inverted dot-product attention) and clarification questions (e.g. same learning rate schedule for all models). The authors provided a response and revised the submission , which addresses most of these concerns. At the end, majority of reviewers recommended accept. Alongside with them, I acknowledge the novelty of using layer norm and parallel execution, and recommend accept. +",ICLR2020, +B1xjFqiye4,1544690000000.0,1545350000000.0,1,rJlWOj0qF7,rJlWOj0qF7,Interesting paper on the much studied subject of word vectors,Accept (Poster),"The authors provide an interesting method to infuse hierarchical information into existing word vectors. This could help with a variety of tasks that require both knowledge base information and textual co-occurrence counts. +Despite some of the shortcomings that the reviewers point out, I believe this could be one missing puzzle piece of connecting symbolic information/sets/logic/KBs with neural nets and hence I recommend acceptance of this paper.",ICLR2019,4: The area chair is confident but not absolutely certain +We2t4Uliay,1610040000000.0,1610470000000.0,1,5L8XMh667qz,5L8XMh667qz,Final Decision,Reject,"All three referees have provided detailed comments, both before and after the author response period. While the authors have carefully revised the paper and provided detailed responses, leading to clearly improved clarity and quality, there remain clear concerns on novelty (at least not sufficiently supported with ablation study) and experiments (neither strong enough nor sufficient to support the main hypotheses). The authors are encouraged to further improve their paper for a future submission.",ICLR2021, +NFVC7kgzNHe,1610040000000.0,1610470000000.0,1,VyDYSMx1sFU,VyDYSMx1sFU,Final Decision,Reject,"This paper proposes the use of federated learning to the application of steering wheel prediction for autonomous driving. While the application is new and interesting, the reviewers felt that the approach and results were mostly empirical. I suggest that the authors improve the conceptual/algorithmic contribution of the paper in a revised draft. Another suggestion is to include a better explanation of hyper-parameter optimization used in the experiments. I hope that the reviewers' constructive comments will help the authors revise the draft adequately for submission to a future venue!",ICLR2021, +bnOKzwzNM37,1610040000000.0,1610470000000.0,1,St1giarCHLP,St1giarCHLP,Final Decision,Accept (Poster),"This work provides additional insights into a class of generative models that is rapidly gaining traction, and extends it by potentially providing a faster sampling mechanism, as well as a way to meaningfully interpolate between samples (an ability which adversarial models, currently the most popular class of generative models, also have). The revised manuscript includes an extension to discrete data, which could potentially amplify the impact of this work. The authors have also run additional experiments in response to the reviewers' comments. + +Reviewer 1 raised several concerns about the choice of language (i.e. referring to the proposed model as a diffusion model, and the precise meaning of 'implicit' in the context of generative models). This is a fair point, as the authors introduce changes that affect the Markovian nature of the ""diffusion"" process, and a diffusion process is supposed to be Markovian by definition. + +However, I think there is something to be said for the authors' argument of using the word 'diffusion' to clearly link this work to the prior work on which it is based. Given that technically speaking, the original DDPM work already 'abuses' the term to refer to a discrete-time process, it is difficult to argue compellingly that 'diffusion' should not feature in the name of the proposed model. Referring to 'non-Markovian diffusion processes' however seems more problematic, as this is a direct contradiction. If the authors wish to use this phrase, adding a few sentences to the introduction that justify this use would be helpful, and personally I feel this would be sufficient to address the issue (I noted that Section 4.1 already acknowledges that the forward process is no longer a diffusion). Plenty of work in our field abuses notation and this is justified simply with the phrase ""with (slight) abuse of notation...""; I don't think this would be any different. + +Reviewer 1 is technically correct that 'stochastic' is an absolute adjective, i.e. something can only be stochastic or deterministic, there is nothing in between, and there are no degrees/levels of stochasticity or determinism. In practice however, it is quite often used in a comparative sense, and I believe I have in fact been guilty of this myself! I do not feel that it causes any ambiguity in this case. Indeed, the phrase 'degree of stochasticity' seems to be in relatively common use in literature. While there may be more correct terms to use, I subscribe to the descriptivist view on language, and I do not think the comparative use of 'stochastic' is a major issue here. The alternatives I can think of seem potentially more cumbersome (e.g. I wager that 'more/less entropic' would be more poorly understood than 'more/less stochastic'). Still, I recommend that the authors consider potential alternatives in the future, to avoid any confusion. + +Overall, I think the reviewers' major concerns have been addressed in the revised manuscript. Given that all reviewers consider the idea worthwhile, I will join them in recommending acceptance.",ICLR2021, +GwC8L3Cjbeq,1610040000000.0,1610470000000.0,1,rsf1z-JSj87,rsf1z-JSj87,Final Decision,Accept (Oral),"This paper investigates a speech synthesis approach that directly generates raw audios from text or phoneme inputs in an end-to-end fashion. The approach first maps the input texts/phonemes into a representation sequence that is aligned with the output at a lower sampling frequency by a differentiable aligner and then upsamples the representation sequence to the full audio frequency by a decoder. A number of techniques including adversarial training and soft DTW are applied to improve the training. The experimental results are good. There are raised concerns from the reviewers which are mostly cleared by the rebuttal of the authors. After the rebuttal and discussion, all reviewers are supportive on accepting the paper. ",ICLR2021, +hjWENaStAsd,1642700000000.0,1642700000000.0,1,HmFBdvBkUUY,HmFBdvBkUUY,Paper Decision,Reject,"This paper proposes a novel approach to include graph information into Transformers. Reviewers expressed concerns on 2 main issues - + +1) The exact architecture proposed in the paper is not well motivated. In words of one of the reviewers 'I still do not understand why the authors learn the spectral GCN filter weights from the attention matrix of the transformer, which can have a completely different sparsity pattern than the input graph, instead of learning the filter weights from the graph itself, e.g., by using a GNN. '. Authors tried to provide an explanation in the response however I think it needs to be made much more rigorous for it to be well motivated. + +2) The interplay between existing position encoding schemes and the proposed method. This point also confused couple of reviewers as the empirical results seem to be strongly influenced by the choices of position encoding. Authors, I think did a great job in addressing this concern by providing additional results during the response period. + +Given the weak experimental results and lack of clear motivations I think the paper is not currently ready for acceptance.",ICLR2022, +SJxIWB4gl4,1544730000000.0,1545350000000.0,1,H1xwNhCcYm,H1xwNhCcYm,Interesting empirical observation and analysis,Accept (Poster),"This paper makes the intriguing observation that a density model trained on CIFAR10 has higher likelihood on SVHN than CIFAR10, i.e., it assigns higher probability to inputs that are out of the training distribution. This phenomenon is also shown to occur for several other dataset pairs. This finding is surprising and interesting, and the exposition is generally clear. The authors provide empirical and theoretical analysis, although based on rather strong assumptions. Overall, there's consensus among the reviewers that the paper would make a valuable contribution to the proceedings, and should therefore be accepted for publication.",ICLR2019,4: The area chair is confident but not absolutely certain +7y4AWxOKPt,1610040000000.0,1610470000000.0,1,3GYfIYvNNhL,3GYfIYvNNhL,Final Decision,Reject,"This paper proposes the c-score, which is the aggregation of a ""consistency profile"" that measures per-instance generalization. Naive computation of the c-score is expensive and thus requires an approximation. The paper then uses the c-score to analyze several image benchmarks and their learning dynamics. + +While the reviewers found the experiments to be well-done, their primary concern was over the novelty and ultimate usefulness of the c-score. As R1 and R4 point out, the c-score correlates with other known measures such as accuracy and training speed. The authors claim this is a contribution. In turn, it is hard to tell if the c-score is a true metric of interest or a recapitulation of what is already known. No reviewer was in favor of acceptance.",ICLR2021, +OgpZffbQDDU,1642700000000.0,1642700000000.0,1,nBU_u6DLvoK,nBU_u6DLvoK,Paper Decision,Accept (Poster),"The paper presents an approach for spatio-temporal representation learning using Transformers. It introduces a particular architecture design, which shows an impressive computational efficiency. The reviewers agree that the experimental results are strong, and unanimously recommend the paper for acceptance. The reviewers find their concerns regarding the details of the approach/setting address after the authors' response. + +We recommend accepting the paper.",ICLR2022, +kpwFO1UVobQ,1642700000000.0,1642700000000.0,1,1R_PRbQK2eu,1R_PRbQK2eu,Paper Decision,Reject,"In this paper, the authors generalized the Fenchel duality formulation of the maximum likelihood for F1-EBM, which leads to a min-max optimization formulation. Meanwhile, the optimization reveals a new connection between primal-dual MLE with score matching. These contribution is significant and make the paper interesting to the community. + +However, there are several issues need to be addressed. + +- *Experiments*: most of the reviewers concern about the empirical study, which are conducted on synthetic data. The paper will be much stronger if the comparison on real-world dataset, e.g., MNIST and CIFAR10, can be conducted. + +- *Clarification of paper*: I totally understand that due to this is a theoretical-oriented paper, it must be notation and derivation heavy. However, the paper will be much easier for reader, if more discussion is added, e.g., the implementation of the proposed algorithms and more explanation about the comparison between primal-dual algorithm vs. score matching and the experimental results for broader audiences. + + +In fact, I personally like the paper very much, which provides a promising solid algorithm for EBM estimation, and the connection to score matching is also novel and different from the current understanding. However, unfortunately, the authors gave up the rebuttal and did not successfully address the concerns from the reviewers. I strongly encourage the authors to revise the draft according to the reviewers' suggestion and resubmit to another venue.",ICLR2022, +vYHO8j_5Bv,1576800000000.0,1576800000000.0,1,S1xCuTNYDr,S1xCuTNYDr,Paper Decision,Reject,"This paper investigates a promising direction on the important topic of interpretability; the reviewers find a variety of issues with the work, and I urge the authors to refine and extend their investigations.",ICLR2020, +TDp6c47p54bw,1642700000000.0,1642700000000.0,1,45Mr7LeKR9,45Mr7LeKR9,Paper Decision,Accept (Spotlight),"In this paper, the authors generalize the univariate Shapley method to bivariate Shapley method. The authors first build a directly graph based on the asymmetric bivariate Shapley value (adding feature j to all sets contained feature i). Then several graph algorithms are applied to analyze the directly graph to derive (1) univariate feature importance available in univariate approach and (2) relations like mutually redundancy only available in bivariate approaches. Experiments on several datasets with comparison to existing methods demonstrated the superiority of the proposed method. All reviews are positive.",ICLR2022, +pY8hYC4_1PO,1610040000000.0,1610470000000.0,1,O7ms4LFdsX,O7ms4LFdsX,Final Decision,Accept (Spotlight),"This paper presents an approach for learning disentangled static and dynamic latent variables for sequence data. In terms of learning objective, the paper extends Wasserstein autoencoder to sequential data, and this approach is novel and well-motivated; the aggregated posterior for static variables comes out naturally and plays an important role for regularization (this appears to be new for sequence data). The authors also studies how to model additional categorical variables for weakly supervised learning in real scenarios. The main steps (generation and inference) were illustrated by graphical models with clarity, and rigorous statements are provided to back them up. Experimental results demonstrate the advantages of proposed method, in terms of disentanglement performance and generation quality. + +The reviewers think this paper makes nice contributions to the sequential generative model community.",ICLR2021, +nPz0qpRdgq,1576800000000.0,1576800000000.0,1,SJxSOJStPr,SJxSOJStPr,Paper Decision,Accept (Poster),"This paper proposes an expansion-based approach for task-free continual learning, using a Bayesian nonparametric framework (a Dirichlet process mixture model). + +It was well-reviewed, with reviewers agreeing that the paper is well-written, the experiments are thorough, and the results are impressive. Another positive is that the code has been released, meaning it’s likely to be reproducible. + +The main concern shared among reviewers is the limited novelty of the approach, which I also share. Reviewers all mentioned that the approach itself isn’t novel, but they like the contribution of applying it to task-free continual learning. This wasn’t mentioned, but I’m concerned about the overlap between this approach and CURL (Rao et al 2019) published in NeurIPS 2019, which also deals with task-free continual learning using a generative, nonparametric approach. Could the authors comment on this in their final version? + +In sum, it seems that this paper is well-done, with reproducible experiments and impressive results, but limited novelty. Given that reviewers are all satisfied with this, I’m willing to recommend acceptance. + +",ICLR2020, +vXYmswWNnw,1642700000000.0,1642700000000.0,1,WZ3yjh8coDg,WZ3yjh8coDg,Paper Decision,Accept (Poster),"This paper follows the recent line of work of theoretically analyzing the Neural Collapse phenomenon, by making certain simplifying assumptions on the problem setup. In this case, the authors use cross-entropy loss on an unconstrained model where second-to-last representations become free variables. Their main results characterise the NC as the only global minimiser. +Reviewers were positive about this work, and concluded it presents a valuable addition to the growing analysis of NC. They also pointed out several clarity issues that should be addressed in the final revision, including a more objective comparison to prior work. Ultimately this work will be an interesting addition to the conference, and therefore the AC recommends acceptance.",ICLR2022, +B1eKjl23k4,1544500000000.0,1545350000000.0,1,HJgTHnActQ,HJgTHnActQ,metareview,Reject,"The paper received mixed ratings. The proposed idea is quite reasonable but also sounds somewhat incremental. While the idea of separating foreground/background is reasonable, it also limits the applicability of the proposed method (i.e., the method is only demonstrated on aligned face images). In addition, combining AdaIn with foreground mask is a reasonable idea but doesn’t sound groundbreakingly novel. The comparison against StarGAN looks quite anecdotal and the proposed method seems to cause only hairstyle changes (but transfer with other attributes are not obvious). In addition, please refer to detailed reviewers’ comments for other concerns. Overall, it sounds like a good engineering paper that might be better fit to computer vision venue, but experimental validation seems somewhat preliminary and it’s unclear how much novel insight and general technical contributions that this work provides. +",ICLR2019,3: The area chair is somewhat confident +yhW4aQ5-sTi,1642700000000.0,1642700000000.0,1,#NAME?,#NAME?,Paper Decision,Reject,"This paper shows minimax lower bounds on transfer learning for binary classification, in terms of a notion of transfer distance, defined in this paper. Experimental results try to show the validity of the proved minimax lower bounds. + +All reviewers acknowledge that the lower bounds are worthy contributions; however, none of the reviewers felt strong enough to champion this work, due to that: +- the theoretical sharpness of the lower bounds are not discussed in detail (Reviewers TfQT, pmYF, and dGvD). Remark 6 only discusses the regime of a small amount of source data and a large transfer distance, which is fairly limited. +- it is unclear to what extent the experiments validates the theory (Reviewer WXNa). Note that for a minimax lower bound, for any algorithm, there is some corresponding ""worst-case"" datasets such that the algorithm does not do well; it is unclear if the datasets considered here are worst-case at all. +- the lower bound techniques are fairly standard. +- the comparisons between the lower bounds in this work and prior lower bounds (e.g. those in [1,2]) need to be discussed more thoroughly. + +We encourage the authors to take into account the reviewers' feedback and revise the paper. + +[1] Hanneke and Kpotufe. On the value of target data in transfer learning. NeurIPS 2019. +[2] ​​Mansour, Mohri, Ro, Suresh, and Wu. A theory of multiple source adaptation with limited target labeled data. AISTATS 2021.",ICLR2022, +miwWQqXxm,1576800000000.0,1576800000000.0,1,B1eX_a4twH,B1eX_a4twH,Paper Decision,Reject,This paper proposes constraints to tackle the problems of dead neurons and dead points. The reviewers point out that the experiments are only done on small datasets and it is not clear if the experiments will scale further. I encourage the authors to carry out further experiments and submit to another venue.,ICLR2020, +SCUCEuju-4,1576800000000.0,1576800000000.0,1,r1lfF2NYvH,r1lfF2NYvH,Paper Decision,Accept (Spotlight),"This paper proposes a graph embedding method for the whole graph under both unsupervised and semi-supervised setting. It can extract a fixed length graph-level representation with good generalization capability. All reviewers provided unanimous rating of weak accept. The reviewers praise the paper is well written and is value to different fields dealing with graph learning. There are some discussions on the novelty of the approach, which was better clarified after the response from the authors. Overall this paper presents a new effort in the active topic of graph representation learning with potential large impact to multiple fields. Therefore, the ACs recommend it to be an oral paper.",ICLR2020, +eyyz8v77gt_,1642700000000.0,1642700000000.0,1,dPyRNUlttBv,dPyRNUlttBv,Paper Decision,Accept (Poster),"This paper goes beyond the NTK setting in analyzing optimization and generalization in ReLU networks. It nicely generalizes NTK by showing that generalization depends on a family of kernels rather than the single NTK. The reviewers appreciated the results. One thing that is missing is a clear separation between NTK results and the ones proposed here. Although it is ok to defer this to future work, a discussion of this point in the paper would be helpful.",ICLR2022, +HJxAG7V1lN,1544660000000.0,1545350000000.0,1,r1l9Nj09YQ,r1l9Nj09YQ,"Interesting technical work, but serious issues with framing ",Reject,"This paper addresses a clear open problem in representation learning for language: the learning of language-agnostic representations for zero-shot cross-lingual transfer. All three reviewers agree that it makes some progress on that problem, and my understanding is that a straightforward presentation of these would likely have been accepted to this conference. However, there were serious issues with the framing and presentation of the paper. + +One reviewer expressed serious concerns about clarity and detail, and two others expressed serious concerns about the paper's framing. I'm more worried about the framing issue: The paper opens with a sweeping discussion about the nature of language and universal grammar and, in the original version, also claims (in vague terms) to have made substantial progress on understanding the nature of language. The most problematic claims have since been removed, but the sweeping introduction remains, and it serves as the only introduction to the paper, leaving little discussion of the substantial points that the paper is trying to make. + +I reluctantly have to recommend rejection. These problems should be fixable with a substantial re-write of the paper, but the reviewers were not satisfied with the progress made in that direction so far.",ICLR2019,4: The area chair is confident but not absolutely certain +oehMcb1TWkv,1642700000000.0,1642700000000.0,1,7Z7u2z1Ornl,7Z7u2z1Ornl,Paper Decision,Reject,"This paper proposes techniques for improving the scalability of set-to-hypergraph models. +The main issue with the submission is that all reviewers found the clarity of the paper to be problematic including the proofs, the experimental conditions, and many other parts. +The authors responded but some reviewers explicitly state that their questions have only partially been answered and some reviewers did not respond to the authors. Unfortunately, given the number of clarity issues raised by the reviewers it makes more sense to re-submit this paper after re-writing based on all the suggestions from the reviewers.",ICLR2022, +7WoB890vBHz,1610040000000.0,1610470000000.0,1,TiXl51SCNw8,TiXl51SCNw8,Final Decision,Accept (Poster),"The paper explores a solution for mixed precision quantization. The authors view the weights in their binary format, and suggest to prune the bits in a structured way. Namely, all weights in the same layer should have the same precision, and the bits should be pruned from the least significant to most significant. This point of view allows the authors to exploit techniques used for weight pruning, such as L1 and group lasso regularization. + +Although the field of quantization and model compression/acceleration is quite mature by now and has a large body of works, this paper is novel in its approach. Although the improvements provided over SoTA results are not very large, I believe that the novelty of the approach would make this paper a welcome addition to ICLR. + +There are a few issues to be dealt with pointed out by the reviewers such as confusing terminology or required clarifications, but these are minor revisions that I trust the authors will be able to add to their paper. +",ICLR2021, +jduIZvEsMCG,1610040000000.0,1610470000000.0,1,Z4R1vxLbRLO,Z4R1vxLbRLO,Final Decision,Accept (Poster),"This paper provides a clear and useful empirical study of how the initialization scale and activations function affects the generalization capability of neural networks. Previous works showing the effect of the initialization scale (Chizat and Bach (2018), Geiger et al. (2019), Woodorth et al. (2020)) had a more limited set of experiments. Moreover, here an extreme case is shown, wherewith sin activation function no generalization is possible at a large init scale (there the kernel regime is useless for generalization since the hidden layer output becomes very sensitive to any small perturbation in the input). Lastly, two alignment measures are suggested, which are correlated with the generalization across several architectures and initialization scales. + +All the reviewers argued for acceptance, and one strongly so. I agree that the paper is sufficiently interesting and clear to be accepted. However, despite the high scores, I only recommend a poster and not spotlight/oral: I think the novelty of the empirical study is not groundbreaking, given the experiments in previous works, and the usefulness of the suggested measures are not completely clear without a thorough comparison against previously suggested measures.",ICLR2021, +KULlcy6KBMj,1642700000000.0,1642700000000.0,1,g2LCQwG7Of,g2LCQwG7Of,Paper Decision,Accept (Poster),"The authors introduce a novel probabilistic hierarchical clustering method for graphs. In particular they design an end-to-end gradient-based learning to optimize the Dasgupta cost and Tree Sampling Divergence cost at the same time. + +Overall the paper presents solid results both from a theoretical and experimental perspective so I think it is a good fit for the conference and I suggest accepting it.",ICLR2022, +lJQgl8qViW7,1642700000000.0,1642700000000.0,1,MTex8qKavoS,MTex8qKavoS,Paper Decision,Accept (Poster),"This work studies the impact of distribution shift via a collection of datasets-MetaShift. Reviewers all agreed that this work is simple, effective, and well-motivated, and has key implications, and will be quite useful to the community. There were some concerns about the lack of analysis of MetaShift, and the binary classification setting, which was addressed by the authors’ responses. Thus, I recommend an acceptance.",ICLR2022, +WF6a5QTw0Us,1610040000000.0,1610470000000.0,1,jxdXSW9Doc,jxdXSW9Doc,Final Decision,Accept (Poster),"The focus of the submission is kernel ridge regression in the distributed setting. Particularly, the authors present optimal learning rates under this assumption both in expectation and in probability, while they relax previous restrictions on the number of partitions taken. The effectiveness of the approach is demonstrated in synthetic and real-world settings. + +As summarized by the reviewers, the submission is well-organized and clearly written, the authors focus on an important problem, they present a fundamental theoretical contribution which also has clear practical impact. As such the submission could be of interest to the ICLR and ML community.",ICLR2021, +lZUKw3nCEG,1576800000000.0,1576800000000.0,1,HJlHzJBFwB,HJlHzJBFwB,Paper Decision,Reject,"This paper proposes to speed up Bayesian deep learning at test time by training a student network to approximate the BNN's output distribution. The idea is certainly a reasonable thing to try, and the writing is mostly good (though as some reviewers point out, certain sections might not be necessary). The idea is fairly obvious, though, so the question is whether the experimental results are impressive enough by themselves to justify acceptance. The method is able to get close to the performance achieved by Monte Carlo estimators with much lower cost, although there is a nontrivial drop in accuracy. This is probably worth paying if it achieves 500x computation reduction as claimed in the paper, though the practical gains are probably much smaller since Monte Carlo methods are rarely used with 500 samples. Overall, this seems a bit below the bar for ICLR. +",ICLR2020, +uRkIkKqcX6B,1642700000000.0,1642700000000.0,1,Ihxw4h-JnC,Ihxw4h-JnC,Paper Decision,Reject,"All reviewers are very consistent with their evaluation of the paper. The discussion phase did not change their initial evaluation. Therefore, I also recommend to reject the paper.",ICLR2022, +kzMG-CO53zQ,1642700000000.0,1642700000000.0,1,Ve0Wth3ptT_,Ve0Wth3ptT_,Paper Decision,Accept (Poster),"This paper proposes a decomposition-based explanation method for graph neural networks. The motivation of this paper is that existing works based on approximation and perturbation suffer from various drawbacks. To address the challenges of existing works, the authors directly decompose the influence of node groups in the forward pass. The decomposition rules are designed for GCN and GAT. Further, to efficiently select subgraph groups from all possible combinations, the authors propose a greedy approach to search for maximally influential node sets. Experiments on synthetic and real-world datasets verify the improvements over existing works. During their initial responses, reviewers suggested that the authors experiment with more baselines and also clarify some of the technical details. The authors revised their manuscript to address several of these comments. So, I am tentatively assigning an accept to this paper.",ICLR2022, +OKl3mzLMswx,1610040000000.0,1610470000000.0,1,loe6h28yoq,loe6h28yoq,Final Decision,Reject,"Some reviewers expressed concerns on soundness of the theory in the paper. Specifically, theorem 3 does not seem to be correct. There are other concerns such as the significance of the theoretical contributions, little empirical value and existence of much stronger results. Unfortunately the authors did not provide responses to the concerns raised by the reviewers. + +",ICLR2021, +uo2ai16padK,1642700000000.0,1642700000000.0,1,VBZJ_3tz-t,VBZJ_3tz-t,Paper Decision,Accept (Poster),"### Summary + +This paper builds on previous work on sparse training that shows the many modern sparse training techniques do no better than a random pruning technique that selects layer wise rations, but otherwise randomly selects which weights within a layer to remove. The key difference in this work is to take these existing results and scale the size of the network to show that as the size of the network increases, the smaller -- as measured in pruning ratio -- a matching subnetwork becomes. + +### Discussion + +#### Strengths + +Places an emphasis on simple techniques + +#### Weakness + +Significant overlap with previous work. Prior already demonstrated the equivalence of random pruning and contemporary pruning at initialization techniques. + +### Recommendation + +I recommend Accept (poster). However, I do want to stress that there is a significant overlap with previous work. The paper does appropriately attribute observations to previous work. However, there is some risk that readers may misinterpret the title and claim results as a wholly new observation about random pruning, where the reality is instead much more nuanced. Given that the work points to new methodological directions on considering scaling the network as an additional parameter to consider in pruning observations, I do believe these results -- even if narrower in scope that can be interpreted -- provide value to the community.",ICLR2022, +HQeIbPdXnnO7,1642700000000.0,1642700000000.0,1,7WVAI3dRwhR,7WVAI3dRwhR,Paper Decision,Reject,"This paper addresses the identification of physical systems defined on graphs. The authors introduce the Adversarial Twin Neural Network (ATN), which consists in augmenting a simple linear model (PNN) with a virtual neural network (VNN). Some regularization terms are used to enforce maximum prediction from the PNN, and to enforce diverse outputs between PNN and VNN. + +The paper initially received tree rejection recommendations. The main limitations pointed out by reviewers relate to the limited contributions, the limiting assumption of using a linear mode for PNN, the lack of positioning with respect to related works, and clarifications on experiments. The authors' rebuttal answered to some reviewers concerned: Rdem1 increased its grade from 3 to 5, and Rdem1 from 5 to 6 - although not willing to champion the paper. R8dT9, which provided a very detailed review and feedback after rebuttal still voted for rejection, especially because he was not convinced by the positioning with recent related works and the answers on experiments. + +The AC's own readings confirmed the issues essentially raised by R8dT9 and other reviewers. Especially, the AC considers that: +- The contributions for driving a proper cooperation between the PNN and VNN models are weak, since it reduces to using simple skip connection and adversarial training. +- The importance of these aspects have not been analysed in depth in the revised version of the paper, neither theoretically nor experimentally: for example, the difference with respect to [Yin+ 2021] for a proper augmentation, the discussion to alternative methods for representing diversity as done in [Rame & Cord 2021], or the positioning with respect to Wasserstein distance-based objectives. +- There remains ambiguities in the cross-validation process, which have not been addressed in the rebuttal. + +Therefore, the AC recommends rejection.",ICLR2022, +EEnOfnTd8Xe,1642700000000.0,1642700000000.0,1,xNO7OEIcJc6,xNO7OEIcJc6,Paper Decision,Accept (Poster),"This paper presents a new benchmark task for models similar to CLIP for evaluating how visual word forms interfere with the visual recognition of objects in images when the former are superimposed on the latter ones. Specifically, by superimposing words belonging to different categories (e.g., hypernyms vs basic labels) the authors study the misclassification rates of CLIP under different degrees of varying similarity between the original and superimposes labels. + +All reviewers agreed that this is a novel and interesting study which, by productively using insights from cognitive science literature on language biases, aims at shedding light on the inner-workings of a popular artificial model. The main concern raised by reviewer P83Y was regarding the claims around misclassification rates. Indeed, since CLIP was not taught (e.g., by fune-tuning or few-shot prompting) which of the two labels (i.e., the written or the visual) is the correct one, it's not fair to assess its performance on this way. While this is strictly true, the experimental protocols presented in Sections 4.3/4/5 are still a valid way to assess representational inference. Moreover, the authors have followed P83Y suggestions and incorporated a few-shot prompting experiment in Section 4.6. + +All in all, I think this will make for an addition to the ICLR program and thus I'm recommending accepting this paper. + +(Minor comment: WKSS rightly pointing that this paper has, at best, a loose connection to compositionality. The authors changed compositionality -> representations which is a better fit, so please make sure to change the title also in Openreview when prompted.)",ICLR2022, +TR3U5b0i6c,1642700000000.0,1642700000000.0,1,6Pe99Juo9gd,6Pe99Juo9gd,Paper Decision,Accept (Poster),"The paper proposes a method for learning state value functions from (s,s',r) tuples, founded on the theoretical analysis in MDP setting. The extensive evaluation in several environments shows the benefit of the algorithm. + +The consensus among the reviewers, and I concur, that the paper proposes an interesting and novel method. It is cleanly presented, and well-founded. The evaluations across range of environments, including robot manipulation validate the method. + +During the rebuttal, the authors provided additional evaluation, added a discussion on the latent MDPs, and made numerous clarification, addressing most / all reviewers' questions.",ICLR2022, +5NszmFbdJ_,1610040000000.0,1610470000000.0,1,1-Mh-cWROZ,1-Mh-cWROZ,Final Decision,Reject,"This is a promising idea, without enough empirics to substantiate its potential utility, and also with a lack of clarity on the importance of the outlined task itself (fold-based rather than structure-based conditional sequence generation). There remain concerns about the lack of a more comprehensive comparison to methods for structure-to-sequence (e.g. Ingraham was added during the revision but only in a limited capacity), or easy generalizations of them, and about the quality of some of the presented results. Additionally, the concern about sensitivity to rotational invariances, and related issues wrt the fixed-size cubic grid were not satisfactorily addressed. As a side-note, the quality of the manuscript in terms of scholarliness of presentation was overall lacking.",ICLR2021, +DroAYKPIcE,1642700000000.0,1642700000000.0,1,#NAME?,#NAME?,Paper Decision,Reject,"The reviewers found the work interesting but have concerns about the correctness of some of the claims in the paper. Also some reviewers would like to see more experiments and some have concerns about the theoretical results. Overall, I see the work promising but it requires a major revision and some improvements to pass the bar. I would recommend the authors to use the reviewers' comments and prepare the paper for future venues.",ICLR2022, +6AaP-OaeA4c,1610040000000.0,1610470000000.0,1,dhQHk8ShEmF,dhQHk8ShEmF,Final Decision,Reject,"In this paper, the authors studied a robust method for detecting out-of-distribution (OOD) instances. OOD instance detection is an important practical problem, and multiple reviewers recognized the proposed approach is interesting. However, it was the common opinion of several reviewers that the main theoretical analysis was imported from existing studies, and the novelty is not sufficiently high. It was also observed that the relationship between the proposed method and closely related studies was not properly discussed. Although this point has been improved in the revision, a reviewer and area chair still concern that enough evidence is not provided for some of the points the authors claim as advantages over existing studies. Although the proposed method is interesting and could be an important contribution to the ICLR community, the current paper needs non-trivial revision before publication.",ICLR2021, +H1meU1arM,1517250000000.0,1517260000000.0,707,HJSA_e1AW,HJSA_e1AW,ICLR 2018 Conference Acceptance Decision,Reject,"The paper proposes a modification to Adam which is intended to ensure that the direction of weight update lies in the span of the historical gradients and to ensure that the effective learning rate does not decrease as the magnitudes of the weights increase. The reviewers wanted a clearer justification of the changes made to Adam and a more extensive evaluation, and held to this opinion after reading the authors' rebuttal and revisions. + +Pros: ++ The basic idea of treating the direction and magnitude separately in the optimization is interesting. + +Cons: +- Insufficient evaluation of the new method. +- More justification and analysis needed for the modifications. For example, are there circumstances under which they will fail? +- The modification to Adam and batch-normalized softmax idea are orthogonal to one another, making for a less coherent story. +- Proposed method does not have better generalization performance than SGD. +- Concern that constraining weight vectors to the unit sphere can harm generalization. +",ICLR2018, +Qv7Vlkc_d5s,1642700000000.0,1642700000000.0,1,VqzXzA9hjaX,VqzXzA9hjaX,Paper Decision,Accept (Poster),"This paper introduces a meta-learning approach to ""amalgamate"" optimizers. The reviewers all found the idea interesting and unanimously found it to be acceptable for publication. In particular, I appreciate that the authors expanded their results to include more larger problems. One of the outstanding questions that would be interesting to address in future work is the use of tuned learning rate schedules.",ICLR2022, +SX4a6Rv1JtD,1610040000000.0,1610470000000.0,1,UvBPbpvHRj-,UvBPbpvHRj-,Final Decision,Accept (Poster),"This paper presents a new approach to model uncertainty in DNNs, based on deterministic weights and simple stochastic non-linearities, where the stochasticity is encoded via a GP prior with a triangular kernel inspired by ReLu. The empirical results are promising. The comments were properly addressed. Overall, a good paper.",ICLR2021, +SJ9srkTHG,1517250000000.0,1517260000000.0,645,SkrHeXbCW,SkrHeXbCW,ICLR 2018 Conference Acceptance Decision,Reject,The paper received three good quality reviews which were in agreement that the paper was below the acceptance threshold. The authors are encouraged to follow the suggestions from the reviews to revise the paper and resubmit to another venue.,ICLR2018, +BJeIuCzlxE,1544720000000.0,1545350000000.0,1,rJf0BjAqYX,rJf0BjAqYX,Meta-Review,Reject," The paper presents a sensible algorithm for knowledge distillation (KD) from a larger teacher network to a smaller student network by minimizing the Maximum Mean Discrepancy (MMD) between the distributions over students and teachers network activations. As rightly acknowledged by the R3, the benefits of the proposed approach are encouraging in the object detection task, and are less obvious in classification (R1 and R2). + +The reviewers and AC note the following potential weaknesses: +(1) low technical novelty in light of prior works “Demystifying Neural Style Transfer” by Li et al 2017 and “Deep Transfer Learning with Joint Adaptation Networks” by Long et al 2017 -- See R2’s detailed explanations; (2) lack of empirical evidence that the proposed method is better than the seminal work on KD by Hinton et al, 2014; (3) important practical issues are not justified (e.g. kernel specifications as requested by R3 and R2; accuracy-efficiency trade-off as suggested by R1); (4) presentation clarity. +R3 has raised questions regarding deploying the proposed student models on mobile devices without a proper comparison with the MobileNet and ShuffleNet light architectures. This can be seen as a suggestion for future revisions. + +There is reviewer disagreement on this paper and no author rebuttal. The reviewer with a positive view on the manuscript (R3) was reluctant to champion the paper as the authors did not respond to the concerns of the reviewers. +AC suggests in its current state the manuscript is not ready for a publication. We hope the reviews are useful for improving and revising the paper. ",ICLR2019,5: The area chair is absolutely certain +Ea-srgueY0,1610040000000.0,1610470000000.0,1,uJSBC7QCfrX,uJSBC7QCfrX,Final Decision,Reject,"The paper presents a method to regularize the discriminator in GAN training with a ranking loss based on the user preference for a desired set within a larger dataset. The tradeoff between GAN loss and preference loss dependence on the distance of the set to the full dataset and the authors consider two regimes : ""small and major correction"". A major correction is needed when the targeted set is very different from the whole density, authors propose in this scenario to replace samples from the data by samples from the generator. The setting in the paper is interesting and can be useful in practice. + +There was a lengthy discussions between the authors and the reviewers, the discussion pinpointed issues , some of them were addressed in the rebuttal . Some issues remain unanswered regarding the clarity and some claims in the paper. + +The clarity of the paper needs further improvement and 1) clarify section 3 the setup and the background section 2) justify claims about the method, in the strong correction scenario when fresh generated samples are introduced how is this an effective procedure? (conceptually / theoretically). +",ICLR2021, +H1xYgN8HxV,1545070000000.0,1545350000000.0,1,SJe8DsR9tm,SJe8DsR9tm,no practical speedup,Reject,"This paper proposes a new method for speeding up convolutional neural networks. It uses the idea of early terminating the computation of convolutional layers. It saves FLOPs, but the reviewers raised a critical concern that it doesn't save wall-clock time. The time overhead is about 4 or 5 times of the original model. There is not any reduced execution time but much longer. The authors agreed that ""the overhead on the inference time is certainly an issue of our method"". The work is not mature and practical. recommend for rejection. ",ICLR2019,4: The area chair is confident but not absolutely certain +h0rVqvk4aZ,1610040000000.0,1610470000000.0,1,6c6KZUdm1Nq,6c6KZUdm1Nq,Final Decision,Reject,"The paper addresses regression in a weakly supervised setting where the correct labels are only available for examples whose prediction lie above some threshold. The paper proposes a method using a gradient that is unbiased and consistent. + +Pros: +- Problem setting is new and this paper is one of the first works exploring it. +- The procedure comes with some unbiasedness and consistency guarantees. +- Experimental results on a wide variety of datasets and domains. + +Cons: +- Novelty and technical contribution is limited. +- Motivation of the problem setting was found to be unclear. +- Some gaps in the experimental section (i.e. needing the use of synthetic data or synthetic modifications of the real data). + +Overall, the reviewers felt that as presented, the paper did not convincingly motivate the proposed upper one-sided regression problem as important or relevant in practice, which was a key reason for rejection. The paper may contain some nice ideas and I recommend taking the reviewer feedback to improve the presentation. ",ICLR2021, +cbUhLT-yvNt,1642700000000.0,1642700000000.0,1,rHMaBYbkkRJ,rHMaBYbkkRJ,Paper Decision,Accept (Poster),"This review paper presents a way of comparative assessment of continual learning. Reviewers all agreed that this work is interesting, unique with comprehensive coverage of the CL space. The proposed categorization, CLEVA-Compass, and its GUI have great potential to facilitate future CL work.",ICLR2022, +r3T5OLjge,1576800000000.0,1576800000000.0,1,B1xm3RVtwB,B1xm3RVtwB,Paper Decision,Accept (Spotlight),"The method presented, the simplified action decoder, is a clever way of addressing the influence of exploratory actions in multi-agent RL. It's shown to enable state of the art performance in Hanabi, an interesting and relatively novel cooperative AI challenge. It seems, however, that the method has wider applicability than that. + +All reviewers agree that this is good and interesting work. Reviewer 2 had some issues with the presentation of the results and certain assumptions, but the authors responded so as to alleviate any concerns. + +This paper should definitely be accepted, if possible as oral. + + + + + +",ICLR2020, +0Mhcm6GTUWM,1610040000000.0,1610470000000.0,1,CHTHamtufWN,CHTHamtufWN,Final Decision,Reject,"The authors address the problem of learning environment-invariant representations in the case where environments are observed sequentially. +This is done by using a variational Bayesian and bilevel framework. + +The paper is borderline, with two reviewers (R2 and R3) favoring slightly acceptance and two reviewrs (R4 and R1) favoring rejection. + +R4 points out that the current experiments do not do a good job of reflecting a continual learning setup and that simple modifications on existing IRM based methods could outperform the method proposed by the authors. The authors are encouraged to take into account the reviewer's +suggestions to improve the paper. + +R1 argued initially that the proposed solution is not learning at all since it has errors very close to random guessing. While the authors have improved their method in the revision, the results are still close to random guessing, which questions the practical usefulness of the proposed approach. Also, in the revision, the authors managed to obtain better results when their method is combined with Environment Inference for Invariant Learning (EIIL), but these results are secondary and not the main part of the paper. + +The authors should improve the work taking into account the reviewrs' comments.",ICLR2021, +0rBVuVY_jjh,1642700000000.0,1642700000000.0,1,SCSonHu4p0W,SCSonHu4p0W,Paper Decision,Reject,This paper studies the important problem of adding structured knowledge (in this case from Wikidata) to pretrained language models. The reviewers do not see this paper as ready for ICLR and recommend a number of revisions. Unfortunately the authors did not respond during the author response period. The area chair hence agrees with the reviewers.,ICLR2022, +c8lmtADCE,1576800000000.0,1576800000000.0,1,B1xwv1StvS,B1xwv1StvS,Paper Decision,Reject,"Main content: + +[Blind review #3] The authors propose a metric based model for few-shot learning. The goal of the proposed technique is to incorporate a prior that highlight better the dissimilarity between closely related class prototype. Thus, the proposed paper is related to prototypical neural network (use of prototype to represent a class) but differ from it by using inner product scoring as a similarity measure instead of the use of euclidean distance. There is also close similarity between the proposed method and matching network. + +[Blind review #2] The stated contributions of the paper are: (1) a method for performing few-shot learning and (2) an approach for building harder few-shot learning datasets from existing datasets. The authors describe a model for creating a task-aware embedding for different novel sets (for different image classification settings) using a nonlinear self-attention-like mechanism applied to the centroid of the global embeddings for each class. The resulting embeddings are used per class with an additional attention layer applied on the embeddings from the other classes to identify closely-related classes and consider the part of the embedding orthogonal to the attention-weighted-average of these closely-related classes. They compare the accuracy of their model vs others in the 1-shot and 5-shot setting on various datasets, including a derived dataset from CIFAR which they call Hierarchical-CIFAR. + +-- + +Discussion: + +All reviews agree on a weak reject. + +-- + +Recommendation and justification: + +While the ideas appear to be on a good track, the paper itself is poorly written - as one review put it, more like notes to themselves, rather than a well-written document to the ICLR audience.",ICLR2020, +LhxIGI1UJB,1576800000000.0,1576800000000.0,1,SyxiRJStwr,SyxiRJStwr,Paper Decision,Reject,"This paper constitutes interesting progress on an important problem. I urge the authors to continue to refine their investigations, with the help of the reviewer comments; e.g., the quantitative analysis recommended by AnonReviewer4.",ICLR2020, +6LcTI-6wbKV,1642700000000.0,1642700000000.0,1,DrZXuTGg2A-,DrZXuTGg2A-,Paper Decision,Accept (Poster),"This work is on stochastic convex optimization (SCO) in shuffle differential privacy (DP) models. In SCO, a learner receives a convex loss function L: Theta x X -> Reals, where Theta is a d-dimensional vector of parameters and X is a set of data points. The objective is to use samples x1, x2, …, xn to find a parameter theta that minimizes the loss E_{x ~ D}[L(theta,x)], where the distribution D on X is unknown. The shuffle models considered are a ``sequential"" model where the analyzer operates in rounds (and where a new set of users participate in a local DP protocol in every round), and a new, stronger ""full"" model in which the analyzer can request a specific subset of users to participate in a round, which in particular allows users' data to be queried more than once. This work shows that in the full model, one can develop excess population loss bounds matching the known best-possible bounds in centralized DP; it is also shown that even the weaker sequential model offers improved excess population loss bounds over the best-possible bound of sqrt(d/n) in the local setting. + +The reviewers appreciated the novelty and technical depth of this work (despite concerns about part of the work being taking “off the shelf” results).",ICLR2022, +ryxo7v6rg4,1545090000000.0,1545350000000.0,1,B1epooR5FX,B1epooR5FX,"innovative idea, contributions insufficient",Reject,"The paper proposes a framework at the intersection of programming and machine learning, where some variables in a program are replaced by PVars - variables whose values are learned using machine learning from data. The paper presents an API that is designed to support this scenario, as well as three case studies: binary search, quick sort, and caching - all implemented with PVars. + +The reviewers and the AC agree that the paper presents and potentially valuable new idea, and shows concrete applications in the presented case studies. They provide example code in the paper, and present a detailed analysis of the obtained results. + +The reviewers and AC also not several potential weaknesses - the AC will focus on a subset for the present discussion. The paper is unusual in that it presents a programming API rather than e.g., a thorough empirical comparison, a novel approach, or new theoretical insights. Paper at the intersection of systems and machine learning can make valuable contributions to the ICLR community, but need to provide a clear contributions which are supported in the paper by empirical or theoretical results. The research contributions of the present paper are vague, even after the revision phase. The main contribution claimed is the introduction of the API, and that such an API / system is feasible. This is an extremely weak claim. A stronger claim would be if e.g., the present approach would advance the state of the art beyond an existing such framework, e.g., probabilistic programming, either conceptually or empirically. I want to particularly highlight probabilistic programming here, as it is mentioned by the authors - this is a well developed research area, with existing approaches and widely used tools. The authors dismiss this approach in their related work section, saying that probabilistic programming is ""specialized on working with distributions"". Many would see the latter as a benefit, so the authors should clearly motivate how their approach improves over these existing methods, and how it would enable novel uses or otherwise provide benefits. At the current stage, the paper is not ready for publication.",ICLR2019,4: The area chair is confident but not absolutely certain +r1F9mkpSG,1517250000000.0,1517260000000.0,203,rytNfI1AZ,rytNfI1AZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The paper presents a way of training 1bit wide resnet to reduce the model footprint while maintaining good performance. The revisions added more comparisons and discussions, which make it much better. Overall, the committee feels this work will bring value to the conference.",ICLR2018, +xSCEilcai4q,1610040000000.0,1610470000000.0,1,vYeQQ29Tbvx,vYeQQ29Tbvx,Final Decision,Accept (Poster),"The paper provides an astonishingly simple experiment: the parameters in the network are fixed, but only the parameters in the BatchNorm (taking less than 1% of the total number of parameters) are trained and also the last linear layer is trained. + The resulting networks provide better accuracies than training a random subset of the network. Another part of this work is the study of the effect of $\beta$ and $\gamma$ when doing full training. + +Pros: - All the reviewers agree this is an interesting and important observation. + - Contribution is clear and paper is well-written + - In future, better understanding of different parameters may + +Cons: A concern has been raised by one of the reviewers that it is more like a technical report + Some previous work which studies the effect of $\gamma$ was not mentioned. + +I think, the most interesting part is training only $\beta$ and $\gamma$. It will provide a ground for theoretical investigations of the properties of deep neural network models, and maybe lead to more efficient training algorithms. + + +",ICLR2021, +udC7ZDObalI,1610040000000.0,1610470000000.0,1,OHgnfSrn2jv,OHgnfSrn2jv,Final Decision,Accept (Poster),"This paper explores the Wasserstein natural gradient in the context of reinforcement learning. R5 rated the paper marginally below the acceptance threshold, but is not very confident about the correctness of his/her assessment. His/her main criticism was the experimental evaluation. This concern was shared by a confident R1. R1 found the paper well structured and that it contains encouraging empirical results, but low technical novelty and (initially) insufficient experiments. His/her initial recommendation was reject, but following an extensive discussion and improvements of the manuscript by the authors, he/she was more convinced about the empirical significance and applicability of the method, and raised his/her score to 6, indicating that the interpretation and presentation improved but that the paper might be interesting only to a moderate number of readers. A confident R2 found this paper very good, although only providing a short review. Two other unfinished or not sufficiently confident reports were not taken into account. Weighing the reports by contents, confidence, and participation in the discussion, the paper scores marginally above the acceptance threshold. In view of the authors' responses, I am discounting R5's criticism about lack of comparison with the PPO baseline. I personally consider the paper very well written, that it presents a natural and potentially useful application of the Wasserstein natural gradient to the context of reinforcement learning, and enjoyed the discussion of behavioral geometry. I am recommending a borderline accept. However, I also appreciate the concern of the referees about the limited technical innovation and how some of the strengths of the method could be presented more convincingly. Please take these comments carefully into consideration when preparing the final version of the paper. ",ICLR2021, +Nwnc8IkpPkA,1642700000000.0,1642700000000.0,1,QFNIpIrkANz,QFNIpIrkANz,Paper Decision,Reject,"This work addresses the issue of learning reward functions that overfit less/are invariant to irrelevant features of expert demonstrations. +The proposed algorithm builds on top of adversarial imitation learning (AIRL) and proposes to include a regularization principle that is based on invariant risk minimization. The proposed algorithm is evaluated both in grid worlds as well as continuous control tasks. Both zero-shot policy transfer, as well as transfer of the reward function to learn out-of-distribution tasks from scratch. + +**Strenghts** +This work is well motivated and addresses an important problem +The proposed method is well motivated, and provides theoretical foundations + +**Weaknesses** +The manuscript had many missing details/no appendix +only one baseline is provided, while many relevant IRL algorithms exist +The evaluation is very limited in actually evaluating the invariance properties of the learned reward function +poor alignment between how the proposed algorithm is motivated (learning invariant reward functions), and on what most of the experimental evaluation is focussed (zero-shot transfer of policy)**(more details on this below). + + +**Rebuttal** +the authors have updated the manuscript to include an appendix and were able to address most structural issues and provided many of the missing details. +No additional baselines were provided, and the experimental evaluation remains limited/poorly aligned with the initial motivation + +**Summary** +This manuscript addresses an important problem and proposes a promising algorithm. My major remaining concern is the experimental evaluation that seems not well aligned with the main contribution of this paper. As the authors state in their rebuttal the main supporting evidence for their claim is provided in Section 5.3, with only one set of experiments on using the reward function to learn policies on OOO tasks and very little analysis (< quarter of a page). While the majority of the evaluation (Section 5.2) is focussed on zero-shot transfer of the learned policy (which is trained during the IRL training phase). These zero-shot transfer experiments are not motivated in the context of the ""learning invariant reward functions"", so it's unclear what these results show. If these results are still relevant in showing that the proposed algorithm learns ""invariant rewards"", then this needs to be explained. Furthermore, more baselines would have been required (e.g algorithms that are focussed on learning a good policy by learning a ""pseudo""-reward - such as GAIL). +Because of this, my recommendation is that this manuscript is not quite ready yet for publication.",ICLR2022, +12P2ecUPw_,1576800000000.0,1576800000000.0,1,S1ecYANtPr,S1ecYANtPr,Paper Decision,Reject,"This paper proposes a method to allow models to generalize more effectively through the use of latent linear transforms. + +Overall, I think this method is interesting, but both R2 and R4 were concerned with the experimental evaluation being too simplistic, and the method not being applicable to areas where a good simulator is not available. This seems like a very valid concern to me, and given the high bar for acceptance to ICLR, I would suggest that the paper is not accepted at this time. I would encourage the authors to continue with follow-up experiments that better showcase the generality of the method, and re-submit a more polished draft to a conference in the near future.",ICLR2020, +cc_SYnxnz8,1610040000000.0,1610470000000.0,1,TTLwOwNkOfx,TTLwOwNkOfx,Final Decision,Reject,"This application paper applies hyperbolic convolutions in VAE learning to perform unsupervised 3D segmentation. +Addition of these components enables performance improvements in the unsupervised segmentation task. +Overall, the paper is borderline and the reviewers mention the limited novelty of the approach, which largely uses components that have been developed before. Even though the paper presents an application of these methods to a relevant and significantly more challenging task than prior work, I recommend rejection from ICLR due to concerns about novelty.",ICLR2021, +94V32Jp2Sy8,1610040000000.0,1610470000000.0,1,kuqBCnJuD4Z,kuqBCnJuD4Z,Final Decision,Reject,"The paper studies the benefit of having multiple servers (with partial coverage) in increase the training speed and latency in Federated Learning. Of course optimization/learning in the multi-server setting comes with a number of challenges which the authors seek to address via novel algorithmic procedures (e.g. FedMes). I believe the paper is suggesting an important, and potentially impactful, methodology to improve the training speedup/latency of FL. I also acknowledge the additional experiments provided by the reviewers which were quite helpful in addressing some of the concerns. However, as the paper mainly relies on experimental studies to evaluate the performance of the proposed methods, the reviewers (and myself) believe that the paper needs some more investigation in which (i) some of the assumptions (e.g. faster communication between the servers) are either removed or validated; and (ii) more complicated topologies are considered. ",ICLR2021, +3D4dUZ7ofx,1610040000000.0,1610470000000.0,1,PKubaeJkw3,PKubaeJkw3,Final Decision,Accept (Oral),"This paper proposes a new selection paradigm for selecting the optimal architecture in neural architecture search (NAS), in particular for methods that involve a one-shot model and that deploy gradient-based methods for the search. Basically, the paper focuses on examining the max selection very closely and found the magnitude of architecture weights are misleading. Instead, the paper proposes much more intuitive finalization step, pick the operator that has the largest drop in validation if the edge is removed. All reviewers agreed that the idea is interesting, the paper is well-written, and the results found in the paper are interesting. In addition, author response satisfactorily addressed most of the points raised by the reviewers, and most of them increased their original score. Therefore, I recommend acceptance.",ICLR2021, +uwARXPuaRKW,1642700000000.0,1642700000000.0,1,Oxeka7Z7Hor,Oxeka7Z7Hor,Paper Decision,Accept (Poster),"This paper presents a deep learning method that aims to address the curse-of-dimensionality problem of conventional convolutional neural networks (CNNs) by representing data and kernels with unconstrained ‘mixtures’ of Gaussians and exploiting the analytical form of the convolution of multidimensional Gaussian mixtures. Since the number of mixture components rapidly increases from layer to layer (after convolution) and common activation functions such as ReLU do not preserve the Gaussian Mixtures (GM), the paper proposes a fitting stage that fits a GM to the output of the transfer function and uses a heuristic to reduce the number of mixture components. Experiments are presented on MNIST (2d) and ModelNet10 (3D), which show competitive performance compared to other approaches such as classic CNNs, PointNet and PontNet++ methods. + +There is somewhat an overall consensus on the novelty of the proposed approach and its potential to pave the way for further research. There were, however, several issues raised by the reviewers in terms of clarity, memory footprint and computational cost that limits the applicability of the method to more complex datasets. While the authors expanded on the dense fitting in their comments and in the revised version of the paper, it still remains unclear the role of the negative weights, as the dense fitting stage seems to constrain all the weights to be positive. In terms of memory footprint, the authors refer to the theoretical footprint and their implementation does not match this. Finally, it is acknowledged by the authors that the computational cost is a limitation that hinders the method from achieving competitive performance in more complex tasks.",ICLR2022, +1cjSIaigOJ,1576800000000.0,1576800000000.0,1,BylUMxSFwS,BylUMxSFwS,Paper Decision,Reject,"The author propose a method to first learn policies for intrinsically generated goal-based tasks, and then leverage the learned representations to improve the learning of a new task in a generalized policy iteration framework. The reviewers had significant issues about clarity of writing that were largely addressed in the rebuttal. However, there were also concerns about the magnitude of the contribution (especially if it was added anything significant to the existing literature on GPI, successor features, etc), and the simplicity (and small number of) test domains. These concerns persisted after the rebuttal and discussion. Thus, I recommend rejection at this time.",ICLR2020, +2wdl3eKpFE,1576800000000.0,1576800000000.0,1,rJxG3pVKPB,rJxG3pVKPB,Paper Decision,Reject,"The paper considers the task of sequence to sequence modelling with multivariate, real-valued time series. +The authors propose an encoder-decoder based architecture that operates on fixed windows of the original signals. + +The reviewers unanimously criticise the lack of novelty in this paper and the lack of comparison to existing baselines. +While Rev #1 positively highlights human evaluation contained in the experiments, they nevertheless do not think this paper is good enough for publication as is. +The authors did not submit a rebuttal. + +I therefore recommend to reject the paper.",ICLR2020, +4F0GG70Lz6X,1642700000000.0,1642700000000.0,1,z-5BjnU3-OQ,z-5BjnU3-OQ,Paper Decision,Reject,"After the author response multiple reviewers remained concerned over the degree to which the current manuscript makes the case for the proposed hyper-network approach to text-to-image generation. It was felt that this was mainly an empirical paper for which the reviewers remain unconvinced that the proposed hyper-network based modulation was better than simple channel-wise StyleGAN2 style modulation. While the authors have shown that their approach beats a StyleGAN2 baseline with sentence conditioning on CLIP-R the reviewers felt that the comparisons with StyleGAN2 baseline needed fairer word conditioning. Only one reviewer recommended accepting this paper. + +The AC recommends rejection.",ICLR2022, +V75gI5Fh01b,1642700000000.0,1642700000000.0,1,Fia60I79-4B,Fia60I79-4B,Paper Decision,Reject,"The paper proposes a method for predicting stock market crises using a deep learning approach which combines time series stock market data with text from news articles. Their experiments show that the proposed method works better than the same model using only news or only stock price data, and a couple of deep learning baselines. All the reviewers pointed out that this paper is lack of novelty and significant technical contributions. The experiments are performed on a single dataset with incomplete baselines, and hence insufficient to support the claimed advantages of the proposed method. The writing quality is not up to the standards of an ICLR papers, with too many grammatical mistakes, typos, and unjustified arguments/claims. The clarity of the writing is poor. + +The authors did not provide their rebuttal.",ICLR2022, +SJxPSP9NlV,1545020000000.0,1545350000000.0,1,SyMDXnCcF7,SyMDXnCcF7,Interesting and surprising findings with a mean-field-theory analysis of batch normalization,Accept (Poster),"This paper provides a mean-field-theory analysis of batch normalization. First there is a negative result as to the necessity of gradient explosion when using batch normalization in a fully connected network. They then provide further insights as to what can be done about this, along with experiments to confirm their theoretical predictions. + +The reviewers (and random commenters) found this paper very interesting. The reviewers were unanimous in their vote to accept.",ICLR2019,4: The area chair is confident but not absolutely certain +SJO1azLOg,1486400000000.0,1486400000000.0,1,r1Chut9xl,r1Chut9xl,ICLR committee final decision,Reject,"The paper introduces a number of ideas / heuristics for learning and interpreting deep generative models of text (tf-idf weighting, a combination of using an inference networks with direct optimization of the variational parameters, a method for inducing context-sensitive word embeddings). Generally, the last bit is the most novel, interesting and promising one, however, I agree with the reviewers that empirical evaluation of this technique does not seem sufficient. + + Positive: + -- the ideas are sensible + -- the paper is reasonably well written and clear + + Negative + -- most ideas are not so novel + -- the word embedding method requires extra analysis / evaluation, comparison to other methods for producing context-sensitive embeddings, etc",ICLR2017, +5yya-prEFws,1642700000000.0,1642700000000.0,1,SN2bkl9f69,SN2bkl9f69,Paper Decision,Reject,"The reviewers' evaluation of this paper are borderline/negative. The AC considered the reviews, rebuttal, and the paper itself, and concurs with the reviewers. The AC found that the paper is an extension of previous work DM-GAN (DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis, CVPR 2019, https://arxiv.org/pdf/1904.01310.pdf). This work uses the word features in addition to sentence features at the first stage of generation, while DM-GAN and other previous work don’t use word features in the first stage, but use them in the later stages when the feature resolution is higher. The authors improve the dynamic memory in DM-GAN into spatial dynamic memory, and also change the image refinement process in DM-GAN into an iterative refinement. The proposed multi-tailed word-level initial generation, spatial dynamic memory, and iterative refinement are incremental changes to DM-GAN. Moreover, the proposed structure almost doubles the parameter size of DM-GAN (shown in Table 2), yet the evaluation results on COCO are similar to DM-GAN with only minor improvements. It is not clear whether the performance improvement comes from the increased number of parameters or the architecture design. Especially on the CUB dataset with limited number of images, the model can easily overfit with a larger number of parameters. The proposed method shares the similar network structure and dynamic memory blocks as DM-GAN, except for a few changes. Overall, the AC finds this paper not suitable for acceptance at ICLR in present form.",ICLR2022, +0yA7bAgTIz8,1642700000000.0,1642700000000.0,1,DFYtZFo_1u,DFYtZFo_1u,Paper Decision,Reject,"The paper proposes to compute local representations on device, which are then shared between clients using an alignment mechanism. Reviewers did appreciate the value of the topic and several contributions, but unfortunately consensus is that it remains below the bar, even after the discussion phase. Concerns remained on privacy and motivational positioning with FL, and lack of simpler baselines, even after the author feedback. + +We hope the detailed feedback helps to strengthen the paper for a future occasion.",ICLR2022, +1A8QyElo2dH,1642700000000.0,1642700000000.0,1,Rivn22SJjg9,Rivn22SJjg9,Paper Decision,Reject,"All reviewers unanimously recommending rejecting this submission and I concur with that recommendation. However, many reviewers were quite pleased with the premise and basic concept of the submission and would have liked to see a clearer version with a bit more in terms of experiments. + +I agree with the submission that the most interesting architecture search research is about the search space, not the search algorithm. +The submission uses measurements of the data Jacobian matrix at different points to construct an extended data Jacobian matrix that then is projected and serves as input to a contrastive embedding learning algorithm. The resulting architecture embeddings can be used for many different things, including architecture search. + +Ultimately, I am recommending rejecting this submission not because of one single overriding weakness, but because the totality of issues the reviewers raised make it clear the submission is not strong enough to publish in its current form. I encourage the authors to continue this line of work and produce a stronger submission in the future to ICLR or another venue.",ICLR2022, +B0hVAU-4p7R,1642700000000.0,1642700000000.0,1,vrW3tvDfOJQ,vrW3tvDfOJQ,Paper Decision,Accept (Spotlight),The reviewers unanimously appreciated the clarity of the work as well as the framing of the proposed method. Congratulations.,ICLR2022, +ByMbVJ6Sf,1517250000000.0,1517260000000.0,288,HJvvRoe0W,HJvvRoe0W,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper addresses an important application in genomics, i.e. the prediction of chromatin structure from nucleotide sequences. The authors develop a novel method for converting the nucleotide sequences to a 2D structure that allows a CNN to detect interactions between distant parts of the sequence. The reviewers found the paper innovative, interesting and convincing. Two reviewers gave a 7 and there was one 6. The 6, however, indicated during rather lengthy discussion that they were willing to raise their scores if their comments were addressed. Hopefully the authors will address these comments in the camera ready version. Overall a solid application paper with novel insights and technical innovation. + +",ICLR2018, +BygW9T-XlN,1544920000000.0,1545350000000.0,1,Hyg1Ls0cKQ,Hyg1Ls0cKQ,Concerns about clarity and writing of the paper,Reject,"Reviewers have expressed concerns about clarity/writing of the paper and technical novelty, which the authors haven't responded to. The paper is not suitable for publication at ICLR.",ICLR2019,5: The area chair is absolutely certain +rkldXLHexV,1544730000000.0,1545350000000.0,1,H1lPUiRcYQ,H1lPUiRcYQ,Borderline paper,Reject,"This paper proposes a neural network based method for computing committor functions, which are used to understand transitions between stable states in complex systems. +The authors improve over the techniques of Khoo et al. with a method to approximately satisfy boundary conditions and an importance sampling method to deal with rare events. +This is a good application paper, introducing a new application to the ML audience, but the technical novelty is a bit limited. The reviewers see value in the paper, however scaling w.r.t. dimensionality appears to be an issue with this approach.",ICLR2019,3: The area chair is somewhat confident +S7eA-_XVQGY,1642700000000.0,1642700000000.0,1,O476oWmiNNp,O476oWmiNNp,Paper Decision,Accept (Poster),"The paper analyses the frequency filtering properties of self-attention in vision architectures, shows that it mainly acts as a low-pass filter, and proposes fixes that allow to better preserve the higher frequencies. These fixes yield moderate classification accuracy gains (~0.5-1%) for several existing attention-based architectures. + +The reviewers are quite borderline about the paper, but after considering the authors' responses lean towards acceptance. Pros include interesting and novel analysis and sound model improvements leading to non-trivial empirical gains. The main con is that the experimental results are fine, but not outstanding. + +Overall, I recommend acceptance. Empirical results are indeed good but not outstanding, but the theoretical analysis is interesting and it is good to see that it leads to actionable insights on the model design side that actually help in practice - even is not by a huge amount. One part that in my opinion is confusing (and might have been confusing to the reviewers too) is that the title seems to suggest the paper will present very deep vision transformers while it does not. Adding deeper models or adjusting the title would help here.",ICLR2022, +ttCtX4RK-cA,1610040000000.0,1610470000000.0,1,fhcMwjavKEZ,fhcMwjavKEZ,Final Decision,Reject,"This work proposes a modification of a GNN architecture by feeding random node features to bootstrap the message propagation. This enables the discriminability of automorphic node pairs with a lightweight, simple change. Experiments are reported showing improvements over baselines. +Reviewers had mixed impressions of this work. On one hand, they found the proposed model principled and with strong empirical performance. On the other hand, they perceived a general lack of novelty and a somewhat misleading theoretical analysis. After careful review, the AC ultimately believes that this work does require an extra iteration that further solidifies the contributions and aligns the theoretical analysis with the empirical performance. In particular, the use of random initialization is folklore in the GNN literature, especially with regards to spectral methods (e.g. power iterations are typically initialized using a random vector, and these constitute the simplest forms of linear GNNs). The authors are encouraged to address these comparisons with further detail, as well as the excellent feedback given by the reviewers. ",ICLR2021, +PaUeEgYYFp,1576800000000.0,1576800000000.0,1,H1lBj2VFPS,H1lBj2VFPS,Paper Decision,Accept (Poster),"This paper considers the question of how to quantize deep neural networks, for processors operating on low-precision integers. The authors propose a methodology and have evaluated it thoroughly. The reviewers all agree that this question is important in practice, though there was disagreement about how novel a contribution this paper is specifically, and on its clarity. The clarity questions were resolved on rebuttal, so I lean to accepting the paper.",ICLR2020, +S13tnzLde,1486400000000.0,1486400000000.0,1,HJ1kmv9xx,HJ1kmv9xx,ICLR committee final decision,Accept (Poster),"The paper proposes a layered approach to image generation, ie starting by generating the background first, followed by generating the foreground objects. All three reviewers are positive, although not enthusiastic. The idea is nice, and the results are reasonable. Accept as poster. For the camera ready, the AC suggests making the generated images in the results larger, to allow the readers to fully appreciate their quality.",ICLR2017, +R4etEXlox4B,1610040000000.0,1610470000000.0,1,QubpWYfdNry,QubpWYfdNry,Final Decision,Accept (Poster),"The reviewers raised a number of concerns about the novelty of the paper and comparisons. The authors were able to address the concerns regarding the comparisons in the response, and the reviewers unanimously agree that the paper should be published. I do think however that this paper is quite borderline. I agree with the reviewers that the updated experiments are convincing in terms of the provided comparisons. However, the reservations I have about the work can perhaps best be stated as follows: There is quite a bit of work in the area of imitation from observations, which makes a range of different assumptions and utilizes a variety of different domain adaptation techniques. Much of this work is in the robotics domain (which is cited in the paper), and much of it demonstrates results in fairly realistic settings, often with real humans and real robots. In comparison, the experiments in this paper are quite simplistic, using toy domains and ""demonstrations"" obtained from a computational oracle (i.e., another policy). Given the maturity of this field and the current state of the art, I am skeptical of this evaluation, and I think TPIL is a very weak baseline. That said, I would defer to the reviewers in this case -- I do think the particular technical contributions that the paper makes are a valuable addition to the literature, though somewhat incremental. I am also sympathetic to the authors in that much of the more successful prior work in this area that does evaluate under realistic conditions makes subtly different assumptions, or utilizes different techniques for which it is difficult to provide an apples-to-apples comparison. + +One thing I would request of the authors for the camera ready though is: Please tone down the claims. ""Human-like 7 DOF Striker"" is not human-like, it's a crudely simulated robotic arm that was recolored. It would of course be better to have a realistic evaluation (as many prior papers in this field indeed have), but in the absence of that, it is best not to overclaim and be upfront that the evaluation is on relatively simple simulated tasks under conditions that are not necessarily realistic (and have nothing to do with actual humans), but meant rather to evaluate in an apples-to-apples manner the particular algorithmic innovations in the method.",ICLR2021, +45SZRyirvS,1576800000000.0,1576800000000.0,1,Syxwsp4KDB,Syxwsp4KDB,Paper Decision,Reject,"This paper proposes an abstractive text summarization model that takes advantage of lead bias for pretraining on unlabeled corpora and a combination of reconstruction and theme modeling loss for finetuning. Experiments on NYT, CNN/DM, and Gigaword datasets demonstrate the benefit of the proposed approach. + +I think this is an interesting paper and the results are reasonably convincing. My only concern is regarding a parallel submission that contains a significant overlap in terms contributions, as originally pointed out by R2 (https://openreview.net/forum?id=ryxAY34YwB). All of us had an internal discussion regarding this submission and agree that if the lead bias is considered a contribution of another paper this paper is not strong enough. + +Due to space constraint and the above concern, along with the issue that the two submissions contain a significant overlap in terms of authors as well, I recommend to reject this paper.",ICLR2020, +izj0ZRIyCDA,1610040000000.0,1610470000000.0,1,om1guSP_ray,om1guSP_ray,Final Decision,Reject,"This paper proposes a graph pooling mechanism based on adaptive edge scores that are then fed into a min-cut procedure. +Reviewers acknowledged that this is an important topic of study, but all agreed that the current manuscript does not provide enough evidence about the significance and novelty of their proposed approach. +The AC recommends rejection at this time, and encourages the authors to build from the reviewers feedback to improve upon their current work. ",ICLR2021, +RY6KFk3nBC,1576800000000.0,1576800000000.0,1,Hkla1eHFvS,Hkla1eHFvS,Paper Decision,Reject,"The paper provides a nice approach to optimizing marginals to improve exploration for RL agents. The reviewers agree that its improvements w.r.t. the state of the art do not merit a publication at ICLR. Furthermore, additional experimentation is needed for the paper to be complete.",ICLR2020, +rkFxLJTSM,1517250000000.0,1517260000000.0,712,r1Kr3TyAb,r1Kr3TyAb,ICLR 2018 Conference Acceptance Decision,Reject,"Two of the reviewers liked the intent of the paper -- to analyze gradient flow in residual networks and understand the tradeoffs between width and depth in such networks. However, all reviewers flagged a number of problems in the paper, and the authors did not participate in the discussion period. + +Pros: ++ Interesting analysis suggests wider, shallower ResNets should outperform narrower, deeper ResNets, and empirical results support the analysis. + +Cons: +- Independence assumption on weights is not valid after any weight updates. +- The notation is not as clear as it should be. +- Empirical results would be more convincing if obtained on several tasks. +- The architecture analyzed in the paper is not standard, so it isn't clear how relevant it is for other practitioners. +- Analysis and paper should take into account other work in this area, e.g. Veit et al., 2016 and Schoenholz et al., 2017. +",ICLR2018, +SkwiHyaHf,1517250000000.0,1517260000000.0,642,SksY3deAW,SksY3deAW,ICLR 2018 Conference Acceptance Decision,Reject,"All three reviewers felt that the paper was just below the acceptance threshold, with scores of 5,4,5. R1 felt there were problems in the proofs, but the authors rebuttal satisfactorily addressed this. R3 and the authors had an extended discussion with the authors, but did not revise their score from its initial value (5). R4 had concerns about the experimental evaluation, that wasn't fully addressed in the rebuttal. With no reviewers advocating acceptance, the paper will have to rejected unfortunately. ",ICLR2018, +LFTe5E3kwi,1576800000000.0,1576800000000.0,1,HJgC60EtwB,HJgC60EtwB,Paper Decision,Accept (Poster),"The authors provide a framework for improving robustness (if the model of the dynamics is perturbed) into the RL methods, and provide nice experimental results, especially in the updated version. I am happy to see that the discussion for this paper went in a totally positive and constructive way which lead to a) constructive criticism of the reviewers b) significant changes in the paper c) corresponding better scores by the reviewer. Good work and obvious accept.",ICLR2020, +_SR8Jk5Rh,1576800000000.0,1576800000000.0,1,rygG7AEtvB,rygG7AEtvB,Paper Decision,Reject,"The paper presents an algorithm to compute mixed-strategy Nash equilibria for continuous action space games. While the paper has some novelty, reviewers are generally unimpressed with the assumptions made, and the quality of the writing. Reviewers were also not swayed by the responses from the authors. Additionally, it could be argued that the paper is somewhat peripheral to the topic of the conference.¨ + +On balance, I would recommend reject for now; the paper needs more work.",ICLR2020, +5KIwNomVcC,1576800000000.0,1576800000000.0,1,H1eJAANtvr,H1eJAANtvr,Paper Decision,Reject,"This paper proposes an approach to handle the problem of unsmoothness while modeling spatio-temporal urban data. However all reviewers have pointed major issues with the presentation of the work, and whether the method's complexity is justified. ",ICLR2020, +Hyg_T-vggV,1544740000000.0,1545350000000.0,1,S1lTg3RqYQ,S1lTg3RqYQ,New technique for semantic consistency for transfer across heterogenous domains with preliminary empirical evidence,Accept (Poster),"This paper proposes an image to image translation technique which decomposes into style and content transfer using a semantic consistency loss to encourage corresponding semantics (using feature masks) before and after translation. Performance is evaluated on a set of MNIST variants as well as from simulated to real world driving imagery. + +All reviewers found this paper well written with clear contribution compared to related work by focusing on the problem when one-to-one mappings are not available across two domains which also have multimodal content or sub-style. + +The main weakness as discussed by the reviewers relates to the experiments and whether or not the set provided does effectively validate the proposed approach. The authors argue their use of MNIST as a toy problem but with full control to clearly validate their approach. Their semantic segmentation experiment shows modest performance improvement. Based on the experiments as is and the relative novelty of the proposed approach, the AC recommends poster and encourages the authors to extend their analysis of the current results in a final version.",ICLR2019,4: The area chair is confident but not absolutely certain +IJk775vK3vF,1610040000000.0,1610470000000.0,1,pBDwTjmdDo,pBDwTjmdDo,Final Decision,Reject,"The paper proposes Fourier temporal state embedding, a new technique to embed dynamic graphs. However, the paper needs to be improved in writing, computational complexity analysis, and more thorough baseline comparisons.",ICLR2021, +aqoNwZSQh75,1610040000000.0,1610470000000.0,1,x6x7FWFNZpg,x6x7FWFNZpg,Final Decision,Reject,"The reviews were a bit mixed: on one hand, by combining and adapting existing techniques the authors obtained some interesting new results that seem to complement existing ones; on the other hand, there is some concern on the novelty and on the interpretation of the obtained results. Upon independent reading, the AC agrees with the reviewers that this paper's presentation can use some polishing. (The revision that the authors prepared has addressed some concerns and improved a lot compared to the original submission.) Overall, the analysis is interesting but the significance and novelty of this work require further elaboration. In the end, the PCs and AC agreed that this work is not ready for publication at ICLR yet. Please do not take this decision as an under-appreciation of your work. Rather, please use this opportunity to consider further polishing your draft according to the reviews. It is our belief that with proper revision this work can certainly be a useful addition to the field. + +Some of the critical reviews are recalled below to assist the authors' revision: + +(a) The result in Theorem 4.1 needs to be contrasted with a single machine setting: do we improve the convergence rate in terms of T here? do we improve the constants in terms of L and M here? What is the advantage one can read off from Theorem 4.1, compared to a single machine implementation? How should we interpret the dependence of (optimal) H on r and lambda_2? + +(b) The justification for $T \geq n^4$ is a bit weak and requires more thoughts: one applies distributed SGD because n is large. What happens if T does not satisfy this condition in practice, as in the experiments? + +(c) Extension 1 perhaps should be more detailed as its setting is much more realistic than Theorem 1. One could use Theorem 1 to motivate and explain some high level ideas but the focus should be on Extension 1-3. In extension 2, the final bound seems to be exactly the same as in Theorem 1, except a new condition on T. Any explanations? Why asynchronous updates only require a larger number of interactions but retain the same bound? These explanations would make the obtained theoretical results more accessible and easier to interpret.",ICLR2021, +SyV6VJ6HM,1517250000000.0,1517260000000.0,451,ryacTMZRZ,ryacTMZRZ,ICLR 2018 Conference Acceptance Decision,Reject,"R1 was neutral on the paper: they liked the problem, simplicity of the approach, and thought the custom pooling layer was novel, but raised issues with the motivation and design of experiments. R1 makes a reasonable point that training a CNN to classify time series, then throw away the output layer and use the internal representation in 1-NN classification is hard to justify in practice. +Results of the reproducibility report were good, though pointed out some issues around robustness to initialization and hyper-parameters. R2 gave a very strong score, though the review didn’t really expound on the paper’s merits. R3 thought the paper was well written but also sided with R1 on novelty. Overall, I side with R1 and R3. Particularly with respect to the practicality of the approach (as pointed out by both these reviewers). I would feel differently if the metric was used in another application beyond classification.",ICLR2018, +wNojVQwS2y,1610040000000.0,1610470000000.0,1,Oos98K9Lv-k,Oos98K9Lv-k,Final Decision,Accept (Spotlight),"The reviewers unanimously agreed that this is an interesting paper that belongs at ICLR. The use of optimal transport in neural topic models is novel and the paper is well-written. + +A common theme among the reviewers was that they would like to see more intuition and justification. I suggest you bear this in mind while editing the final version of the paper. I also believe that R3 brings up valid points about evaluating perplexity -- I don't think the lack of perplexity results are a reason to reject the paper, but I believe they can be calculated here (see eg the reference R3 provided) and they would give a clearer view of the model's performance. +",ICLR2021, +sQd4-vB4HB,1610040000000.0,1610470000000.0,1,FZ1oTwcXchK,FZ1oTwcXchK,Final Decision,Accept (Poster),"The work tackles the task to convert an artificial neural networks (ANN) to a spiking neural network (SNN). The topic is potentially important for energy-efficient hardware implementations of neural networks. There is already quite some literature available on this topic. +Compared to these, the manuscript exhibits a number of strong contributions: It presents a theoretical analysis of the conversion error and consequently arrives at a principled way to reduce the conversion error. The authors test the performance of the conversion on a number of challenging data sets. Their method achieves excellent performances with reduced simulation time / latency (usually, in order to achieve comparable performance to ANNs, one needs to run the SNN for many simulated time steps- this simulation time is reduced by their model). +One reviewer criticized that the article was hard to read, but this opinion was not shared by other reviewers and the authors have improved the readability in a revision. + +In summary, I believe that this manuscript presents a very good contribution to the field.",ICLR2021, +0kq4NU5cg4K,1642700000000.0,1642700000000.0,1,K-hiHQXEQog,K-hiHQXEQog,Paper Decision,Reject,"The submission proposes to learn a causal transformer model over a pretrained VQ-GAN representation to generate videos. While the paper is well written and clear, proposing a simple idea, the novelty of this method is not well explained compared to pre-existing publications see ([reviews JjT4](https://openreview.net/forum?id=K-hiHQXEQog¬eId=22EyPvpodh), [3gJ8](https://openreview.net/forum?id=K-hiHQXEQog¬eId=9ikn_nBC_Sf), [6x6m](https://openreview.net/forum?id=K-hiHQXEQog¬eId=zCX9VP8I5uL), and [pKCo](https://openreview.net/forum?id=K-hiHQXEQog¬eId=uRgHWX4C5yX)) especially since it's lacking [ablation](https://openreview.net/forum?id=K-hiHQXEQog¬eId=uRgHWX4C5yX) or comparative (see [reviews 6x6m](https://openreview.net/forum?id=K-hiHQXEQog¬eId=zCX9VP8I5uL), and[pKCo](https://openreview.net/forum?id=K-hiHQXEQog¬eId=uRgHWX4C5yX)) experiments. +The authors have expressed their consideration of reviewers requests but will not satisfy them in time for this conference. Therefore I am currently recommending this submission for rejection.",ICLR2022, +QclL6iS1Qgq,1642700000000.0,1642700000000.0,1,6g4VoBTaq6I,6g4VoBTaq6I,Paper Decision,Reject,"This paper proposes a new class of divergences that are also sensitive to the variance of the estimator. The proposed additional variance penalty term introduces a bias term and acts directly on each component of the statistical estimator. By choosing the penalty parameter one can trade bias versus variance. + +The results on synthetic examples look promising and suggest that with this technique, it is feasible to decrease the estimation error relative to the baseline statistical estimator. This is demonstrated to be particularly pronounced for certain Renyi divergences in the large order parameter alpha regime. Two applications (detection of subpopulations and disentangled representation learning in speech) are provided. + +The opinions about the work were fairly divided. Both positive reviews have lower confidence and are rather short and do not fully justify the high rating. Two high confidence reviews are negative and raise several critical points. In a nutshell, reviewer JtE9 complains mainly about the insufficient experimental evaluation while 3SLV raises several concerns regarding readability, mathematical notation, lacking details of the proofs, as well as technicalities regarding the consistency. The authors have partially answered the concerns. + +While there seems to be a consensus that the paper is interesting and makes a valid contribution, the introduction of the VP term defines a new estimation problem and both the choice and interpretation of lambda becomes critical. In particular, the key question is understanding the effect of lambda for various tasks where divergence estimation is crucial and I am not fully convinced if the chosen applications are the best for convincingly demonstrating the utility as these require somewhat application specific motivation. I would rather see result of standard benchmark datasets (such as estimating the divergence between two subsets of MNIST images to detect subtle distribution shifts). + +The synthetic experiments are good but this section could be improved as well to get the message accross. Rather than delving directly to the findings, this section could first justify what needs to be measured and what are the control variables (number of samples, Renyi order etc) + +In light of the comments raised by the reviewers, I feel that this paper can benefit from a further iteration and clarification of the experimental section before being accepted to a venue like ICLR.",ICLR2022, +jCnKEzykEE,1576800000000.0,1576800000000.0,1,HklCk1BtwS,HklCk1BtwS,Paper Decision,Reject,"The paper studies word embeddings using the matrix factorization framework introduced by Levy et al 2015. The authors provide a theoretical explanation for how the hyperparameter alpha controls the distance between words in the embedding and a method to estimate the optimal alpha. The authors also provide experiments showing the alpha found using their method is close to the alpha that gives the highest performance on the word-similarity task on several datasets. + +The paper received 2 weak rejects and 1 weak accept. The reviews were unchanged after the rebuttal, with even the review for weak accept (R2) indicating that they felt the submission to be of low quality. Initially, reviewers commented that while the work seemed solid and provided insights into the problem of learning word embeddings, the paper needed to improve their positioning with respect to prior work on word embeddings and add missing citations. In the revision, the authors improved the related work, but removed the conclusion. + +The current version of the paper is still low quality and has the following issues +1. The paper exposition still needs improvement and it would benefit from another review pass +Following R3's suggestions, the authors have made various improvements to the paper, including modifying the terminology and contextualizing the work. However, as R3 suggests, the paper still needs more rewriting to clearly articulate the contribution and how it relates to prior work throughout the paper. In addition, the conclusion was removed and the paper still needs an editing pass as there are still many language/grammar issues. + +Page 5: ""inherites"" -> ""inherits"" +Page 5: ""top knn"" -> ""top k"" + +2. More experimental evaluation is needed. +For instance, R1 suggested that the authors perform additional experiments on other tasks (e.g. NER, POS Tagging). The authors indicated that this was not a focus of their work as other works have already looked at the impact of alpha on other task. While prior works has looked at the correlation of alpha vs performance on the task, they have not looked at whether alpha estimated the method proposed by the author will give good performance on these tasks as well. Including such analysis will make this a stronger paper. + +Overall, there are some promising elements in the paper but the quality of the paper needs to be improved. The authors are encouraged to improve the paper by adding more experimental evaluation on other tasks, improving the writing, as well as incorporating other reviewer comments and resubmit to an appropriate venue. +",ICLR2020, +r1Gv8kTBf,1517250000000.0,1517260000000.0,799,H1wt9x-RW,H1wt9x-RW,ICLR 2018 Conference Acceptance Decision,Reject,"The paper proposes iterative training strategies for learning teacher and student models. They show how iterative training can lead to interpretable strategies over joint training on multiple datasets. All the reviewers felt the idea was interesting, although, one of the reviewers had concerns about the experimentation. + +However, there is a BIG problem with this submission. The author names appear in the manuscript thus disregarding anonymity.",ICLR2018, +NJVhTfSZP7-,1642700000000.0,1642700000000.0,1,30nbp1eV0dJ,30nbp1eV0dJ,Paper Decision,Reject,"This paper gives sample complexity lower bounds for differentially private empirical risk minimization (ERM). While the reviewers agreed that the results are non-trivial, the general consensus was that the proofs are tweaks of previously developed techniques and that the main result is actually new in a rather narrow setting (specifically, for unconstrained ERM and sub-constant error parameter). Another concern was that one of the proofs (the one on pure differential privacy) was incorrect in the submission; a different proof was provided subsequently (which also closely follows prior work). Finally, the reviewers pointed out several issues with the clarity of the presentation and comparison to prior work. Given the above, this work is below the acceptance threshold.",ICLR2022, +CCgLjsgCJV,1610040000000.0,1610470000000.0,1,Qr0aRliE_Hb,Qr0aRliE_Hb,Final Decision,Accept (Poster),This paper proposes a simple yet effective approach for determining weight quantization bit lengths using RL. All the reviewers agree that the simplicity and performance improvements are a strong plus point. There are some concerns on applicability which have been sufficiently handled by rebuttal. AC recommends accepting the paper.,ICLR2021, +XddFBEQEs0,1576800000000.0,1576800000000.0,1,Sylgsn4Fvr,Sylgsn4Fvr,Paper Decision,Accept (Poster),"The paper proposes a black box algorithm for MRF training, utilizing a novel approach based on variational approximations of both the positive and negative phase terms of the log likelihood gradient (as R2 puts it, ""a fairly creative combination of existing approaches""). + +Several technical and rhetorical points were raised by the reviewers, most of which seem to have been satisfactorily addressed, but all reviewers agreed that this was a good direction. The main weakness of the work is that the empirical work is very small scale, mainly due to the bottleneck imposed by an inner loop optimization of the variational distribution q(v, h). I believe it's important to note that most truly large scale results in the literature revolve around purely feedforward models that don't require expensive to compute approximations; that said, MNIST experiments would have been nice. + +Nevertheless, this work seems like a promising step on a difficult problem, and it seems that the ideas herein are worth disseminating, hopefully stimulating future work on rendering this procedure less expensive and more scalable.",ICLR2020, +YK-TlboJ1Y,1642700000000.0,1642700000000.0,1,FpnQMmnsE8Y,FpnQMmnsE8Y,Paper Decision,Reject,"Meta Review for Recurrent Parameter Generators + +This work investigates a method for reducing the parameters of a deep CNN by having a recurrent parameter generator (RPG) produce the weights, in effect achieving this compression via parameter sharing across layers (similar to earlier works, such as the 2016 Hypernetworks paper, as discussed in between xUeP and the author during the review period). But unlike previous work, this work conducts extensive empirical experiments on classification and even pose estimate tasks, and proposes an additional method, such as the use of pseudo-random seed to perform element-wise random sign reflection in the weight sharing. The novelty and experimental results are clearly displayed in this work, and shows a lot of promise, but after much discussion, I currently cannot recommend acceptance for ICLR 2022. + +In my assessment, and also looking at reviewers and discussion, I believe this work is a great workshop paper at present, but there are a few items that would make it much stronger. There are outstanding issues in the paper that need to be improved. In particular, during discussions, reviewers noted that the paper has a problem with the design and presentation of the experiments. It somehow shifts the reader’s focus to the compression task (3 of the 4 reviewers raised concerns about the compression performance and questioned the baselines). In their rebuttal, the authors emphasized that their contribution is not limited to compression but is rather more fundamental, and the authors propose an approach for understanding the relationship between the model DoF and the network performance. But if that's the main narrative of the paper, rather than the compression aspects, the authors need to clearly articulate why decoupling the DoF from the underlying architecture is advantageous (and also make the narrative more clear in the writing). While there are novel innovations in the method proposed, the authors also need to explain clearly why their method works well, why the even weight assignment and random sign flipping are so effective? + +There is discussion between the authors and reviewers about what constitutes vector quantization, and I believe the author has clarified their position effectively (with regard to cgCS's review), and I believe this will be explained in great clarity in future revisions. But even with that disagreement out of the way, we still believe that this work needs improvement to meet the bar of ICLR 2022. Reviewers, including myself, do acknowledge the novelty and are excited about the method proposed, and we look forward to seeing an updated version of this work published or presented at a future journal or conference. Good luck!",ICLR2022, +DDHtXG-FBeB,1610040000000.0,1610470000000.0,1,HHSEKOnPvaO,HHSEKOnPvaO,Final Decision,Accept (Spotlight),"This paper presents an interesting idea for task-free continual learning, which makes use of random graphs to represent relational structures among contextual and target samples. The reviewers agreed that the technical idea is novel, the experiments are extensive and the presentation is good. The authors addressed the reviewers' concerns in the rebuttal. I recommend to accept.",ICLR2021, +FQBI6-dLOz,1610040000000.0,1610470000000.0,1,eQe8DEWNN2W,eQe8DEWNN2W,Final Decision,Accept (Poster),"Many papers have been written on calibrating neural networks recently. This paper presents a definition of calibration that is more robust than the popular ECE measure while also being more discerning than the Brier score. Then it proposes a practical spline-based method of post-editing the output softmax scores to make them more calibrated. The method is shown to be better than existing methods both on their measure and established measure (thanks to reviewer's questions on that.). +The paper should be of much interest to the community.",ICLR2021, +u1tyvsYD0pN,1610040000000.0,1610470000000.0,1,RVANVvSi8MZ,RVANVvSi8MZ,Final Decision,Reject,"The paper proposed a GNN model based on a weighted line graph (dual of the input graph), where information is simultaneously propagated on both graphs, coupling the two propagations at each step. + +Overall, the reviewers were lukewarm about the paper, with some raised criticism including +- limited novelty in light of Monti et al. 2018 +- limited theoretical justification +- unconvincing and incomplete experiments, not offering significant improvement compared to other alternatives + +While the presented approach is interesting, we believe the paper is below the bar and recommend Rejection. +",ICLR2021, +WvBRIdbi07i,1610040000000.0,1610470000000.0,1,1Jv6b0Zq3qi,1Jv6b0Zq3qi,Final Decision,Accept (Poster),"The authors design a framework to estimate the uncertainties in the predictions of gradient boosting models, for both classification and regression. The framework contains several methods, some that use sub-sampling on data to calculate the estimation, and some that use sub-sampling on the trees within one single gradient boosting model (i.e. virtual ensemble) to calculate the estimation. The different methods reveal the trade-off between faster calculation and good uncertainty estimation. The authors conduct extensive empirical study to demonstrate the validity of the designed framework. + +The reviewers agree that the paper is well-written on a very important topic of machine learning in practice. The authors have done a great job addressing the comments from the reviewers, including the comparison to random forest, and adding more motivating examples. The reviewers believe that the work marks a good starting point for addressing this important topic. Nevertheless, the reviewers have some concerns that the results are promising but not impressive yet, and the performance of the virtual ensemble is a bit discouraging. +",ICLR2021, +aXqnbSq4nCN,1610040000000.0,1610470000000.0,1,R7aFOrR0b2,R7aFOrR0b2,Final Decision,Reject,"The authors empirically analyse the properties of datasets which lead to poor calibration. In particular, they show that high class imbalance, high degree of label noise, and small dataset size are all likely to lead to poor overall calibration or poor per-class calibration. While there are some interesting insights in this work, the reviewers argued that the contribution is not substantial enough for ICLR. To improve the manuscript the authors should consider accuracy and calibration jointly and extend the results pertaining to label noise which were appreciated by the reviewers. For the former, the same conclusions hold for accuracy, instead of calibration, which raises the question of their relationship -- is there a tradeoff? For the latter, the reviewers pointed to a concrete extension with structured label noise. Finally, the theoretical analysis is a step in the right direction, but the assumption on the width of the network required to fit the training set is too restrictive in practice. Therefore, I will recommend rejection.",ICLR2021, +9GF0diZZaVd,1610040000000.0,1610470000000.0,1,9Y7_c5ZAd5i,9Y7_c5ZAd5i,Final Decision,Reject,"The reviewers, AC, and PCs participated in a very thorough discussion. AC ultimately felt that the work was unfinished, and in particular that details in the proofs still needed work before publication. + +",ICLR2021, +k3K0KJkT-j,1576800000000.0,1576800000000.0,1,rkxZyaNtwB,rkxZyaNtwB,Paper Decision,Accept (Spotlight),"This is a mostly theoretical paper concerning online and stochastic optimization for convex loss functions that are not Lipschitz continuous. The authors propose a method for replacing the Lipschitz continuity condition with a more general Riemann-Lipschitz continuity condition, under which they are able to provide regret bounds for the online mirror descent algorithm, as well as extending to the stochastic setting. They follow up by evaluating their algorithm on Poisson inverse problems. + +The reviewers all agree that this is a well-written paper that makes a clear contribution. To the best of our knowledge, the theory and derivations are correct, and the authors were highly responsive to reviewers’ (minor) comments. I’m therefore happy to recommend acceptance.",ICLR2020, +rygo3IdWx4,1544810000000.0,1545350000000.0,1,r1xwS3RqKQ,r1xwS3RqKQ,"Rejection, reviewer concerns not addressed",Reject,"The reviewers unanimously agreed the paper did not meet the bar of acceptance for ICLR. They raised questions around the technical correctness of the paper, as well as the experimental setup. The authors did not address any reviewer concerns, or provide any response. Therefore, I recommend rejection.",ICLR2019,5: The area chair is absolutely certain +3RxM_EGaLz,1576800000000.0,1576800000000.0,1,H1gz_nNYDS,H1gz_nNYDS,Paper Decision,Reject,"The paper presents a simple one-shot approach on searching the number of channels for deep convolutional neural networks. It trains a single slimmable network and then iteratively slim and evaluate the model to ensure a minimal accuracy drop. The method is simple and the results are promising. + +The main concern for this paper is the limited novelty. This work is based on slimmable network and the iterative slimming process is new, but in some sense similar to DropPath. The rebuttal that PathNet ""has not demonstrated results on searching number of channels, and we are among the first few one-shot approaches on architectural search for number of channels"" seem weak.",ICLR2020, +KCp_X0kZfN,1642700000000.0,1642700000000.0,1,k7efTb0un9z,k7efTb0un9z,Paper Decision,Accept (Poster),"### Description +The paper develops a new automatic scheduler to schedule the learning rate during the training. The scheduler has access to the current training state summarized by certain statistics of weights and gradients in all layers and the loss history. It is trained by reinforcement learning with the reward derived from the progress with respect to the performance measure, such as validation accuracy. The key innovations are the design of the state vector using graph convolutional neural networks and empirical improvements to the reward function. The main claim is that GCNs allow to take into account the architecture of the network to be trained and the state of all layers, which, authors hypothesize and demonstrate experimentally, improves performance and transferability of the scheduler across networks and tasks. + +### Decision +The reviewers recommendations after the rebuttal settled on 4 x ""marginally above"" and one ""accept"". Respectively, I recommend to accept. I recommend a poster based on the reception by reviewers: the novelty was assessed as limited because the idea of an automatic schedulers based on RL with a similar learning strategy belongs to the prior work. On the strong side, the paper satisfied all requests by reviewers for the experiments, regarding alternative methods, large datasets and ablation studies demonstrating that it is indeed the new architecture that allows to achieve a significant improvement, making it a solid improvement step. Amongst alternative methods the paper considers all viable alternatives: a function-based schedulers, a hyper-gradient method, and a the RL based scheduler, optimized in hyperparameters. + +### Discussion +There was no significant non-public discussion. As an additional feedback, let me just share my observations. + +What is somewhat unclear in that the paper starts by discussing the directed graph of a feed-forward network, then it proposes to run GCN on it, which is undirected. Then the hierarchical method is proposed, which runs GCN on each block sequentially while taking the aggregated input from the preceding block. This makes it a directed processing method on the level of block. I wonder whether the directed processing is desired or not desired here? Can a sequential processing summarize the network state efficiently on its own, similar to feedforward propagation, without the global averaging proposed? + +It was not very clear to me from the paper what is the meaning of the batch size for training the GSN, and respectively what the batch normalization is doing there. + +From some 100 mile perspective, it seems to me that whatever efficient optimization can be performed on the validation set, it helps. It does not matter so much what is varied: the learning rate schedule, other hyperparameters or even the network architecture. So in a sense it is not surprising that one can improve. What is more interesting is that the learned schedulers are generalizable / transferable, as demonstrated in Section 5.4 (changing the architecture or going from CIFAR to ImageNet while keeping the scheduler). + +The work has done quite a lot on the experimental side with the baseline GCN model that they proposed. It seems to have still lots of potential via different possible enhancements. For example, what authors mentioned, including exponentially weighted running averages of gradients and squared gradients into the features. They already tried GAT instead of GCN as proposed by reviewers. There may be hyperparameters other than those controlling the learning rate. Some such hyperparameters, e.g. momentum, are apparently tightly coupled with the learning rate. The paper does not discuss how to tune them together with GNS. It could be a difficulty. On the other hand, they can be potentially scheduled with the same GNS. + +When considering SGD with momentum (which is not used in the paper), please note that the common use of a momentum parameter $\mu$ actually mixes the learning rate together with the smoothing parameter controlling the exponentially weighted averaging. So if one wants to control the learning rate alone, it is better to implement the gradient smoothing is done in .e.g. Adam, with its hyperparameter $\beta_1$.",ICLR2022, +H1li3Y4-gV,1544800000000.0,1545350000000.0,1,HJE6X305Fm,HJE6X305Fm,Good practical approach to stabilise GAN training,Accept (Poster),The paper provides a simple method for regularising and robustifying GAN training. Always appreciated contribution to GANs. :-),ICLR2019,4: The area chair is confident but not absolutely certain +aCS4TLW_csZ,1610040000000.0,1610470000000.0,1,7ehDLD1yoE0,7ehDLD1yoE0,Final Decision,Reject,"The paper gives a gradient-free method for generating adversarial examples for the code2seq model of source code. + +While the reviewers found the high-level objectives interesting, the experimental evaluation leaves quite a bit to be desired. (Please see the reviews for more details.) As a result, the paper cannot be accepted in the current form. We urge the authors to improve the paper along the lines that the reviews suggest and resubmit to a different venue.",ICLR2021, +S1gyUkpBG,1517250000000.0,1517260000000.0,691,SJ60SbW0b,SJ60SbW0b,ICLR 2018 Conference Acceptance Decision,Reject,"The proposed LAN provides a visualization of the selectivity of networks to its inputs. It takes a trained network as golden target and estimates an LAN to predict masks that can be applied on inputs to generate the same outputs. +But the significance of the proposed method is unclear, ""what is the potential usage of the model?"". Empirical justification of that would make it stronger. ",ICLR2018, +sdCEWQY5b,1576800000000.0,1576800000000.0,1,Hkxvl0EtDH,Hkxvl0EtDH,Paper Decision,Reject,"This paper attempts to present a causal view of robustness in classifiers, which is a very important area of research. +However, the connection to causality with the presented model is very thin and, in fact, mathematically unnecessary. Interventions are only applied to root nodes (as pointed out by R4) so they just amount to standard conditioning on the variable ""M"". The experimental results could be obtained without any mention to causal interventions.",ICLR2020, +CUaVYR84b7,1576800000000.0,1576800000000.0,1,SJx4O34YvS,SJx4O34YvS,Paper Decision,Reject,"This paper describes a method for generating adversarial examples from images and text such that they maintain the semantics of the input. + +The reviewers saw a lot of value in this work, but also some flaws. The review process seemed to help answer many questions, but a few remain: there are some questions about the strength of the empirical results on text after the author's updates. Wether the adversarial images stay on the manifold is questioned (are blurry or otherwise noisy images ""on manifold""?). One reviewer raises good questions about the soundness of the comparison to the Song paper. + +I think this review process has been very productive, and I hope the authors will agree. I hope this feedback helps them to improve their paper.",ICLR2020, +HkeDl48bxE,1544800000000.0,1545350000000.0,1,HJfxbhR9KQ,HJfxbhR9KQ,"Nice work with potential, but contributions need to be strengthened",Reject,"The paper proposes an interesting idea for more effective imitation learning. The idea is to include short actions sequences as labels (in addition to the basic actions) in imitation learning. Results on a few Atari games demonstrate the potential of this approach. + +Reviewers generally like the idea, think it is simple, and are encouraged by its empirical support. That said, the work still appears somewhat preliminary in the current stage: (1) some reviewer is still in doubt about the chosen baseline; (2) empirical evidence is all in the similar set of Atari games --- how broadly is this approach applicable?",ICLR2019,4: The area chair is confident but not absolutely certain +IZpKdERNvF,1610040000000.0,1610470000000.0,1,BvrKnFq_454,BvrKnFq_454,Final Decision,Reject,"The paper proposes a new adaptive optimization algorithm which is claimed to have better convergence properties and lower susceptibility to gradient variance. Reviewers found the idea of normalizing on the fly to be interesting, but raised some important concerns. Although similar to AdaGrad, Expectigrad has a very important differentiation due to division by $n_t$. Assuming $\beta=0$ in my opinion is also ok and many papers assume this for analysis. Even after accounting for these two facts, during discussions the reviewers considered the work to be incremental and a more thorough evaluation is needed to determine the benefits of algorithm. Specifically, please compare to important and relevant baselines (like AdamNC and Yogi), because sometimes it felt like baselines were picked and dropped randomly. The empirical improvement provided by Expectigrad compared to SOTA is not clear (both on synthetic problems from Reddi et al and real problems). Thus, unfortunately, I cannot recommend an acceptance of the paper in the current form. However, I would strongly encourage authors to resubmit after improving according to reviewer suggestions. + +Some other minor points that came up during discussion are: +1. choice of hyperparameters was not clear to reviewers, e.g. different optimizer may behave very differently for same set of hyperparameters, so it would not be fair to compare them as is. +2. gradients would never be exactly zero in deep networks, so is current definition of $n_t$ good enough?",ICLR2021, +XsT-XMrh43p,1610040000000.0,1610470000000.0,1,nQxCYIFk7Rz,nQxCYIFk7Rz,Final Decision,Reject,"While there was some interest in the analysis, the consensus view was that the original treatment was not sufficiently well-motivated, and the revision was too dissimilar from the original submission for it to be evaluated for publication in this year's ICLR.",ICLR2021, +stCt_NRtjHo,1610040000000.0,1610470000000.0,1,tY38nwwdCDa,tY38nwwdCDa,Final Decision,Reject,"The paper introduces an augmentation technique that, given an image with a detected object, keeps the object and removes the background. + +The reviewers expressed numerous valid concerns about the paper's novelty, the setting (assumption that there's a single object), the scalability of the approach and the experimental setup, including the baselines used. + +The authors have not addressed these concerns.",ICLR2021, +elXOHw25cAt,1610040000000.0,1610470000000.0,1,jYkO_0z2TAr,jYkO_0z2TAr,Final Decision,Reject,"The submission proposes to leverage a commonsense knowledge graph and an attention GNN based model to aggregate the node features on the graph for the problem of zero-shot learning. It received three reviews two of them recommending rejection, and another review was initially borderline however they moved to acceptance after the rebuttal. The meta reviewer finds that the paper is not yet ready for publication and recommends rejection based on the following observations. + +Although the model is interesting, as agreed by the reviewers the initial version of the paper fell short on convincingly evaluating the method, e.g. generalized zero-shot learning (GZSL) setting as pointed out by R5, ImageNet and small scale dataset results as pointed out by R1. Similarly, the main paper (without the annexes) has been found to fall short on providing enough details of the model as pointed out by R1. + +The authors ran additional experiments during the rebuttal phase which showed some promise, however one more review round may be necessary to carefully validate these results. As R1 pointed out, the paper only reports results on two small datasets i.e., AWA2 and aPY, which contain classes similar to ImageNet. It would be interesting to observe the behaviour of the model on more challenging scenarios on other publicly available benchmark datasets of fine-grained nature whose distribution are far from ImageNet. This would indicate the generalisation ability of the model. + +Furthermore, as pointed out by R1, moving the details on the implementation and the architecture details from appendix and from python scripts to the main paper may be beneficial. However, this would end up significantly extending the paper. Hence, one more review round may be necessary for this paper.",ICLR2021, +HTVO7y8qHSu,1642700000000.0,1642700000000.0,1,KBuOP5HrVQ0,KBuOP5HrVQ0,Paper Decision,Reject,"The topic of this paper is timely and important. However, ultimately the reviewers remained unconvinced that this paper provides a sufficiently clear and sufficiently significant advance to lifelong RL. + +As an additional note, the setting under investigation here is not the full lifelong learning setting. E.g., several of the challenges outlined by Schaul et al. [1] are not treated, and this work is, instead, situated in a somewhat typical multi-task setting with substantlal structure. That is not bad, but it would be good if this is reflected clearly in all the statements, and, e.g., in the title of the work. + +The authors are encouraged to carefully take the provided feedback and see how they can use it to improve their work. This is an important research direction. It was just felt the current submission was not quite ready for publication yet. + +[1] https://arxiv.org/abs/1811.07004",ICLR2022, +sDpGJ-9iAKP,1642700000000.0,1642700000000.0,1,gf9buGzMCa,gf9buGzMCa,Paper Decision,Reject,"*Summary:* Study expressive power of narrow networks. + +*Strengths:* +- Study the narrow setting, which is not as well studied as the wide setting. +- Some reviewers found the paper well written. + +*Weaknesses:* +- Restricted class of targets and activations. +- Similar results have appeared in previous works. + +*Discussion:* + +99iL asked about the possibility to remove certain assumptions and the extension to other activations. Authors answer negatively to both. 99iL acknowledges the response and concludes the so-called maximum principle is the most interesting result, but also points out that similar results appear in previous work and that it would have been good to see some extensions. qHTG indicates that the paper is well written and has interesting contributions but that some of the theoretical results only apply in settings that are more restrictive than in other recent related works. Authors agree that generalizations deserve to be investigated in the context of the presented results, but point out that their principle does not apply in that case, and hence that such generalizations are out of scope. Although qHTG identifies several good aspects in this work, they maintain the overall assessment of just marginally above the threshold. PCRn finds the work very interesting but is concerned about the novelty and points out that although the work is technical, the main message is not very strong and that the extraction of insights to solve tasks is not as clear. PCRn concludes that the paper presents various relatively weak results but not a sufficiently significant message. Authors remark that some of their results constitute a mathematical tool for future works. + +*Conclusion:* + +One reviewer rated this work marginally below the acceptance threshold and three other marginally above. Considering the reviews and the discussion, I conclude that this paper obtains a few interesting results but leaves much for future work. Further development of the current results would make the article significantly stronger. In view of the very high quality of other submissions to the conference, I find that this article tightly misses the bar for acceptance. Therefore I recommend to reject this article. I encourage the authors to revise and resubmit.",ICLR2022, +49yTTaXsOl,1610040000000.0,1610470000000.0,1,14nC8HNd4Ts,14nC8HNd4Ts,Final Decision,Reject,The approach proposed here have raised major concerns from multiple reviewers especially concerning the novelty and the experimental validation procedure. Authors did not succeed in convincing reviewers of the value of their work for ML or calcium imaging processing.,ICLR2021, +8u_UWmR8ulz_,1642700000000.0,1642700000000.0,1,8eb12UQYxrG,8eb12UQYxrG,Paper Decision,Accept (Poster),"It can be prohibitively expensive to train a reinforcement learner from scratch — particularly in cases where experience is expensive to obtain, such as with a physical robot. So, we might hope to speed up RL in a couple of ways: first, by pre-training a representation that makes subsequent RL need less data; and second by running our RL on a cheaper proxy environment such as a simulator. For pre-training, we hope to be able to take advantage of available pre-collected data, and we hope to be able to use supervised learning or reconstruction tasks since they can be cheaper than RL. For either pre-training or a proxy environment, we have to deal with distribution shifts: the properties of the environment may change between pre-training and RL, and between RL and testing the learned policy. + +The paper presents an empirical study of how different pre-trained representations and different distribution shifts affect RL performance. It evaluates a number of representations trained by different VAEs (differing in aspects such as loss and hyperparameter settings) under various scenarios of distribution shift. It also asks whether we can predict the performance of the learned policies from properties of the representations, before going to the expense of training and evaluating our reinforcement learner. + +The paper concludes that it is possible to significantly reduce RL data requirements using pre-trained representations, even in the presence of significant distribution shifts — including demonstrating zero-shot sim2real transfer. And, the paper concludes that inexpensive measurements of OOD performance on supervised tasks can at least partially predict success in generalization. + +The reviewers praised the extensive experimental evaluation, including a large number of experiments on a physical robot, as well as the investigation of less-expensive ways to predict generalization. + +Some reviewers were concerned that the choice of environments was limiting — e.g., that the distributional distance between in-distribution and out-of-distribution tests was limited, or that the results might not generalize to other related robotic environments. However, in the end there was support for the conclusion that the experiments cover a sufficiently general and interesting question.",ICLR2022, +8v-mHt-y726,1610040000000.0,1610470000000.0,1,eZllW0F5aM_,eZllW0F5aM_,Final Decision,Reject,"This paper proposes to use randomly wired architectures [1] in the context of GNNs and introduces a method for sampling random architectures based on the Erdős–Rényi model. The authors further include a theoretical analysis and two methodological contributions: sequential path embeddings and DropPath, a regularizer. Results are reported on two graph datasets (ZINC and CLUSTER) and on GNN-based CIFAR10 image classification. + +The reviewers agree that the empirical results presented in the paper are compelling. The value of the contribution largely lies in this aspect, namely the empirical analysis of an existing technique (randomly wired architectures) in the context of GNNs, in addition to several smaller empirical methodological contributions. I agree with the reviewers in that the nature of the contribution and the otherwise limited novelty calls for a more extensive and detailed empirical evaluation (ideally incl. e.g. FLOPS, wall-clock time, memory usage) across a wide range of datasets and careful ablation studies, and I encourage the authors to improve on this aspect in a future version of the paper. The theoretical analysis is interesting, but, as pointed out by the reviewers both during the reviews and the later discussion period, does not add sufficient value to the main empirical contribution of the paper to push the paper beyond the acceptance threshold and does not satisfactorily address the question of how the method addresses the oversmoothing problem in GNNs. + +[1] Xie et al., Exploring randomly wired neural networks for image recognition (ICCV 2019) +",ICLR2021, +pKT6QAHqbv,1576800000000.0,1576800000000.0,1,BJgRsyBtPB,BJgRsyBtPB,Paper Decision,Reject,"The paper proposes a variant of the max-sliced Wasserstein distance, where instead of sorting, a greedy assignment is performed. As no theory is provided, the paper is purely of experimental nature. + +Unfortunately the work is too preliminary to warrant publication at this time, and would need further experimental or theoretical strengthening to be of general interest to the ICLR community.",ICLR2020, +B1g2PEMKeV,1545310000000.0,1545350000000.0,1,Hyfg5o0qtm,Hyfg5o0qtm,metareview,Reject,"The reviewers raised a number of major concerns including lack of explanations, lack of baseline comparisons, and lack of discussion on pros and cons of the main contribution of this work -- the presented Temporal Gaussian Mixture (TGM) layer. The authors’ rebuttal addressed some of the reviewers’ comments but failed to address all concerns (especially when it comes to the success of TGMs; it remains unclear whether this could be attributed solely to the way TGMs are applied rather than to their fundamental methodological advantage). Having said that, I cannot suggest this paper for presentation at ICLR.",ICLR2019,5: The area chair is absolutely certain +1yx6UQQaFDx,1642700000000.0,1642700000000.0,1,GesLOTU_r23,GesLOTU_r23,Paper Decision,Reject,"This work performs a mean field analysis of a certain class of fully connected networks with and without layer normalization. Theory is provided which successfully predicts when some networks will exhibit either exploding gradients, or ""representation shrinkage"" which is similar to the extreme ordered phase discussed in prior works on signal propagation. The primary concerns raised by reviewers included, large overlap with prior works on signal propagation, a bug in the proof of the main theorem, lack of clarity, and many assumptions made in the theory which significantly limit the space of architectures for which the theory can be applied. Some of these concerns were addressed in the rebuttal period, notably major flaw in the main theorem was resolved and some concerns on clarity were addressed. However, with the remaining issues (notably overlap with prior work, and overly restrictive assumptions made) a majority of reviewers did not recommend acceptance in the end. The AC agrees with this final decision and recommends the authors look to further expand upon the contributions relative to prior work.",ICLR2022, +_qUqDg4lEp6,1642700000000.0,1642700000000.0,1,Czsdv-S4-w9,Czsdv-S4-w9,Paper Decision,Accept (Poster),"This work tackles video generation using implicit representations, and demonstrates that using these representations enables improvements to long-term coherence of the generated videos. + +Reviewers praised the writing, the thorough experimental evaluation, and the strong quantitative results. Some concerns were raised about a lack of discussion of relevant related work, novelty/significance, model architecture, and a lack of qualitative examples, many of which the authors have tried to address during the discussion phase. Several reviewers raised their ratings as a result. + +Personally I certainly believe that exploring implicit representations for video is important, and I know of no published prior work in this direction, which amplifies the potential significance of this work. Even if results are qualitatively worse than previous work in some ways, this exploration is still valuable and worth publishing. + +While the paper ultimately received one reject rating, another reviewer chose to champion this work and award it the highest possible rating. Combined with the other positive reviews, this provides plenty of convincing evidence for me to recommend acceptance. That said, given the rating spread, I would like to encourage the authors to consider the reviewers' comments further as they prepare the final version of the manuscript. Especially providing more qualitative results would be a welcome addition.",ICLR2022, +8v1lef4nl,1576800000000.0,1576800000000.0,1,H1eArT4tPH,H1eArT4tPH,Paper Decision,Reject,"This paper studies the statistics of activation norms and Jacobian norms for randomly-initialized ReLU networks in the presence (and absence) of various types of residual connections. Whereas the variance of the gradient norm grows with depth for vanilla networks, it can be depth-independent for residual networks when using the proper initialization. + +Reviewers were positive about the setup, but also pointed out important shortcomings on the current manuscript, especially related to the lack of significance of the measured gradient norm statistics with regards to generalisation, and with some techinical aspects of the derivations. For these reasons, the AC believes this paper will strongly benefit from an extra iteration. ",ICLR2020, +IuKBEK8Ytif,1610040000000.0,1610470000000.0,1,m2ZxDprKYlO,m2ZxDprKYlO,Final Decision,Reject,"This paper sits at the borderline: the reviewers agree it is a well-written and interesting paper, but have concerns about efficiency as well as a comparison with the neural process (the authors did include a revision with this comparison, though the numbers they report are worse than in the original neural processes paper on the same experiment). Ultimately, this paper probably requires another round of reviews before it is ready for publication.",ICLR2021, +pVWVoDI1_6,1642700000000.0,1642700000000.0,1,Vvb-eicR8N,Vvb-eicR8N,Paper Decision,Reject,"This paper proposes a new contribution in the recent literature on learning distributions of sketches. While all reviewers have recognized the overall good quality of the presentation, two factors seem to weight heavily on a negative decision: clarifications on the contribution's scope (presented as a tool for general Hessians in the introduction, but ultimately only applied to least-square errors of linear predictors, to recover an explicit factorization of the Hessian matrix) and links with existing literature; weakness of experiments whose small scale does not justify using sketches in the first place. Since this is a ""learning"" approach, I am particularly sensitive to the latter point, and therefore am inclined to reject, but I encourage the authors to address these two issues with the current draft.",ICLR2022, +rkxo9bEkgE,1544660000000.0,1545350000000.0,1,r1lohoCqY7,r1lohoCqY7,Metareview,Accept (Poster),"The paper conveys interesting ideas but reviewers are concern about an incremental nature of results, choice of comparators, and in general empirical and analytical novelty.",ICLR2019,5: The area chair is absolutely certain +Jc7rLAHzzZD,1610040000000.0,1610470000000.0,1,4CxsUBDQJqv,4CxsUBDQJqv,Final Decision,Reject,"This paper proposes an algorithm to learn symbolic intrinsic rewards via a symbolic function generator. The policy optimizes this reward function and an evolutionary algorithm selects between a set of such policies. The core idea is that learning with such a symbolic reward function is useful in sparse reward environments and also enables better interpretability. + +${\bf Pros}$: +1. The learnt reward function has a relatively simple form and is therefore interpretable +2. The experimental section is quite extensive ranging from diverse tasks, control systems and agent systems. However there are some issues about showing clear need of the proposed method + +${\bf Cons}$: +1. There was a consensus among reviewers that the paper does not make a strong case for the symbolic reward generator. In the rebuttal the authors argued that as RL scales to real world problems, it will become necessary to use such a method. I can understand how it would be useful in the context of inverse RL or imitation learning. However, as R3 points out, in the cases considered in this paper, the rewards are fairly intuitive and explainable. The paper might become stronger by directly tackling problems with such constraints. +2. There is confusion about the details and scope in the current version of the paper. The paper would become stronger by incorporating all the feedback received during the review period. ",ICLR2021, +Kb089OOeb,1576800000000.0,1576800000000.0,1,HyeKcgHFvS,HyeKcgHFvS,Paper Decision,Reject,"The paper presents an SGD-based learning of a Gaussian mixture model, designed to match a data streaming setting. + +The reviews state that the paper contains some quite good points, such as +* the simplicity and scalability of the method, and its robustness w.r.t. the initialization of the approach; +* the SOM-like approach used to avoid degenerated solutions; + +Among the weaknesses are +* an insufficient discussion wrt the state of the art, e.g. for online EM; +* the description of the approach seems yet not mature (e.g., the constraint enforcement boils down to considering that the $\pi_k$ are obtained using softmax; the discussion about the diagonal covariance matrix vs the use of local principal directions is not crystal clear); +* the fact that experiments need be strengthened. + +I thus encourage the authors to rewrite and polish the paper, simplifying the description of the approach and better positioning it w.r.t. the state of the art (in particular, mentioning the data streaming motivation from the start). Also, more evidence, and a more thorough analysis thereof, must be provided to back up the approach and understand its limitations.",ICLR2020, +VfCNrPDP5H,1576800000000.0,1576800000000.0,1,rJxBa1HFvS,rJxBa1HFvS,Paper Decision,Reject,"This paper studies the problem of estimating the value function in an RL setting by learning a representation of the value function. While this topic is one of general interest to the ICLR community, the paper would benefit from a more careful revision and reorganization following the suggestions of the reviewers.",ICLR2020, +21pSwzmJJwc,1642700000000.0,1642700000000.0,1,EIm_pvFJx5k,EIm_pvFJx5k,Paper Decision,Reject,"This paper proposes an autoregressive framework that combines RNN and local linear component for the problem of meta-forecasting of time series. The linear model can domain-adapt to different time series while the RNN component is shared across series. Reviewers thought the problem was important, the paper was generally clear and the experiments extensive. However they found the significance to be limited and all took issue with some of the ways that the comparisons were done. FZbR also raised the issue of complexity of the matrix inversion component of the method. I believe this paper does fall on the rejection side of the fence due to the issues of complexity, significance and evaluations. With some development, the paper could certainly be ready for acceptance.",ICLR2022, +xOHUq-su75Z,1610040000000.0,1610470000000.0,1,xrUySgB5ZOK,xrUySgB5ZOK,Final Decision,Reject,"This paper adopts an idea from 1990 for reducing reliance on texture, and shows that this idea improves the quality of visual representations in a variety of tasks. Initially reviewer scores were 7/5/4 but those improved slightly to 7/6/4 (changed in comment, not final review) after the rebuttal stage-- thus, one accept, one borderline, and one reject score. Reviewers have concerns about the great simplicity of the approach, where the only contribution is from prior work. Some reviewers request comparisons in a proper domain adaptation setting. While the large number of experimental settings somewhat balance out the concerns, overall, support for acceptance is not strong enough at this stage.",ICLR2021, +mUMByV4mNw1,1610040000000.0,1610470000000.0,1,B9t708KMr9d,B9t708KMr9d,Final Decision,Reject,"This paper proposes a semi-supervised graph classification technique that unifies feature and label propagation techniques. The resulting algorithm is a simple extension that attains strong performance. Reviewers were divided on this submission. Some reviewers felt the proposed algorithm did not constitute a sufficient technical contribution given that it was a simple combination of existing techniques. I tend to agree with other reviewers that the simplicity is a benefit. However, despite the methods simplicity there was significant confusion about the details of the method and multiple reviewers flagged that the paper was difficult to read and understand. It further could benefit from additional discussion and some clarification/cleanup of the experimental results. Finally, multiple reviewers asked for better situating of the proposed method with respect to prior work. Given these concerns, I do not think the paper is ready for publication. I would recommend the reviewers do a thorough re-write of the paper to address these concerns and consider resubmitting.",ICLR2021, +OZxd-18MgMD,1610040000000.0,1610470000000.0,1,ZHADKD4pl5H,ZHADKD4pl5H,Final Decision,Reject,"The rationality of the proposed method, especially its implementation detail, is challenged by the reviewers. Additionally, the experimental part and the writing of the paper should be improved. According to the feedback of the reviewers, I don't think this work is qualified enough at its current status. ",ICLR2021, +Bkdl6zI_g,1486400000000.0,1486400000000.0,1,S19eAF9ee,S19eAF9ee,ICLR committee final decision,Reject,"While graph structures are an interesting problem, as the reviewers observed, the paper extends previous work incrementally and the results are not very moving. + + pros + - interesting problem space that has not been thoroughly explored + cons + - experimental evaluation was not convincing enough with the results. + - the method itself is a small incremental improvement over prior papers.",ICLR2017, +Ht8HD_-vRUj,1610040000000.0,1610470000000.0,1,ry8_g12nVD,ry8_g12nVD,Final Decision,Reject,"Does not seem to be a complete submission (only one page), all reviewers agree on rejecting.",ICLR2021, +BJxVqfGrlE,1545050000000.0,1545350000000.0,1,rkgd0iA9FQ,rkgd0iA9FQ,meta-review,Reject,"The reviewers and ACs acknowledge that the paper has a solid theoretical contribution because it give a convergence (to critical points) of the ADAM and RMSprop algorithms, and also shows that NAG can be tuned to match or outperform SGD in test errors. However, reviewers and the AC also note that potential improvements for the paper a) the exposition/notations can be improved; b) better comparison to the prior work could be made; c) the theoretical and empirical parts of the paper are somewhat disconnected; d) the proof has an error (that is fixed by the authors with additional assumptions.) Therefore, the paper is not quite ready for publications right now but the AC encourages the authors to submit revisions to other top ML venues. +",ICLR2019,4: The area chair is confident but not absolutely certain +4nHjf-733z,1576800000000.0,1576800000000.0,1,HyxQbaEYPr,HyxQbaEYPr,Paper Decision,Reject,The authors propose a sample reweighting scheme that helps to learn a simple model with similar performance as a more complex one. The authors contained critical errors in their original submission and the paper seems to lack in terms of originality and novelty of the proposed method.,ICLR2020, +aT4Q6MTFP4HB,1642700000000.0,1642700000000.0,1,wYqLTy4wkor,wYqLTy4wkor,Paper Decision,Reject,"The paper tackles the problem of covariate shift in adaptive curriculum learning. Unfortunately, the paper lacks clarity and the experiments are insufficient. The author response clarified the notation and corrected many typos, however, the paper remains conceptually unclear as pointed out by the reviewers. Hence this work is not ready for publication.",ICLR2022, +H1gTF2byeV,1544650000000.0,1545350000000.0,1,HJgXsjA5tQ,HJgXsjA5tQ,Meta-review,Accept (Poster),"This paper introduces a class of deep neural nets that provably have no bad local valleys. By constructing a new class of network this paper avoids having to rely on unrealistic assumptions and manages to provide a relatively concise proof that the network family has no strict local minima. Furthermore, it is demonstrated that this type of network yields reasonable experimental results on some benchmarks. The reviewers identified issues such as missing measurements of the training loss, which is the actual quantity studied in the theoretical results, as well as some issues with the presentation of the results. After revisions the reviewers are satisfied that their comments have been addressed. This paper continues an interesting line of theoretical research and brings it closer to practice and so it should be of interest to the ICLR community.",ICLR2019,4: The area chair is confident but not absolutely certain +GF4_KfLKNf,1576800000000.0,1576800000000.0,1,rJgUfTEYvH,rJgUfTEYvH,Paper Decision,Accept (Poster),"The authors explore the use of flow-based models for video prediction. The idea is interesting. The paper is well-written. It is a good paper worthwhile presenting in ICLR. + +For final version, we suggest that the authors can significantly improve the experiments: (1) report results on human motion datasets; (2) include the results by the FVD metric. +",ICLR2020, +eud2hY4OJBr,1642700000000.0,1642700000000.0,1,Jep2ykGUdS,Jep2ykGUdS,Paper Decision,Reject,"The manuscript develops a novel method for uncertainty prediction that can be used in the context of active or reinforcement learning problems. They consider experiments such as an OOD Detection task wherein a ResNet is trained on CIFAR10 and predictions must subsequently be made for in- versus out-of-distribution (SVHN) data. +The work develops an approach based on directly estimating epistemic (as opposed to a aleatoric) uncertainty by learning to predict generalization error and then subtracting estimated aleatoric uncertainty. +Reviewers found the essential approach to be novel and creative. However, there were several issues raised by reviewers that are not well addressed by responses by the authors. For example, Zaec worries about the dependence of the approach on an oracle for estimating aleatoric uncertainty. Multiple reviewers were concerned that this would make the approach unsuitable for many situations and thus limit the applicability of the ideas. +Multiple reviewers also found the manuscript to be difficult to understand. I agree with the sentiment. While there may indeed be an interesting and important idea here, the text and explication of the algorithm and approach are complicated and leave the reader unsure about the contribution. I would recommend that the authors invest time and effort in simplifying and streamlining the narrative and presenting the technical innovation so that it is easier to judge. In it's current form, the manuscript is premature for publication.",ICLR2022, +SImJo69hX1v,1642700000000.0,1642700000000.0,1,d5SCUJ5t1k,d5SCUJ5t1k,Paper Decision,Accept (Poster),"This paper presents work on open-world object detection. The main idea is to use fixed per-category semantic anchors. These can be incrementally added to when new data appear. The reviewers engaged in significant discussion around the paper with many iterations of improvements to the paper. Initial concerns regarding zero-shot learning were addressed, as were remarks on presentation and claims. + +In the end the reviewers were split on this paper. I recommend to accept the paper on the basis of the semantic topology ideas and the thorough experimental results. + +The remaining concern centered around the evaluation protocol used in the paper, which follows that in the literature (e.g. Joseph et al. CVPR 21). While this is not a fatal flaw, it is an issue with how this genre of methods is evaluated. It would be good to add discussion to the final paper to highlight this as an opportunity for future work in the field to address. Specifically, as a reviewer noted ""after detecting ""unknown"" objects in T1, the (hypothetical) annotation process provides boxes for ALL objects of some new classes instead of only for those that have been correctly detected (localized and marked ""unknown"").""",ICLR2022, +1bBbkIq5cr,1610040000000.0,1610470000000.0,1,s9788-pPB2,s9788-pPB2,Final Decision,Reject,"Though the method suggested in this paper is interesting, theoretically motivated, and resulted in some practical improvement, the reviewers ultimately had low scores. The reasons for this are: +1) The improvements obtained by this method were rather small, especially on the standard datasets (CIFAR, Imagenet). +2) In the main results presented in the paper, it seems that a proper validation/test split was not done (which seems quite important for demonstrating the validity of this method). In some of the results, presented in supplementary, such a split was done, but this seems to decrease the performance of the method even more. +3) The method requires that features in the last hidden layer approximately span a low dimensional manifold. This seems like a major limitation for the accuracy of this method, which becomes approximate in datasets where the number of datapoints is larger than the size of the last hidden layer (which is the common case). + +Therefore, I suggest the authors try to improve all of the above issues and re-submit. For example, one simple way to address issue 3 and potentially improve the results (issue 1) is to use the same method on all the features in all the layers, instead of just the last layer. In other words, concatenate all the features and all the layers, and then add a linear layer from this concatenated feature vector directly to the network output, in a direction that is orthogonal to the data. + + ",ICLR2021, +ryqoifUux,1486400000000.0,1486400000000.0,1,SyCSsUDee,SyCSsUDee,ICLR committee final decision,Reject,"The reviewers all expressed concerns with the technical quality of this work. In particular, the reviewers are concerned that ignoring certain entropy terms in the objective is problematic and would require significantly more justification theoretically and empirically. The reviewers believe that the authors had to resort to unjustified tricks such as adding noise in order to compensate for the missing terms in the objective. Some of the reviewers also had concerns with the choice of experiments, expressing that the authors did not choose the right baseline comparisons to compare to (e.g. convolutional networks vs. fully connected networks on MNIST). Hopefully the thorough feedback and lengthly discussion, along with the authors' responses (both in the text and additions to the paper and appendix), will lead to a stronger submission to a future conference.",ICLR2017, +8VS3NxKUPj,1576800000000.0,1576800000000.0,1,B1x6BTEKwr,B1x6BTEKwr,Paper Decision,Accept (Poster),"Quoting R3: ""This paper studies the theoretical property of neural network's loss surface. The main contribution is to prove that the loss surface of every neural network (with arbitrary depth) with piecewise linear activations has infinite spurious local minima."" + +There were split reviews, with two reviewers recommending acceptance and one recommending rejection. During a robust rebuttal and discussion phase, both R2 and R3's appreciation for the work was strengthened. The authors also provided a robust response to R1, whose main concerns included (i) that the paper's analysis is limited to piecewise linear activation functions, (ii) technical questions about the difficulty of proving theorem 2, which appear to have been answered in the discussion, and (iii) concerns about the strength of the language employed. + +On the balance, the reviewers were positively impressed with the relevance of the theoretical study and its contributions. Genuine shortcomings and misunderstandings were systematically resolved during the rebuttal process.",ICLR2020, +gKvyFq-TkEk,1610040000000.0,1610470000000.0,1,D3TNqCspFpM,D3TNqCspFpM,Final Decision,Reject,"The reviewers noted that this is an important, interesting but difficult topic. They appreciated that the authors clarified their assumptions in the theorem statements. Nevertheless, they recommend the authors to detail in depth when the method work better than the method where only the covariates are adjusted. They still think that the paper would require major modifications to be considered for publication hence the decision is rejection the paper. +",ICLR2021, +uQ67MdQiX6,1576800000000.0,1576800000000.0,1,rygGQyrFvH,rygGQyrFvH,Paper Decision,Accept (Poster),"This paper presents nucleus sampling, a sampling method that truncates the tail of a probability distribution and samples from a dynamic nucleus containing the majority of the probability mass. Likelihood and human evaluations show that the proposed method is a better alternative to a standard sampling method and top-k sampling. + +This is a well-written paper and I think the proposed sampling method will be useful in language modeling. All reviewers agree that the paper addresses an important problem. + +Two reviewers have concerns regarding the technical contribution of the paper (i.e., nucleus sampling is a straightforward extension of top-k sampling), and whether it is enough for publications at a venue such as ICLR. R2 suggests to have a better theoretical framework for nucleus sampling. I think these are valid concerns. However, given the potential widespread application of the proposed method and the strong empirical results, I recommend to accept the paper. + +Also, a minor comment, I think there is something wrong with your style file (e.g., the bottom margin appears too large compared to other submissions).",ICLR2020, +ysilVJ-KD8r,1610040000000.0,1610470000000.0,1,RDpTZpubOh7,RDpTZpubOh7,Final Decision,Reject,"Based on the paper, reviewers' comments and discussions, and the responses, the meta-reviewer would like to suggest the authors to improve the paper and resubmit.",ICLR2021, +SktXUJpSM,1517250000000.0,1517260000000.0,749,SyGT_6yCZ,SyGT_6yCZ,ICLR 2018 Conference Acceptance Decision,Reject,"The paper addresses the training time of CNNs, in the common setting where a CNN is trained on one domain and then used to extract features for another domain. The paper proposes to speed up the CNN training step via a particular proposed training schedule with a reduced number of epochs. Training time of the pre-trained CNN is not a huge concern, since this is only done once, but optimizing training schedules is a valid and interesting topic of study. However, the approach here does not seem novel; it is typical to adjust training schedules according to the desired tradeoff between training time and performance. The experimental validation is also thin, and the writing needs improvement.",ICLR2018, +-8ySLz3b-3R,1642700000000.0,1642700000000.0,1,z3Tf4kdOE5D,z3Tf4kdOE5D,Paper Decision,Reject,"This manuscript proposes a quantization approach to improve adversarial robustness. Reviewers agree that the problem studied is timely and the approach is interesting. However, note concerns about the novelty compared to closely related work, the quality of the presentation, the strength of the evaluated attacks compared to the state of the art, among other concerns. There is no rebuttal.",ICLR2022, +r0NXVFoB6E,1610040000000.0,1610470000000.0,1,ZpS34ymonwE,ZpS34ymonwE,Final Decision,Reject,"I thank the authors and reviewers for the discussions. Reviewers agreed the work is interesting but there are some aspects of the paper that need improvements. In particular, authors need to better address concerns raised by R5. Given all, I think the work still needs a bit more work before being accepted. ",ICLR2021, +vRUo-CsLzi,1642700000000.0,1642700000000.0,1,GVDwiINkMR,GVDwiINkMR,Paper Decision,Reject,"The reviewers were not convinced by the authors' responses to their concerns, and this paper generated little followup discussion. Some primary concerns include the privacy analysis, limited technical contribution and scope (e.g., only being applicable to iid data), and lacking comparison to suggested baselines. The authors are suggested to take the reviewer comments into account for further investigation.",ICLR2022, +7raYo5Dw4tM,1642700000000.0,1642700000000.0,1,KLaDXLAzzFT,KLaDXLAzzFT,Paper Decision,Accept (Poster),"In this paper, the authors motivate the paper well by the gap between the upper bound of the popular offline RL algorithm and the lower bound of the offline RL. By exploiting the special linear structure, the authors designed a variance-aware pessimistic value iteration, in which the variance estimation is used for reweighting the Bellman loss. Finally, the upper bound of the proposed algorithm in terms of the algorithm quantity is proposed, which is more refined to reflexing the problem-dependency. These results are interesting to the offline RL theoretical community. + +As the reviewers suggested, several improvements can be made to further refinement, e.g., + +- The intuition about the self-normalization in the algorithm exploited to improve the upper bound should be introduced. +- The discussion in Sec 3.3t about the insight of the improvement of the upper bound is not sufficient. +- The extra computational cost about the variance should be discussed.",ICLR2022, +Dri1hBxcWN0-,1642700000000.0,1642700000000.0,1,K9KiBYAthi9,K9KiBYAthi9,Paper Decision,Reject,"This paper is proposed to improve base CNN models by dual multi-scale attention module. To achieve a better feature representational ability, authors consider the multi-scale mechanism from both channel-dimension and spatial-dimension. The proposed method has been verified on several benchmarks, including ImageNet and MS COCO. However, all reviewers consider rejecting this paper because this work lacks novelty, the results are suspicious, and the writing is poor. No responses are submitted by authors to address the reviewers' concerns.",ICLR2022, +FBFKp8Z0RfR7,1642700000000.0,1642700000000.0,1,IEKL-OihqX0,IEKL-OihqX0,Paper Decision,Reject,"The paper introduces a way of making Ratio Matching (RM) scale better to high-dimensional data when training energy-based models (EBMs). The main idea is to estimate the sum over the datapoint dimensions in the RM objective with importance sampling (IS), achieving computational savings by using fewer samples than dimensions. A key part of the method is a proposal that uses gradient information w.r.t. discrete variables to efficiently approximate the optimal (minimum variance) proposal, resulting in much better performance compared to uniform sampling. The authors also introduce a biased version of the estimator that samples from the same proposal but drops the importance weights when averaging over the samples, which, somewhat surprisingly, outperforms the unbiased version. + +The idea of using Monte Carlo estimation based on importance sampling to speed up Ratio Matching is novel and sound. The use of gradient information to approximate the optimal IS proposal is also novel in this context, though the idea of using gradients this way to reduce the number of EBM energy function evaluations comes from Grathwohl et al. (2021), where it was used to speed up Gibbs sampling. + +While the method is well described, the paper is insufficiently rigorous in several places, most importantly in claiming that Eq. 4 corresponds to Ratio Matching, which is not true. Eq. 4 is instead equivalent to the objective for Generalized Score Matching (GSM) given by Eq. 17 in (Lyu, 2009). Crucially, while both GSM and RM recover the true model if the model class is nonparametric/unconstrained, as is stated in (Lyu, 2009) (and thus agree with each other as well as with maximum likelihood estimation) ), they do not yield the same solution for constrained model classes such as neural networks. This means that the method in the paper implements GSM and not RM. Unlike RM, which has been used widely in the literature, GSM is essentially empirically unproven and thus is a less interesting choice. The main difference between the GSM and RM objectives is the presence of the squashing function g(u) = 1/(1+u) around the probability ratios in RM to avoid division by zero, when the probability in the denominator is vanishingly small (as is explained above Eq. 12 in (Hyvarinen, 2007)). This means GSM is likely to be prone to stability issues due to using probability ratios directly. This is one possible explanation for the puzzling empirical results in the paper, where the proposed sampling-based methods outperform the exact method they are supposed to approximate, with the biased method clearly performing best. The intuition-based arguments made in the paper to explain these results are not convincing and need to be improved upon. While, as the authors pointed out in their response, it is possible to apply the strategy in the paper to RM by applying IS to Eq. 3 instead of Eq. 4, that would be essentially a different paper. + +One example of puzzling experimental results is Figure 3, which shows that the base method (""Ratio Matching"") does not find the correct solution while the proposed approximate methods do. This suggests that there is something wrong either with the method (e.g. with the objective, as mentioned above) or with the experimental setup. In either case, the cause needs to be thoroughly investigated. + +Currently the empirical evaluation is primarily MMD based, relying on sampling from the model using MCMC. Ensuring that MCMC chains mix sufficiently well to sample from the true distribution by visiting all of its modes is difficult, and it is important to provide some evidence that this was done. As suggested by a reviewer, the results would be substantially strengthened by reporting the log-likelihoods for the models, estimated e.g. using AIS, even if that requires including scaled-down versions of some of the experiments. + +The title of the paper is misleading and should be changed because the proposed method is specific to EBMs for binary data, even if the intent is to extend it to other types of discrete data in the future. + +The clarification and additional results provided by the authors to the reviewers and the AC were appreciated, but unfortunately the outstanding issues with the paper are too major to allow acceptance at this point. The main idea of the paper has substantial promise however, and the authors are encouraged to develop it to its full potential by addressing the points from this meta-review as well as the additional ones from the reviewers. + +Bibliography correction: Hyvarinen is the solo author of ""Estimation of Non-Normalized Statistical Models by Score Matching"". Peter Dayan was the editor of that paper and not a co-author. Please correct your bibliography.",ICLR2022, +2ey2o7vH6oR,1642700000000.0,1642700000000.0,1,oDFvtxzPOx,oDFvtxzPOx,Paper Decision,Accept (Spotlight),"This paper proposes a feature selection method to identify features for downstream supervised tasks, focused on addressing challenges with sample scarcity and feature correlations. The proposed approach is highly motivating in biological and medical applications. Reviewers pointed out various strengths including potential high impacts in biomedical applications, technical novelty and significance, and comprehensive and illustrative experiments. The authors adequately addressed major concerns raised by reviewers.",ICLR2022, +c0MncHzArlN,1642700000000.0,1642700000000.0,1,zbZL1s-pBF,zbZL1s-pBF,Paper Decision,Reject,"Three experts reviewed the paper and gave mixed reviews. Reviewer BBZL raised their score to 6 in the discussion phase. Reviewer dv5k was not fully convinced by the rebuttal and remained negative. Reviewer oUrr also remained negative. The reviewers were not excited by the proposed method in general and raised questions about both experiments and theoretical results. AC found clear merits in the paper, but the reviewers' comments suggested the work could be strengthened in both experiments and presentation. Hence, the decision is *not* to recommend acceptance at this time. The authors are encouraged to consider the reviewers' comments when revising the paper for submission elsewhere.",ICLR2022, +VUF0NwqVts,1642700000000.0,1642700000000.0,1,7oyVOECcrt,7oyVOECcrt,Paper Decision,Reject,"This paper presents a graph neural network (GNN) architecture that adopts locally permutation-equivariant constructs, which has better scalability compared to globally permutation-equivariant GNNs, and the paper claims this change also does not lose expressivity of the network. All reviewers unanimously recommended rejection, and the main issues are the clarity and writing, to the point where it becomes hard for a reader to follow the precise implementation of the proposed approach and how that compares to prior work. Therefore in its current form this paper is not yet ready for publication at ICLR. When the authors work toward the next revision I’d suggest clarifying a little more about the precise algorithmic implementation of the proposed ideas, with a bit of additional intuition from a higher-level, rather than staying at the current level of technicality.",ICLR2022, +qKHGrewZlXV,1610040000000.0,1610470000000.0,1,SncSswKUse,SncSswKUse,Final Decision,Reject,"The paper introduces a linear projection method, inspired by ANOVA, +for finding a supervised low-dimensional embedding. + +A positive aspect is that the method is straightforward, and it is +even slightly surprising that in the family of linear models, there +still was an uncovered ""niche"". + +The paper was considered useful for the purpose studied in the +paper, single-cell RNA-seq data analysis. But to claim broader +usefulness, more evidence should be presented. + +One particular detail which was brought up by all reviewers was the +PCA preprocessing. For ICA it is a sensible choice, as linear ICA is +essentially ""just"" a rotation of the PCA components. But the +justification is not as good for a supervised method. PCA may be +necessary in practice, but may lose important category-relevant +information. + +The paper still needs a significant revision before publication. Even +though the method is straightforward method, a lot of time and +discussion was required for expert reviewers to understand it. +",ICLR2021, +SCnB-y3hh9U,1642700000000.0,1642700000000.0,1,Rty5g9imm7H,Rty5g9imm7H,Paper Decision,Accept (Poster),"The paper builds upon previous work on neural temporal point processes. It mainly proposes the replacement of the LSTMs with Transformers as transformers are widely considered as a more powerful sequence modeling tool and the three advantages listed in the end of section 1 in this paper. + +However, on the modeling side, it is not straight-forward how to apply the transformer (the attention architecture) on to the continuous time-sequence problem using NDTT. I think because I read a revised version of this paper, it is actually more understandable to me as compared to the reviewers who read the first draft of the paper. I think A-NDTT is a natural and principled way of introducing the attention mechanism into the continuous time neural symbolic framework, although I agree it unfortunately does not leading to a significant improvement in every experimental setting. + +To summarize the discussions, I think the authors did a good job in resolving the concerns on the related work and made the paper easier to understand. I appreciate these efforts from the authors even though I also understand there are concerns left still from the reviewers. + +In summary, I am leaning to accept this paper because I think it is an interesting contribution. However, I do agree with the reviewers that the writing of the paper needs to be improved and the experiment section is relatively weak in this paper.",ICLR2022, +rk2WB16Hf,1517250000000.0,1517260000000.0,512,Bys_NzbC-,Bys_NzbC-,ICLR 2018 Conference Acceptance Decision,Reject,"The submission is motivated by an empirical observation of a phase transition when a sufficiently high L1 or L2 penalty on the weights is applied. The proposed solution is to optimize for several epochs without the penalty followed by introduction of the penalty. Although empirical results seem to moderately support this approach, there does not seem to be sufficient theoretical justification, and comparisons are missing. Furthermore, the author response to reviewer concerns contain unclear statements e.g. +""The reason is that, to reach the level of L1 norm that is low enough, the model needs to go through the strong regularization for the first few epochs, and the neurons already lose its learning ability during this period like the baseline method."" +It is not at all clear what ""neurons already lose its learning ability"" is supposed to mean.",ICLR2018, +RyOltrjCHx,1576800000000.0,1576800000000.0,1,ByggpyrFPS,ByggpyrFPS,Paper Decision,Reject,"This paper tackles the problem of detection out-of-distribution (OoD) samples. The proposed solution is based on a Bayesian variational autoencoder. The authors show that information-theoretic measures applied on the posterior distribution over the decoder parameters can be used to detect OoD samples. The resulting approach is shown to outperform baselines in experiments conducted on three benchmarks (CIFAR-10 vs SVNH and two based on FashionMNIST). + +Following the rebuttal, major concerns remained regarding the justification of the approach. The reason why relying on active learning principles should allow for OoD detection would need to be clarified, and the use of the effective sample size (ESS) would require a stronger motivation. Overall, although a theoretically-informed OoD strategy is indeed interesting and relevant, reviewers were not convinced by the provided theoretical justifications. I therefore recommend to reject this paper.",ICLR2020, +SylMd7nMlV,1544890000000.0,1546870000000.0,1,r1GAsjC5Fm,r1GAsjC5Fm,meta-review,Accept (Poster),"The authors have described a navigation method that uses co-grounding between language and vision as well as an explicit self-assessment of progress. The method is used for room 2 room navigation and is tested in unseen environments. On the positive side, the approach is well-analyzed, with multiple ablations and baseline comparisons. The method is interesting and could be a good starting point for a more ambitious grounded language-vision agent. The approach seems to work well and achieves a high score using the metric of successful goal acquisition. On the negative side, the method relies on beam search, which is certainly unrealistic for real-world navigation, the evaluation metric is very simple and may be misleading, and the architecture is quite complex, may not scale or survive the test of time, and has little relevance for the greater ML community. There was a long discussion between the authors and the reviewers and other members of the public that resolved many of these points, with the authors being extremely responsive in giving additional results and details, and the reviewers' conclusion is that the paper should be accepted. ",ICLR2019,4: The area chair is confident but not absolutely certain +rk4ONJTrG,1517250000000.0,1517260000000.0,383,HkpYwMZRb,HkpYwMZRb,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"The paper sets out to analyze the problem of exploding gradients in deep nets which is of fundamental importance. Reviewers largely acknowledge the novelty of the main ideas in the paper towards this goal, however it is also strongly felt that the writing/presentation of the paper needs significant improvement to make it into a coherent and clean story before it can be published. There are also some concerns on networks used in the experiments not being close to practice. + +I recommend invitation to the workshop track as it has novel ideas and will likely generate interesting discussion. ",ICLR2018, +p0-I3n6w6XX,1642700000000.0,1642700000000.0,1,v6s3HVjPerv,v6s3HVjPerv,Paper Decision,Accept (Poster),"This work presents a novel and clever experiment for interpretable vision. Reviewers all agreed that it tackles an important and interesting research question via a user study design. There are some concerns around the generalization and transfer to large-scale real-world settings, as well as dataset construction. With the authors’ responses and discussion, I think the pros seem to outweigh the cons of this work a bit.",ICLR2022, +Y2t4zHDtLTq,1642900000000.0,1642900000000.0,1,Mspk_WYKoEH,Mspk_WYKoEH,Paper Decision,Accept (Poster),"In standard message-passing GNNs (MPNNs), one step at any node u involves receiving state/embedding information from all of u’s neighbors, and then updating u’s state as a function of these messages and of u’s own current state. Thus, the communication pattern at every step is that of a star topology (a graph with u at its “center”, and with u connected to all its neighbors, and with no other edges). However, it is well-known that the expressive power here is bounded by that of the 1st order Weisfeiler-Leman isomorphism test (1-WL). This paper then takes the natural step of generalizing the star topology to more general ones (e.g., k-hop egonets, the subgraph induced by the nodes of distance at most k from u). It is shown that this framework is strictly more powerful than 1-WL and 2-WL (however, as pointed out by a referee, this is actually a weaker version of 2-WL that is equivalent to 1-WL), and is at least as powerful as 3-WL. Subgraph-sampling approaches that improve efficiency are also introduced. It is shown that this method beats the SOTA for some number of well-known graph-ML problems. + +It looks like this paper has a very strong overlap with the NeurIPS 2021 paper ""Nested Graph Neural Networks"" (https://openreview.net/forum?id=7_eLEvFjCi3). Both papers use rooted subgraphs (k-hop ego-nets) to replace the k-hop rooted subtrees in traditional GNNs, and both use a base GNN over the rooted subgraph to compute a subgraph representation as the node representation while pooling the node representations into a graph representation. Both papers claim to outperform 1-WL in expressive power; both use distance to center node in order to enhance subgraph node features. The authors are urged to compare and contrast the two papers in the camera-ready version.",ICLR2022, +gBeIkOjk0UN,1642700000000.0,1642700000000.0,1,oC12z8lkbrU,oC12z8lkbrU,Paper Decision,Reject,"To overcome the challenge of lacking task-specific unlabeled data in semi-supervised learning (SSL) or knowledge-distillation (KD) tasks, this paper presents a new framework called ""generate, annotate, and learn (GAL)"" that uses unconditional language models to synthesize in-domain unlabeled data to advance SSL and KD. Extensive experiments on both NLP and tabular tasks demonstrate positive results of the proposed method. + +Reviewers generally agree on several key strengths of the paper, e.g., the paper is well-written, literature review is comprehensive, experimental results are generally positive (the improvements over the standard baselines on GLUE benchmark looks solid despite not very significant). On the negative side, some reviewers did raise some major concerns about the novelty of the proposed framework and the lack of strong baselines for comparison. For example, the proposed GAL framework doesn’t seem particularly novel as neither of the proposed components is new, and the key value of the work seems on the contribution of evaluating the large LM's ability to generate good in-domain unlabeled data (as agreed by authors). Therefore, it is very important to compare with other existing data augmentation baselines in the empirical studies. While authors did try to add one round-trip-translation (RT) data augmentation baseline for comparison during the rebuttal, more stronger SOTA data augmentation baselines should be compared. + +Overall, this is a good paper which is worthy of publication in near future but it still needs some more work on more extensive comparison of more baselines and improvements on the writing of novelty and contribution claims.",ICLR2022, +SJSXTMUOe,1486400000000.0,1486400000000.0,1,B1oK8aoxe,B1oK8aoxe,ICLR committee final decision,Accept (Poster),"The authors provide an approach to policy learning skills via a stochastic neural network representation. The overall idea seems novel, and the reviewers are in agreement that the method provides an interesting take on skill learning. The additional visualization presented in response to the reviewer comments were also quite useful. + + Pros: + + Conceptually nice approach to skill learning in RL + + Strong results on benchmark tasks + + Cons: + - The overall clarity of the paper could still be improved: the actual algorithmic section seems to present the methods still at a very high level, though code release may also help with this (it would not be at all trivial to re-implement the method just given the paper). +We hope the authors can address these outstanding issues in the camera ready version of their paper.",ICLR2017, +H1L_X1aSz,1517250000000.0,1517260000000.0,171,BkLhaGZRW,BkLhaGZRW,ICLR 2018 Conference Acceptance Decision,Accept (Poster)," + Original regularizer that encourages discriminator representation entropy is shown to improve GAN training. + + good supporting empirical validation + - While intuitively reasonable, no compelling theory is given to justify the approach + - The regularizer used in practice is a heap of heuristic approximations (continuous relaxation of a rough approximate measure of the joint entropy of a binarized activation vector) + - The writing and the mathematical exposition could be clearer and more precise",ICLR2018, +lHZInGhifAX,1610040000000.0,1610470000000.0,1,JAlqRs9duhz,JAlqRs9duhz,Final Decision,Reject,"This paper proposes ScaleGrad, a simple technique to encourage generating non-repetitive tokens for text generation tasks. The key idea is to modify a language model's token-level distributions by rescaling the softmax probability for certain words (in the novel set) by a factor of $\gamma$. Experiments show that ScaleGrad outperforms MLE and Unlikelihood Training (UT). + +This paper receives 2 reject and 2 accept recommendations. Most of the reviewers have provided very detailed comments, and the authors have also provided very long and detailed responses. On one hand, all the reviewers agree that the experiments are comprehensive, and the motivation of the proposed method is clear. + +On the other hand, several concerns still exist after the rebuttal, namely, hand-wavy arguments and inconsistent experimental protocol. (i) The empirical evidence in the experiments is not convincing enough. It makes reviewers more reluctant about the approach after seeing more experimental results during the discussion. That is, for different tasks, different $\gamma$s are used, while the hyper-parameters used for other methods seem to be default values. This makes reviewers hesitant that scaling novel tokens and renormalizing the model's output distribution is really significant. (ii) Another minor concern is that the discussion on the potential issue of UL (sec. 5.4) does not look convincing. (iii) After reading the paper, the AC also feels that the novelty of the proposed method might be a little bit limited. + +The rebuttal unfortunately did not fully address the reviewers' main concerns. On balance, the AC regrets that the paper cannot be recommended for acceptance at this time. The authors are encouraged to consider the reviewers' comments when revising the paper for submission elsewhere.",ICLR2021, +2_9RqYISfX7,1610040000000.0,1610470000000.0,1,Bpw_O132lWT,Bpw_O132lWT,Final Decision,Reject,"The average review rating is 5.5 which means it’s somewhat borderline. One of the reviewers planned to increase the score but apparently didn’t do so formally. A subset of the main pros and cons the reviewers pointed out are: + +Pros: +“Some empirical support is provided for the theory.” +“ It is particularly interesting that the authors show that the second order effect of the SGD noise in the Hessian induces a power law distribution over the iterates.” + +Cons: +“The escaping efficiency of the power-law dynamic is only analyzed in low-dimension case. ...” The author responded that Theorem 7 proves the multi-dimensional case. But the AC noted that it’s very likely that escaping time is exponential in dimension (because kappa needs to be larger than d as the author noted and the det() might also be exponential in d. The author did say in the revision that the dimension should be considered as the effective dimension of the hessian, but the AC couldn’t find a formal argument about it.) +“The assumptions made are somewhat strong and may not hold in some cases...” + +The reviewers also had a few clarity questions which the author addressed in revisions with re-organized writing. The AC weighed the pros and cons and found that the unclarity and potential exponential escaping time in the multi-dimensional case outweigh the pros. + +",ICLR2021, +uXOURAHBqrM,1642700000000.0,1642700000000.0,1,nWprF5r2spe,nWprF5r2spe,Paper Decision,Reject,"The main motivation of this work is to introduce robustness in federated learning, through a Wasserstein uncertainty set. The end result, however, leaves a mixed feeling: As the reviewers pointed out, the authors, perhaps for computational convenience, forgo strong duality and treat the important variable gamma as a hyperparameter, which renders large part of the work follow immediately from existing work: essentially, we simply use a different loss function in FedAvg. While there may be advantages to choose one loss over another in any specific application, this itself is not a significant contribution. The comparison against existing FL algorithms is also a bit weak: Despite of the reviewer's request, the authors did not compare to other robust FL algorithms (e.g., AFL), thus it is not clear what is the real advantage of the proposed algorithm. As a result, we believe the current draft is not ready for publication.",ICLR2022, +_q2U7RjHeuX,1642700000000.0,1642700000000.0,1,_ixHFNR-FZ,_ixHFNR-FZ,Paper Decision,Reject,"The paper tries to analyze the relationship between regularization, adversarial robustness, and transferability. + +Pros: +- An interesting problem was tackled. + +Cons: +- The main claim (Prop.3.1) is almost trivial. Prop. 3.1 shows that ""relative"" transferability is smaller for stronger regulariation, which is just a slight generalization of the triangler inequality ||YT = YS|| <= ||YT - Y|| + ||YS - Y|| for any Y in Fig.2. +- Experiments show negative correaltion between the relative transferability and accuracy, which is trivial. Large regularization degrades the accuracy which increases the ""relative transferability"". ""Absolute"" transferability in Appendix doesn't show clear negative correlations. +- Salmann et al. claimed that adversarially ""trained"" models transfer better, and did not claim that there are positive correlations between the transferability and robustness for general classifiers without adversarial training. So the finding in this paper is not surprising nor against Salmann et al. + +To prove that adversarial robustness is just a subproduct of regularization, the authors should show that the ""absolute"" transferability by adversarially trained classifier can be achieved by other regularization. Defining relative transferability is fine if it is just a decomposition to conduct an analysis of the absolute transferability. But no conclusion on the performance should be made from its analysis, because a trivial correlation will appear, i.e., (A-B) and B should be negatively correlated unless A strongly correlates to B. Also, this is highly misleading so that some reviewers seem to have misunderstood that the authors would have claimed that negative correlations between regularization and absolute transferability were observed in the original submission. + +Overall, the paper requires major revision.",ICLR2022, +Mefm7a0BcrT,1610040000000.0,1610470000000.0,1,8W7LTo_zxdE,8W7LTo_zxdE,Final Decision,Reject,The reviewers all agreed that the paper represent thorough work but also is closely related to existing literature. (All referees point to other non-overlapping literature so it is a crowded field the authors have entered.) The amount of novelty (needed) can always be discussed but given the referees unanimous opinion and knowledgable input it is better for this work to be rejected for this conference. Using this input can make this work a good paper for submission elsewhere. ,ICLR2021, +SJgGCBrZx4,1544800000000.0,1545350000000.0,1,Hygxb2CqKm,Hygxb2CqKm,"Important topic, favorable reviews but are the stated implications general?",Accept (Poster),The paper presents both theoretical analysis (based upon lambda-stability) and experimental evidence on stability of recurrent neural networks. The results are convincing but is concerns with a restricted definition of stability. Even with this restriction acceptance is recommended. ,ICLR2019,5: The area chair is absolutely certain +#NAME?,1642700000000.0,1642700000000.0,1,2DT7DptUiXv,2DT7DptUiXv,Paper Decision,Reject,"The paper studies the problem of metric learning for handling catastrophic forgetting. All the reviewers recommended clear reject because of writing issues and lack of experimental investigation to support the ideas. The authors did not provide a rebuttal. Hence, the reviewers' opinion still remains the same. AC agrees with the reviewers and believes that the paper is not yet ready.",ICLR2022, +ByAzrJ6Hz,1517250000000.0,1517260000000.0,526,S1tWRJ-R-,S1tWRJ-R-,ICLR 2018 Conference Acceptance Decision,Reject,"Thank you for submitting you paper to ICLR. ICLR. The consensus from the reviewers is that this is not quite ready for publication. In particular, the experimental results are promising, but further work is required to fully demonstrate the efficacy of the approach.",ICLR2018, +K7h6M2h9eT,1576800000000.0,1576800000000.0,1,ByeL1R4FvS,ByeL1R4FvS,Paper Decision,Reject,"The paper shows that data augmentation methods work well for consistency training on unlabeled data in semi-supervised learning. + +Reviewers and AC think that the reported experimental scores are interesting/strong, but scientific reasoning for convincing why the proposed method is valuable is limited. In particular, the authors are encouraged to justify novelty and hyper-parameters used in the paper. This is because I also think that it is not too surprising that more data augmentations in supervised learning are also effective in semi-supervised learning. It can be valuable if more scientific reasoning/justification is provided. + +Hence, I recommend rejection.",ICLR2020, +BkxnM1TrM,1517250000000.0,1517260000000.0,12,B1QRgziT-,B1QRgziT-,ICLR 2018 Conference Acceptance Decision,Accept (Oral),"This paper presents impressive results on scaling GANs to ILSVRC2012 dataset containing a large number of classes. To achieve this, the authors propose ""spectral normalization"" to normalize weights and stabilize training which turns out to help in overcoming mode collapse issues. The presented methodology is principled and well written. The authors did a good job in addressing reviewer's comments and added more comparative results on related approaches to demonstrate the superiority of the proposed methodology. The reviewers agree that this is a great step towards improving the training of GANs. I recommend acceptance.",ICLR2018, +0-WvuIrcvch,1610040000000.0,1610470000000.0,1,8qsqXlyn-Lp,8qsqXlyn-Lp,Final Decision,Reject,"A method is proposed for removing prior knowledge, presented as a +distance matrix, from low-dimensional embeddings, to focus them on +what is new. + +The task of visualizing novely in data is interesting and good +solutions would potentially be highly useful. + +The proposed method essentially substracts a distance matrix from +another. While this is sensible, it is not completely clear in what +sense this is _the_ right solution for what the embeddings will be +used for. + +In final discussions among the reviews, the main remaining concerns +were considered severe: comparisons to other methods being limited, +and possible problems in one of the experiments. +",ICLR2021, +Dh6Ma45Bhnc,1642700000000.0,1642700000000.0,1,QkfMWTl520U,QkfMWTl520U,Paper Decision,Reject,"This work presents a simple method for early stopping that is based on layer statistics. The reviewers have all commented on the work's relative lack of novelty, poor writing and insufficiently general empirical evidence for the method working. There are very few baselines being compared and little in terms of ablation studies. All the reviewers have provide extensive constructive comments about how this work can be improved and while there was no rebuttal or discussion, I feel that there is sufficient feedback in the process for the authors to improve this work further. + +In conclusion: while the idea of using stability of layer statistics has merit, at this point this work is not ready to be published at ICLR.",ICLR2022, +w1e9xMTmlT,1576800000000.0,1576800000000.0,1,r1xGP6VYwH,r1xGP6VYwH,Paper Decision,Accept (Poster),"The paper propose a scheme to enable optimistic initialization in the deep RL setting, and shows that it's helpful. + +The reviewers agreed that the paper is well-motivated and executed, but had some minor reservations (e.g. about the proposal scaling in practice). In an example of a successful rebuttal two of the reviewers raised their scores after the authors clarified the paper and added an experiment on Montezuma's revenge. + +The paper proposes a useful, simple and practical idea on the bridge between tabular and deep RL, and I gladly recommend acceptance.",ICLR2020, +B1loIzEmxE,1544930000000.0,1545350000000.0,1,Hye9lnCct7,Hye9lnCct7,good idea; general consensus,Accept (Poster),"To borrow the succinct summary from R1, ""the paper suggests a method for generating representations that are linked to goals in reinforcement learning. More precisely, it wishes to learn a representation so that two states are similar if the +policies leading to them are similar."" The reviewers and AC agree that this is a novel and worthy idea. + +Concerns about the paper are primarily about the following. +(i) the method already requires good solutions as input, i.e., in the form of goal-conditioned policies, (GCPs) +and the paper claims that these are easy to learn in any case. +As R3 notes, this then begs the question as to why the actionable representations are needed. +(ii) reviewers had questions regarding the evaluations, i.e., fairness of baselines, additional comparisons, and +additional detail. + +After much discussion, there is now a fair degree of consensus. While R1 (the low score) still has a remaining issue with evaluation, particularly hyperparameter evaluation, they are also ok with acceptance. The AC is of the opinion that hyperparameter tuning is of course an important issue, but does not see it as the key issue for this particular paper. +The AC is of the opinion that the key issue is issue (i), raised by R3. In the discussion, the authors reconcile the inherent contradiction in (i) based on the need of additional downstream tasks that can then benefit from the actionable representation, and as demonstrated in a number of the evaluation examples (at least in the revised version). The AC believes in this logic, but believes that this should be stated more clearly in the final paper. And it should be explained +the extent to which training for auxiliary tasks implicitly solve this problem in any case. + +The AC also suggests nominating R3 for a best-reviewer award.",ICLR2019,4: The area chair is confident but not absolutely certain +KOz0f3fkCdY,1642700000000.0,1642700000000.0,1,06Wy2BtxXrz,06Wy2BtxXrz,Paper Decision,Accept (Poster),"This paper presents a conditional variational autoencoder (CVAE) approach to solve an instance of stochastic integer program (SIP) using graph convolutional networks. Experiments show that their method achieves high quality solutions with high performance. + +It holds merit as an interesting novel application of CVAEs to the ML for combinatorial optimization literature, as well as for the nice empirical results which show a very nice improvement. Two reviewers had a concern that the contribution is a bit narrowly focused toward MILP-focused journal rather than a general-purpose ML conference since the core contribution is the novel application. On the other hand, they believe that combinatorial optimization has received growing interest from the ML community in recent years. + +All three reviewers vote for borderline accept of this paper. The authors have addressed some of reviewers' concerns, hence some reviewers increased their scores throughout the discussion phase.",ICLR2022, +L8_m6hIcIkhU,1642700000000.0,1642700000000.0,1,AcrlgZ9BKed,AcrlgZ9BKed,Paper Decision,Accept (Poster),"Summary: The paper studies RL and bandits in the conservative setting where the performance of the new, learnt policy should never be significantly worse than that of a baseline. + +Discussions: The main concern of the reviewers was about novelty, and specifically what new techniques and ideas were brought in this work compared to (Wu et al. 2016) and (Garcelon et al 2020). The authors have addressed these concerns and updated their draft accordingly. The reviewers have now all reached a consensus and recommend to accept this work. + +Recommendation: Accept",ICLR2022, +tVq1bBESvl,1576800000000.0,1576800000000.0,1,SylWNC4FPH,SylWNC4FPH,Paper Decision,Reject,"The paper introduces an interesting application of GNNs, but the reviewers find that the contribution is too limited and the motivation is too weak.",ICLR2020, +HJeXTf8de,1486400000000.0,1486400000000.0,1,BJK3Xasel,BJK3Xasel,ICLR committee final decision,Accept (Poster),"The paper presents a clean framework for optimizing for the network size during the training cycle. While the complexity of each iteration is increased, they argue that overall, the cost is significantly reduced since we do not need to train networks of varying sizes and cross-validate across them. The reviewers recommend acceptance of the paper and I am in agreement with them.",ICLR2017, +g25cclM2IIB,1642700000000.0,1642700000000.0,1,SHnXjI3vTJ,SHnXjI3vTJ,Paper Decision,Reject,"This work proposes a so-called self-supervised approach for few-shot learning. The self-supervision doesn't refer to the lack of use of any labels as in regular self-supervised embedding learning methods (here support sets are labelled), but refer to the fact that the query set's labels aren't used in their proposed objective. Instead the query labels are predicted by a primary network, which then uses these predicted labels to predict the ground truth labels on the support set. The support set label predictions can thus be used to derive a learning signal for the model. Some results are presented that suggest the method is competitive with respect to the state of the art. + +Reviewers are quite split on this work. Even the reviewers who are technically leaning toward accepting this work (rating of 6) mention concerns that are worrying, e.g. reviewer Gmcb and wTnW both mention concerns related to the fairness of the evaluation. + +I too share similar concerns. First, the method in question is effectively a transductive method (as opposed to inductive, as are many of the baselines this work compares to), a distinction that the paper does not make explicit or address directly. This distinction is important, as it is well known that transductive methods have an advantage over inductive methods. I did try to look for some published transductive baselines. One is the method of Zhang et al. (2021a), which the authors do beat on mini-ImageNet, but not on CUB (and in fact, the paper only reports the results from Zhang et al. on mini-ImageNet, even though the original paper actually reports results on CUB, which I find odd). On CUB, for 1-shot, Zhang et al. (2021a) outperforms SPDN, while for 5-shot SPDN does only very slightly better. The paper is not clear as to whether the compared baselines are transductive or not in the cross-domain experiments either. + +Second, by introducing a dual network that is separate from the primal network, the proposed model effectively is increasing the capacity of their model, relative to using only a primal network. This capacity is mostly used when performing the self-supervised optimization of the query labels, which would explain why this aspect of the proposed method is what yields the largest improvements. Given that capacity has a large effect on the performance of methods on few-shot learning benchmarks, I'm quite concerned that this is the more likely explanation for the (sometimes surprisingly large) improved performance. + +That said, I don't find the paper entirely without merit. The label optimization procedure is neat, and is probably the most interesting innovation of the paper. The use of a primal-dual architecture on the other hand is more incremental, e.g. relative to architectures used in semi-supervised few-shot learning. + +Overall, at this point, given the lukewarm evaluation by reviewers and the lingering concerns (or at a minimum, lack of clarity) on the fairness of the evaluation, I'm afraid I'm not comfortable to recommend accepting this work as it currently stands.",ICLR2022, +eV9ov5k4WI,1576800000000.0,1576800000000.0,1,SkgtbaVYvH,SkgtbaVYvH,Paper Decision,Reject,"The authors propose a method for automatic tuning of learning rates. The reviewers liked the idea but felt that there are much more extensive experiments to be done especially better baselines. Also, clarifying what aspect is automated is important, because no method can be truly automatic: they all have some hyperparameters. ",ICLR2020, +M5gUjAi2R6R,1642700000000.0,1642700000000.0,1,RftryyYyjiG,RftryyYyjiG,Paper Decision,Accept (Poster),"This paper reviews a number of parameter decomposition methods for BERT style contextual embedding models. The authors argue for the application of Tucker decomposition to the attention and feedforward layers of such models. Evaluation is performed for a range of models on the GLUE benchmark. Further ablation studies indicate that the distillation procedure employed is crucial for obtaining competitive results and the raw decomposition approaches are ineffective at directly approximating the original pre-trained model. + +Strengths: The reviewers generally agree that the methods explored and results presented in this paper are interesting and could be of use to those deploying large embedding models. The authors review a range of possible decomposition methods and use this to motivate their approach. The resulting levels compression are high while maintaining good performance, while the ablation study clearly shows the contribution of the various steps of the training pipeline. + +Weaknesses: The main weakness identified by the reviewers is the incremental nature of this work in comparison to previous works applying various decomposition and compression techniques to neural networks. They also highlight that many of the techniques discussed early in the paper are not compared in the evaluation. The authors have effectively responded to this issue by providing further comparisons and justification for their modelling choices (e.g. not compressing the embedding layers). + +Overall, despite the incremental nature of this work, I believe that there are enough though provoking ideas and results presented to warrant publication. Interestingly, as the authors emphasise in their response, the ablation study highlights that this work is not really about approximating the original models weights, as all of the work appears to be being done by the distillation procedure in concert with the choice weight decomposition. In general I wonder whether this paper would be better presented as exploring a structured distillation procedure rather than weight compression.",ICLR2022, +L2g-O09rgS,1610040000000.0,1610470000000.0,1,w5bNwUzj33,w5bNwUzj33,Final Decision,Reject,"The paper deals with cross-domain few-shot learning in the case of large source-target domain shifts. + +The paper received mostly below-threshold reviews, with one exception (R3) whose review is addressing more general aspects, but still with some concern, especially in relation to the experimental part (to which authors did not answer). R1's review is not of much help. + +Clarity of the presentation and missing details seem to be recurrent issues all over the reviewers, together with remarks concerning the experimental validation, which would have required a deep revision and improvement, in particular regarding the use of more backbones, better ablation (Hebbian learner contribution, unclear initialization), processing times/computational complexity, significant comparative analysis re robust baselines. + +The rebuttal clarifies some of the raised remarks but there are still issues, especially regarding Hebbian learning rule and ensemble learning strategies, and about results too, so not all reviewers were convinced to raise their ratings. + +Overall, given the above issues, I consider the paper not yet ready for publication in ICLR 2021. +",ICLR2021, +zKmocj4bdN,1576800000000.0,1576800000000.0,1,BylA_C4tPr,BylA_C4tPr,Paper Decision,Accept (Poster),"This paper proposes and evaluates a formulation of graph convolutional networks for multi-relation graphs. The paper was reviewed by three experts working in this area and received three Weak Accept decisions. The reviewers identified some concerns, including novelty with respect to existing work and specific details of the experimental setup and results that were not clear. The authors have addressed most of these concerns in their response, including adding a table that explicitly explains the contribution with respect to existing work and clarifying the missing details. Given the unanimous Weak Accept decision, the ACs also recommend Accept as a poster.",ICLR2020, +ryOIhGUul,1486400000000.0,1486400000000.0,1,rJJ3YU5ge,rJJ3YU5ge,ICLR committee final decision,Reject,"Three knowledgable reviewers recommend rejection. While the application is interesting and of commercial value, the technical contribution falls below the ICLR's bar. I encourage the authors to improve the paper and submit it to a future conference.",ICLR2017, +YmH37XDtXLS,1642700000000.0,1642700000000.0,1,iw-ms2znSS2,iw-ms2znSS2,Paper Decision,Reject,"The paper proposes a mechanism for A* planning with learned policy and value functions. The experiments (restricted to the Sokoban domain) show that the runtime of guided search follows a heavy-tailed distribution, suggesting that in many cases, the problem is either solved quickly or takes a long time. An abstract model is proposed to explain this distribution, and a number of mechanisms are proposed to overcome its challenges. + +The reviewers thought the paper had some interesting ideas but found the experimental section to be especially weak. While the paper starts out with quite general claims, the experiments only consider a single domain. Also, key details about the experiments were missing. Finally, the writing feels rushed -- the original submission had many typos and lacked proofs for two theorems. + +I agree with these objections and recommend rejection. Please revise the paper following the reviews and resubmit to a different deadline.",ICLR2022, +4E7IKMGHpu5,1610040000000.0,1610470000000.0,1,Lvb2BKqL49a,Lvb2BKqL49a,Final Decision,Reject,"This paper is a study in optimizing the Donsker-Varadhan lower bound on mutual information focusing on a ""drift"" problem. The bound is a difference between terms which appears to have an extra degree of freedom where the two terms increase or decrease together. They propose a fix for this problem. The authors state that the DV bound is of practical value but in most cases it is replaced by discriminative lower bounds as in contrastive predictive coding (CPC) which are biased but have lower variance. The paper does not address the variance (convergence) issues with the DV bound. + +We have a well informed reviewer who feels that the paper is not sufficiently novel and has other issues supporting rejection. Other reviews are not very enthusiastic. I will side with rejection.",ICLR2021, +DsrJqMCeO3,1576800000000.0,1576800000000.0,1,HyxnMyBKwB,HyxnMyBKwB,Paper Decision,Accept (Poster),"This paper studies the optimal value function for the gambler's problem, and presents some interesting characterizations thereof. The paper is well written and should be accepted.",ICLR2020, +SNEudislZCX,1610040000000.0,1610470000000.0,1,uie1cYdC2B,uie1cYdC2B,Final Decision,Reject,"Although all reviewers agree that the work is interesting and has potential, several issues in the presentation and the experimental section (especially regarding the ablation) need to be worked on before granting acceptance to the paper. ",ICLR2021, +dHVP-NrtadT,1610040000000.0,1610470000000.0,1,mhEd8uOyNTI,mhEd8uOyNTI,Final Decision,Reject,"The paper presents a significant body of seemingly solid work, but its contribution nevertheless feels limited: It evaluates a single MLM on a single dataset, and results are largely unsurprising. Note: The authors added experiments on other LMs in the rebuttal. The idea of using perturbations is related in spirit to many interpretability methods and adversarial techniques, and using higher-order correlations for interpreting neural networks is, for example, at the heart of relational similarity analysis. A few suggestions to make the work more relevant to a wider audience: Compare with several probing techniques - e.g., in a tree-decoding set-up - or contrast results across domains (using OntoNotes), or across languages (using OntoNotes and other PTB-style treebanks). Also: While results were added for multiple LMs, differences were not analysed in detail. ",ICLR2021, +rJgoXzIZxE,1544800000000.0,1545350000000.0,1,HJgZrsC5t7,HJgZrsC5t7,"Nice empirical results, but ad hoc approach",Reject,"The paper proposes an interesting idea for efficient exploration of on-policy learning in sparse reward RL problems. The empirical results are promising, which is the main strength of the paper. On the other hand, reviewers generally feel that the proposed algorithm is rather ad hoc, sometimes with not-so-transparent algorithmic choices. As a result, it is really unclear whether the idea works only on the test problems, or applies to a broader set of problems. The author responses and new results are helpful and appreciated by all reviewers, but do not change the reviewers' concerns.",ICLR2019,5: The area chair is absolutely certain +o5LVZEw5XrS,1610040000000.0,1610470000000.0,1,uQfOy7LrlTR,uQfOy7LrlTR,Final Decision,Accept (Poster),"Dear authors, + +as you have noticed this paper was not easy to review. I have hence invited 2 additional reviewers which I strongly respect and are very knowledgeable. After carefully reading the paper myself, I have to agree with one of the reviewers who said ""... it [your paper] makes a good contribution to the literature ...."". To be honest, we were working in my group on a very similar approach but did not manage to finish it (and I know how hard it is). + +To conclude, when preparing to the final version, please try to go over the reviews, I am sure they can make your paper even stronger :) +",ICLR2021, +Hyl1ee2Ay4,1544630000000.0,1545350000000.0,1,ryGpEiAcFQ,ryGpEiAcFQ,early work with possible merit,Reject,"In this paper, neural networks are taken a step further by increasing their biological likeliness. In particular, a model of the membranes of biological cells are used computationally to train a neural network. The results are validated on MNIST. + +The paper argumentation is not easy to follow, and all reviewers agree that the text needs to be improved. ˜The neuroscience sources that the models are based on are possibly outdated. Finally, the results are too meagre and, in the end, not well compared with competing approaches. + +All in all, the merit of this approach is not fully demonstrated, and further work seems to be needed to clarify this.",ICLR2019,5: The area chair is absolutely certain +S1xIRD66yN,1544570000000.0,1545350000000.0,1,ByxZX20qFQ,ByxZX20qFQ,clear consensus to accept,Accept (Poster),"There is a clear consensus among the reviews to accept this submission thus I am recommending acceptance. The paper makes a clear, if modest, contribution to language modeling that is likely to be valuable to many other researchers.",ICLR2019,5: The area chair is absolutely certain +djFI6ZPFJrp,1610040000000.0,1610470000000.0,1,o2ko2D_uvXJ,o2ko2D_uvXJ,Final Decision,Reject,"The paper proposes an MLP based approach for data without known structure (such as tabular data). At first, the data are partitioned into K blocks in a differentiable way, then the standard MLP is applied to each block. The results are then aggregated recursively to produce the final output. + +Pros: +1. Handling less structured data is surely an important problem in machine learning and is much less explored. +2. The paper is well written, easily understandable even with a fast browsing. +3. The experimental results show some improvement. + +Cons: +1. The approach is somewhat trivial, and the framework could be improved, see, e.g. Reviewers #3. +2. By the structure of the approach and the type of target data, a more reasonable comparison is with random forest (echoing Reviewer #1), which the authors added during rebuttal, rather than MLP etc. Maybe should even compare with deep random forest. Although the comparison with MLP etc. is quite favorable, the advantage over random forest is somewhat marginal (except on HAPT, which is a imagery data set and random forest may not be good at; also echoing Reviewers #1's comment on why using imagery data, which do not fit the theme of the paper). Reviewers #3 also had some concerns with the experiments. Reviewer #4 confirmed in the confidential comment that the performance improvement is incremental. + +Although the rebuttal seemed to be successful, thus both Reviewers #1 and #4 raised their scores, the average score is still at the borderline. Due to the limited acceptance rate, the area chair has to reject the paper.",ICLR2021, +kEhRefl2hBJ,1610040000000.0,1610470000000.0,1,Whq-nTgCbNR,Whq-nTgCbNR,Final Decision,Reject,The paper focuses on anomaly detection in dynamical systems from time series measurement. The originality of the contribution is to detect anomalies not based on the detection of OOD observations but from identified parameters or statistics of the dynamical system. They are using “polynomial neural networks. All the reviewers agree that the paper is not yet mature both in the form and in the technical content. The authors did not provide a rebuttal.,ICLR2021, +H1eotByYy4,1544250000000.0,1545350000000.0,1,Hygv0sC5F7,Hygv0sC5F7,meta-review,Reject,The reviewers and AC note the following potential weaknesses: 1) the proof techniques largley follow from previous work on linear models 2) it’s not clear how signficant it is to analyze a one-neuron ReLU model for linearly separable data. ,ICLR2019,4: The area chair is confident but not absolutely certain +3mUv9BsMQK,1576800000000.0,1576800000000.0,1,S1l8oANFDH,S1l8oANFDH,Paper Decision,Accept (Poster),"The authors consider control tasks that require ""inductive generalization"", ie +the ability to repeat certain primitive behaviors. +They propose state-machine machine policies, which switch between low-level +policies based on learned transition criteria. +The approach is tested on multiple continuous control environments and compared to +RL baselines as well as an ablation. + +The reviewers appreciated the general idea of the paper. +During the rebuttal, the authors addressed most of the issues raised in the +reviews and hence reviewers increased their score. + +The paper is marginally above acceptance. +On the positive side: Learning structured policies is clearly desirable but +difficult and the paper proposes an interesting set of ideas to tackle this +challenge. +My main concern about this work is: +The approach uses the true environment simulator, as the +training relies on gradients of the reward function. +This makes the tasks into planning and not an RL problems; this needs to be +highlighted, as it severly limits its applicability of the proposed approach. +Furthermore, this also means that the comparison to the model-free PPO baselines +is less meaningful. +The authors should clear mention this. +Overall however, I think there are enough good ideas presented here to warrant +acceptance. ",ICLR2020, +EhFs4dgxm,1576800000000.0,1576800000000.0,1,rJgVwTVtvS,rJgVwTVtvS,Paper Decision,Reject,"In this paper, the authors showed that for differentially private convex optimization, the utility guarantee of both DP-GD and DP-SGD is determined by the expected curvature rather than the worst-case minimum curvature. Based on this motivation, the authors justified the advantage of gradient perturbation over other perturbation methods. This is a borderline paper, and has been discussed after author response. The main concerns of this paper include (1) the authors failed to show any loss function that can satisfy the expected curvature inequality; (2) the contribution of this paper is limited, since all the proofs in the paper are just small tweak of existing proofs; (3) this paper does not really improve any existing gradient perturbation based differentially private methods. Due to the above concerns, I have to recommend reject.",ICLR2020, +2aqzQ-03qO,1576800000000.0,1576800000000.0,1,SkeATxrKwH,SkeATxrKwH,Paper Decision,Reject,"The authors proposes a generative model with a hierarchy of latent variables corresponding to a scene, objects, and object parts. + +The submission initially received low scores with 2 rejects and 1 weak reject. After the rebuttal, the paper was revised and improved, with significant portions of the paper completely rewritten (the description of the model was rewritten and a new experiment comparing the proposed model to SPAIR was added). While the reviewers acknowledged the improvement in the paper and accordingly adjusted their score upward, the paper is still not sufficiently strong enough to be accepted (it currently has 3 weak rejects). + +The reviewer expressed the following concerns: +1. The experiments uses only a toy dataset that does not convincingly demonstrate the generalizability of the method to more realistic/varied scenarios. In particular, the reviewers voiced concern that the dataset is tailored to the proposed method + +2. Lack of comparisons with baseline methods such as AIR/SPAIR and other work on hierarchical generative models such as SPIRAL. +In the revision, the author added an experiment comparing to SPAIR, so this is partially addressed. As a whole, the paper is still weak in experimental rigor. The authors argue that as their main contribution is the design and successful learning of a probabilistic scene graph representation, there is no need for ablation studies or to compare against baselines because their method ""can bring better compositionality, interpretability, transferability, and generalization"". This argument is unconvincing as in a scientific endeavor, the validity of such claims needs to be shown via empirical comparisons with prior work and ablation studies. + +3. Limited novelty +The method is a fairly straightforward extension of SPAIR with another hierarchy layer. This would not be a concern if the experimental aspects of the work was stronger. + +The AC agrees with the issues pointed by the reviewers. In addition, the initial presentation of the paper was very poor. While the paper has been improved, the changes are substantial (with the description of the method and intro almost entirely rewritten). Regardless, despite the improvements in writing, the paper is still not strong enough to be accepted. I would recommend the authors improve the evaluation and resubmit. ",ICLR2020, +zgu5b9nPlio,1642700000000.0,1642700000000.0,1,PiDkqc9saaL,PiDkqc9saaL,Paper Decision,Reject,"Thank you for your submission to ICLR. The reviewers were split on this paper, with more favoring acceptance but with relatively low confidence. After reading through the paper and reviews, I tend to agree slightly more with the more critical comments. The paper is very much on the borderline, but ultimately 1) the rather incremental nature of the work compared to [Bhagoji, 2021], and 2) the rather small-scale evaluations in the current version, which the field has largely moved on from as they often give overly-optimistic impressions of certified robustness. Ultimately a lot of the extensions (which at this point are fairly standard in most methods for deep network verification), seem like they should really be taken into account in the current paper. For these reasons I lean slightly towards not accepting the paper in its current state.",ICLR2022, +eRrEaqwyNTE,1610040000000.0,1610470000000.0,1,J7bUsLCb0zf,J7bUsLCb0zf,Final Decision,Reject," +This paper presents approach to improve compute and memory efficiency by freezing layers and storing latent features. The approach is simple and provide efficiency. However, there are concerns as well. One big concern is that the experiments are not on realistic settings for example real world images and the current CNN is too simple. Overall, the reviewers are split. The AC agrees with some of the reviewers that for a paper like this experiments on more realistic setting will make it significantly stronger. +",ICLR2021, +4qqiCFUOgu,1576800000000.0,1576800000000.0,1,Byg5ZANtvH,Byg5ZANtvH,Paper Decision,Accept (Poster),"The work considers sparse and short blind deconvolution problem, which is to inverse a convolution of a sparse source (such as spikes at cell locations in microscopy) with a short (of limited spatial size) kernel or point spread function, not known in advance. This is posed as a bilinear lasso optimization problem. The work applies a non-linear optimization method with some practical improvements (such as data-driven initialization, momentum, homotopy continuation). + +The paper extends the work by Kuo et al. (2019) by providing a practical algorithm for solving those inverse problems. A focus of the paper is to solve the bilinear lasso instead of the approximate bilinear lasso, because this approximation is poor for coherent problems. Having read the rebuttal and the paper, I believe the authors addressed the issues raised by Reviewer #2 in a sufficient way. + +small things: +- it would be good to define $\iota$ (zero-padding operator) in (1) +- it would be good to define $p, p_0$ just below (3). They seem to be appearing out of the blue without any direct relation to anything mentioned prior in section 2. +- it would be good to cite some older/historic references for various optimization methods , e.g. [1] below. + + +[1] Richter & deCarlo +Continuation methods: Theory and applications +IEEE Transactions on Systems, Man, and Cybernetics, 1983 +https://ieeexplore.ieee.org/abstract/document/6313131",ICLR2020, +0lQCtKMQOp5,1642700000000.0,1642700000000.0,1,gKWxifgJVP,gKWxifgJVP,Paper Decision,Reject,"Strengths: +* Well-written paper +* Strong empirical results on three benchmarks +* Interesting approach of producing semantically augmented LMs using dependency parses to extract svo triples, and finding coreferences between them across multiple sentences + +Weaknesses: +* None of the reviewers seem particularly excited about the paper +* Stronger baseline comparisons would have improved the paper +* Authors re-define a lot of terminology, but the novelty of the method is more from the type of graph used to initialize their method, which seems to be a function of OpenIE triplets",ICLR2022, +6-Byr8esOmR,1610040000000.0,1610470000000.0,1,G70Z8ds32C9,G70Z8ds32C9,Final Decision,Reject,"This paper received borderline scores, which makes for a difficult recommendation. Unfortunately, two of the reviews were too short and thus were of limited use in forming a recommendation. That includes the high-scoring one, which did not adequately substantiate its score. + +There is much to admire in this submission. Reviewers appreciated the originality of this research, linking rate reduction optimization to deep network architectures: +* R1: ""The paper proposes a novel perspective"" +* R4: ""The novelty of the paper is in that formulation of the feature optimisation is baked-in into a deep architecture"" +* R5: "" I think the construction seems interesting and the rate reduction metric seems like a reasonable thing to optimize. I found the relationship of coding rate maximization to ReduNet to be quite clever"" +* R3 (short): ""The innovative method allows the inclusion of a new layer structure named ReduNet"" + +Reviewers also applauded the paper's clarity, including R4 who raised their score to 6 based on satisfying clarity revisions from the authors: +* R1: ""The writing is good and easy to follow"" +* R4 post-discussion: ""Clarity is not an issues anymore - additional explanations provided by the authors and one more careful reading of the paper helped in understanding of all the aspects of the model"" +* R2 (short): ""The paper is well-structured."" + +However, there were some core questions around how well the main significance claims of the paper are supported. The most in-depth discussion on these topics is in the detailed thread with R5. In that thread there are many points discussed, but the two issues seem to be: +1. whether the connection between ReduNet and standard neural net architectures is sufficiently substantiated so as to constitute an explanation for behaviors of those standard architectures, like CNNs; and +2. whether the emergence of ReduNet's group invariance/equivariance is surprising or qualitatively new. + +The first is much more central. On the first issue, R5 writes in summary: +""Fundamentally I think the authors propose a hypothesis: that ReduNets explain DL models. However, the authors do not take meaningful steps towards validating this hypothesis. [...] I would contrast this with, for example, the scattering networks paper (https://arxiv.org/abs/1203.1513) which did an exceptional job of arguing for an ab initio explanation of convolutional networks."" + +I find R5's perspective on this point to be compelling, in that the paper currently doesn't do enough to justify these main claims, either through drawing precise nontrivial mathematical connections or through experimental validation. (The thread has a much more detailed and nuanced discussion.) + +The second issue is not quite as central to the significance of the paper, but it was noted by multiple reviewers: +* R5: ""I may be missing something, but given the construction of ReduNet, I feel as though the emergence of a convolutional structure subject to translation invariance is not terribly surprising."" +* R4: ""Finally, I am not sure if the result of obtaining a convnet architecture in ReduNet when translation invariance constraint is added the embedding is all that surprising."" +* R4 post-discussion: ""Reading the exchange between the authors and R5 I am still not fully convinced that translation invariance property is all that surprising, but for me that's not a reason to reject."" + +At the least, the paper as written hasn't yet convinced some readers (myself included) on these claims. + +As I mentioned at the start, this paper is borderline, but because I am largely aligned with R5's perspectives, I think this paper does not quite pass the bar for acceptance. I recommend a rejection, but I look forward to seeing a strengthened version of this work in the future. I hope the feedback here has been useful to bringing about that stronger version.",ICLR2021, +Sk3kDyTrf,1517250000000.0,1517260000000.0,916,BJgVaG-Ab,BJgVaG-Ab,ICLR 2018 Conference Acceptance Decision,Reject,"The authors make an argument for constructing an MDP from the formal structures of temporal logic and associated finite state automata and then applying RL to learn a policy for the MDP. This does not provide a solution for low-level skill composition, because there are discontinuities between states, but does provide a means for high level skill composition. + +The reviewers agreed that the paper suffered from sloppy writing and unclear methods. They had concerns about correctness, and were not impressed by the novelty (combining TL and RL has been done previously). These concerns tip this paper to rejection.",ICLR2018, +GHWvI3qQ0X-,1642700000000.0,1642700000000.0,1,U0k7XNTiFEq,U0k7XNTiFEq,Paper Decision,Accept (Poster),"This paper seeks to find an answer to some quite interesting research question: can deep vanilla networks without skip connections or normalization layers be trained as fast and accurately as ResNets? In this regard, the authors extend Deep Kernel Shaping and show that a vanilla network with leaky RELU-family activations can match the performance of a deep residual network. + +Four reviewers unanimously suggested acceptance of the paper. There were concerns about the clarity or marginal performance improvement. However, they all including myself agree: achieving the competitive performance with the vanilla deep model itself can be seen as a big contribution and the clarity has been improved to some extent through revision.",ICLR2022, +PXinSegCOe,1576800000000.0,1576800000000.0,1,SJeLO34KwS,SJeLO34KwS,Paper Decision,Reject,"As Reviewer 2 pointed out in his/her response to the authors' rebuttal, this paper (at least in current state) has significant shortcomings that need to be addressed before this paper merits acceptance.",ICLR2020, +ryegAVM3kE,1544460000000.0,1545350000000.0,1,B1lx42A9Ym,B1lx42A9Ym,Very interesting direction but requiring major revision for readability,Reject,"This paper introduced a Neural Rendering Model, whose inference calculation corresponded to those in a CNN. It derived losses for both supervised and unsupervised learning settings. Furthermore, the paper introduced Max-Min network derived from the proposed loss, and showed strong performance on semi-supervised learning tasks. + +All reviewers agreed this paper introduces a highly interesting research direction and could be very useful for probabilistic inference. However, all reviewers found this paper hard to follow. It was written in an overly condensed way and tried to explain several concepts within the page limit such as NRM, rendering path, max-min network. In the end, it was not able to explain key concepts sufficiently. + +I suggest the authors take a major revision on the paper writing and give a better explanation about main components of the proposed method. The reviewer also suggested splitting the paper into two conference submissions in order to explain the main ideas sufficiently under a conference page limit.",ICLR2019,4: The area chair is confident but not absolutely certain +MLIS1TmYJK,1642700000000.0,1642700000000.0,1,67T66kchK_7,67T66kchK_7,Paper Decision,Reject,"The paper proposes a novel meta-algorithm, called Self-Imitation Policy Learning through Iterative Distillation (SPLID) , which relies on the concept of -distilled policy to iteratively level up the quality of the target data and agent mimics from the relabeled target data. +Several aspects of the paper can be improved. The reviewers are concerned in particular about the experimental section which might not exhaust the core set of tasks, where the method should be compared with baselines. Furthermore the presentation can be significantly improved (lots of grammatical errors). Another major point is the novelty of the presented algorithm. + +In the rebuttal the authors tried to address some of the remarks, in particular by adding additional experiments to the empirical section of the paper. Those experiments still do not convince some of the reviewers. Furthermore, one of the biggest concerns is still a limited novelty of the approach. The presentation of the paper still needs to be substantially improved. Thus the paper still requires nontrivial work.",ICLR2022, +BkgGeBwSxE,1545070000000.0,1545350000000.0,1,HklKWhC5F7,HklKWhC5F7,reject,Reject,The reviewers conclude the paper does not bring an important contribution compared to existing work. The experimental study can also be improved. ,ICLR2019,5: The area chair is absolutely certain +1W7K2rTi4FI,1642700000000.0,1642700000000.0,1,qvUJV2-t_c,qvUJV2-t_c,Paper Decision,Reject,"This work proposes a method for automatic adaptation of the learning rate via a estimating quadratic approximation of the full batch during training. The method motivated by two observed properties of the loss landscape, first the full batch loss along the gradient direction is well approximated by a quadratic polynomial, and second the optimal full batch step size does not change quickly during training. Two primary criticisms raised by reviewers is the weak experimental evidence provided to validate the method and similarities with other approaches for adapting the learning rate. Ultimately reviewers remain unconvinced by the rebuttal and maintained their scores. The AC further stresses the difficulty of properly (and fairly) comparing optimization methods in deep learning. As is consistently shown in the literature, optimizer performance is typically dominated by hyperparameter tuning, this is particularly problematic when submissions tune their own baselines as authors naturally are incentivized to tweak their own methods until the method looks favorable relative to others. Comparing directly against prior published results tuned by other researchers would help alleviate reviewer concerns regarding hyperparameter tuning.",ICLR2022, +KdEQEkFhgWI,1642700000000.0,1642700000000.0,1,jbrgwbv8nD,jbrgwbv8nD,Paper Decision,Accept (Poster),"This paper does as it’s title suggests, it introduces an algorithm for constraining a CRF’s output space to correspond to a pre-specified regular language. The authors build upon a wealth of prior work aiming to enable CRFs to capture particular non-local dependencies and output constraints and present a coherent general algorithm to specify such constraints with a regular language. This is a clearly presented and well motivated contribution. + +The reviewers predominantly agree that this work is clearly and rigorously presented and that the formalisation of constraints for CRFs through regular languages is a useful contribution for practitioners. One reviewer questioned the utility of constraining the output distribution at training time. In response the authors convincingly argue that unconstrained models will fail to learn the data generating distribution when non-local constraints exist in the data and have included a clear synthetic example of this in the paper. + +The most significant weakness identified of this paper is the limited experimentation, consisting of one synthetic experiment and an application to semantic role labelling. The key motivation for formalising constraints on CRFs with regular languages is the argument that this allows model builders to use a familiar formalism across disparate tasks rather than producing bespoke solutions for each. As such it would be informative when assessing the contribution of this work to see a number of practical examples of task output spaces formalised as regular languages such that we can form an intuition for how natural this representation is for more than one task, while also shedding light on the ease, or otherwise, of the crucial processing of minimising the representation to maximise efficiency. + +While the application to a broader range of tasks would definitely strengthen this paper, in its current form it provides a useful formalism that will be of interest to those working in structured learning and as such is a contribution worthy of publication.",ICLR2022, +Xyf6uAqUjuw,1642700000000.0,1642700000000.0,1,f9JwVXMJ1Up,f9JwVXMJ1Up,Paper Decision,Reject,"This paper proposes a method for self-training in an open-world setting where a significant portion of unlabeled data might include examples that are not task related. The proposed method (ODST) uses a more accurate OOD detection technique which allows an improved sample selection leading to higher accuracy. + +Strong Points: +- This paper studies a very important and impactful problem. +- The paper is well-written. +- The empirical results show that the proposed method improves over prior work. +- To better understand the iterative scheme, authors provide theoretical analysis using Bayesian decision theory. + +Weak Points: +- Novelty: Given prior work on different variants of noisy students, this work has limited novelty. +- Dataset diversity: The main results are provided for CIFAR-10 and CIFAR-100 datasets which are very similar to each other. During the discussion period, authors added results on SVHN datasets but the accuracy gap between the proposed method and FixMatch is insignificant (FPR gap is higher but since the main goal is improving performance, I think showing accuracy is a more important measure here). +- Connecting theoretical results to the rest of the paper: The paper can be improved significantly if the theoretical results are more connected to the rest of the paper and in particular with the proposed algorithm. + +While 4 out of 5 reviewers are recommending rejection, I think this was a very close decision. Most reviewers were concerned with novelty which I think is a valid point. Given that and the fact that the theoretical results are very limited, showing strong empirical results are required to accept this paper. Even though the provided results on CIFAR datasets are strong, the result on SVHN does not show a significant improvement. I understand that running experiments on ImageNet might not be budget-friendly. However, it is possible to run similar experiments or other datasets to show the robustness of the proposed method to the choice of the dataset. Consequently, I recommend rejecting the paper and propose authors to resubmit after adding more datasets as part of their evaluation.",ICLR2022, +NMzDAf6POZ,1576800000000.0,1576800000000.0,1,rygePJHYPH,rygePJHYPH,Paper Decision,Reject,"This paper proposes an algorithm to produce well-calibrated uncertainty estimates. The work accomplishes this by introducing two loss terms: entropy-encouraging loss and an adversarial calibration loss to encourage predictive smoothness in response to adversarial input perturbations. + +All reviewers recommended weak reject for this work with a major issue being the presentation of the work. Each reviewer provided specific examples of areas in which the paper text, figures, equations etc were unclear or missing details. Though the authors have put significant effort into responding to the specific reviewer mentions, the reviewers have determined that the manuscript would benefit from further revision for clarity. + +Therefore, we do not recommend acceptance of this work at this time and instead encourage the authors to further iterate on the manuscript and consider resubmission to a future venue. +",ICLR2020, +MX_4iyk1e5,1610040000000.0,1610470000000.0,1,l-LGlk4Yl6G,l-LGlk4Yl6G,Final Decision,Accept (Poster),"The paper considers a new linear-algebraic problem motivated by applications such as metagenomics which requires the algorithm to partition the coordinates of a long noisy vector according to a few known subspaces. A number of theoretical questions were asked (e.g., identifiability; efficient algorithms and their error bounds; etc). + +The reviewers generally liked the paper for what it does. Specific suggestions were raised by the reviewers, including how the paper went into length about the motivating applications but did not end up evaluating the proposed algorithms on any motivating applications; and that the main theoretical results were not technically challenging / nor surprising (although the authors provided a fair justification in their rebuttal). + +The AC finds the paper an outlier in terms of the topics among papers typically received by ICLR, but liked the paper precisely because it is different. The authors are encouraged to discuss the connections of the specific problem to the context of representation learning and machine learning in general. + +Overall, I believe the paper is a solid borderline accept. + +",ICLR2021, +2NtVuXih4H,1610040000000.0,1610470000000.0,1,BfayGoTV4iQ,BfayGoTV4iQ,Final Decision,Reject," +Description: +The paper presents a generative model, SketchEmbedNet, for class-agnostic generation of sketch drawings from images. They leverage sequential data in hand-drawn sketches. Results shows this outperforms STOA on few-shot classification tasks, and the model can generate sketches from new classes after one shot. + +Strengths: +- Detailed, technically sound, presentation +- Shows that enforcing the decoder to output sequential data leads to a more informative internal representation, and thus generate better quality sketches +- Improves over unsupervised STOA methods + +Weaknesses: +- Experiments are done against methods that do not use the sequential aspect of sketches. Because ground-truth in this case contains much more data, comparison is not quite fair. +- Will have been useful to see results against a baseline that uses it. +- Quality of sketches generated from natural images is low",ICLR2021, +kAXnI136ETF,1610040000000.0,1610470000000.0,1,InGI-IMDL18,InGI-IMDL18,Final Decision,Reject,"In this paper, the authors propose to adapt the recent paper by Yu et al. (ICML 2020), namely FedAwS. In that paper, the authors solved a potential failure mode in federated learning, when all the users only have access to one class in their devices. In this paper, the authors extend FedAwS to a setting in which federated learning is used for User Verification (UV), namely FedUV. The authors argue that the previous paper could not be the solution to learning UV because FedAwS share the embedding vectors with the server. + +The authors then show a procedure in which they can learn a classifier in which the embedding vectors are not needed to be shared with the classifier. They use error-correcting codes to make the mapping sufficiently different and that allows the training to succeed without sharing the embedding. The proposed change is only marginally worse than FedAwS and centralized learning. This is the part of the paper that has attracted positive comments and is praised by all the reviewers. + +The authors take as given that by not sharing the embedding vectors and by using randomly generated error-correcting codes, the whole procedure is privacy-preserving and secure. The 4th reviewer indicates that these guarantees need to be proven and points out several references that hint toward flaws in the argument by the authors. Reviewer 4th does say that not sharing the embeddings might not be enough, but that self-evident arguments are not enough. + +This paper provides a significant improvement for a federated machine learning algorithm that deserves publication, but the rationale of the paper is flawed from a privacy and security viewpoint. I think if the paper is published as is, especially with the proposed title, it will create a negative reaction by the security and privacy community for not adhering to their standards. We cannot lower those standards. + +I suggest to the authors that they can follow two potential paths for publishing this work: + +1 Change the scope of their algorithm. For example, I can imagine that by not sharing the embedding the communication load with the server might be significantly reduced or that adding new users with new classes can be easier. + +2 Follow the recommendation from Reviewer 4 and show that the proposed method is robust against the different attacks. + +Minor comments: + +For a paper that is trying to solve the AU problem, I would expect a discussion about why learning is better than a private algorithm. In a way, learning is sharing, and that increases the risk of mischief by malicious users. + +The discussion about error-correcting codes and the minimum distance is quite old fashion. In high dimensions, the minimum distance is not the whole story. LDPC codes make sense when we stop focusing on minimum distance codes and minimum distance decoding. I would recommend having a look at the Berlekamp’s Bat discussion in David MacKay’s book (Chapter 13).",ICLR2021, +G7ogdRwKQPe,1642700000000.0,1642700000000.0,1,7b4zxUnrO2N,7b4zxUnrO2N,Paper Decision,Accept (Spotlight),"This paper proposes a hierarchical reinforcement learning approach that exploits affordances to better explore/prune the subtasks, and thus making the overall learning more efficient. + +The idea of the paper is novel and interesting. + +After the rebuttal, all the reviewers agree that the paper is a solid contribution. +Therefore, I recommend acceptance of this paper.",ICLR2022, +2NjCDZRM7o,1576800000000.0,1576800000000.0,1,H1eWGREFvB,H1eWGREFvB,Paper Decision,Reject,"This paper proposes a new sampling mechanism which uses a self-repulsive term to increase the diversity of the samples. + +The reviewers had concerns, most of which were addressed in the rebuttal. Unfortunately, none of the reviewers genuinely championed the paper. Since there were a lot of good submissions this year, we had to make decisions on the borderline papers and this lack of full support means that this submission will be rejected. + +I highly encourage you to keep updating the manuscript and to rebusmit it to a later conference.",ICLR2020, +YI-wHKwWEE,1642700000000.0,1642700000000.0,1,rI0LYgGeYaw,rI0LYgGeYaw,Paper Decision,Accept (Poster),"The paper proposes an unrolled algorithm to solve the l1-norm formulated dictionary learning problem, and focuses on the number of unrolling steps. It shows that it is better to limit the number of unrolling steps, and this leads to favorable performance over the alternating minimization baseline. The method can also be adapted to scale to very large datasets. + +Most reviewers were positive or became positive after the rebuttals. Reviewer njnY was still concerned about some issues, such as constraints and the choice of the l1 model over the l0 model; there also may have been confusion about unit sphere vs unit ball constraints. However, given the recommendations of the other reviewers and my own opinion, I think the paper is a worthy contribution, and the point about not unrolling too deeply is an important one that is worth highlighting.",ICLR2022, +SJAnG1TSf,1517250000000.0,1517260000000.0,24,rJWechg0Z,rJWechg0Z,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper presents a nice approach to domain adaptation that improves empirically upon previous work, while also simplifying tuning and learning. +",ICLR2018, +_CMgZHYoG-e,1642700000000.0,1642700000000.0,1,rSI-tyrv-ni,rSI-tyrv-ni,Paper Decision,Reject,"This is a clearly written paper about integration of entity abstraction to the transformer based language modeling methods for language processing tasks that require reasoning (this is clarified by the authors later as tasks that require linger chains of reasoning) and have shown results on CLUTTR, HotpotQA, and CoQA. The reviewers seem to agree on two issues: First, it is not clear why the proposed idea does not result in a lot of improvement, except the synthetic CLUTTR. Authors provided additional experimental results on yet another dataset. Second, the paper would benefit from a detailed analysis of the experimental results, for example, why don't abstractions help on all datasets.",ICLR2022, +VQSBEcGWti,1576800000000.0,1576800000000.0,1,HkxSOAEFDB,HkxSOAEFDB,Paper Decision,Reject,"Two reviewers are negative on this paper while the other one is slightly positive. Overall, the paper does not make the bar of ICLR and thus a reject is recommended.",ICLR2020, +T4ZImSk5CAO,1642700000000.0,1642700000000.0,1,9zcjXdavnX,9zcjXdavnX,Paper Decision,Reject,"This paper proposes a new approximate sampling approach called Quasi Rejection Sampling (QRS) to exploit global proposal distributions without requiring to know a bound on the associated importance ratio, and providing a trade-off between the approximation quality of the +sampler and its efficiency. QRS is demonstrated on EBM-based text generation tasks. The reviews acknowledge the simplicity of the approach which when combined with advances in learning proposal distributions opens up many potential applications. At the same time, the reviews indicate that more work could be done to make the empirical demonstrations more compelling, with a more thorough coverage of comparisons with MCMC and other alternatives. The authors are encouraged to revise their submission and clarify significance and novelty.",ICLR2022, +p72TsB7FtJ,1576800000000.0,1576800000000.0,1,S1et8gBKwH,S1et8gBKwH,Paper Decision,Reject,"This paper addresses the problem of rotation estimation in 2D images. The method attempted to reduce the labeling need by learning in a semi-supervised fashion. The approach learns a VAE where the latent code is be factored into the latent vector and the object rotation. + +All reviewers agreed that this paper is not ready for acceptance. The reviewers did express promise in the direction of this work. However, there were a few main concerns. First, the focus on 2D instead of 3D orientation. The general consensus was that 3D would be more pertinent use case and that extension of the proposed approach from 2D to 3D is likely non-trivial. The second issue is that minimal technical novelty. The reviewers argue that the proposed solution is a combination of existing techniques to a new problem area. + +Since the work does not have sufficient technical novelty to compare against other disentanglement works and is being applied to a less relevant experimental setting, the AC does not recommend acceptance. +",ICLR2020, +93c40IG7erB,1642700000000.0,1642700000000.0,1,5MbRzxoCAql,5MbRzxoCAql,Paper Decision,Reject,"The reviewers unisono do not accept the paper, because it is (a) not well-written; (b) experimentally not convincing, but addresses a nice problem. I suggest that the authors address the issues in a subsequent paper, and resubmit to one of the main conferences.",ICLR2022, +_SCtyp4YNeS,1642700000000.0,1642700000000.0,1,7QfLW-XZTl,7QfLW-XZTl,Paper Decision,Accept (Poster),"All reviewers except one agreed that this paper should be accepted because of the strong author response during the rebuttal phase. Specifically the reviewers appreciated the new ablation study showing that improvements are not due to minor architectural changes, the new experiment on the number of time steps required for experiments, the agreement to change language around ""neural energy minimization"", the improvements to the related work, the novelty of the unrolled optimization approach, and the nice experimental results. Given this, I vote to accept. Authors: please carefully revise the manuscript based on the suggestions by the reviewers: they made many careful suggestions to improve the work and stressed that the paper should only be accepted once these changes are implemented. Once these are done the paper will be a nice addition to the conference!",ICLR2022, +aWzpYeDJzk0,1610040000000.0,1610470000000.0,1,padYzanQNbg,padYzanQNbg,Final Decision,Reject,"The reviewers agree that this paper has some interesting ideas. However, they believe it needs more work before it is ready for publication, especially so with regards to presentation (SDEs as GANs) and the experiments (backpropagating through the solver rather than using the adjoint dynamics). These would significantly strengthen the paper, but would probably require another round of reviews.",ICLR2021, +G_bpX0yhQY,1642700000000.0,1642700000000.0,1,9jsZiUgkCZP,9jsZiUgkCZP,Paper Decision,Accept (Poster),This paper receives positive reviews. The authors provide additional results and justifications during the rebuttal phase. All reviewers find this paper interesting and the contributions are sufficient for this conference. The area chair agrees with the reviewers and recommends it be accepted for presentation.,ICLR2022, +eyPEgYbl2ex,1610040000000.0,1610470000000.0,1,PeT5p3ocagr,PeT5p3ocagr,Final Decision,Reject,"This paper proposes a hybrid algorithm that combines RL and population-based search. The work is interesting and well-written. But, the contribution of the work is very limited, in comparison with the state-of-the-art. ",ICLR2021, +3o2n87GV-0z,1642700000000.0,1642700000000.0,1,h4EOymDV3vV,h4EOymDV3vV,Paper Decision,Reject,"The paper proposes extracting multiple-scale features using denoising score matching. Reviewers pointed out the limited novelty in the work and that it does not cite various previous work and how it connects to them. The paper needs some further polishing on the writing, and in making the use of lambda divergences more rigorous and principled as explained in the comment of Reviewer VdM1 .",ICLR2022, +4LA0O9iDZyv,1642700000000.0,1642700000000.0,1,BAtutOziapg,BAtutOziapg,Paper Decision,Reject,This paper shows that SLGD can be non-private (in the sense of differential privacy) even when a single step satisfies DP and also when sampling from the true posterior distribution is DP. I believe that it is useful to understand the behavior of SLGD in the intermediate regime. At the same time the primary question is whether SLGD is DP when the parameters are chosen so as to achieve some meaningful approximation guarantees after some fixed number of steps T and the algorithm achieves them while satisfying DP (but at the same does not satisfy DP for some number of step T' >T). Otherwise the setting is somewhat artificial and I find the result to be less interesting and surprising. So while I think the overall direction of this work is interesting I believe it needs to be strengthened to be sufficiently compelling.,ICLR2022, +SkCq4kaBf,1517250000000.0,1517260000000.0,418,rkEfPeZRb,rkEfPeZRb,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"The reviewers find the gradient compression approach novel and interesting, but they find the empirical evaluation not fully satisfactory. Some aspects of the paper have improved with the feedback from the reviewers, but because of the domain of the paper, experimental evaluation is very important. I recommend improving the experiments by incorporating the reviewers' comments.",ICLR2018, +pkanhS4rLC,1576800000000.0,1576800000000.0,1,rkgPnhNFPB,rkgPnhNFPB,Paper Decision,Reject,"The paper theoretically shows that the data (embedded by representations learned by GANs) are essentially the same as a high dimensional Gaussian mixture. The result is based on a recent result from random matrix theory on the covariance matrix of data, which the authors extend to a theorem on the Gram matrix of the data. The authors also provide a small experiment comparing the spectrum and principle 2D subspace of BigGAN and Gaussian mixtures, demonstrating that their theorem applies in practice. + +Two of the reviews (with confident reviewers) were quite negative about the contributions of the paper, and the reviewers unfortunately did not participate in the discussion period. + +Overall, the paper seems solid, but the reviews indicate that improvements are needed in the structure and presentation of the theoretical results. Given the large number of submissions at ICLR this year, the paper in its current form does not pass the quality threshold for acceptance.",ICLR2020, +Skg2l7ngeV,1544760000000.0,1545350000000.0,1,SyerAiCqt7,SyerAiCqt7,Meta-Review for Group Profiling paper,Reject,"All reviewers agree to reject. While there were many positive points to this work, reviewers believed that it was not yet ready for acceptance.",ICLR2019,5: The area chair is absolutely certain +SJvcsfL_e,1486400000000.0,1486400000000.0,1,BkCPyXm1l,BkCPyXm1l,ICLR committee final decision,Reject,The reviewers unanimously recommend rejection.,ICLR2017, +KMA5TmQXjS_,1642700000000.0,1642700000000.0,1,XEW8CQgArno,XEW8CQgArno,Paper Decision,Accept (Spotlight),"*Summary:* Low-rank bias in nonlinear architectures. + +*Strengths:* +- Significant theoretical contribution. +- Well written; detailed sketch of proofs. + +*Weaknesses:* +- More intuitions desired. +- Restrictive assumptions. + +*Discussion:* + +Authors made efforts to improve the discussion in response to 6P7z. Authors agree with eeoo about Assumption 2 being relatively restrictive but point out that main results do not need it. They discuss Assumption 1 and revised it formulation. Reviewer eeoo was satisfied with this. Following the discussion udhX raised their score (after authors acknowledged an early problems and improved them) and found the paper well written with novel and significant results. + +*Conclusion:* + +Three reviewers consider this a good paper that should be accepted. A fourth reviewer rated it marginally above the acceptance threshold but following the discussion period explicitly recommended acceptance. I find the topic interesting, timely, relevant. In view of unanimously favorable feedback from four reviewers I am recommending accept.",ICLR2022, +0KZ0PJlGFgJ,1642700000000.0,1642700000000.0,1,_DqUHcsQfaE,_DqUHcsQfaE,Paper Decision,Reject,"This paper proposes a personalized federated learning method using a hyper-network to encode unlabeled data from new clients. At inference time, new clients can use unlabeled data as input to this hyper-network in order to obtain a personalized version of the model. The key strength of the paper is that the idea is interesting and timely. Personalization has been studied for clients that participate from the beginning of training, but personalization of models for new clients that join later on has not been considered in most previous works. The experimental results also show a reasonable improvement over the baselines. However, the following concerns remain: +1) Novelty in comparison with reference [1]. Please add a detailed comparison when you revise the paper. +2) Explanation of the experimental results and comparison with baselines was deemed insufficient by some of the reviewers. +3) The generalization bound and the DP results seem standard extensions of existing works and do not add much novelty to the paper. + +There wasn't much post-rebuttal discussion and the reviewers decided to stick to their original scores. Therefore, I recommend rejection of the paper. I hope that the authors will take the reviewers' constructive comments into account when revising the paper for a future resubmission.",ICLR2022, +HJY-NJ6Hf,1517250000000.0,1517260000000.0,295,B1l8BtlCb,B1l8BtlCb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The paper proposes a novel method for training a non-autoregressive machine translation model based on a pre-trained auto-regressive model. The method is interesting and the evaluation is carried out well. It should be noted, however, that the relative complexity of the training procedure (which involves multiple stages and external supervision) might limit the practical applicability and impact of the technique.",ICLR2018, +SJglkvRglE,1544770000000.0,1545350000000.0,1,rkgoyn09KQ,rkgoyn09KQ,"While not groundbreaking, the proposed model is new and the empirical results are strong. ",Accept (Poster),"This paper presents an extension of an existing topic model, DocNADE. Compared to DocNADE and other existing bag-of-word topic models, the primary contribution of this work is to integrate neural language models into the topic model in order to address two limitations of the bag-of-word topic models: expressiveness and interpretability. In addtion, the paper presents an approach to integrate external knowledge into the neural topic models to address the empirical challenges of the application scenarios where there might be only a small training corpus or limited context available. + +Pros: +The paper presents strong and extensive empirical results. The authors went above and beyond to strengthen their paper during the rebuttal and address all the reviewers' questions and suggestions (e.g., the submitted version had 7 baselines, and the revised version has 6 additional baselines per reviewers' requests). + +Cons: +The paper builds on an earlier paper that introduced the DocNADE model. Thus, the modeling contribution is relatively marginal. On the other hand, the extended model, albeit based on a relatively simple idea, is still new and demonstrates strong empirical results. + +Verdict: +Probably accept. While not groundbreaking, the proposed model is new and the empirical results are strong. ",ICLR2019,3: The area chair is somewhat confident +BJgHIF64eN,1545030000000.0,1545350000000.0,1,r1xRW3A9YX,r1xRW3A9YX,Insufficient evidence,Reject,"This paper proposes a generalization of the translation-style embedding approaches for link prediction to Riemannian manifolds. The reviewers feel this is an important contribution to the recent work on embedding graphs into non-Euclidean spaces, especially since this work focuses on multi-relational links, thus supporting knowledge graph completion. The results on WN11 and FB13 are also promising. + +The reviewers and AC note the following potential weaknesses: (1) the primary concern is the low performance on the benchmarks, especially WN18 and FB15k, and not using the appropriate versions (WN18-RR and FB15k-237), (2) use of hyperbolic embedding for an entity shared across all relations, and (3) lack of discussion/visualization of the learned geometry. + +During the discussion phase, the authors clarified reviewer 1's concern regarding the difference in performance between HolE and ComplEx, along with providing a revision that addressed some of the clarity issues raised by reviewer 3. The authors also justified the lower performance due to (1) they are focusing on low-dimensionality setting, and (2) not all datasets will fit the space of the proposed model (like FB15k). However, reviewers 2 and 3 still maintain that the results provide insufficient evidence for the need for Riemannian spaces over Euclidean ones, especially for larger, and more realistic, knowledge graphs. + +The reviewers and the AC agree that the paper should not be accepted in the current state. +",ICLR2019,4: The area chair is confident but not absolutely certain +90U9go4_jgp,1642700000000.0,1642700000000.0,1,qO-PN1zjmi_,qO-PN1zjmi_,Paper Decision,Reject,"The authors propose a semi-supervised novelty detection method which tries to identify out-of-distribution samples in the unlabeled data (consisting of in- and out-distribution samples) using a disagreement score of an ensemble. The ensemble is generated by fine-tuning the trained classiifer on the labeled training data plus the unlabeled data which all get a fixed label (which is repeated several times to generate the ensemble). The main idea is that one uses early stopping based on an in-distribution validation set in order to avoid overfitting on the unlabeled points which allows then identification of the out-distribution points via the disagreement score. + +The reviewers appreciated the simplicity of the approach and the extensive experimental results. The authors did a good job in trying to answer all questions and concerns of the reviewers. + +However, some concerns remained: +- the setting assumes that the OOD data is fixed which was considered as partially unrealistic and thus evaluation of the OOD detection performance on unseen OOD distributions was requested in order to understand the limitations of the method (this was only partially done by the authors). +- the theoretical result is for a two-layer network and completely based on previous work. As the authors use much deeper networks later on in the experiments, this result cannot be used to theoretically justify the approach. +- there remained concerns about the necessary diversity of the ensemble and the early stopping procedure + +While I think that the paper has its merits, it is not yet ready for publication. I encourage the authors to to take into account the above points and other remaining concerns of the reviewers in a revised version.",ICLR2022, +vAPRKbxUK9,1576800000000.0,1576800000000.0,1,SyxJU64twr,SyxJU64twr,Paper Decision,Reject,"This paper considers the challenge of sparse reward reinforcement learning through intrinsic reward generation based on the deviation in predictions of an ensemble of dynamics models. This is combined with PPO and evaluated in some Mujoco domains. + +The main issue here was with the way the sparse rewards were provided in the experiments, which was artificial and could lead to a number of problems with the reward structure and partial observability. The work was also considered incremental in its novelty. These concerns were not adequately rebutted, and so as it stands this paper should be rejected.",ICLR2020, +d6_sdLCIL,1576800000000.0,1576800000000.0,1,Hye4WaVYwr,Hye4WaVYwr,Paper Decision,Reject,"The paper provides some insight why model-based RL might be more efficient than model-free methods. It provides an example that even though the dynamics is simple, the value function is quite complicated (it is in a fractal). Even though the particular example might be novel and the construction interesting, this relation between dynamics and value function is not surprising, and perhaps part of the folklore. The paper also suggests a model-based RL methods and provides some empirical results. + +The reviewers find the paper interesting, but they expressed several concerns about the relevance of the particular example, the relation of the theory to empirical results, etc. The authors provided a rebuttal, but the reviewers were not convinced. Given that we have two Weak Rejects and the reviewer who is Weak Accept is not completely convinced, unfortunately I can only recommend rejection of this paper at this stage.",ICLR2020, +ibBdxWQiaxcK,1642700000000.0,1642700000000.0,1,Ee2ugKwgvyy,Ee2ugKwgvyy,Paper Decision,Reject,"In this paper, in order to theoretically investigate the relationship between graph structure and labels in GNNs, interaction probabity and frequency indicators are introduced and analyzed, and a new family of GNNs with multiple filters is proposed based on the insights from the theoretical analysis, +In the discussion, there was an opinion that the theoretical analysis is interesting, but its novelty and clarity are limited. Although certain contributions are acknowledged, the impact is marginal and the audience for which this paper will matter is rather limited.",ICLR2022, +pSDtJKpoiTV,1642700000000.0,1642700000000.0,1,G33_uTwQiL,G33_uTwQiL,Paper Decision,Reject,"The paper proposes a symmetry-informed neural network for modelling many-body systems. The network is empirically evaluated in the tasks of predicting Newtonian trajectories and molecular conformations. + +All four reviewers are critical of the paper and recommend rejection (one weak, three strong). The reviews have flagged weaknesses and quality issues with several aspects of the submission, including the proposed methodology, the novelty of the contribution, and the clarity of the presentation. Although detailed clarifications were provided by the authors, most of the reviewers' concerns remain, and the consensus among reviewers remains to reject the paper. + +Consequently, the current version of the paper does not appear to meet the quality standards for acceptance to ICLR.",ICLR2022, +8P7NpCs8z5,1576800000000.0,1576800000000.0,1,rkliHyrFDB,rkliHyrFDB,Paper Decision,Reject,"The authors develop a novel connection between information theoretic MPC and entropy regularized RL. Using this connection, they develop Q learning algorithm that can work with biased models. They evaluate their proposed algorithm on several control tasks and demonstrate performance over the baseline methods. + +Unfortunately, reviewers were not convinced that the technical contribution of this work was sufficient. They felt that this was a fairly straightforward extension of MPPI. Furthermore, I would have expected a comparison to POLO. As the authors note, their approach is more theoretically principled, so it would be nice to see them outperforming POLO as a validation of their framework. + +Given the large number of high-quality submissions this year, I recommend rejection at this time.",ICLR2020, +EzhVlfecRj,1576800000000.0,1576800000000.0,1,Skgeip4FPr,Skgeip4FPr,Paper Decision,Reject,"This article studies the inductive bias in a simple binary perceptron without bias, showing that if the weight vector has a symmetric distribution, then the cardinality of the support of the represented function is uniform on 0,...,2^n-1. Since the number of possible functions with support of extreme cardinality values is smaller, the result is interpreted as a bias towards such functions. Further results and experiments are presented. The reviewers found this work interesting and mentioned that it contributes to the understanding of neural networks. However, they also expressed concerns about the contribution relying crucially on 0/1 variables, and that for example with -1/1 the effect would disappear, implying that the result might not be capturing a significant aspect of neural networks. Another concern was whether the results could be generalised to other architectures. The authors agreed that this is indeed a crucial part of the analysis, and for the moment pointed at empirical evidence for the appearance of this effect in other cases. The reviewers also mentioned that the motivation was not very clear, that some of the derivations were difficult to follow (with many results presented in the appendix), and that the interpretation and implications were not sufficiently discussed (in particular, in relation to generalization, missing a more detailed discussion of training). This is a good contribution and the revision made important improvements on the points mentioned above, but not quite reaching the bar. ",ICLR2020, +aX7CipBXv7,1576800000000.0,1576800000000.0,1,Hke3gyHYwH,Hke3gyHYwH,Paper Decision,Accept (Poster),"This paper studies the effect of various regularization techniques for dealing with noisy labels. In particular the authors study various regularization techniques such as distance from initialization to mitigate this effect. The authors also provide theory in the NTK regime. All reviewers have positive assessment about the paper and think is clearly written with nice contributions but do raise some questions about novelty given that it mostly follows the normal NTK regime. I agree that the paper is nicely written and well-motivated. I do not think the theory developed here fully captures all the nuances of practical observations in this problem. In particular, with label noise this theory suggests that test performance is not dramatically affected by label noise when using regularization or early stopping where as in practice what has been observed (and even proven in some cases) is that the performance is completely unaffected with small label noise. I think this paper is a good addition to ICLR and therefore recommend acceptance but recommend the authors to more clearly articulate the above nuances and limitations of their theory in the final manuscript. ",ICLR2020, +jZBh9cesdt,1576800000000.0,1576800000000.0,1,SygRikHtvS,SygRikHtvS,Paper Decision,Reject,"This paper investigates the practical and theoretical consequences of speeding up training using incremental gradient methods (such as stochastic descent) by calculating the gradients with respect to a specifically chosen sparse subset of data. + +The reviewers were quite split on the paper. + +On the one hand, there was a general excitement about the direction of the paper. The idea of speeding up gradient descent is of course hugely relevant to the current machine learning landscape. The approach was also considered novel, and the paper well-written. + +However, the reviewers also pointed out multiple shortcomings. The experimental section was deemed to lack clarity and baselines. The results on standard dataset were very different from expected, causing worry about the reliability, although this has partially been addressed in additional experiments. The applicability to deep learning and large dataset, as well as the significance of time saved by using this method, were other worries. + +Unfortunately, I have to agree with the majority of the reviewers that the idea is fascinating, but that more work is required for acceptance to ICLR. ",ICLR2020, +G8dIIqBdj,1576800000000.0,1576800000000.0,1,BkgZxpVFvH,BkgZxpVFvH,Paper Decision,Reject,"The paper proposes a deep learning architecture for forecasting Origin-Destination (OD) flow. The model integrates several existing modules including spatiotemporal graph convolution and periodically shifted attention mechanism. + +The reviewers agree that the paper is not written well, and the experiments are also not executed well. Overall, we recommend rejection.",ICLR2020, +H1hSEJpSf,1517250000000.0,1517260000000.0,348,ryZ283gAZ,ryZ283gAZ,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"The reviewers agree that the proposed architecture is novel. However, there are issues in terms of the motivation. It would be helpful in future drafts to strengthen the argument about why the architecture is expected to be better than others. Most importantly, the gains at this stage are still incremental. A larger improvement from the new architecture would motivate more researchers to focus on this architecture.",ICLR2018, +B1l6OfrZxE,1544800000000.0,1545350000000.0,1,SylPMnR9Ym,SylPMnR9Ym,boderline - but leaning to accept,Accept (Poster),The reviewers had some concerns regarding clarity and evaluation but in general liked various aspects of the paper. The authors did a good job of addressing the reviewers' concerns so acceptance is recommended.,ICLR2019,4: The area chair is confident but not absolutely certain +hl-r2zcfr,1576800000000.0,1576800000000.0,1,r1lQQeHYPr,r1lQQeHYPr,Paper Decision,Reject,This paper offers a new approach to cross-modal embodied learning that aims to overcome limited vocabulary and other issues. Reviews are mixed. I concur with the two reviewers who say the work is interesting but the contribution is not sufficiently clear for acceptance at this time.,ICLR2020, +Bky1nfU_g,1486400000000.0,1486400000000.0,1,rJPcZ3txx,rJPcZ3txx,ICLR committee final decision,Accept (Poster),"While the core ideas explored in this paper are quite limited in algorithmic novelty (e.g., the direct sparse convolutions), the reviewers largely feel that the paper is well written, experiments are carefully done on multiple architectures and system issues are discussed in-depth. Given the interest in the ICLR community around performance characterization and acceleration of CNNs in particular, this paper offers an interesting perspective.",ICLR2017, +LYHKrnBNSdJ,1610040000000.0,1610470000000.0,1,a4E6SL1rG3F,a4E6SL1rG3F,Final Decision,Reject,"This paper studies the problem of how data should be balanced among a set of tasks within meta-learning. This problem is interesting, and largely hasn't been studied before. However, the reviewers raised several shortcomings of the current version of the paper, including the significance of the problem setting, the limited experimental study (i.e. the only experiment with real data is CIFAR-FS), the depth of the related work section, and the clarity/impreciseness of the writing. Further, the paper has not been revised to address any of these shortcomings. As such, the paper is not ready for publication at ICLR.",ICLR2021, +HJlnJzRxeN,1544770000000.0,1545350000000.0,1,rJxXDsCqYX,rJxXDsCqYX,Relatively incremental ideas with inconclusive empirical results,Reject,"This paper presents two extensions of Relation Networks (RNs) to represent a sentence as a set of relations between words: (1) dependency-based constraints to control the influence of different relations within a sentence and (2) recurrent extension of RNs to propagate information through the tree structure of relations. + +Pros: +The notion of relation networks for sentence representation is potentially interesting. + +Cons: +The significance of the proposed methods compared to existing variants of TreeRNNs is not clear (R1). R1 requested empirical comparisons against TreeRNNs (since the proposed methods are also of tree shape), but the authors argued back that such experiments are necessary beyond BiLSTM baselines. + +Verdict: +Reject. The proposed methods build on relatively incremental ideas and the empirical results are rather inconclusive.",ICLR2019,5: The area chair is absolutely certain +yDQYOxjhoR,1576800000000.0,1576800000000.0,1,HyenUkrtDB,HyenUkrtDB,Paper Decision,Reject,"The paper proposes a new, stable metric, called Area Under Loss curve (AUL) to recognize mislabeled samples in a dataset due to the different behavior of their loss function over time. The paper build on earlier observations (e.g. by Shen & Sanghavi) to propose this new metric as a concrete solution to the mislabeling problem. + +Although the reviewers remarked that this is an interesting approach for a relevant problems, they expressed several concerns regarding this paper. Two of them are whether the hardness of a sample would also result in high AUL scores, and another whether the results hold up under realistic mislabelings, rather than artificial label swapping / replacing. The authors did anecdotally suggest that neither of these effects has a major impact on the results. Still, I think a precise analysis of these effects would be critically important to have in the paper. Especially since there might be a complex interaction between the 'hardness' of samples and mislabelings (an MNIST 1 that looks like a 7 might be sooner mislabeled than a 1 that doesn't look like a 7). The authors show some examples of 'real' mislabeled sentences recognized by the model but it is still unclear whether downweighting these helped final test set performance in this case. + +Because of these issues, I cannot recommend acceptance of the paper in its current state. However, based on the identified relevance of the problem tackled and the identified potential for significant impact I do think this could be a great paper in a next iteration. ",ICLR2020, +X3s3OCXYrzO,1642700000000.0,1642700000000.0,1,7YDLgf9_zgm,7YDLgf9_zgm,Paper Decision,Accept (Spotlight),"This paper proposes an innovative method for continual learning that modifies the direction of gradients on a new task to minimise forgetting on previous tasks without data replay. The method is mathematically rigorous with a strong theoretical analysis and excellent empirical results across multiple continual learning benchmarks. It is a clear accept. There was good discussion between the reviewers and authors that addressed a number of minor issues, including clarifying that the method has the same computational complexity as backpropagation. The authors are encouraged to make sure that these points are addressed in the final version of the paper.",ICLR2022, +fZRy-vhh8a,1576800000000.0,1576800000000.0,1,rJlYsn4YwS,rJlYsn4YwS,Paper Decision,Reject,"The paper proposes a new learning algorithm for deep neural networks that first reformulates the problem as a multi-convex and then uses an alternating update to solve. The reviewers are concerned about the closeness to previous work, comparisons with related work like dlADMM, and the difficulty of the dataset. While the authors proposed the possibility of addressing some of these issues, the reviewers feel that without actually addressing them, the paper is not yet ready for publication. ",ICLR2020, +Bk6FNyaHG,1517250000000.0,1517260000000.0,404,r1YUtYx0-,r1YUtYx0-,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,The paper proposes a new way to understand why neural networks generalize well. They introduce the concept of ensemble robustness and try to explain DNN generalization based on this concept. The reviewers feel the paper is a bit premature for publication in a top conference although this new way of explaining generalization is quite interesting.,ICLR2018, +yMcJHiSRy6p,1610040000000.0,1610470000000.0,1,uQnJqzkhrmj,uQnJqzkhrmj,Final Decision,Reject,"The paper studies an interesting problem motivated by VLSI design. The reviewers agree that there are interesting aspects of the RC algorithm. Nevertheless, the paper could be improved by a clearer characterization/apples-to-apples comparison to baselines, particularly regarding computation cost, use of parallelism, as well as a more thorough contrast to state of the art in general. Given the contribution is experimental, and this is a well studied problem, it is important to establish whether the solution is indeed best-in-class; cost due to training should be taken into account, and minimized to the extend possible. Going beyond the baselines considered here, as well as reviewing possible theoretical connections to other problems and guarantees, would also strengthen the paper.",ICLR2021, +m0-Di_Aq0DY,1610040000000.0,1610470000000.0,1,6HlaJSlQFEj,6HlaJSlQFEj,Final Decision,Reject,"The paper proposes a defense against black-box adversarial example attacks based on adding small Gaussian noise to the inputs. Its evaluation is carried out empirically using CIFAR-10 and ImageNet datasets. + +Despite a somewhat complete experimental evaluation (on two datasets) the lack of theoretical justification strongly affects the significance of the proposed method. It can be clearly seen from the experimental results that the proposed level of noise is a trade-off between clean accuracy and attack effectiveness. However, this tradeoff neither implies a substantial degree of security (the attack success rate is roughly halved but this does imply robustness against attacks since the initial success rate is rather high) nor is the impact to clean accuracy negligible. Furthermore, the robustness of the proposed method against an attack which is aware of such defense (similar to the Kerckhoff's principle in cryptography) is not evaluated. The authors mentioned several directions for addressing this issue in their response but implementation of such improvements is impossible within the level of revisions acceptable in a post-review process. + +A major revision of the paper taking into account the feedback provided by the current reviews would certainly improve its acceptance chances. ",ICLR2021, +SJsohfUOl,1486400000000.0,1486400000000.0,1,S1Bb3D5gg,S1Bb3D5gg,ICLR committee final decision,Accept (Oral),"Most dialog systems are based on chit-chat models. This paper explores goal-directed conversations, such as those that arise in booking a restaurant. While the methodology is rather thin, this is not the main focus of the paper. The authors provide creative evaluation protocols, and datasets. The reviewers liked the paper, and so does the AC. The paper will make a nice oral, on a topic that is largely explored but opening a direction that is quite novel.",ICLR2017, +xNL2UqDGOpx,1642700000000.0,1642700000000.0,1,saNgDizIODl,saNgDizIODl,Paper Decision,Reject,"The paper proposes a simple approach to quantify uncertainty in ""deterministic"" neural networks, not unlike the works of SNGP, DDU, and DUE, where one only performs one forward pass rather than in an ensemble or Monte Carlo sample. In particular, they propose a kernel-based method on a network's logits to estimate uncertainty, obtaining data and model uncertainty estimates separately using a bound on Bayes risk. + +While I agree with the relevance of the problem, there's a shared concern among reviewers across both technical novelty and experimental validation---particularly compared to prior work that can be difficult to understand the key distinguishing factor. I recommend the authors use the reviewers' feedback to enhance their preprint should they aim to submit to a later venue.",ICLR2022, +nbd4ifViwmA,1610040000000.0,1610470000000.0,1,rJA5Pz7lHKb,rJA5Pz7lHKb,Final Decision,Accept (Oral),"All reviewers recommend acceptance. Some concerns were raised about the precision of theorem 2 (now renamed to proposition 1), as well as the analysis of hyperparameter choices and quantitative evaluation, which I believe the authors have adequately addressed. Based on a suggestion of reviewer 1, experiments with flow-based models were also added, which demonstrates that the method is not strictly tied to autoregressive models. Personally, I was also curious about the connection between noise injection and quantisation, which the authors responded to by adding a paragraph discussing this connection in the manuscript. + +I would recommend that the authors also add the kernel inception distance (KID) results reported in the comments to the manuscript. + +This work stands out to me in that it combines a relatively simple, easy to understand idea with nice results, which is a trait of many impactful papers. I will therefore join the reviewers in recommending acceptance.",ICLR2021, +b5YtuBv16WH,1642700000000.0,1642700000000.0,1,ar92oEosBIg,ar92oEosBIg,Paper Decision,Accept (Poster),"In this paper, a novel machine learning-based method for solving TSP is presented; this method uses guided local search in conjunction with a graph neural network, which is trained to predict regret. Reviewers disagree rather sharply on the merits of the paper. Three reviewers think that the paper is novel, interesting, and has good empirical results. Two reviewers think that the fact the results are not competitive with the best non-learning-based (""classic"") solvers mean that the paper should be rejected. +This area chair believes that research is fundamentally not about beating benchmarks, but about new, interesting, and sound ideas. The conceptual novelty of this method, together with the good results compared with other learning-based methods, is sufficient for accepting the paper.",ICLR2022, +VUyTleF2WLK,1642700000000.0,1642700000000.0,1,JyI9lc8WxW,JyI9lc8WxW,Paper Decision,Reject,"The reviewers were in general lukewarm about the paper, not convinced by why realistic augmentation mean more robust features in SSL, had concerns over the szie of the datasets (up to ~100k), and the success depends on the relevance of color for classification. The AC agrees with the reviewers. While the paper sounds interesting, there are many questions remain unanswered -- it's unclear that the rebuttal addressed the concerns shared by the reviewers. + +In addition to the comments by the reviewers, the AC also feels that the overall design is adhoc and it's unclear that the proposed augmentation can generalize to larger, more practical problems.",ICLR2022, +Gu5CMac20,1576800000000.0,1576800000000.0,1,BygZARVFDH,BygZARVFDH,Paper Decision,Reject,"This submission proposes an image generation technique for composing concepts by combining their associated distributions. + +Strengths: +-The approach is interesting and novel. + +Weaknesses: +-Several reviewers were not convinced about the correctness of the formulations for negation and disjunction. +-The experimental validation of the disjunction and negation approaches is insufficient. +-The paper clarity and exposition could be improved. The authors addressed this in the discussion but concerns remain. + +Given the weaknesses, AC shares R3’s recommendation to reject.",ICLR2020, +r1e9w1OfgE,1544880000000.0,1545350000000.0,1,BJx0sjC5FX,BJx0sjC5FX,Interesting work however it lacks connection with modern tensor models.,Accept (Poster),"AR1 seeks the paper to be more standalone and easier to read. As this comment comes from the reviewer who is very experienced in tensor models, it is highly recommended that the authors make further efforts to make the paper easier to follow. AR2 is concerned about the manually crafted role schemes and alignment discrepancy of results between these schemes and RNNs. To this end, the authors hypothesized further reasons as to why this discrepancy occurs. AC encourages authors to make further efforts to clarify this point without overstating the ability of tensors to model RNNs (it would be interesting to see where these schemes and RNN differ). Lastly, AR3 seeks more clarifications on contributions. + +While the paper is not ground breaking, it offers some starting point on relating tensors and RNNs. Thus, AC recommends an accept. Kindly note that tensor outer products have been used heavily in computer vision, i.e.: +- Higher-Order Occurrence Pooling for Bags-of-Words: Visual Concept Detection by Koniusz et al. (e.g. section 3 considers bi-modal outer tensor product for combining multiple sources: one source can be considered a filter, another as role (similar to Smolensky at al. 1990), e.g. a spatial grid number refining local role of a visual word. This further is extended to multi-modal cases (multiple filter or role modes etc.) ) +- Multilinear image analysis for facial recognition (e.g. so called tensor-faces) by Vasilescu et al. +- Multilinear independent components analysis by Vasilescu et al. +- Tensor decompositions for learning latent variable models by Anandkumar et al. + +Kindly make connections to these works in your final draft (and to more prior works). + ",ICLR2019,5: The area chair is absolutely certain +R0YU3dRqD0Y,1610040000000.0,1610470000000.0,1,OqtLIabPTit,OqtLIabPTit,Final Decision,Accept (Poster)," + This paper studies the difference between cross-entropy and contrastive learning losses in the feature representations that they learn, specifically looking at class-imbalanced datasets. The authors show that contrastive losses result in a more ""balanced"" representation, as measured by the balance of accuracy across the classes when a linear classifier is learned mapping from the feature representation to the class labels. They also show that empirically this tends to result in better generalization to downstream tasks. Inspired by this, they devise a simple modification of the prior supervised contrastive loss method and show that it can improve performance on ImageNet-LT and even generalization performance when trained on balanced datasets and applied to downstream tasks. + + The reviewers identified several weaknesses, including some clarity issues (R1), limitations of how balancedness is measured and lack of theoretical/statistical rigor in terms of the resulting claims (R2), and differences with respect to concurrent work (R4). A lengthy discussion occurred between reviewers and authors, as well as input from a co-author of the concurrent work. In the end, the reviewers were not fully satisfied both in terms of the balancedness measure and relationship to the concurrent work. + + Overall, despite this and the valid limitations of the work, I recommend accepting this paper as I believe the contributions outweigh the limitations, and that the findings would be interesting to the community. First, the paper provides some interesting analysis of balancedness and differences across these two loss functions, as well as connections to generalization, which even the concurrent work does not provide. The resulting method, while being a simple modification of the supervised contrastive loss work, is effective both for long-tailed datasets and generalization to downstream tasks (even when trained in a balanced manner) which is nice. In the end, we should not use [3] to reject this paper since it was accepted right before the ICLR deadline. + + However, I **strongly** recommend that the paper address the valid limitations mentioned in the discussions. Specifically: + 1) While I agree that [3] is concurrent work, this paper should none-the-less tone down its claims of being the first in exploring balance for the camera-ready version and clearly address differences between this paper and that one (even if mentioned as concurrent work). It is important to give credit when it is due, and while I think [3] is a different perspective it should be mentioned. Further, the claim that their methodology is not correct is highly arguable, so this should not be mentioned; rather the differences in perspectives and what each paper shows should be emphasized. Even without [3], self-supervised pre-training (initialization) should arguably be included as a baseline given that it is the logical first choice for incorporating self-supervised learning. + + 2) Like R2, I do not believe the balancedness metric shows uniformity of the feature space. This would have to be shown through methods such as t-SNE or in some other way. Being linearly separable in a balanced way across classes (which is what you showed) is not sufficient to show that feature space ""uniformity"". One can draw many feature space distributions that do not have the intuitive meaning of this (which isn't precisely defined by the authors) but still be linearly separable. I recommend authors remove this type of characterization (unless they can define/show it) and instead include a discussion of the limitations of the current methodology for measuring balancedness. Figure 1 should also emphasize that it is notional (not from real data). ",ICLR2021, +2kMBaEAcr6,1610040000000.0,1610470000000.0,1,jcN7a3yZeQc,jcN7a3yZeQc,Final Decision,Reject,"This paper investigates some variants of the double Q-learning algorithm and develops theoretical guarantees. In particular, it focuses on how to reduce the correlation between the two trajectories employed in the double Q-learning strategy, in the hope of rigorously addressing the overestimation bias issue that arises due to the max operator in Q-learning. However, the reviewers point out that the proofs are hard to parse (and often hand-waving with important details omitted). The experimental results are also not convincing enough.     ",ICLR2021, +D2W-j_mZuP,1610040000000.0,1610470000000.0,1,hu2aMLzOxC,hu2aMLzOxC,Final Decision,Reject,"The paper extends previous work on asymmetric self-play by introducing a novel behavior cloning loss (referred to as ABC). The zero-shot results are impressive and demonstrate that the proposed curriculum learning approach pushes the state-of-the-art. The reviewers acknowledge these contributions. The pros of the paper are well summarized by R2, + +- The experimental results are very encouraging. +- The analysis of the method helps to understand which components are important. +- The evaluation on the hold-out tasks is very impressive and pushes the state of the art. +- The paper is well written and very easy to follow, the illustrations are informative and appealing. +- Although this approach is based on previous work on asymmetric self-play, the authors clearly describe the contributions of this work (training clearly from self-play, zero-shot generalization). + +R1, R2, R5 recommend accepting the paper with scores of 7, 7, 6. R1 expressed that he is not confident about the paper. R4 recommends acceptance with a score of 6. However, R4 also expresses the concern for real-world applicability, ""the sim-real gap with respect to applicability to robotics is still a major concern IMO. I have updated my score accordingly."" The sim-to-real gap is a concern due to knowledge of perfect state information and the assumption of resets. + +Based on confident reviews of R2, R4, and R5, and the impressive zero-shot results, ordinarily, I would recommend the paper to be accepted. However, unfortunately, both the authors and reviewers missed a comparison to prior work, which I detail below. While the current paper makes a good case for zero-shot generalization, it does not compare to previous approaches that also exhibits zero-shot generalization. For instance, Li et al. ICRA 2020 (https://arxiv.org/abs/1912.11032) show that using a simple curriculum that depends on the number of manipulated objects + Graph Neural Networks can generalize very well to unseen tasks. E.g., the results reported in the paper demonstrate that a policy trained on 2/3 blocks generalizes and can stack many more blocks. Their policy learns to stack 6-7 blocks, whereas the paper under review can only stack up to 3 blocks (Figure 8). + +The authors mention in Section 5.2 of their paper, ""The curriculum:distribution baseline, which slowly increases the proportion of pick-and-place and stacking goals in the training distribution, fails to acquire any skill. The curriculum:full baseline incorporates all hand-designed curricula yet still cannot learn how to pick up or stack blocks. We have spent a decent amount of time iterating and improving these baselines but found it especially difficult to develop a scheme good enough to compete with asymmetric self-play."" +This is at odds with results in Li et al. + +This reason for dissonance is that good generalization can be achieved by improving two separate components -- the neural network architecture or the learning curriculum. This paper shows good generalization with weak neural net architectures + a good curriculum learning method. It is unclear to me how critical the self-play method would be with a stronger architecture such as a graph network which is arguable more appropriate for the set of tasks presented in the paper. I would like to see if the curriculum is necessary (i.e., complements a stronger architecture) or is it just a replacement for alternate neural network architecture. Without such a study, this paper should not be accepted, because it will add to more noise rather than advancing the field of robotic manipulation. ",ICLR2021, +B1laTZyC14,1544580000000.0,1545350000000.0,1,BJxLH2AcYX,BJxLH2AcYX,Useful problem statement but incremental technical advances with modest empirical improvements,Reject,"The paper proposes the unique setting of adapting to multiple target domains. The idea being that their approach may leverage commonality across domains to improve adaptation while maintaining domain specific parameters where needed. This idea and general approach is interesting and worth exploring. The authors' rebuttal and paper edits significantly improved the draft and clarified some details missing from the original presentation. + +There is an ablation study showing that each part of the model contributes to the overall performance. However, the approach provides only modest improvements over comparative methods which were not designed to learn from multiple target domains. In addition, comparison against the latest approaches is missing so it is likely that the performance reported here is below state-of-the-art. + +Overall, given the modest experimental gains combined with incremental improvement over single source information theoretic methods, this paper is not yet ready for publication.",ICLR2019,5: The area chair is absolutely certain +VpbCFmAVlB,1642700000000.0,1642700000000.0,1,oVfIKuhqfC,oVfIKuhqfC,Paper Decision,Reject,"This paper introduces a new method for diffusion-based generative modeling through a Brownian bridge formulation, where the data and latent variable can be coupled. They extend their method to mixtures of diffusion bridges and spatially correlated processes that go beyond the factorial diffusion processes used in prior work. + +We thank the authors for engaging with the reviewers and addressing many of their detailed concerns. While reviewers agreed that the proposed theory and methodology were novel and interesting, there are no small or large scale experiments or empirical comparisons to the relevant prior work. In the absence of theoretical justification (bound or proof) as to why the proposed diffusion bridge mixture transport method would result in better performance, more empirical comparisons and evaluations are needed. Additionally, several reviewers found the presentation confusing and overly complex, including the notation, writing, and figures. Given the lack of experimental results and concerns over presentation, I’m inclined to reject this paper.",ICLR2022, +swpkmu9nm,1576800000000.0,1576800000000.0,1,Bke8UR4FPB,Bke8UR4FPB,Paper Decision,Accept (Poster),"This paper leverages the piecewise linearity of predictions in ReLU neural networks to encode and learn piecewise constant predictors akin to oblique decision trees. The reviewers think the paper is interesting, and the idea is clever. The paper can be further improved in experiments. This includes comparison to ensembles of traditional trees or (in some cases) simple ReLU networks. Also the tradeoffs other than accuracy between the method and baselines are also interesting. ",ICLR2020, +nmsavO3KX1,1576800000000.0,1576800000000.0,1,Hke0oa4KwS,Hke0oa4KwS,Paper Decision,Reject,"The paper proposes to model uncertainty using expected Bayes factors, and empirically show that the proposed measure correlates well with the probability that the classification is correct. + +All the reviewers agreed that the idea of using Bayes factors for uncertainty estimation is an interesting approach. However, the reviewers also found the presentation a bit hard to follow. While the rebuttal addressed some of these concerns, there were still some remaining concerns (see R3's comments). + +I think this is a really promising direction of research and I appreciate the authors' efforts to revise the draft during the rebuttal (which led to some reviewers increasing the score). This is a borderline paper right now but I feel that the paper has the potential to turn into a great paper with another round of revision. I encourage the authors to revise the draft and resubmit to a different venue.",ICLR2020, +HuBduR4ZmmX,1610040000000.0,1610470000000.0,1,6zaTwpNSsQ2,6zaTwpNSsQ2,Final Decision,Accept (Poster),"This paper proposes a new approach to training networks with low precision called Block Minifloat. The reviewers found the paper well written and found that the empirical results were sufficient. In particular, they found the hardware implementation was a strong contribution. Furthermore, the rebuttal properly addressed the comments of the reviewer.",ICLR2021, +Gq5dOFNkuF1,1610040000000.0,1610470000000.0,1,vOchfRdvPy7,vOchfRdvPy7,Final Decision,Reject,"This paper examines adversarially trained robust models, and finds that accuracy disparity is higher than for standard models. The authors introduce a method they call Fair Robust Learning using Lagrange multipliers to minimize overall robust error while constraining the accuracy discrepancy between classes. + +In discussion, consensus was reached that the observations and approach are interesting but the paper is not yet ready for publication. The main concern is that it is not clear if the class accuracy disparity is due to adversarial training, or simply due to lower accuracy in general. Please see reviews and public discussion for further details.",ICLR2021, +NqZUHT_8Wj1,1610040000000.0,1610470000000.0,1,BbNIbVPJ-42,BbNIbVPJ-42,Final Decision,Accept (Poster),"The authors have made significant efforts to thoroughly address all the concerns. Due to the amount of discussions, I had to go through the paper myself and agree with the authors on many of the points. In my opinion, this is a solid theoretical work on the pitfalls of IRM. ",ICLR2021, +ixm7xiyoaH0,1610040000000.0,1610470000000.0,1,vCEhC7nOb6,vCEhC7nOb6,Final Decision,Reject,"The main concern is that the results in this paper are based on strong asymptotic assumptions. (At least) more empirical results are needed. +",ICLR2021, +ZKtIF9TsKC,1576800000000.0,1576800000000.0,1,BkxUvnEYDH,BkxUvnEYDH,Paper Decision,Accept (Spotlight),"This paper provides a fascinating hybridization approach to incorporating programs as priors over policies which are then refined using deep RL. The reviewers were, at the end of the discussion, all in favour of acceptance (with the majority strongly in favour). An excellent paper I hope to see included in the conference.",ICLR2020, +SJlVmS0llV,1544770000000.0,1545350000000.0,1,rJeXCo0cYX,rJeXCo0cYX,a new platform that supports interactive and grounded language learning,Accept (Poster),"This paper presents ""BabyAI"", a research platform to support grounded language learning. The platform supports a suite of 19 levels, based on *synthetic* natural language of increasing difficulties. The platform uniquely supports simulated ""human-in-the-loop"" learning, where a human teacher is simulated as a heuristic expert agent speaking in synthetic language. + +Pros: +A new platform to support grounded natural language learning with 19 levels of increasing difficulties. The platform also supports a heuristic expert agent to simulate a human teacher, which aims to mimic ""human-in-the-loop"" learning. The platform seems to be the result of a substantial amount of engineering, thus nontrivial to develop. While not representing the real communication or true natural language, the platform is likely to be useful for DL/RL researchers to perform prototype research on interactive and grounded language learning. + +Cons: +Everything in the presented platform is based on synthetic natural language. While the use of synthetic language is not entirely satisfactory, such limit is relatively common among the simulation environments available today, and lifting that limitation is not straightforward. The primary contribution of the paper is a new platform (resource). There are no insights or methods. + +Verdict: +Potential weak accept. The potential impact of this work is that the platform will likely be useful for DL/RL research on interactive and grounded language learning.",ICLR2019,3: The area chair is somewhat confident +f3lzF5O0Hh,1576800000000.0,1576800000000.0,1,SygBIxSFDS,SygBIxSFDS,Paper Decision,Reject,"There is insufficient support to recommend accepting this paper. The authors provided detailed responses to the reviewer comments, but the reviewers did not raise their evaluation of the significance and novelty of the contributions as a result. The feedback provided should help the authors improve their paper.",ICLR2020, +V8wFgZ2UUKU,1642700000000.0,1642700000000.0,1,AwgtcUAhBq,AwgtcUAhBq,Paper Decision,Accept (Poster),"Summary (from reviewer uzT5): This paper analyzes adversarial domain learning (DAL) from a game-theoretical perspective, where the optimal condition is defined as obtaining the local Nash equilibrium. From this view, the authors show that the standard optimization method in DAL can violate the asymptotic guarantees of the gradient-play dynamics, thus requiring careful tuning and small learning rates. Based on these analyses, this paper proposed to replace the existing optimization method with higher-order ordinary differential equation solvers. Both theoretical and experimental results show that the latter ODE method is more stable and allows for higher learning rates, leading to noticeable improvements in transfer performance and the number of training iterations. + +All reviewers appreciated the contributions of this paper and recommended acceptance. While the methods themselves are not novel, the game perspective applied to this problem appears to be and the use of higher-order solves yield interesting theoretical and empirical improvements. + +== Additional comments == + +1) For the comparison vs. game optimization algorithms (Figure 3), it would be nice to normalize the x-axis so that one ""epoch"" yields comparable computational cost among the different methods (as RK4 and RK2 is much more expensive than EG or GD per mini-batch). Given that EG had such bad performance there, it would not change the conclusions; but the current scaling is still quite misleading. Same comments for Figure 2. + +2) Note that modern approaches for stochastic extragradient recommend to use different step-sizes for the extrapolation step and the update step (see e.g. Hsieh et al. NeurIPS 2020 ""Explore Aggressively, Update Conservatively: Stochastic Extragradient Methods with Variable Stepsize Scaling"") I suspect that much bigger step-sizes could be used in this case while maintaining convergence, and this version should be added to Figure 3. + +3) In ""Related Work | Two-Player Zero-Sum Games"" -> note that Gidel et al. 2019a provided all their convergence theory and methods for stochastic variational inequalities and thus it also applies to three-player games, unlike seems to be implied by this paragraph. In particular, all the algorithms they investigated (Extra-Adam amongst others) could also be applied to DAL. While I can see that the specifics of the objective in DAL might be different than for GAN optimization, it would be worthwhile to acknowledge these alternative approaches more clearly, and I encourage the DAL community to investigate their performance more exhaustively for DAL than what was done in this paper.",ICLR2022, +rkeEOA9LgE,1545150000000.0,1545350000000.0,1,SJGyFiRqK7,SJGyFiRqK7,meta-review,Reject,The reviewers reached a consensus that the paper is not ready for publication in ICLR. (see more details in the reviews below. ),ICLR2019,4: The area chair is confident but not absolutely certain +3W09AQUtXU,1642700000000.0,1642700000000.0,1,Z7VhFVRVqeU,Z7VhFVRVqeU,Paper Decision,Reject,"Overall, the work is borderline with no reviewer feeling strongly for or against the paper. + +The paper is well-written and proposes a simple approach, along with code for reproducibility. Criticism stems primarily in the work's technical novelty, being an incremental improvement of ideas from ANP and BANP, and related work like Neural Bootstrapper. In addition, the experimental validation involves regression on 1-to-2D functions, Bayesopt on synthetic functions, and contextual bandits on the synthetic wheel bandit problem. This is fairly toy, and multiple reviewers raise unaddressed concerns on the regression experiments. Ignoring orginality in and of itself (which is overvalued in conferences), the work does not yet provide a sufficiently convincing demonstration of its practical importance. + +I recommend the authors use the reviewers' feedback to enhance their preprint should they aim to submit to a later venue.",ICLR2022, +oJkWWID8dp,1576800000000.0,1576800000000.0,1,SJeLopEYDH,SJeLopEYDH,Paper Decision,Accept (Poster),"This paper proposes video-level 4D CNNs and the corresponding training and inference methods for improved video representation learning. The proposed model achieves state-of-the-art performance on three action recognition tasks. +Reviewers agree that the idea well motivated and interesting, but were initially concerned with positioning with respect to the related work, novelty, and computational tractability. As these issues were mostly resolved during the discussion phase, I will recommend the acceptance of this paper. We ask the authors to address the points raised during the discussion to the manuscript, with a focus on the tradeoff between the improved performance and computational cost.",ICLR2020, +jgHnavHtUA,1576800000000.0,1576800000000.0,1,SJl5Np4tPr,SJl5Np4tPr,Paper Decision,Accept (Spotlight),"This submission addresses the problem of few-shot classification. The proposed solution centers around metric-based models with a core argument that prior work may lead to learned embeddings which are overfit to the few labeled examples available during learning. Thus, when measuring cross-domain performance, the specialization of the original classifier to the initial domain will be apparent through degraded test time (new domain) performance. The authors therefore, study the problem of domain generalization in the few-shot learning scenario. The main algorithmic contribution is the introduction of a feature-wise transformation layer. + +All reviewers suggest to accept this paper. Reviewer 3 says this problem statement is especially novel. Reviewer 1 and 2 had concerns over lack of comparisons with recent state-of-the-art methods. The authors responded with some additional results during the rebuttal phase, which should be included in the final draft. + +Overall the AC recommends acceptance, based on the positive comments and the fact that this paper addresses a sufficiently new problem statement. +",ICLR2020, +6n_Oms4iui,1576800000000.0,1576800000000.0,1,ryxtCpNtDS,ryxtCpNtDS,Paper Decision,Reject,"This paper presents a synthetic oversampling method for sequence-to-sequence classification problems based on autoencoders and generative adversarial networks. + +All reviews reject the paper for two main reasons: +1 The novelty of the paper is not enough for ICLR as the idea of utilizing GAN for data sampling is common now. +2 The experimental is not convincing as authors did not compare with other leading oversampling methods. + +The rebuttal did not well answer these two questions; thus I choose to reject the paper. +",ICLR2020, +IeaaSf9qOxb,1610040000000.0,1610470000000.0,1,lxHgXYN4bwl,lxHgXYN4bwl,Final Decision,Accept (Spotlight),"This paper is concerned with the ongoing research program of mapping the approximation power of different GNN architectures. It provides significant advances in the study of equivariant GNNs and nice extensions in the invariant case by closing existing gaps between distinct GNN families. +All reviewers agreed that this is a strong submission with substantial new theoretical results. The AC recommends a strong acceptance. ",ICLR2021, +vOTurDFmk,1576800000000.0,1576800000000.0,1,rJxGLlBtwH,rJxGLlBtwH,Paper Decision,Accept (Poster),"This paper investigates how two means of learning natural language - supervised learning from labeled data and reward-maximizing self-play - can be combined. The paper empirically investigates this question, showing in two grounded visual language games that supervision followed by self-play works better than the reverse. + +The reviewers found this paper interesting and well executed, though not especially novel. The last is a reasonable criticism but in this case I think a little beside the point. In any case, since all the reviewers are in agreement I recommend acceptance.",ICLR2020, +MJrXsTtDywG,1642700000000.0,1642700000000.0,1,41e9o6cQPj,41e9o6cQPj,Paper Decision,Accept (Spotlight),"The paper presents a novel method of fusion of information from two modalities: text (context and question) and a Knowledge Base, for the task of question answering. The proposed method looks quite simple and clear, while the results show strong gains against baseline methods on 3 different datasets. Ablation studies show that the model achieves good performance on more complex questions. While the reviewers raise some concerns, e.g., on the sensitivity of the proposed method, the technical novelty against prior works, they see values in this paper in general. And the authors did a good job in their rebuttal. After several rounds of interactions, some reviewers were convinced to raise their scores by a little bit. As a result, we think the paper is in a good shape and ICLR audience should be interested in it.",ICLR2022, +K4WTQ9-MPr,1576800000000.0,1576800000000.0,1,r1xH5xHYwH,r1xH5xHYwH,Paper Decision,Reject,"This paper explores training CNNs with labels of differing granularity, and finds that the types of information learned by the method depends intimately on the structure of the labels provided. + +Thought the reviewers found value in the paper, they felt there were some issues with clarity, and didn't think the analyses were as thorough as they could be. I thank the authors for making changes to their paper in light of the reviews, and hope that they feel their paper is stronger because of the review process.",ICLR2020, +4D3wIMKf5xy,1610040000000.0,1610470000000.0,1,Cz3dbFm5u-,Cz3dbFm5u-,Final Decision,Accept (Poster),"The authors did a nice job of responding to the concerns of reviewers during the discussion phase which increased reviewer scores. Because of this I will vote to accept. + +The authors should carefully edit the paper for typos, grammatical errors, and style errors. Some examples: +- Abstract: Make this one paragraph without a line break +- End of 1st paragraph in Intro: ""So there is an urge"" -> ""So there is an urgent"" +- Start of 3rd paragraph in Intro: ""State-of-the-art cryptographic"" -> ""The state-of-the-art cryptographic"" +- Last paragraph of 2.1: ""To solve above"" -> ""To solve the above"" +- End of 2.3: ""Compared to the light-weight InstaHide and TextHide, MPC and HE are of advantages in the security guarantees so far."" -> ""Compared to the light-weight methods InstaHide and TextHide, MPC and HE provide much stronger security guarantees."" + +I also urge the authors to please double check the reviewer comments when preparing a newer version to ensure all concerns are taken into account.",ICLR2021, +HkEzI1TSf,1517250000000.0,1517260000000.0,734,r15kjpHa-,r15kjpHa-,ICLR 2018 Conference Acceptance Decision,Reject,"All reviewers are unanimous that the paper is below threshold for acceptance. The authors have not provided rebuttals, but merely perfunctory generic responses. + +I think the most important criticism is that the approach is ""very ad-hoc."" I would encourage the authors to consider more principled ways of automatically designing reward functions, like for example, Inverse Reinforcement Learning, in which you start with a good agent behavior policy, and then estimate a reward function for which the behavior policy maximizes the reward function.",ICLR2018, +vpVHZttBm,1576800000000.0,1576800000000.0,1,SJx-j64FDr,SJx-j64FDr,Paper Decision,Accept (Poster),"This paper studies how the architecture and training procedure of binarized neural networks can be changed in order to make it easier for SAT solvers to verify certain properties of them. + +All of the reviewers were positive about the paper, and their questions were addressed to their satisfaction, so all reviewers are in favor of accepting the paper. I therefore recommend acceptance.",ICLR2020, +LvuHkKg9xM7,1642700000000.0,1642700000000.0,1,MRGFutr0p5e,MRGFutr0p5e,Paper Decision,Reject,"The paper proposes to use the recently introduced ""Barlow-twins"" contrastive learning objective, to the case of graph networks. The main concern raised by reviewers was the limited novelty of this work, which they argued mostly combines existing lines of work, and does not introduce sufficiently new concepts. This was also discussed between the authors and the reviewers. +Having read the paper and the reviews, I tend to agree with the reviewers that this paper is more of a combination of existing works, and their relatively straightforward application to the graph network domain. Thus, although the empirical results are encouraging, I agree the paper has limited novelty, and falls below the ICLR acceptance bar.",ICLR2022, +H1xn5CiC14,1544630000000.0,1545350000000.0,1,HygtHnR5tQ,HygtHnR5tQ,Limited novelty,Reject,"This paper proposes a GAN-based framework for image compression. + +The reviewers and AC note a critical limitation on novelty of the paper i.e., such a conditional GAN framework is now standard. The authors mentioned that they apply GAN for extreme compression for the first time in the literature, but this is not enough to justify the novelty issue. + +AC thinks the proposed method has potential and is interesting, but decided that the authors need new ideas to publish the work. +",ICLR2019,4: The area chair is confident but not absolutely certain +5gsoFiIFqUV,1642700000000.0,1642700000000.0,1,uc8UsmcInvB,uc8UsmcInvB,Paper Decision,Reject,"This is an extremely interesting and timely paper regarding the approximation ability, with statistical consequences, of circuits and (computation-bounded) Turing machines by feedforward networks and transformers. The paper has an interesting and valuable setting, and also many unusual ideas, together which can inspire a lot of future work. Unfortunately, the reviewers had significant difficulties with the presentation and setting; the Transformer material in particular lacks clarity. As such, the paper could use more time and polish. + +Separately, I will recommend in the future that authors consider making use of the rebuttal and revision phase. While it is not strictly required, it seems that in ICLR, scores shift quite a lot in those phase, and it has (for better or worse) become standard to have a thorough involvement in this phase. It was difficult to cause score changes after the initial phase due to the lack of review responses. That said, I sincerely hope the authors continue with this valuable line of work.",ICLR2022, +Sy9s3z8Og,1486400000000.0,1486400000000.0,1,SJGCiw5gl,SJGCiw5gl,ICLR committee final decision,Accept (Poster),The paper presents a method for pruning filters from convolutional neural networks based on the first order Taylor expansion of the loss change. The method is novel and well justified with extensive empirical evaluation.,ICLR2017, +C_2WBjFa89h,1610040000000.0,1610470000000.0,1,wpSWuz_hyqA,wpSWuz_hyqA,Final Decision,Accept (Spotlight),"The paper investigates the capacity for neural language models to perform fast-mapping word acquisition using a proposed multimodal external memory architecture. Much work exists that shows that neural models are capable of following instructions whose meaning persists across episodes (i.e., slow-learning), however much less attention has been paid to instruction-following in a one-shot learning context. Using a simulated 3D navigation/manipulation domain, the paper shows that the proposed multimodal memory network is capable of both slow and one-shot word learning when trained via standard RL. + +The submission was reviewed by four knowledgable referees, who read the author feedback and engaged in discussion with the authors. The paper is topical---one-shot language learning for instruction-following using neural models is of significant interest of-late. The reviewers agree that the proposed multimodal memory architecture is both interesting and technically solid. The reviewers raised concerns about the experimental evaluation and the role of embodiment. The author feedback together with discussion with reviewers were helpful in resolving some of these issues. However, the authors are encouraged to ensure that the paper clearly motivates the importance of embodiment to slow learning and fast-mapping, particularly given the large body of work in language acquisition in robotics, a truly embodied domain, which is notably missing from the related work discussion.",ICLR2021, +Lwturqal5,1576800000000.0,1576800000000.0,1,Byx55pVKDB,Byx55pVKDB,Paper Decision,Reject," +The paper investigates how the softmax activation hinders the detection of out-of-distribution examples. + +All the reviewers felt that the paper requires more work before it can be accepted. In particular, the reviewers raised several concerns about theoretical justification, comparison to other existing methods, discussion of connection to existing methods and scalability to larger number of classes. + +I encourage the authors to revise the draft based on the reviewers’ feedback and resubmit to a different venue. + +",ICLR2020, +LAFMVdCA_U-,1642700000000.0,1642700000000.0,1,Q76Y7wkiji,Q76Y7wkiji,Paper Decision,Accept (Poster),"This paper is a follow-up paper of Zhang et al. (2021), that proposed a new network architecture for adversarial robustness, l_\infty distance net. Although the l_\infty network is provably 1-Lipschitz w.r.t. the l_\infty distance, its training procedure exploits the l_p relaxation to overcome the non-smoothness of the model but suffers from an unexpected large Lipschitz constant at the early training stage, an issue to be solved. This paper resolves this issue by a new loss design of scaled cross entropy loss and clipped hinge loss. Without using MLP on top of the l_\infty distance net backbone, the proposed new training method empirically outperforms the original one in Zhang et al. (2021) and improves over the state-of-the-art by more than 5% for 8/255 and other radiuses. Moreover, the paper shows the theoretical expressive power of l_\infty distance net for well-separated data. + +There are some concerns about the moderate novelty and reproducibility of the results. Since the empirical results are indeed impressive, the paper could be accepted conditional on that the authors release their reproducible codes to the public.",ICLR2022, +I-3H8Eii5Q,1576800000000.0,1576800000000.0,1,HylthC4twr,HylthC4twr,Paper Decision,Reject,"This paper studies two-layer graph convolutional networks and two-layer multi-layer perceptions and develops quantitative results of their effect in signal processing settings. The paper received 3 reviews by experts working in this area. R1 recommends Weak Accept, indicating that the paper provides some useful insight (e.g. into when graph neural networks are or are not appropriate for particular problems) and poses some specific technical questions. In follow up discussions after the author response, R1 and authors agree that there are some over claims in the paper but that these could be addressed with some toning down of claims and additional discussion. R2 recommends Weak Accept but raises several concerns about the technical contribution of the paper, indicating that some of the conclusions were already known or are unsurprising. R2 concludes ""I vote for weak accept, but I am fine if it is rejected."" R3 recommends Reject, also questioning the significance of the technical contribution and whether some of the conclusions are well-supported by experiments, as well as some minor concerns about clarity of writing. In their thoughtful responses, authors acknowledge these concerns. Given the split decision, the AC also read the paper. While it is clear it has significant merit, the concerns about significance of the contribution and support for conclusions (as acknowledged by authors) are important, and the AC feels a revision of the paper and another round of peer review is really needed to flesh these issues out. ",ICLR2020, +h3JRl_YLaO,1576800000000.0,1576800000000.0,1,rkl_Ch4YwS,rkl_Ch4YwS,Paper Decision,Reject,"One reviewer is positive, while the others recommend rejection. The authors did not submit a rebuttal, thus the reviewers kept their original assessment.",ICLR2020, +rkhVBkTHz,1517250000000.0,1517260000000.0,550,Byht0GbRZ,Byht0GbRZ,ICLR 2018 Conference Acceptance Decision,Reject,"This work introduces a new type of structured attention network that learn latent structured alignments between sentences in a fully differentiable manner, which allows the network to learn not only the target task, but also the latent relationships. Reviewers seem partial to the idea of the work, and it's originality, but have issues with the contributions. In particular: + +- The reviewers note that the gains in performance from using this approach are quite small and do not outperform previous structured approaches.",ICLR2018, +r1hDryaSG,1517250000000.0,1517260000000.0,589,rJe7FW-Cb,rJe7FW-Cb,ICLR 2018 Conference Acceptance Decision,Reject,"This paper received borderline reviews. Initially, all reviewers raised a number of concerns (clarity, small improvements, etc). Even after some back and forth discussion, concerns remain, and it's clear that while the idea has potential, another round of reviewing is needed before a decision can be reached. This would be a major revision in a journal. Unfortunately, that is not possible in a conference setting and we must recommend rejection. We recommend the authors to use the feedback to make the manuscript stronger and submit to a future venue. ",ICLR2018, +aR5hnzfg8o,1576800000000.0,1576800000000.0,1,HJeYalBKvr,HJeYalBKvr,Paper Decision,Reject,"This paper incorporates phrases within the transformer architecture. + +The underlying idea is interesting, but the reviewers have raised serious concerns with both clarity and the trustworthiness of the experimental evaluation, and thus I cannot recommend acceptance at this time.",ICLR2020, +eWkLLI5xLC,1576800000000.0,1576800000000.0,1,HJxK5pEYvr,HJxK5pEYvr,Paper Decision,Accept (Poster),This paper incorporates tree-structured information about a sentence into how transformers process it. Results are improved. The paper is clear. Reviewers liked it. Clear accept.,ICLR2020, +9yfQzdAVy,1576800000000.0,1576800000000.0,1,BJgEd6NYPH,BJgEd6NYPH,Paper Decision,Reject,"This paper interprets adaptive gradient methods as trust region methods, and then extends the trust regions to axis-aligned ellipsoids determined by the approximate curvature. It's fairly natural to try to extend the algorithms in this way, but the paper doesn't show much evidence that this is actually effective. (The experiments show an improvement only in terms of iterations, which doesn't account for the computational cost or the increased batch size; there doesn't seem to be an improvement in terms of epochs.) I suspect the second-order version might also lose some of the online convex optimization guarantees of the original methods, raising the question of whether the trust-region interpretation really captures the benefits of the original methods. The reviewers recommend rejection (even after discussion) because they are unsatisfied with the experiments; I agree with their assessment. +",ICLR2020, +BJlyS95eeN,1544760000000.0,1545350000000.0,1,BkgYIiAcFQ,BkgYIiAcFQ,further work needed,Reject,"there is a disagreement among the reviewers, and i am siding with the two reviewers (r1 and r3) and agree with r3 that it is rather unconventional to pick learning-to-learn to experiment with modelling variable-length sequences (it's not like there's no other task that has this characteristics, e.g., language modelling, translation, ...) ",ICLR2019,3: The area chair is somewhat confident +BJlNkgC-g4,1544840000000.0,1545350000000.0,1,BJl65sA9tm,BJl65sA9tm,meta-review,Reject,"This paper proposes a variant of GAIL that can learn from both expert and non-expert demonstrations. The paper is generally well-written, and the general topic is of interest to the ICLR community. Further, the empirical comparisons provide some interesting insights. However, the reviewers are concerned that the conceptual contribution is quite small, and that the relatively small conceptual contribution also does not lead to large empirical gains. As such, the paper does not meet the bar for publication at ICLR.",ICLR2019,4: The area chair is confident but not absolutely certain +mG8yUNB1AS,1576800000000.0,1576800000000.0,1,HkgAJxrYwr,HkgAJxrYwr,Paper Decision,Reject,"The paper proposes an aggregation algorithm for federated learning that is robust against label-flipping, backdoor, and Gaussian noise attacks. The reviewers agree that the paper presents an interesting and novel method, however the reviewers also agree that the theory was difficult to understand and that the success of the methodology may be highly dependent on design choices and difficult-to-tune hyperparameters. ",ICLR2020, +xUFYIlcPDL,1576800000000.0,1576800000000.0,1,SkeIyaVtwB,SkeIyaVtwB,Paper Decision,Accept (Poster),"This paper considers options discovery in hierarchical reinforcement learning. It extends the idea of covering options, using the Laplacian of the state space discover a set of options that reduce the upper bound of the environment's cover time, to continuous and large state spaces. An online method is also included, and evaluated on several domains. + +The reviewers had major questions on a number of aspects of the paper, including around the novelty of the work which seemed limited, the quantitative results in the ATARI environments, and problems with comparisons to other exploration methods. These were all appropriately dealt with in the rebuttals, leaving this paper worthy of acceptance.",ICLR2020, +48rp5AlXRs,1642700000000.0,1642700000000.0,1,KVhvw16pvi,KVhvw16pvi,Paper Decision,Reject,"The authors develop a memory-based method for continual learning that stores gradient information from past tasks. This memory is then used by a proposed task-aware optimizer that, based on the task relatedness, aims at preserving knowledge learned in previous tasks. + +The initial reviews were reasonable but indicated that this paper was not yet ready to be published. In particular, the reviewers seemed to agree on the somewhat limited methodological novelty of the paper given prior work (such as LA-MAML and OGD in terms of method and GEM in terms of task similarity comparison). + +In their response, the authors do seem to agree to a certain extent with some of the criticisms, but also point to clear differences with respect to previous work (and other distinguishing aspects such as a smaller memory footprint than OGD). The authors also carefully responded to reviewer comments and provided additional results when possible. + +In the end, the main criticism from the reviewers remained (Reviewer 95tf also suggests that the authors should compare their method to others in terms of memory consumption (which the authors partly did) and compare to replay-based methods) and this paper was a borderline one. Three, out of the four, reviewers suggest that it is not ready to be published. One reviewer did give it a high score (8) but also understood the limitations raised by the other reviewers. As a result, my recommendation is that this paper falls below the acceptance threshold. + +I am sorry that for this recommendation and I strongly suggest the authors consider the reviewer's suggestions in preparing the next version of this work. In particular, it seems like providing a full study of the memory usage of your approach vs. others as well as providing more insights about the ""trajectory"" (see the comment from ZR5n) might go a long way toward improving the paper.",ICLR2022, +XtL_nqo1gQ,1610040000000.0,1610470000000.0,1,ok4MWWSeOJ1,ok4MWWSeOJ1,Final Decision,Reject,"The paper proposes a unification of three popular baseline regularizers in continual learning. The unification is realized through a claim that they all regularize (surprisingly) related objectives. + +The key strengths of the paper highlighted by the reviewers were: +1. The established connection is valuable and interesting, even if weaker than suggested originally +2. Good motivation (unifying different regularization methods is useful for the community) +3. Clear writing + +The key weakness of the paper is a weak empirical validation of the claim that these three regularizers work *because* they regularize the norm of the gradient (as mentioned in the discussion by R3). Rather, the key claims are correlational. The authors correctly say that (1) the three regularizers all regularize related objects (namely different norms of the gradient) and (2) they reduce forgetting. However, it is not sufficiently well demonstrated that (1) => (2). Relatedly, given that the paper does not have a very clear theoretical contribution, it would be really helpful to demonstrate utility. It would be useful to extend experiments that apply these insights to developing novel regularizers or improving/simplifying hyperparameter tuning. + +Additionally, in the review process, the link was discovered to be weaker than originally suggested. The paper casts the relation in terms of the Fisher Information Matrix, which suggests it is theoretical and sound. After the discussion, it seems that viewing this relationship in terms of the Fisher Information Matrix is somewhat misleading. The three different regularization methods all regularize different norms of the gradient (L1 or L2), which are empirically, and under some assumption theoretically, related. More precisely, EWC regularizes the trace of the *Empirical* Fisher, which is equivalent to the L2 norm of the gradient of the loss function. SI regularizes a term similar to the L1 norm of the gradient. These effects were seen by the reviewers to be somewhat loosely related to the Fisher Information Matrix. + +Based on the above, I have to recommend rejection. I would like to thank the Authors for submitting the work for consideration to ICLR. I hope the feedback will be useful for improving the work.",ICLR2021, +p9AEGay0zQp,1642700000000.0,1642700000000.0,1,-9ffJ9NQmal,-9ffJ9NQmal,Paper Decision,Reject,"The authors propose Variational Inference for Concept Embeddings (VICE), a method to learn representations such that an odd object can be detected given a triplet (i.e. the odd-one-out task). The authors build on Sparse Positive object Similarity Embedding (SPoSE) which learns sparse, non-negative embeddings for images by placing a zero-mean Laplace prior. Claimed contributions include replacing it with a spike-and-slab Gaussian mixture prior, and a principled approach to choosing the subset of the dimensions of the learned embeddings. The empirical results show improvements over the SPoSE baseline. + +The reviewers appreciated the empirical improvements over SPoSE and accept that a more informative prior might lead to improved results. However, the **motivation, novelty and significance** of the proposed method doesn’t meet the acceptance criteria for ICLR. After the rebuttal and the discussion phase the reviewers felt that the work necessitates a major revision (notwithstanding the remaining issue with limited novelty), and raised the following as the main improvement points: +- Clarifying the motivation and significance. +- Stronger empirical validation and generalization beyond the THINGS dataset. +- Address the discrepancy with analyzing GMM priors, but using unimodal Gaussians in the implementation. +- Comparing the chosen prior to other prior distributions and justifying the design choices.",ICLR2022, +OEQShZpSNA,1610040000000.0,1610470000000.0,1,PEcNk5Bad7z,PEcNk5Bad7z,Final Decision,Reject,"This work investigates an algorithm to learn representations of Lie groups. It first learns a representation of the Lie algebra by enforcing the Jacobi identity using known structure coefficients. Then obtains the group representation via matrix exponentiation. +The paper also proposes a Poincaré-equivariant neural network, and applies this model to an object-tracking task. +The paper is well-motivated, the derivations could be more clearly presented but are otherwise sound. The experimental results are promising but rather limited in scope at the time.",ICLR2021, +a-g7yh5eJ8N,1610040000000.0,1610470000000.0,1,asLT0W1w7Li,asLT0W1w7Li,Final Decision,Reject,"This paper presents a model-based posterior sampling algorithm in continuous state-action spaces theoretically and empirically. The work is interesting and the authors provide numerical evaluations of the proposed method. But the reviewers find the contribution of the work limited. +",ICLR2021, +r4-_B70Y10,1642700000000.0,1642700000000.0,1,D78Go4hVcxO,D78Go4hVcxO,Paper Decision,Accept (Spotlight),"The paper presents an empirical analysis of Vision Transformers - and in particular multi-headed self-attention - and ConvNets, with a focus on optimization-related properties (loss landscape, Hessian eigenvalues). The paper shows that both classes of models have their strengths and weaknesses and proposes a hybrid model that takes the best of both worlds and demonstrates good empirical performance. + +Reviewers are mostly very positive about the paper. Main pro is that analysis is important and this paper does a thorough job at it and draws some useful insights. There are several smaller issues with the presentation and the details of the content, but the authors did a good job addressing these in their responses. + +Overall, it's a good paper on an important topic and I recommend acceptance.",ICLR2022, +0_NyWNP98E8,1642700000000.0,1642700000000.0,1,r9cpyzP-DQ,r9cpyzP-DQ,Paper Decision,Reject,"This submission proposes a new manner to learn ordinary differential equations, aiming to improve their efficiency. While judging it interesting, the reviewers are quite split on this work. Overall there was no strong consensus to accept, nor anyone willing to champion this work. + +The main stated weaknesses are + +- The reliance on the existence of a diffeomorphism (and its choice in the method) +- The choice of the base and its expressiveness +- A somewhat limited experimental section, not indicating strongly how amenable this would be to more complex problems.",ICLR2022, +1ZVi5KKT4YJ,1610040000000.0,1610470000000.0,1,c1zLYtHYyQG,c1zLYtHYyQG,Final Decision,Reject,"This work proposes to uses an energy-based objective combined with generative adversarial networks for imitation learning. While most reviewers find the work easy to follow and come with theoretical justifications, albeit mostly followed from previous works, and good coverage of experimental results, all of them raised questions regarding the limited novelty and added contribution of the work, and missing more recent baselines. Please consider address these feedback in your future submissions.",ICLR2021, +H1Q1pGLOl,1486400000000.0,1486400000000.0,1,r1PRvK9el,r1PRvK9el,ICLR committee final decision,Reject,"This paper develops a new shared memory based model for doing inference in knowledge bases. The work shows strong empirical results, and potentially could be impactful. However the reviewers felt that the work was not completely convincing without more analysis into the mechanisms of the system itself. + + Pros: + - Quality: The reviewers like the experimental results of this work, praising them as ""strikingly good"", but giving the caveat the dataset used is now a bit old for this task. + + Mixed: + - Clarity: Some reviewers found the work to be fairly well-written although there was mixed opinions about the exposition. Details of the model could be better explained, as could the development of the model + + Cons: + - Quality: The main criticism is not feeling that the methodology is motivated for this task. Multiple reviewers claim there is ""little analysis about how it works"". Or that was ""hard to see"" how this would help. All reviewers are in agreement, that the paper should explore more deeply what shared memory is adding to this task, and introduce the approach better in this regard.",ICLR2017, +qdWxcrb1o2p,1642700000000.0,1642700000000.0,1,FLA55mBee6Q,FLA55mBee6Q,Paper Decision,Accept (Spotlight),"This paper presents a new technique for constrained offline RL. The proposed method is based on reducing a nested constrained optimization problem to a single unconstrained optimization problem that can be efficiently represented with a neural network. The proposed algorithm is tested against several baselines on both random grid-worlds and continuous environments. Results clearly show that the proposed algorithm outperforms baselines while keep the provided constraints satisfied. + +The reviewers agree that the paper is well-written, the proposed algorithm is novel and technically sound, and the empirical evaluation clearly supports the claims of the paper. There were some concerns regarding the novelty of this idea, but these concerns were properly addressed by the authors in the discussion.",ICLR2022, +AxUfguy4mC,1576800000000.0,1576800000000.0,1,BkeqATVYwr,BkeqATVYwr,Paper Decision,Reject,All three reviewers are consistently negative on this paper. Thus a reject is recommended.,ICLR2020, +JsqmPEE5C2B,1610040000000.0,1610470000000.0,1,otuxSY_QDZ9,otuxSY_QDZ9,Final Decision,Reject,"All reviewers recommended rejection after considering the rebuttal from the authors. The main weaknesses of the submission include poorly motivated claims and designs, and insufficient experimental comparisons. The AC did not find sufficient grounds to overturn the reviewers' consensus recommendation. + +",ICLR2021, +S1eO_VKblV,1544820000000.0,1545350000000.0,1,H1edIiA9KQ,H1edIiA9KQ,metareview,Accept (Poster),"The submission proposes a model to generate images where one can control the fine-grained locations of objects. This is achieved by adding an ""object pathway"" to the GAN architecture. Experiments on a number of baselines are performed, including a number of reviewer-suggested metrics that were added post-rebuttal. + +The method needs bounding boxes of the objects to be placed (and labels). The proposed method is simple and likely novel and I like the evaluating done with Yolov3 to get a sense of the object detection performance on the generated images. I find the results (qual & quant) and write-up compelling and I think that the method will be of practical relevance, especially in creative applications. + +Because of this, I recommend acceptance.",ICLR2019,4: The area chair is confident but not absolutely certain +WGPogWeuQU,1642700000000.0,1642700000000.0,1,IPy3URgH47U,IPy3URgH47U,Paper Decision,Reject,"The authors propose WARM, a novel method that actively queries a small set of true labels to improve the label function in weak supervision. In particular, the authors propose a methodology that converts the label function to ""soft"" versions that are differentiable, which are in term learnable with true labels using proper updates of parameters. Empirical results on several real-world data sets demonstrate that the method yields a pretty strong performance. + +The reviewers generally agree that the idea of making the labeling functions differentiable is conceptually interesting. They are also positive about the simplicity and the promising performance. They share joint concerns on whether the idea has been sufficiently studied in terms of the design choices and completeness of the experiments. For instance, the authors can conduct deeper exploration of the trade-off for differentiable LFs. They can also study active learning strategies that are beyond basic uncertainty sampling. While the authors have provided more studies about those exploration and ablation studies during the rebuttal, generally the results are not sufficient to convince most of the reviewers. In future revisions, the authors are encouraged to clarify its position with respect to existing works that combine active learning and weakly-supervised learning. + +The authors position the paper as more empirical than theoretical. So the suggestion from some reviewers about more theoretical study is viewed as nice-to-have but not a must.",ICLR2022, +SkluXjwElV,1545010000000.0,1545350000000.0,1,B1x33sC9KQ,B1x33sC9KQ,incremental work,Reject,The paper describes a clipping method to improve the performance of quantization. The reviewers have a consensus on rejection due to the contribution is not significant. ,ICLR2019,5: The area chair is absolutely certain +KzPOC9wM_,1576800000000.0,1576800000000.0,1,ryg48p4tPH,ryg48p4tPH,Paper Decision,Accept (Poster),"The authors address the challenge of sample-efficient learning in multi-agent systems. They propose a model that distinguishes actions in terms of their semantics, specifically in terms of whether they influence the acting agent and environment or whether they influence other agents. This additional structure is shown to substantially benefit learning speed when composed with a range of state of the art multi-agent RL algorithms. During the rebuttal, technical questions were well addressed and the overall quality of the paper improved. The paper provides interesting novel insights on how the proposed structure improves learning.",ICLR2020, +#NAME?,1610040000000.0,1610470000000.0,1,yRP4_BOxdu,yRP4_BOxdu,Final Decision,Reject,"The paper presents hierarchical Bayesian methods for modelling the +full covariance structure in cases where noise dimensions cannot be +assumed independent. + +This is an important problem with potential practical importance. The +work is solid. + +Conceptual novelty in the work is somewhat limited. + +The method is applied in the paper on hierarchical linear +regression. It is claimed to be applicable to other methods as well, +and the claim is plausible, but to be fully convincing, results and +comparisons would need to be shown. The new extended discussion does +help somewhat. + +There was also discussion about whether ICLR is the best match for +this work. This is not a strereotypical ICLR paper though is relevant. + +Authors are encouraged to continue this line of work. +",ICLR2021, +JfcLSXtoGM0,1642700000000.0,1642700000000.0,1,B7O85qTDgU4,B7O85qTDgU4,Paper Decision,Reject,"The paper addresses the problem of domain generalization for learning spatio-temporal dynamics. It proposes a solution where an encoder captures some characteristics of a given environment, and a forecaster autoregressively predicts future dynamics conditioned on the characteristics learned by the encoder. Said otherwise, the forecaster learns the general form of dynamics parameterized by an environment representation extracted by the encoder. The conditioning is implemented via an adaptive instance normalization mechanism. A form of padding is also introduced in order to take into account boundary conditions. The two components encoder and forecaster are trained sequentially. This approach is casted in a meta-learning framework. Theoretical results inspired by multi-task learning and domain adaptation are also demonstrated. The model is evaluated and compared to different baselines on three problems, and for two different settings: varying initial conditions with a given dynamics, and dynamics with varying parameters. + +This is a borderline paper. It targets a timely and important problem of domain generalization for dynamic environments. The proposed solution is original and compares well experimentally to several baselines. It allows for better generalization performance for the two test settings considered. In the current version, the paper however suffers from different weaknesses. First there is the imprecision of the arguments and the description of the experiments. Some of the arguments and claims are vague and sometimes abusive, not backed up by evidence. For example, a central claim is that the encoder learns time invariant quantities characterizing the environment when the learned representations indeed change with a time shift in the input for any environment. The same goes for the argument developed for the padding construction. It is claimed to model boundary conditions, but this is not supported by any theoretical or empirical evidence. +As noted by the reviewers, the theoretical analysis is disconnected from the algorithmic and experimental developments and does not bring much additional value to the paper. What is more embarrassing is that some of the claims in this section are overstated and induce incorrect conclusions. From Theorem 3.1 and proposition 3.3, the authors suggest that multitask learning leads to better generalization than learning independently, while this is not formally guaranteed by the results (this is acknowledged by the authors in a later comment). Besides, the conditions of validity are not discussed while they seem to only cover situations for which the train and the test distributions are the same. The same holds for the second theoretical results (theorem 3.4). It is claimed that this result supports the authors’ idea of training encoder and forecaster sequentially, while it does not. Besides, the bounds in this result cannot be controlled as noted by the reviewers and are not useful in practice. + +Overall, the paper addresses an important topic and proposes new solutions. The results are promising and it is indeed an interesting contribution. However, inaccuracies and incorrect or exaggerated claims make it difficult to accept the current version of the article. The article would make a strong and innovative contribution if it were written as a purely experimental article with a detailed description of the experiments and comparisons.",ICLR2022, +SygVniYWgV,1544820000000.0,1545350000000.0,1,HJxqMhC5YQ,HJxqMhC5YQ,Limited novelty,Reject,"The authors present a system for end-to-end multi-lingual and multi-speaker speech recognition. The presented method is based on multiple prior works that propose end-to-end models for multi-lingual ASR and multi-speaker ASR; the work combines these techniques and shows that a single system can do both with minimal changes. + +The main critique from the reviewers is that the paper lacks novelty. It builds heavily on existing work, and does not make any enough contributions to be accepted at ICLR. Furthermore, training and evaluations are all on simulated test sets that are not very realistic. So it is unclear how well the techniques would generalize to real use-cases. For these reasons, the recommendation is to reject the paper.",ICLR2019,5: The area chair is absolutely certain +eEx-yC4jiT,1576800000000.0,1576800000000.0,1,rkgvXlrKwH,rkgvXlrKwH,Paper Decision,Accept (Talk),"The paper presents a framework for scalable Deep-RL on really large-scale architecture, which addresses several problems on multi-machine training of such systems with many actors and learners running. Large-scale experiments and impovements over IMPALA are presented, leading to new SOTA results. The reviewers are very positive over this work, and I think this is an important contribution to the overall learning / RL community.",ICLR2020, +B1l0Y7BklN,1544670000000.0,1545350000000.0,1,BJxRVnC5Fm,BJxRVnC5Fm,"Good writing and experiments, but limited novelty and applicability",Reject,"This paper proposes an approach to pruning units in a deep neural network while training is in progress. The idea is to (1) use a specific ""scoring function"" (the absolute-valued Taylor expansion of the loss) to identify the best units to prune, (2) computing the mean activations of the units to be pruned on a small sample of training data, (3) adding the mean activations multiplied by the outgoing weights into the biases of the next layer's units, and (4) removing the pruned units from the network. Extensive experiments show that this approach to pruning does less immediate damage than the more common zero-replacement approach, that this advantage remains (but is much smaller) after fine-tuning, and that the importance of units tends not to change much during training. The reviewers liked the quality of the writing and the extensive experimentation, but even after discussion and revision had concerns about the limited novelty of the approach, the fact that the proposed approach is incompatible with batch normalization (which severely limits the range of architectures to which the method may be applied), and were concerned that the proposed method has limited impact after fine-tuning.",ICLR2019,5: The area chair is absolutely certain +44OeAPnrPd,1610040000000.0,1610470000000.0,1,yOkmUBv9ed,yOkmUBv9ed,Final Decision,Reject,"The paper presents a model for question answering where blocks of text can be skipped and only relevant blocks are further processed for extracting the answer span. +
The reviewers mostly praised the general idea. +
R3 raised concerns on generalizability of the presented approach. +R4 raised several issues regarding presentation and clarity. +R2 and R4 have concerns regarding execution and find some of the results unconvincing. +While I don't necessarily share R2s concern on small improvements (improvements are still statistically significance), and despite the approach being very interesting, there are several issues that reviewers pointed out and wasn't resolved after discussions. +",ICLR2021, +rydMryTHM,1517250000000.0,1517260000000.0,521,B1nLkl-0Z,B1nLkl-0Z,ICLR 2018 Conference Acceptance Decision,Reject,"Thank you for submitting you paper to ICLR. Two of the reviewers are concerned that the paper’s contributions are not significant enough —either in terms of the theoretical or experimental contribution -- to warrant publication. The authors have improved the experimental aspect to include a more comprehensive comparison, but this has not moved the reviewers. + +Summary: The approach is very promising, but more experimental work is still required to demonstrate significance. ",ICLR2018, +mNvt_xyu-,1576800000000.0,1576800000000.0,1,SJlJegHFvH,SJlJegHFvH,Paper Decision,Reject,The paper propose to analyze bitcoin addresses using graph embeddings. The reviewers found that the paper was too incomplete for publication. Important information such as a description of datasets and metrics was omitted.,ICLR2020, +CTh1jZMO_Xx,1610040000000.0,1610470000000.0,1,Uqu9yHvqlRf,Uqu9yHvqlRf,Final Decision,Reject,"The authors present a study on what maintains the stability of emerged communication protocols. To study this question the authors design experiments in bargaining communities of agents in 3 setups, a) no punishment of restriction of liar agents b) allowing individual agents to refuse bargaining with liar agents and c) introducing a global punishment system for liar agents. + +Overall the reviewers agree that the design of the study is interesting, but also point that motivation and take-home messages of this study are unclear. Having read the paper, I share the same opinion. The authors discuss on a very abstract level about the implications of this study for the field of AI, but this study is quite specific and clearly does not capture all the complexities or real societies. From the scale of results and study, I think it would be more valuable to draw some concrete proposals/implications about perhaps multi-agent modelling or environment design in general. + +All in all, this is an interesting study but some more work needs to be done around research framing. +",ICLR2021, +gkBmFSZS1e,1610040000000.0,1610470000000.0,1,uR9LaO_QxF,uR9LaO_QxF,Final Decision,Accept (Poster),"I thank the authors for their submission and participation in the author response period. The reviewers unanimously agree that the papers proposes an interesting and original approach to using a costly model on a learner node, while distilling to a cheaper model run on actor nodes to gather experiences in a distributed RL framework. During discussion, R1 and myself emphasized the concern that the experiments in this paper leave open the question whether the approach will work beyond toy environments. However, I side with R2 and R3 in that the paper presents a valuable contribution to the community as it stands, and that the experiments proof the concept to the point that the paper should be accepted. I therefore recommend acceptance.",ICLR2021, +H1Am8JpSM,1517250000000.0,1517260000000.0,753,rk1FQA0pW,rk1FQA0pW,ICLR 2018 Conference Acceptance Decision,Reject,"Authors present an evaluation of end-to-end training connecting reconstruction network with detection network for lung nodules. + +Pros: +- Optimizing a mapping jointly with the task may preserve more information that is relevant to the task. + +Cons: +- Reconstruction network is not ""needed"" to generate an image -- other algorithms exist for reconstructing images from raw data. Therefore, adding the reconstruction network serves to essentially add more parameters to the neural network. As a baseline, authors should compare to a detection-only framework with a comparable number of parameters to the end-to-end system. Since this is not provided, the true benefit of end-to-end training cannot be assessed. + +- Performance improvement presented is negligible + +- Novelty is not clear / significant",ICLR2018, +kyjFIGpEEPv,1642700000000.0,1642700000000.0,1,mKsMcL8FfsV,mKsMcL8FfsV,Paper Decision,Reject,"The paper proposes a method to perform self-supervised model ensembling by learning representations directly through gradient descent at inference. The effectiveness is evaluated by k-nearest neighbors accuracy. + +The reviewers agreed that the paper studies an important and interesting problem of leveraging model ensembling for self-supervised learning, which could improve both the performance and robustness of the learned representations. However, the reviewers also agreed that there were issues with the soundness of the empirical evaluation, which was a key reason for rejection.",ICLR2022, +B1lSYgAGeN,1544900000000.0,1545350000000.0,1,BJluy2RcFm,BJluy2RcFm,Interesting take on permutation invariances.,Accept (Poster),"AR1 is concerned about whether higher-order interactions are modeled explicitly and if pi-SGD convergence conditions can be easily satisfied. AR2 is concerned that basic JP has been conceptually discussed in the literature and \pi-SGD is not novel because it was realized by Hamilton et al. (2017) and Moore & Neville (2017). However, the authors provide some theoretical analysis for this setting in contrast to prior works. AR1 is also concerned that the effect of higher-order information has not been 'disentangled' experimentally from order invariance. AR4 is concerned about poor performance of higher order Janossy pooling compared to k =1 case and asks about the number of hyper-parameters. The authors showed a harder task of computing the variance of a sequence of numbers in response. + +On balance, despite justified concerns of AR2 about novelty and AR1 about experimental verification, the work appears to tackle an interesting topic. Reviewers find the problem interesting and see some hope in the proposed solutions. On balance, AC recommends this paper to be accepted at ICLR. The authors are asked to update manuscript to reflect honestly weaknesses as expressed by reviewers, e.g. issue with effects of 'higher-order information' and 'disentangled' from order invariance.",ICLR2019,4: The area chair is confident but not absolutely certain +ryFo816BM,1517250000000.0,1517260000000.0,859,SyxCqGbRZ,SyxCqGbRZ,ICLR 2018 Conference Acceptance Decision,Reject,"This paper brings recent innovations in reinforcement learning to bear on a tremendously important application, treating sepsis. The reviewers were all compelled by the application domain but thought that the technical innovation in the work was low. While ICLR welcomes application papers, in this instance the reviewers felt that the technical contribution was not justified well enough. Two of the reviewers asked for a more clear discussion of the underlying assumptions of the approach (i.e. offline policy evaluation and not missing at random). Unfortunately, lack of significant revisions to the manuscript over the discussion period seem to have precluded changes to the reviewer scores. Overall, this could be a strong submission to a conference that is more closely tied to the application domain. + +Pros: +- Very compelling application that is well motivated +- Impressive (possibly impactful) results +- Thorough empirical comparison + +Cons: +- Lack of technical innovation +- Questions about the underlying assumptions and choice of methodology",ICLR2018, +zyRf-ZB4RPF,1642700000000.0,1642700000000.0,1,kiNEOCSEzt,kiNEOCSEzt,Paper Decision,Reject,"This paper studies the influence of recommender systems on users' preferences. The authors propose a method for estimating preference shifts, evaluating their desirability, and avoiding such shifts (when needed). + +After the initial review and discussion period, a fourth reviewer with significant recsys experience and a very good knowledge of this sub area was invited to provide an additional review of the paper. This is reviewer vNt7. Their review was positive overall but did highlight some limitations and potential ways to improve the paper's grounding in the recsys literature. + +Overall, the main strengths of this paper were that it studies an interesting and practically motivated question. The reviewers also found the proposed solution reasonable. + +The main limitations are twofold. One, the results use a single set of simulation assumptions. Showing similar results under different simulation assumptions would be helpful to better understand the robustness and potential limitations of the approach. Two, there is a certain disconnect with the simulation literature. See comments from reviewers vNt7 and kWQ2 (although I found your reply to Virtual-Taobao convincing). + +Overall and given the final reviewer recommendations (three marginally above and one marginally below), this is a very borderline paper. However, the consensus view of the committee is that it would benefit from additional work before publication. + +I am sorry that I cannot recommend acceptance at this stage. I do believe that some of the suggestions from the reviewers highlighted above (more diverse simulation, better grounding in current recsys simulation literature and in the field) will be useful in preparing the next version of this work.",ICLR2022, +p33NQ7dkqwi,1610040000000.0,1610470000000.0,1,n7wIfYPdVet,n7wIfYPdVet,Final Decision,Accept (Poster),The paper proposes a novel framework to develop useful auxiliary tasks and combine auxiliary tasks into a single coherent loss. The idea is good and the experiments are sufficient to verify the arguments. All the reviewers agree to accept the paper.,ICLR2021, +e8fjNf8K9_,1576800000000.0,1576800000000.0,1,BygNAa4YPH,BygNAa4YPH,Paper Decision,Reject,"This paper presents a method for out-of-distribution detection under the condition of access to only a few positive labeled samples. The main contribution as summarized by reviewers and authors is the new proposed benchmark and problem statement. + +All reviewers are in agreement that this paper is not ready for publication in its current form. The main concern is around the validity of the problem statement. The reviewers seek more clarity motivating the proposed scenario. Though the authors argue that as few-shot recognition is very difficult and may benefit from strategies like active learning, it is not directly clear how out of distribution detection is the best approach. In addition, R3 seeks clarification on the similarity to existing work. + +Considering the unanimous opinions of the reviewers and all author rebuttal text, the AC does not recommend acceptance of this work. We encourage the authors to focus their revisions on the explanation and motivation of this new benchmark and submit to a future venue. +",ICLR2020, +r1ef54VegN,1544730000000.0,1545350000000.0,1,BJxgz2R9t7,BJxgz2R9t7,"Good paper, comparison with traditional SAT solvers would be helpful",Accept (Poster),"This paper introduces a new graph neural network architecture designed to learn to solve Circuit SAT problems, a fundamental problem in computer science. The key innovation is the ability to to use the DAG structure as an input, as opposed to typical undirected (factor graph style) representations of SAT problems. The reviewers appreciated the novelty of the approach as well as the empirical results provided that demonstrate the effectiveness of the approach. Writing is clear. While the comparison with NeuroSAT is interesting and useful, there is no comparison with existing SAT solvers which are not based on learning methods. So it is not clear how big the gap with state-of-the-art is. Overall, I recommend acceptance, as the results are promising and this could inspire other researchers working on neural-symbolic approaches to search and optimization problems.",ICLR2019,4: The area chair is confident but not absolutely certain +WaYrGlwZBIu,1610040000000.0,1610470000000.0,1,butEPeLARP_,butEPeLARP_,Final Decision,Reject,"This paper studies the following broad question: How can we predict model performance when the data comes from different sources? The reviewers agreed that the direction studied is very interesting. While the results presented in this work are promising, several reviewers pointed out some weaknesses in the paper, including a confusion between absolute loss and excess loss, and the limited scope of the experiments. Overall, this paper does not appear to be ready for publication in its current form. In my personal opinion, if the concerns raised by the reviewers are appropriately addressed, this work could be publishable in a high quality venue.",ICLR2021, +L5BS8isuUhz,1610040000000.0,1610470000000.0,1,3YdNZD5dMxI,3YdNZD5dMxI,Final Decision,Reject,"The paper receives a mixed rating, with R3 rates the paper above the bar, R1 and R2 rates marginally above the bar, and R4 recommends rejection. The cited positive points include 1) decomposing image generation into first synthesizing segmentation masks and then converting segmentation masks to images, and 2) good results comparing to Progressive GAN and BigGAN. R4 raises several concerns, including the novelty concern and unconvincing experimental validation. After analyzing the papers, the reviews, and the rebuttal, the AC finds the arguments made by R4 more convincing. Decomposing image generation to a two-step approach has been illustrated in the prior work [Wang & Gupta ECCV 2016, Hong, Yang, Choi, Lee CVPR 2018]. The proposed method does not provide additional insights. The provided experimental results are not very convincing, either. As the proposed setting assuming the availability of segmentation masks, it is not surprising that it outperforms the unconditional baselines. Overall, the AC believes the paper does not have enough novelty to justify its acceptance and would recommend rejection of the paper.",ICLR2021, +0Jl3t5ayC5,1576800000000.0,1576800000000.0,1,HJxNAnVtDS,HJxNAnVtDS,Paper Decision,Accept (Talk),"This manuscript analyzes the convergence of federated learning wit hstragellers, and provides convergence rates. The proof techniques involve bounding the effects of the non-identical distribution due to stragglers and related issues. The manuscript also includes a thorough empirical evaluation. Overall, the reviewers were quite positive about the manuscript, with a few details that should be improved. ",ICLR2020, +r1gDEO5NxV,1545020000000.0,1545350000000.0,1,SygHGnRqK7,SygHGnRqK7,Borderline paper,Reject,"While there was some support for the ideas presented, unfortunately this paper was on the borderline. Significant concerns were raised as to whether the setting studied was realistic, among others.",ICLR2019,4: The area chair is confident but not absolutely certain +rynOLyTHz,1517250000000.0,1517260000000.0,821,r1ayG7WRZ,r1ayG7WRZ,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers highlight a lack of technical content and poor writing. +They all agree on rejection. +There was no author rebuttal or pointer to a new version. ",ICLR2018, +BpPn_s8xbfd,1610040000000.0,1610470000000.0,1,6DOZ8XNNfGN,6DOZ8XNNfGN,Final Decision,Accept (Poster),"Summary: The authors propose to approximate operations on graphs, roughly +speaking by approximating the graph locally around a collection of +vertices by a collection of trees. The method is presented as a +meta-algorithm that can be applied to a range of problems in the +context of learning graph representations. + +Discussion: The reviews are overall positive, though they point out a +number of weaknesses. One was unconvincing experimental +validation. Another, more conceptual one was that this is a 'unifying +framework' rather than a novel method. Additionally, there were a number of +minor points that were not clear. However, the authors have provided +additional experiments that the reviewers consider convincing, and +were able to provide sufficient clarification. + +Recommendation: +The reviewer's verdict post-discussion favors publication, and I +agree. The authors have convincingly addressed the main concerns in discussion, and novelty is not a necessity: Unifying frameworks often seem an end in themselves, but this one is +potentially useful and compellingly simple. +",ICLR2021, +bqlxhX1Y2E,1576800000000.0,1576800000000.0,1,S1e4jkSKvB,S1e4jkSKvB,Paper Decision,Accept (Spotlight),"The paper analyses the importance of different DNN modules for generalization performance, explaining why certain architectures may be much better performing than others. All reviewers agree that this is an interesting paper with a novel and important contribution. ",ICLR2020, +l5L_FQdEwO,1576800000000.0,1576800000000.0,1,Byl-264tvr,Byl-264tvr,Paper Decision,Reject,"The authors propose an end-to-end object tracker by exploiting the attention mechanism. Two reviewers recommend rejection, while the last reviewer is more positive. The concerns brought up are novelty (last reviewer), and experiments (second reviewer). Furthermore, the authors seem to overclaim their contribution. There indeed are end-to-end multi-object trackers, see Frossard & Urtasun's work for example. This work needs to be cited, and possibly a comparison is needed. Since the paper did not receive favourable reviews and there are additional citations missing, this paper cannot be accepted in current form. The authors are encouraged to strengthen their work and resubmit to a future venue.",ICLR2020, +fjp11UwcgkI,1610040000000.0,1610470000000.0,1,cjk5mri_aOm,cjk5mri_aOm,Final Decision,Reject,"The paper proposes a self-supervised method to predict the gist features of image frames during navigation of an agent supervised by depth and egomotion. The features are retargeted to train navigation policies and outperform previous methods or other pretraining schemes. The idea is related to self-supervised by feature prediction but is employed in a zone level as opposed to isolated image level. Though reasonable, in the context of the recent abundance of self-supervised prediction papers in various level of spatial visual granularity, the paper may not be of sufficient novelty to present a sizable contribution for ICLR acceptance. +",ICLR2021, +ifDo7DDFWdb,1610040000000.0,1610470000000.0,1,D5Wt3FtvCF,D5Wt3FtvCF,Final Decision,Reject,"This paper received borderline negative scores. The reviewers all agree that the proposed approach is interesting. However, there are also common concerns around the clarity of the paper, as well as lacking sufficient empirical evaluation. One reviewer also argues that technical contribution is relatively limited. The author responses were taken into account but it didn't manage to swing the reviews. Therefore, I recommend reject and wish the authors can incorporate the feedback in the revision. ",ICLR2021, +rk2enGUdg,1486400000000.0,1486400000000.0,1,Skvgqgqxe,Skvgqgqxe,ICLR committee final decision,Accept (Poster),"All the reviewers agreed that the research direction is very interesting, and generally find the results promising. We could quibble a bit about the results not being really state-of-the-art and the choice of baselines, but I think the main claims are well supported by the experiments (i.e. the induce grammar appears to be useful for the problem in question, within the specific class of models). There are still clearly many issues unresolved, and we are yet to see if this class of methods (RL / implicit structure-based) can lead to state-of-the-art results on any important NLP problem. But it is too much to ask from a submission. I see the paper as a strong contribution to the conference.",ICLR2017, +2t1eQ2FDk,1576800000000.0,1576800000000.0,1,rJe5_CNtPB,rJe5_CNtPB,Paper Decision,Reject,"The paper proposed an attention-forcing algorithm that guides the sequence-to-sequence model training to make it more stable. But as pointed out by the reviewers, the proposed method requires alignment which is normally unavailable. The solution to address that is using another teacher-forcing model, which can be expensive. + +The major concern about this paper is the experimental justification is not sufficient: +* lack of evaluations of the proposed method on different tasks; +* lack of experiments on understanding how it interact with existing techniques such as scheduled sampling etc; +* lack of comparisons to related existing supervised attention mechanisms. +",ICLR2020, +Jh_yrAivj-m,1642700000000.0,1642700000000.0,1,INO8hGXD2M,INO8hGXD2M,Paper Decision,Reject,"This paper studies the general problem of out-of-distribution (OOD) detection, where the goal is to detect outliers (i.e., points not in the distribution of training data) in the sample. The paper introduces a methodology for measuring robustness by using adversarial search/distributions. Experimental evaluation indicates that traditional metrics fail to fully capture OOD detection. The reviewers' evaluations of this work were mixed. Overall, there was consensus about the importance of the problem. Moreover, some of the reviewers argued that the submission contains some interesting new ideas. On the other hand, concerns were raised regarding lacking comparison to prior work, potential overselling of the contributions, and several aspects of the experimental evaluation. At the end, there was not sufficient support for acceptance. In its current form, the work appears to be slightly below the acceptance threshold.",ICLR2022, +xFzwRvbzlAS,1642700000000.0,1642700000000.0,1,AFH3FnBksHT,AFH3FnBksHT,Paper Decision,Reject,"While fusing multiple heterogeneous neural networks into a single network looks like an interesting exploration, there are many major concerns raised by the reviewers: +1) The motivation why the proposed method works is not convincing. In other words, under what conditions the proposed would work or would not work is not clear. +2) The authors failed to provide either theoretical analysis or convincing empirical studies of the proposed method. In the rebuttal, the authors did not address the critical issues raised by the reviewers. +3) There are many other detailed problems about the proposed method as well as the experimental setup. + +Therefore, by considering the above concerns, this submission does not meet the standard of publication at ICLR.",ICLR2022, +B1l3fb8d14,1544210000000.0,1545350000000.0,1,ByEtPiAcY7,ByEtPiAcY7,Method converts neural networks into logical rules of varying complexity. Issues around evaluation not addressed.,Reject,"The presented paper introduces a method to represent neural networks as logical rules of varying complexity, and demonstrate a tradeoff between complexity and error. Reviews yield unanimous reject, with insufficient responses by authors. + +Pros: ++ Paper well written + +Cons: +- R1 states inadequacy of baselines, which authors do not address. +- R3&4 raise issues about the novelty of the idea. +- R2&4 raise issues on limited scope of evaluation, and asked for additional experiments on at least 2 datasets which authors did not provide. + +Area chair notes the similarity of this work to other works on network compression, i.e. compression of bits to represent weights and activations. By converting neurons to logical clauses, this is essentially a similar method. Authors should familiarize themselves with this field and use it as a baseline comparison. i.e.: https://arxiv.org/pdf/1609.07061.pdf ",ICLR2019,5: The area chair is absolutely certain +H1HuHJTrf,1517250000000.0,1517260000000.0,597,SkaPsfZ0W,SkaPsfZ0W,ICLR 2018 Conference Acceptance Decision,Reject,"This paper proposes a multiscale variant of Graph Convolutional Networks (GCN) , obtained by combining separate GCN modules using powers of normalized adjacency as generators. The model is tested on several node classification semi-supervised tasks obtaining excellent numerical performance. + +Reviewers acknowledged the good empirical performance of the model, but all raised the issue of limited novelty, relative to the growing body of literature on graph neural networks. In particular, they missed an analysis that compares random walks powers to other multiscale approaches and justifies its performance in the context of semi-supervised learning. Overall, the AC believes this is a good paper, but it can be significantly stronger with an extra iteration that addresses these limitations. ",ICLR2018, +BJg4IgUVlV,1545000000000.0,1545350000000.0,1,HJeABnCqKQ,HJeABnCqKQ,combination of self-imitation and GAIL - needs more thorough development of conceptual insights,Reject,"The paper proposes an extension to reinforcement learning with self-imitation (SIL)[Oh et al. 2018]. It is based on the idea of leveraging previously encountered high-reward trajectories for reward shaping. This shaping is learned automatically using an adversarial setup, similar to GAIL [Ho & Ermon, 2016]. The paper clearly presents the proposed approach and relation to previous work. Empirical evaluation shows strong performance on a 2D point mass problem designed to examine the algorithms behavior. Of particular note are the insightful visualizations in Figure 2 and 3 which shed light on the algorithm's learning behavior. Empirical results on the Mujoco domain show that the proposed approach is particularly strong under delayed-reward (20 steps) and noisy-observation settings. + +The reviewers and AC note the following potential weaknesses: The paper presents an empirical validation showing improvements over PPO, in particular in Mujoco tasks with delayed rewards and with noisy observations. However, given the close relation to SIL, a direct comparison with the latter algorithm seems more appropriate. Reviewers 2 and 3 pointed out that the empirical validation of SIL was more extensive, including results on a wide range of Atari games. The authors provided results on several hard-exploration Atari games in the rebuttal period, but the results of the comparison to SIL were inconclusive. Given that the main contribution of the paper is empirical, the reviewers and the AC consider the contribution incremental. + +The reviewers noted that the proposed method was presented with little theoretical justification, which limits the contribution of the paper. During the rebuttal phase, the authors sketched a theoretical argument in their rebuttal, but noted that they are not able to provide a guarantee that trajectories in the replay buffer constitute an unbiased sample from the optimal policy, and that policy gradient methods in general are not guaranteed to converge to a globally optimal policy. The AC notes that conceptual insights can also be provided by motivating algorithmic or modeling choices, or through detailed analysis of the obtained results with the goal to further understanding of the observed behavior. Any such form of developing further insights would strengthen the contribution of the submission.",ICLR2019,4: The area chair is confident but not absolutely certain +laVLKkd8r__,1610040000000.0,1610470000000.0,1,N6JECD-PI5w,N6JECD-PI5w,Final Decision,Accept (Poster),"The paper presents a fair filter network to mitigating bias in sentence encoders by constructive learning. The approach reduces the bias in the embedding while preserves the semantic information of the original sentences. + +Overall, all the reviewers agree that the paper is interesting and the experiment is convincing. Especially the proposed approach is conceptually simple and effective. + +One suggestion is that the model only considers fairness metric based on the similarity between sentence embedding; however, it would be better to investigate how the ""debiased embedding"" helps to reduce the bias in more advanced downstream NLP applications such as coreference resolution, in which researchers demonstrate that the bias in underlying representation causing bias in the downstream model predictions. ",ICLR2021, +rJxIHJ1mlN,1544900000000.0,1545350000000.0,1,HkGTwjCctm,HkGTwjCctm,"Interesting ideas, but insufficient insight and novelty",Reject,"This paper studies change-point detection in time series using a multiscale neural network architecture which contains recurrent connections across different time scales. + +Reviewers were mixed in this submission. They found the paper generally clear and well-written, and the idea of adding a multiscale component to the model interesting. However, they also pointed out weaknesses in the related work section and found the experimental setup somewhat limited. In particular, the paper provides little to no analysis of the learnt features. Taking these assessments into consideration, the AC concludes this submission cannot be accepted at this time. ",ICLR2019,4: The area chair is confident but not absolutely certain +ZpmZDwwsq2r,1642700000000.0,1642700000000.0,1,_dXmN3FV--0,_dXmN3FV--0,Paper Decision,Reject,"### Summary + +The paper demonstrates the applicability of pruning to tabular datasets, which aren't typically explored in the literature on pruning. The work identifies that yes, pruning can indeed be applied to this domain with some success. + + +### Discussion + +#### Strengths + +An unconventional domain that, nonetheless, should be studied. + +#### Weaknesses + +The empirical setup does not include comparisons to baselines or ablations (e.g., different importance metrics). + +### Decision + +I recommend Reject. Reviewer k3Jq provides a precise and constructive set of criticisms that if solved would make for an interesting and significant piece of work.",ICLR2022, +TQZihR6z0x,1576800000000.0,1576800000000.0,1,H1l-02VKPB,H1l-02VKPB,Paper Decision,Reject,"This paper proposes to incorporate graph topology into pooling operations on graphs, to better define the notion of locality necessary for pooling. While the paper tackles an important problems, and seems to be also well-written, the reviewers agree that there are several issues regarding the contribution and empirical results that need to be addressed before this paper is ready for publication.",ICLR2020, +ryhBQk6Hf,1517250000000.0,1517260000000.0,135,H1UOm4gA-,H1UOm4gA-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This manuscript was reviewed by 3 expert reviewers and their evaluation is generally positive. The authors have responded to the questions asked and the reviewers are satisfied with the responses. Although the 2D environments are underwhelming (compared to 3D environments such as SUNCG, Doom, Thor, etc), one thing that distinguishes this paper from other concurrent submissions on the similar topics is the demonstration that ""words learned only from a VQA-style supervision condition can be successfully interpreted in an instruction-following setting."" ",ICLR2018, +PEeIsf7mqYM,1610040000000.0,1610470000000.0,1,bEoxzW_EXsa,bEoxzW_EXsa,Final Decision,Accept (Poster)," +The reviewers have different views on the papers but agreed that the paper can be accepted. However, they suggested +some points of improvements including the writing (clarity and style) and experiments showing strong improvements +compared to WGAN.",ICLR2021, +BkLB2M8dx,1486400000000.0,1486400000000.0,1,ByqiJIqxg,ByqiJIqxg,ICLR committee final decision,Accept (Poster),"Though the method does not seem to really break new ground in transfer learning (see Reviewer 1), the reviewers do not question the validity of the approach. The online aspect of the approach as well as an application of Bayesian moment matching to HMMs with GM emissions seem novel. Though the topic may seem a bit peripheral within ICLR, I agree that it can be considered as a representation learning method and the divergence from the mainstream (within ICLR) may be as much a pro as a con. Also, on the positive side, the applications are interesting and real, and the methods seem well suited to the task. The paper is well written. + + + well-written + + technically solid + + interesting application + + - innovation is moderate + + +/- not a typical ICLR paper",ICLR2017, +m-kP9EEhVRw,1642700000000.0,1642700000000.0,1,5SgoJKayTvs,5SgoJKayTvs,Paper Decision,Reject,"This paper aims at improving AAEs with an intervention loss. While the topic is important, the reviewers agree that + +- The paper has poor clarity, +- The related work is not adequately put into perspective, +- There are concerns with technical correctness, +- Experimental evidence is lacking, + +As the authors have not addressed any of these concerns, the paper can not be accepted in its current form.",ICLR2022, +yJbHFikafvY,1642700000000.0,1642700000000.0,1,Dzpe9C1mpiv,Dzpe9C1mpiv,Paper Decision,Accept (Poster),"The paper extends the previously established connection between adversarial training (AT) and Wasserstein distributional robustness (WDR) to other adversarial defense methods such as PGD-AT, TRADES and MART, and connects them to WDR. While this connection itself is not surprising given earlier works connecting AT and WDR, the paper makes contributions in establishing it formally and proposing algorithmic variations (eg, softball projection) that show clear empirical gains on standard benchmarks of MNIST/CIFAR10/CIFAR100 over point-wise adversarial defense methods.",ICLR2022, +AuyFGF4L7a,1576800000000.0,1576800000000.0,1,rkgqN1SYvr,rkgqN1SYvr,Paper Decision,Accept (Poster),"The paper shows that initializing the parameters of a deep linear network from the orthogonal group speeds up learning, whereas sampling the parameters from a Gaussian may be harmful. + +The result of this paper can be interesting to the deep learning community. The main concern the reviewers raised is the huge overlap with the paper by Du & Hu (2019). It would have been nice to actually see whether the results for linear networks empirically also hold for nonlinear networks. ",ICLR2020, +niMDW9ftDg0,1642700000000.0,1642700000000.0,1,SLz5sZjacp,SLz5sZjacp,Paper Decision,Accept (Poster),"This paper presents a new metric for disentanglement of learned representations, extending a prominent framework (DCI) to support object-centric structured representations. + +The reviewers agree on the importance of the question and find the metric a valuable contribution for addressing this problem. In the discussion, the reviewers identified some clarity issues that the authors have improved, leading to an overall much better writeup, as well as some deeper evaluation of learned matching agreements. The main remaining points that could be improved are + - making the results more robust with thorough hyperparam tuning + - connecting to other methods for inducing soft / probabilistic matchings, such as Sinkhorn or smooth&sparse optimal transport. + +Please consider switching to the Times font as recommended by the ICLR style guide.",ICLR2022, +SYGE_gUN2BT,1610040000000.0,1610470000000.0,1,Qe_de8HpWK,Qe_de8HpWK,Final Decision,Reject,"This paper proposes a new quantum machine learning framework which is evaluated on the MNIST dataset. While the paper was relatively well written, reviewers noted that most of the ideas are already well established and used in quantum machine learning community. Thus it was not clear what novelty is provided relative to related work.",ICLR2021, +J-Jq5dWiWe,1576800000000.0,1576800000000.0,1,BJlJVCEYDB,BJlJVCEYDB,Paper Decision,Reject,"This paper proposes a deep RL framework that incorporates motivation as input features, and is tested on 3 simplified domains, including one which is presented to rodents. + +While R2 found the paper well-written and interesting to read, a common theme among reviewer comments is that it’s not clear what the main contribution is, as it seems to simultaneously be claiming a ML contribution (motivation as a feature input helps with certain tasks) as well as a neuroscientific contribution (their agent exhibited representations that clustered similarly to those in animals). In trying to do both, it’s perhaps doing both a disservice. + +I think it’s commendable to try to bridge the fields of deep RL and neuroscience, and this is indeed an intriguing paper. However any such paper still needs to have a clear contribution. It seems that the ML contributions are too slight to be of general practical use, while the neuroscientific contributions are muddled somewhat. The authors several times mentioned the space constraints limiting their explanations. Perhaps this is an indication that they are trying to cover too much within one paper. I urge the authors to consider splitting it up into two separate works in order to give both the needed focus. + +I also have some concerns about the results themselves. R1 and R3 both mentioned that the comparison between the non-motivated agent and the motivated agent wasn’t quite fair, since one is essentially only given partial information. It’s therefore not clear how we should be interpreting the performance difference. Second, why was the non-motivated agent not analyzed in the same way as the motivated agent for the Pavlovian task? Isn’t this a crucial comparison to make, if one wanted to argue that the motivational salience is key to reproducing the representational similarities of the animals? (The new experiment with the random fixed weights is interesting, I would have liked to see those results.) For these reasons and the ones laid out in the extensive comments of the reviewers, I’m afraid I have to recommend reject. +",ICLR2020, +0zwkC8dDzxV,1642700000000.0,1642700000000.0,1,xRK8xgFuiu,xRK8xgFuiu,Paper Decision,Reject,"This paper proposes an algorithm for learning linear SEMs via the Cholesky factorization and provides a detailed theoretical analysis of the algorithm. After an extensive discussion and clarification from the authors, there was a consensus that the theoretical results are incremental compared to existing work and many of the claims need additional context in light of existing work. In particular, I recommend the authors pay careful attention to the presentation of the sample complexity bounds, which were revealed to be substantially weaker than initially claimed, and to validate these bounds with careful experiments.",ICLR2022, +6CPpyjfi7r,1642700000000.0,1642700000000.0,1,T8BnDXDTcFZ,T8BnDXDTcFZ,Paper Decision,Reject,"The paper derives a new parameter initialization for deep spiking neural networks to overcome the vanishing gradient problem. + +During the review, concerns were expressed about how well the method would scale to larger neural networks. It was also questioned how this parameter initialization technique compares with a recently proposed batch normalization technique, especially when training larger neural network on more challenging datasets. There were also concerns raised about the readability of the paper. + +I commend the authors for improving the readability of their paper in their revision. I also commend them for taking the time to implement the comparisons requested by the reviewers. These new comparisons revealed that batch normalization and its recently proposed variant were superior to the initialization method on its own, and that the initialization proposed in the paper did not significantly improve performance when paired with batch norm [[1](https://openreview.net/forum?id=T8BnDXDTcFZ¬eId=yIAPcSbUAQ0)]. The authors also acknowledged based on the new results, that their proposed parameter initialization scheme appears to fail to scale to more complex datasets and networks, especially relative to competing methods, which invalidates a key claim that their approach can ""accelerate training and get better accuracy compared with existing methods"" [[2](https://openreview.net/forum?id=T8BnDXDTcFZ¬eId=j12fwayWEb)]. + +The recommendation is to reject the paper in its current form.",ICLR2022, +BJgSkv5fgV,1544890000000.0,1545350000000.0,1,B1e0KsRcYQ,B1e0KsRcYQ,Incomplete work.,Reject,"AR1 is is concerned that the only contribution of this work is combining second-order pooling with with a codebook style assignments. After discussions, AR1 still maintains that that the proposed factorization is a marginal contribution. AR2 feels that the proposed paper is highly related to numerous current works (e.g. mostly a mixture of existing contributions) and that evaluations have not been improved. AR3 also points that this paper lacks important comparisons for fairly evaluating the effectiveness of the proposed formulation and it lacks detailed description and discussion for the methods. + +AC has also pointed several works to the authors which are highly related (but by no means this is not an exhaustive list and authors need to explore google scholar to retrieve more relevant papers than the listed ones): + +[1] MoNet: Moments Embedding Network by Gou et al. (e.g. Stanford Cars via Tensor Sketching: 90.8 vs. 90.4 in this submission, Airplane: 88.1 vs. 87.3% in this submission, 85.7 vs. 84.3% in this submission) +[2] Second-order Democratic Aggregation by Lin et al. (e.g. Stanford Cars: 90.8 vs. 90.4 in this submission) +[3] Statistically-motivated Second-order Pooling by Yu et al (CUB: 85%) +[4] DeepKSPD: Learning Kernel-matrix-based SPD Representation for Fine-grained Image Recognition by Engin et al. +[5] Global Gated Mixture of Second-order Pooling for Improving Deep Convolutional Neural Networks by Q. Wang et al. (512D representations) +[6] Low-rank Bilinear Pooling for Fine-Grained Classification' by S. Kong et al. (CVPR I believe). They get some reduction of size of 10x less than tensor sketch, higher results than here by some >2% (CUB), and all this obtained in somewhat more sophisticated way. + +The authors brushed under the carpet some comparisons. Some methods above are simply better performing even if cited, e.g. MoNet [1] uses sketching and seems a better performer on several datasets, see [2] that uses sketching (Section 4.4), see [5] which also generates compact representation (8K). [4] may be not compact but the whole point is to compare compact methods with non-compact second-order ones too (e.g. small performance loss for compact methods is OK but big loss warrants a question whether they are still useful). Approach [6] seems to also obtain better results on some sets (common testbed comparisons are essentially encouraged). + +At this point, AC will also point authors to sparse coding methods on matrices (bilinear) and tensors (higher-order) from years 2013-2018 (TPAMI, CVPR, ECCV, ICCV, etc.). These all methods can produce compact representations (512 to 10K or so) of bilinear or higher-order descriptors for classification. This manuscript fails to mention this family of methods. + +For a paper to be improved for the future, the authors should consider the following: +- make a thorough comparison with existing second-order/bilinear methods in the common testbed (most of the codes are out there on-line) +- the authors should vary the size of representation (from 512 to 8K or more) and plot this against accuracy +- the authors should provide theoretical discussion and guarantees on the quality of their low-rank approximations (e.g. sketching has clear bounds on its approximation quality, rates, computational cost). The authors should provide some bounds on the loss of information in the proposed method. +- authors should discuss the theoretical complexity of proposed method (and other methods in the literature) + +Additionally, the authors should improve their references and the story line. Citing (Lin et al. (2015)) in Eq. 1 and 2 as if they are the father of bilinear pooling is misleading. Citing (Gao et al. (2016)) in the context of polynomial kernel approximation in Eq. 3 to obtain bilinear pooling should be preceded with earlier works that expand polynomial kernel to obtain bilieanr pooling. AC can think of at least two papers from 2012/2013 which do derive bilinear pooling and could be cited here instead. AC encourages the authors to revise their references and story behind bilinear pooling to give unsuspected readers a full/honest story of bilinear representations and compact methods (whether they are branded as compact or just use sketching etc., whether they use dictionaries or low-rank representations). + +In conclusion, it feels this manuscript is not ready for publication with ICLR and requires a major revision. However, there is some merit in the proposed direction and authors are encouraged to explore further.",ICLR2019,5: The area chair is absolutely certain +HJg8RdU4gV,1545000000000.0,1545350000000.0,1,SJf_XhCqKm,SJf_XhCqKm,"A well written, interesting paper on designing experiments for hyperparameter optimization using DPPs, but with lingering concerns over novelty and experiments. ",Reject,"This is a very clearly written, well composed paper that does a good job of placing the proposed contribution in the scope of hyperparameter optimization techniques. This paper certainly appears to have been improved over the version submitted to the previous ICLR. In particular, the writing is much clearer and easy to follow and the methodology and experiments have been improved. The ideas are well motivated and it's exciting to see that sampling from a k-DPP can give better low discrepancy sequences than e.g. Sobol. However, the reviewers still seem to have two major concerns, namely novelty of the approach (DPPs have been used for Bayesian optimization before) and the empirical evaluation. + +Empirical evaluation: As Reviewer1 notes, there are much more recent approaches for Bayesian optimization that have improved significantly over the TPE method, also for conditional parameters. There are also more recent approaches proposing variants of random search such as hyperband. + +Novelty: There is some work on using determinantal point processes for Bayesian optimization and related work in optimal experimental design. Optimal design has a significant amount of literature dedicated to designing a set of experiments according to the determinant of their covariance matrix - i.e. D-Optimal Design. This work may add some interesting contributions to that literature, including fast sampling from k-DPPs, etc. It would be useful, however, to add some discussion of that literature in the paper. Jegelka and Sra's tutorial at NeurIPS on negative dependence had a nice overview of some of this literature. + +Unfortunately, two of the three reviewers thought the paper was just below the borderline and none of the reviewers were willing to champion it. There are very promising and interesting ideas in the paper, however, that have a lot of potential. In the opinion of the AC, one of the most powerful aspects of DPPs over e.g. low discrepancy sequences, random search, etc. is the ability to learn a distance over a space under which samples will be diverse. This can make a search *much* more efficient since (as the authors note when discussing random search vs. grid search) the DPP can sample more densely in areas and dimensions that have higher sensitivity. It would be exciting to learn kernels specifically for hyperparameter optimization problems (e.g. a kernel specifically for learning rates that can capture e.g. logarithmic scaling). Taking the objective into account through the quality score, as proposed for future work, also seems very sensible and could significantly improve results as well. ",ICLR2019,5: The area chair is absolutely certain +HJlD0CKexE,1544750000000.0,1545350000000.0,1,SklXvs0qt7,SklXvs0qt7,"Insufficient clarity and detail, reviewer concerns not addressed.",Reject,"The manuscript describes a procedure for prioritizing the contents of an experience replay buffer in a UVFA setting based on a density model of the trajectory of the achieved goal states. A rank-based transformation of densities is used to stochastically prioritize the replay memory. + +Reducing the sample complexity of RL is a worthy goal and reviewers found the overall approach is interesting, if somewhat arbitrary in the implementation details. Concerns were raised about clarity and justification, and the restriction of experiments to fully deterministic environments. + +After personally reading the updated manuscript I found clarity to still be lacking. Statements like ""... uses the ranking number (starting from zero) directly as the probability for sampling"" -- this is not true (it is normalized, as confusingly laid out in equation 2 with the same symbol used for the unnormalized and normalized densities), and also implies that the least likely trajectory under the model is never sampled, which doesn't seem like a desirable property. Schaul's ""prioritized experience replay"" is cited for the choice of rank-based distribution, but the distribution employed in that work has rather different form. The related work section is also very poor given the existing body of literature on curiosity in a reinforcement learning context, and the new ""importance sampling perspective"" section serves little explicatory purpose given that an importance re-weighting is not performed. + +Overall, I concur most strongly with AnonReviewer1 that more work is needed to motivate the method and prove its robustness applicability, as well as to polish the presentation.",ICLR2019,4: The area chair is confident but not absolutely certain +Mrprk-YZZ6,1576800000000.0,1576800000000.0,1,SJgSflHKDr,SJgSflHKDr,Paper Decision,Reject,"The authors discuss how to predict generalization gaps. Reviews are mixed, putting the submission in the lower half of this year's submissions. I also would have liked to see a comparison with other divergence metrics, for example, L1, MMD, H-distance, discrepancy distance, and learned representations (e.g., BERT, Laser, etc., for language). Without this, the empirical evaluation of FD is a bit weak. Also, the obvious next step would be trying to minimize FD in the context of domain adaptation, and the question is if this shouldn't already be part of your paper? Suggestions: The Amazon reviews are time-stamped, enabling you to run experiments with drift over time. See [0] for an example. + +[0] https://www.aclweb.org/anthology/W18-6210/",ICLR2020, +H1xrg92keN,1544700000000.0,1545350000000.0,1,BJWfW2C9Y7,BJWfW2C9Y7,Issues with the analysis,Reject,"Dear authors, + +All reviewers pointed to severe issues with the analysis, making the paper unsuitable for publication to ICLR. Please take their comments into account should you decide to resubmit this work.",ICLR2019,5: The area chair is absolutely certain +Bke5fit-gN,1544820000000.0,1545350000000.0,1,rkxn7nR5KX,rkxn7nR5KX,meta review,Reject,"This paper proposes an approach for incremental learning of new classes using meta-learning. +Strengths: The framework is interesting. The reviewers agree that the paper is well-written and clear. The experiments include comparisons to prior work, and the ablation studies are useful for judging the performance of the method. +Weaknesses: The paper does not provide significant insights over Gidaris & Komodakis '18. Reviewer 1 was also concerned that the motivation for RBP is not entirely clear. +Overall, the reviewers found that the strengths did not outweigh the weaknesses. Hence, I recommend reject. +",ICLR2019,4: The area chair is confident but not absolutely certain +ss-RU_lVAM,1576800000000.0,1576800000000.0,1,HkgsUJrtDB,HkgsUJrtDB,Paper Decision,Accept (Poster),"The paper addresses the problem of fair representation learning. The authors propose to use Rényi correlation as a measure of (in)dependence between the predictor and the sensitive attribute and developed a general training framework to impose fairness with theoretical properties. The empirical evaluations have been performed using standard benchmarks for fairness methods and the SOTA baselines -- all this supports the main claims of this work's contributions. +All the reviewers and AC agree that this work has made a valuable contribution and recommend acceptance. Congratulations to the authors! +",ICLR2020, +46hwjGufIM1,1642700000000.0,1642700000000.0,1,XSwpJ2bonX,XSwpJ2bonX,Paper Decision,Reject,"Meta Review for Neural Circuit Architectural Priors for Embodied Control + +The motivation of this work is to address an important challenge: To understand innate contributions to neural circuits for motor control. This paper proposes both a set of reusable architectural components and design principles, and also interesting principles for producing biologically-inspired neural networks for embodied control. This work aims to be at the intersection between neuroscience and machine learning for improving the design of artificial neural networks and improving our understanding of observed biological networks. In their model, various components of biological networks are replicated (such as the balance between excitation and inhibition, sparsity, and oscillation). They show that a resulting model, inspired by C.elegans, can learn to swim more efficiently (when evaluated on the Swimmer RL environment) and requires fewer parameters while achieving similar accuracy as an MLP. + +Most reviewers, including myself, recognize (and appreciate) the ambition of this work, and are excited at the goal of looking at problems from the perspectives of both system neuroscience and machine learning. The motivations of this paper are clearly explained, and the paper is well written (also the diagrams are great). I'm very excited about this work, and hope to see it succeed, but in the paper's current state (even with the revisions), I don't think it addresses the reviewers' main concerns. + +After discussions and examining the paper and the reviews in detail, I feel reviewer GaKc best summarizes the main issues with the work at its current state: + +- This paper is interesting but does not proposes a significant improvement to the literature as the gap between the promises made in the motivation and actually delivered work is too wide. + +- From a neuroscience point of view, this work does not provide substantial evidence of the importance of the model at either modeling or simulating biological neural systems. + +- From a ""theoretical"" point of view, the model does not provide much advancement to the machine learning community either. + +So while the current work (especially in the revised state) I would consider to be an outstanding workshop paper, I cannot recommend it for acceptance at ICLR 2022. An advice I would give to the authors (as someone who publishes to ML conferences, and Sys-Neuro/Bio venues) is that for these ML conferences, it might be easier to make the narrative of the work narrower, and well-defined. If the method is supposed to demonstrate significant advantages of biologically inspired network architecture over current RL methods, the results should clearly demonstrate convincing experimental results that can persuade the (non-neuro) RL community to have interest in the method. If the method does not achieve SOTA results, then try to present the method capable of something really useful that existing RL methods simply fail at (and emphasize that as a core contribution). Conversely, if the narrative is to use a bio-inspired network to emulate biological behaviors, the method must have something important to offer for the community of people working on simulating biological neural systems. + +I look forward to seeing this work improved and eventually published at a journal or presented at a conference in the future, good luck!",ICLR2022, +5TcSYg43cRc,1610040000000.0,1610470000000.0,1,Ig53hpHxS4,Ig53hpHxS4,Final Decision,Accept (Poster),"This paper proposes an autoregressive flow-based network, Flowtron, for TTS with style transfer. It integrates the Tacotron architecture with the flow-based generative model. Extensive experiments are carried out in a controlled manner and the results show that the proposed Flowtron framework can achieve comparable MOS scores to the SOTA TTS models and is good at generating speech with different styles. All reviewers consider the work interesting. There are concerns raised on technical details which mostly have been cleared by the authors' rebuttal. The exposition also has been greatly improved based on the reviewers' suggestions and questions. Overall, this is an interesting paper and I would recommend acceptance. ",ICLR2021, +tj0l8PNgi71,1610040000000.0,1610470000000.0,1,Ptaz_zIFbX,Ptaz_zIFbX,Final Decision,Accept (Poster),"This paper was controversial amongst the reviewers. There is clear utility to the ICLR community: a new model of grid cells based on well-known technique (SR) used frequently in ML; good science---careful analysis showing the proposed model exhibits key properties and useful in synthetic navigation domains; such work reminds of the important concerns in natural learning systems which is relevant to those that wish to simulate and build intelligence. Two of the reviewers with subject matter experience in the area advocated for acceptance. + +On the other hand, many readers of ICLR may find the paper confusing and unsatisfying as some of the reviewers did. The empirical work was limited to small domains and mostly in the form of demonstrations---a typically ICLR reader would expect a performance improvement claim or a scientific hypothesis tested by each experiment. Presented as a new algorithm for ML the paper might appear too limited and simple (e.g., relying on state aggregation). The reviewers with neuro background found the paper clear and well organized, while the ML reviewers found it confusing. The relevance of the work will be limited to a smaller subset of researchers---but this is true of many ML works also. Finally, ML readers might be more familar with neuro work which propose computational models and then validate those models against real neural activity data from brains. This is work is not like that, rather using synthetic data to demonstrate important properties and explore empirical conjectures about the model. + +In the end the paper is boarder line: the subject matter experts both listed issues that should be addressed (e.g., band cells issue), while the reaction of the ML reviewers suggests the impact of the work might be reduced at ICLR (compared to other venues). Additional text clearly articulating the scope and managing reader expectation could mitigate this concern, but it's not a small task to change the tone and pitch this way. Scientific conferences are about insights and understanding, this paper provides both. Please consider the suggested edits to maximize the impact of your work at ICLR this year. + ",ICLR2021, +s400TTStoY,1610040000000.0,1610470000000.0,1,wUUKCAmBx6q,wUUKCAmBx6q,Final Decision,Reject,"The revised paper is a solid improvement. However, all reviewers and I find that there are still a number of issues that prevent the paper from being acceptable at the current stage. For example, some important parts are still unclear, especially the definition of STI effect. The observation of STI effect requires more theoretical or empirical investigation, in addition to a toy example.",ICLR2021, +Csq5l8wPh,1576800000000.0,1576800000000.0,1,rygjHxrYDB,rygjHxrYDB,Paper Decision,Accept (Poster),"This paper introduces a new convolution-like operation, called a Harmonic Convolution (weighted combination of dilated convolutions with different dilation factors/anchors), which operates on the STFT of an audio signal. Experiments are carried on audio denoising tasks and sound separation and seems convincing, but could have been more convincing: (i) with different types of noises for the denoising task (ii) comparison with more methods for sound separation. Apart those two concerns, the authors seem to have addressed most of reviewers' complaints. +",ICLR2020, +UlmecEFG1I,1576800000000.0,1576800000000.0,1,rJg9OANFwS,rJg9OANFwS,Paper Decision,Reject,"The paper proposes two approaches to topic modeling supervised by survival analysis. The reviewers find some problems in novelty, algorithm and experiments, which is not ready for publish.",ICLR2020, +OfRwYSTzXM,1576800000000.0,1576800000000.0,1,HJe-blSYvH,HJe-blSYvH,Paper Decision,Reject,"The paper focuses on learning speech representations with contrastive predictive coding (CPC). As noted by reviewers, (i) novelty is too low (mostly making the model bidirectional) for ICLR (ii) comparison with existing work is missing.",ICLR2020, +7C_BgXicy,1576800000000.0,1576800000000.0,1,BJgkbyHKDS,BJgkbyHKDS,Paper Decision,Reject,"This paper studies the empirical performance of invertible generative models for compressive sensing, denoising and in painting. One issue in using generative models in this area has been that they hit an error floor in reconstruction due to model collapse etc i.e. one can not achieve zero error in reconstruction. The reviewers raised some concerns about novelty of the approach and thoroughness of the empirical studies. The authors response suggests that they are not claiming novelty w.r.t. to the approach but rather their use in compressive techniques. My own understanding is that this error floor is a major problem and removing its effect is a good contribution even without any novelty in the techniques. However, I do agree that a more thorough empirical study would be more convincing. While I can not recommend acceptance given the scores I do think this paper has potential and recommend the authors to resubmit to a future venue after a through revision.",ICLR2020, +uQJMLaIuLK,1610040000000.0,1610470000000.0,1,eqBwg3AcIAK,eqBwg3AcIAK,Final Decision,Accept (Poster),"This paper presents an approach to domain adaptation in reinforcement learning. The main idea behind this approach, DARC, is to modify the reward function in the source domain so that the learned policy is optimal in the target domain. This is achieved by learning a classifier that learns to discriminate between the data from the source domain and those from the target domain. + +Overall, reviewers appreciated the intuitiveness of the approach as well as its formal analysis. They had some concerns with respect to experiments, which was sorted out in the author response period. Given the overall positive reviews, I recommend accepting the paper. +",ICLR2021, +UOjOTwcfXCz,1642700000000.0,1642700000000.0,1,VDdDvnwFoyM,VDdDvnwFoyM,Paper Decision,Reject,"The authors propose a VAE-based architecture for generating multivariate time series. The base version of TimeVAE models a distribution over a fixed-length sequences of observations using a latent vector of fixed dimensionality and a convolutional encoder and decoder. The Interpretable TimeVAE model incorporates additional features from traditional time series models such as explicit modelling of trends and seasonality. TimeVAE is compared to several baselines such as TimeGAN on four small times series dataset and seems to perform competitively according to two custom evaluation metrics and a visualization. + +The reviewers thought that the paper was interesting but not ready for publication due to the following: +-The paper's contributions and their significance are not clear +-Interpretable VAE was not used in the experiments and its interpretability has not been verified +-Coverage of related work is insufficient",ICLR2022, +BkxoGgQzlN,1544860000000.0,1545350000000.0,1,S1xcx3C5FX,S1xcx3C5FX,well-written paper addressing timely question,Accept (Poster),"* Strengths + +The paper addresses an important topic: how to bound the probability that a given “bad” event occurs for a neural network under some distribution of inputs. This could be relevant, for instance, in autonomous robotics settings where there is some environment model and we would like to bound the probability of an adverse outcome (e.g. for an autonomous aircraft, the time to crash under a given turbulence model). The desired failure probabilities are often low enough that direct Monte Carlo simulation is too expensive. The present work provides some preliminary but meaningful progress towards better methods of estimating such low-probability events, and provides some evidence that the methods can scale up to larger networks. It is well-written and of high technical quality. + +* Weaknesses + +In the initial submission, one reviewer was concerned that the term “verification” was misleading, as the methods had no formal guarantees that the estimated probability was correct. The authors proposed to revise the paper to remove reference to verification in the title and the text, and afterwards all reviewers agreed the work should be accepted. The paper also may slightly overstate the generality of the method. For instance, the claim that this can be used to show that adversarial examples do not exist is probably wrong---adversarial examples often occupy a negligibly small portion of the input space. There was also concern that most comparisons were limited to naive Monte Carlo. + +* Discussion + +While there was initial disagreement among reviewers, after the discussion all reviewers agree the paper should be accepted. However, we remind the authors to implement the changes promised during the discussion period.",ICLR2019,4: The area chair is confident but not absolutely certain +1VrNzxI9PS1,1610040000000.0,1610470000000.0,1,#NAME?,#NAME?,Final Decision,Reject,"The authors extend prior work showing that networks trained to be certifiably robust using interval bound propagation are universal approximators. They extend prior results applicable to ReLU networks to a much broader class of networks with general activation functions. + +The paper makes an interesting contribution to the literature relative to prior literature showing that one need not sacrifice universal approximation guarantees while training networks with IBP to be certifiably robust to l_inf attacks. + +Since the paper is primarily theoretical, the main concern raised by the reviewers was around novelty and the theoretical significance of ideas presented relative to prior work. While proof techniques may be novel, the extension of AUA results to alternate activation functions is not surprising and do not substantially contribute to the field's understanding of learning certifiably robust networks particular since most SOTA results for IBP-based training have been achieved with ReLU based networks. The authors' rebuttal did not providing convincing arguments for the reviewers to revise their scores. Hence I do not feel that the contributions of the paper justify acceptance at this time.",ICLR2021, +B1eSwJymgN,1544900000000.0,1545350000000.0,1,H1fU8iAqKX,H1fU8iAqKX,Consensus is accept,Accept (Poster),"The overall consensus after an extended discussion of the paper is that this work should be accepted to ICLR. The back-and-forth between reviewers and authors was very productive, and resulted in substantial clarification of the work, and modification (trending positive) of the reviewer scores.",ICLR2019,5: The area chair is absolutely certain +qtR6htZ-UaP,1610040000000.0,1610470000000.0,1,io-EI8C0q6A,io-EI8C0q6A,Final Decision,Reject,"This work mainly applies wav2vec 2.0 to multilingual speech recognition and lacks of novelty. +The various pre-training and fine-tuning mix-match are specific to the speech recognition task. As suggested by reviewers, it is recommended to resubmit to a speech conference. +Also the paper lacks comparisons to SOTA on one of the well studied task (i.e. BABEL) in the speech field. + +The main factor for the decision is lack of novelty.",ICLR2021, +B1lYJijHeN,1545090000000.0,1545350000000.0,1,B1gstsCqt7,B1gstsCqt7,Interesting work on how to train a dictionary from local information,Accept (Poster),"While there has been lots of previous work on training dictionaries for sparse coding, this work tackles the problem of doing son in a purely local way. While previous work suggests that the exact computation of gradient addressed in the paper is not necessarily critical, as noted by reviewers, all reviewers agree that the work still makes important contributions through both its theoretical analyses and presented experiments. Authors are encouraged to work on improving clarity further and delineating their contribution more precisely with respect to previous results.",ICLR2019,4: The area chair is confident but not absolutely certain +a7msfpKvM4,1576800000000.0,1576800000000.0,1,rkl_f6EFPS,rkl_f6EFPS,Paper Decision,Reject," This paper focuses on the problem of robustness in the network with random loss of neurons. However, reviewers had issues with insufficient clarity of the presentation, and lack of discussion about closely related dropout approach. + + + ",ICLR2020, +r1eVc3g4xV,1544980000000.0,1545350000000.0,1,BkG5SjR5YQ,BkG5SjR5YQ,Area chair recommendation,Accept (Poster),"The submission evaluates maximum mean discrepancy estimators for post selection inference. +It combines two contributions: (i) it proposes an incomplete u-statistic estimator for MMD, (ii) it evaluates this and existing estimators in a post selection inference setting. + +The method extends the post selection inference approach of (Lee et al. 2016) to the current u-statistic approach for MMD. The top-k selection problem is phrased as a linear constraint reducing it to the problem of Lee et al. The approach is illustrated on toy examples and a GAN application. + +The main criticism of the problem is the novelty of the paper. R1 feels that it is largely just the combination of two known approaches (although it appears that the incomplete estimator is key), while R3 was significantly more impressed. Both are senior experts in the topic. + +On the balance, the reviewers were more positive than negative. R2 felt that the authors comments helped to address their concerns, while R3 gave detailed arguments in favor of the submission and championed the paper. The paper provides an additional interesting framework for evaluation of estimators, and considers their application in a broader context of post-selection inference.",ICLR2019,5: The area chair is absolutely certain +Azr6PdoBpI,1610040000000.0,1610470000000.0,1,dlEJsyHGeaL,dlEJsyHGeaL,Final Decision,Accept (Poster),"The reviewers generally liked the paper but had several concerns. The rebuttal and revision of the paper could mitigate most concerns and the reviewers are now mostly positive towards the paper. Remaining concerns are mostly about the presentation of the paper which indeed has room for improvements but overall is good enough to accept the paper. +",ICLR2021, +Q0Ah0bdfQg,1576800000000.0,1576800000000.0,1,H1e3HlSFDr,H1e3HlSFDr,Paper Decision,Reject,"This paper proposes to add constraints to the RL problem within a variational method. The hope is to specify a safe vs non-safe states. The reviewers were not convinced that this paper makes the cut for ICLR. Moreover, there was no rebuttal from the authors, so it didn't give the reviewer a chance to reconsider their opinion. Based on the current ratings, I recommend to reject this paper. ",ICLR2020, +Bylc8ckyxN,1544650000000.0,1545350000000.0,1,BJxhijAcY7,BJxhijAcY7,Good paper but requires revisions,Accept (Poster),The Reviewers noticed that the paper undergone many editions and raise concern about the content. They encourage improving experimental section further and strengthening the message of the paper. ,ICLR2019,5: The area chair is absolutely certain +KCki-i9g8pD,1610040000000.0,1610470000000.0,1,_sSHg203jSu,_sSHg203jSu,Final Decision,Reject,"This paper proposes a method for compressing weight matrices in large scale pre-trained NLP encoders (like BERT) through low-rank decompositions of both fully connected and self-attention layers. The method is used to compress and speedup pre-trained models. Experiments measure timing on a single CPU thread and demonstrate speedups with small loss of accuracy. Reviewers noted the that goal of this paper is potentially impactful. Some reviewers viewed the resulting loss in accuracy as marginal, while others viewed it as more substantial -- a potential downside. Reviewers also raised concerns about the methodology used to measure inference speedup, a critical measure of success. Specifically, timing experiments were done only on a single CPU thread -- while most practical scenarios would almost certainly rely on GPUs -- as a result, positive experimental results are less impactful. Authors updated the paper to include GPU timing experiments, which did show speedups -- though only marginal speedups over the baseline, TinyBERT. Further, reviewers pointed out that there are several other relevant baselines on compression approaches that are not compared with, and that further analysis should be done on the timing/accruacy tradeoff of baseline methods. Finally, reviewers felt that the contribution of the proposed method relative to other approaches that also attempt to compress transformers is not clearly outlined. Weighing these concerns, I agree with reviewers that the paper is not ready for acceptance in its current form. ",ICLR2021, +JIAX9ASieT7,1642700000000.0,1642700000000.0,1,B6EIcyp-Rb7,B6EIcyp-Rb7,Paper Decision,Accept (Poster),"This manuscript makes an interesting observation: there is no reason why planning-based methods like MDPs must be limited to physical or grounded environments. One can plan about more abstract textual domains. It adapts the standard methods from planning to such text domains in a fairly straightforward way. The fact that concepts from MDPs map to these problems directly is an asset: ideas could flow between these domains in the long term. While the original submission was lacking clarity and significant technical details, the authors engaged with the reviewers and resolved lingering concerns. Reviewers are unanimous that this a strong contribution.",ICLR2022, +_N_qCx-9ayz,1642700000000.0,1642700000000.0,1,2DJwuD-elOt,2DJwuD-elOt,Paper Decision,Reject,"This paper aims to improve performance on edge devices by utilizing a large-capacity network in the cloud. To this end, the authors suggest using the routing network that decides whether to use the base model (on the edge device) or the global model (on the cloud). They also propose an overall training scheme for learning not only model parameters, but also network architectures. After the discussion period, 3 reviewers are on the negative side, and 1 reviewer is positive. AC thinks that the authors’ response was not enough to convince the negative reviewers. In particular, AC agrees with the negative comments of reviewers on limited novelty, unclear motivation for the proposed method, and unclear presentations. Overall, AC recommends rejection.",ICLR2022, +0KkqKZSgC,1576800000000.0,1576800000000.0,1,HkewNJStDr,HkewNJStDr,Paper Decision,Reject,"All the reviewers reach a consensus to reject the current submission. + +In addition, there are two assumptions in the proof which seemed never included in Theorem conditions or verified in typical cases. + +1) Between Eq (16) and (17), the authors assumed the 'extended restricted strong convexity’ given by the un-numbered equation. + +2) In Eq. (25), the authors assume the existence of \sigma making the inequality true. + +However those assumptions are neither explicitly stated in theorem conditions, nor verified for typical cases in applications, e.g. even the square or logistic loss. The authors need to address these assumptions explicitly rather than using them from nowhere.",ICLR2020, +rkcBqYyzj5i,1642700000000.0,1642700000000.0,1,zBOI9LFpESK,zBOI9LFpESK,Paper Decision,Accept (Poster),"The paper proposes to learn a state-representation using bi-simulation in an RL setting. The approach is thoroughly evaluated on several benchmarks. In its current form the paper is mainly an empirical contribution, with now some theoretical contributions tucked away in the appendices. Nevertheless, an interesting approach with promising results. + +The reviewers appreciated the revised paper and the discussion. The replies and discussions successfully addressed all serious concerns of the reviewers. Please also clarify the discussed points in the next iteration of the paper, and run the experiments with more seeds, as promised.",ICLR2022, +wiYJqg68l,1576800000000.0,1576800000000.0,1,BJlS634tPr,BJlS634tPr,Paper Decision,Accept (Spotlight),"This paper proposes an improvement to the popular DARTS approach, speeding it up by performing the search in a subset of channels. The improvements are robust, and code is available for reproducibility. + +The rebuttal cleared up initial concerns, and after the (private) discussion among reviewers now all reviewers give accepting scores. Because the improvements seem somewhat incremental and only applied to DARTS, R3 argued against an oral, and even the most positive reviewer agreed that a poster format would be best for presentation. + +I therefore strongly recommend recommendation, as a poster.",ICLR2020, +KW77Op-_bTM,1642700000000.0,1642700000000.0,1,u2JeVfXIQa,u2JeVfXIQa,Paper Decision,Reject,"The paper introduces an cross-layer attention mechanism for image restoration. To reduce the computational complexity, the framework uses deformable convolutions and an adaptive selection for reducing the number of keys, as well as a neural architecture search. The paper received three borderline reject recommendations and a clear accept. After reading the reviews, responses, and the paper in details, the area chair agrees with Reviewer 6N93 that the paper has some merit. Unfortunately, he/she also agrees with the fact that the proposed framework is quite complicated with many components for a marginal improvement (something that also Reviewer 6N93 has mentioned in the discussion between reviewers). Overall, this points towards rejection, which is the final recommendation of the area chair. + +Another point that would be helpful, in case this paper is resubmitted elsewhere, is to release the code for the method, given its complexity.",ICLR2022, +8Rg0jJPhCqi,1610040000000.0,1610470000000.0,1,vYVI1CHPaQg,vYVI1CHPaQg,Final Decision,Accept (Poster),"I agree with the reviewers' comments. The technique proposed in the paper is very interesting, and although the method itself is not particularly surprising (it's ""just"" chaining two compressors), it's a really nice way of framing and studying the problem. On the other hand, the experiments _are_ relatively weak, and I think there is significant potential for improvement here (especially with an added 9th page of text). I encourage the authors to add some more convincing experiments in future versions of the paper.",ICLR2021, +0328uKKEs6H,1642700000000.0,1642700000000.0,1,1iDVz-khM4P,1iDVz-khM4P,Paper Decision,Reject,"This work provides an empirical investigation on the adversarial attacking problem in deep neural networks. While it contains some interesting ideas, the work is still in the preliminary stage, lacking substantial support for the main points. Many of the ideas discussed in the paper have been explored in the past and hence more discussions on previous works would be needed. We encourage the authors to keep improving the work for future submission.",ICLR2022, +eMwsfE6TgO2,1642700000000.0,1642700000000.0,1,eBS-3YiaIL-,eBS-3YiaIL-,Paper Decision,Accept (Spotlight),"This contribution investigates and takes a step back on an important problem in recent ML, namely the impact of the noise distribution in density estimation using Noise Contrastive Estimation. The work offers both theoretical insights and convincing experiments. + +For these reasons, this work should be endorsed for publication at ICLR 2022.",ICLR2022, +r82ANkYXJkT,1642700000000.0,1642700000000.0,1,5QhUE1qiVC6,5QhUE1qiVC6,Paper Decision,Accept (Poster),The papers makes progress on the important question of implicit bias in gradient based neural learning. Remarkably they derive reasonable conditions for global optimality.,ICLR2022, +2KZuNQ3AhPA,1610040000000.0,1610470000000.0,1,Tio_oO2ga3u,Tio_oO2ga3u,Final Decision,Reject,"The paper presents a DKL variant with a linear kernel. Representations from several networks is combined through concatenation, making it not quite an ensemble. It's shown that the model is a universal kernel approximator. Experiments are conducted on a large number of UCI datasets. + +Following the discussions, the paper still has the following shortcomings: +- some lack of clarity in the presentation (for instance, explaining the equivalence between a multi-output learner and M different single-output learners) +- lack of experiments on data where deep learning is typically used (images); the UCI datasets have structured data and other ensembles like XGBoost may outperform the baselines presented in this paper +- difference in performance between DKL and DEKL, especially since DKL benefits from a larger model space, theoretically. maybe DEKL has better sample complexity, but does this advantage hold in the case of the large datasets that deep learning is used for?",ICLR2021, +HyKhjfI_g,1486400000000.0,1486400000000.0,1,BJxhLAuxg,BJxhLAuxg,ICLR committee final decision,Reject,"The authors have combined two known areas of research - frame prediction and reward prediction - and combined them in a feedforward network trained on sequences from Atari games. The fact that this should train well is unsurprising for this domain, and the research yields no other interesting results. Pros - the paper is clearly written and the experiments are sound. Cons - there is very little novelty or contribution.",ICLR2017, +AA7icvy-ZA,1576800000000.0,1576800000000.0,1,B1esygHFwS,B1esygHFwS,Paper Decision,Reject,"The paper proposes ATR-CSPD, which learns a low-dimensional representation of seasonal pattern, for detecting changes with clustering-based approaches. + +While ATR-CSPD is simple and intuitive, it lacks novel contribution in methodology. It is unclear how it is different from existing approaches. The evaluation and the writing could be improved significantly. + +In short, the paper is not ready for publication. We hope the reviews can help improve the paper for a strong submission in the future. ",ICLR2020, +xFxQ-uEEu9pA,1642700000000.0,1642700000000.0,1,mfwdY3U_9ea,mfwdY3U_9ea,Paper Decision,Accept (Poster),"This paper introduces a novel approach for out of distribution detection that generates scores from a trained DNN model by using the Fisher-Rao distance between the feature distributions of a given input sample at the logit layer and the lower layers of the model and the corresponding mean feature distributions over the training data. + +The use of Fisher-Rao distance is novel in the context of OOD, and the empirical evaluations are extensive. The main concerns of the reviewers were the limitations of the Gaussianity assumption used in computing the Fisher-Rao distance and the use of the sum of the Fisher-Rao distances to the class-conditional distributions of the target classes rather than the minimum distance. These concerns were addressed satisfactorily in a revision. In terms of technical novelty, experimental evaluation and novelty, the paper is above the bar of acceptance.",ICLR2022, +BJxp8-3RyN,1544630000000.0,1545350000000.0,1,r1GB5jA5tm,r1GB5jA5tm,Limited experimental results,Reject,"The paper proposes adversarial sampling for pool-based active learning. + +The reviewers and AC note the critical potential weaknesses on experimental results: it is far from being surprising the proposed method is better than random sampling. Ideally, one has to reduce the complexity under keeping the state-of-art performance. Otherwise, it is hard to claim the proposed method is fundamentally better than prior ones, although their targets might be different. + +AC thinks the proposed method has potential and is interesting, but decided that the authors need more works to publish.",ICLR2019,4: The area chair is confident but not absolutely certain +Iur9nqT9Nl,1576800000000.0,1576800000000.0,1,Byg5KyHYwr,Byg5KyHYwr,Paper Decision,Reject,"This paper addresses the problem of exploration in challenging RL environments using self-imitation learning. The idea behind the proposed approach is for the agent to imitate a diverse set of its own past trajectories. To achieve this, the authors introduce a policy conditioned on trajectories. The proposed approach is evaluated on various domains including Atari Montezuma's Revenge and MuJoCo. + +Given that the evaluation is purely empirical, the major concern is in the design of experiments. The amount of stochasticity induced by the random initial state alone does not lead to convincing results regarding the performance of the proposed approach compared with baselines (e.g. Go-Explore). With such simple stochasticity, it is not clear why one could not use a model to recover from it and then rely on an existing technique like Go-Explore. Although this paper tackles an important problem (hard-exploration RL tasks), all reviewers agreed that this limitation is crucial and I therefore recommend to reject this paper.",ICLR2020, +WTuw-NPOV,1576800000000.0,1576800000000.0,1,HJenn6VFvB,HJenn6VFvB,Paper Decision,Accept (Spotlight),"The paper introduces a novel way of learning Hamiltonian dynamics with a generative network. The Hamiltonian generative network (HGN) learns the dynamics directly from data by embedding observations in a latent space, which is then transformed into a phase space describing the system's initial (abstract) position and momentum. Using a second network, the Hamiltonian network, the position and momentum are reduced to a scalar, interpreted as the Hamiltonian of the system, which can then be used to do rollouts in the phase space using techniques known from, e.g., Hamiltonian Monte Carlo sampling. An important ingredient of the paper is the fact that no access to the derivatives of the Hamiltonian is needed. + +The reviewers agree that this paper is a good contribution, and I recommend acceptance.",ICLR2020, +dQD823QA3w,1576800000000.0,1576800000000.0,1,B1ldb6NKDr,B1ldb6NKDr,Paper Decision,Reject,"This paper presents an approach combining multi-agent with hierarchical RL in a custom-made simulated humanoid robotics setting. + +Although it is an interesting premise and has a compelling motivation (multi-agent, real-world interaction, humanoid robotics), the reviewers had some trouble pinpointing what the significant contributions are. Partly this is due to lack of clarity in the presentation, such as with overlong sections (eg 5.2), unclear descriptions, mistakes in the text, etc. Reviewers also remarked that this paper might be trying to do too much, without performing the necessary experiments/comparisons and analyses needed to interpret the contributions of each component. + +This work is definitely promising and has the potential to make a nice contribution, given some additional care (experiments, analyses) and rewriting/polishing. As it is, it’s probably a bit premature for publication at ICLR. +",ICLR2020, +3AyuiF66XP,1610040000000.0,1610470000000.0,1,H8VDvtm1ij8,H8VDvtm1ij8,Final Decision,Reject,"The paper proposes to recalibrate predictive models by fitting a +normalizing flow on top of the predictive model on a held out validation +set using side information. At a high level this idea has some potential, +especially in the multivariate setting, but there are several directions for +improvement: + +- Comparison with a broader set of baselines as suggested by the reviewers + + +- Clarity on why recalibrate with a normalizing flow especially in the 1-d case + + +- Why not any other model with explicit density? Are there other important desiderata? + + +- A motivating experiment that makes the potential value clear",ICLR2021, +NbccYdVLN0a,1610040000000.0,1610470000000.0,1,DGIXvEAJVd,DGIXvEAJVd,Final Decision,Reject,"I thank the authors for their submission and very active participation in the author response period. World state tracking is an important problem that encompasses existing problems like coreference resolution. I agree with R2 and R3 that proposing a novel environment in which we can investigate to what extend Transformers can tackle world state tracking should be interesting to the community. The majority of the reviewers agree that this paper presents an interesting benchmark [R2,R3,R4] with good thorough experimental work [R1,R2,R4]. However, R1 is confused about the positioning of the work and R4 finds the work narrow. R2, despite positive review, agrees with this assessment. I agree with this assessment as well and, after discussion with the program chairs, came to the decision that this paper is not ready for publication in its current state. I strongly encourage the authors to incorporate R1's and R4's feedback, in particular with respect to positioning this environment in comparison to TextWorld, and resubmit to the next venue.",ICLR2021, +yDlq6P7F6my,1610040000000.0,1610470000000.0,1,9_J4DrgC_db,9_J4DrgC_db,Final Decision,Reject,"Unfortunately some of the reviewers' reactions to the author feedback won't be visible to the authors. +The reviewers highly appreciated the replies and revision of the paper + +Pros: +- The paper renders Generalized Exploration tractable for deep RL. +- The idea is applicable to many DRL methods and is potentially very valuable to deal with the headaches associated to DRL. + +Cons: +- R2 and R4 are still concerned about whether 'smart' exploration will always be advantageous, and whether the added complexity is a good trade-off for the (potentially) better performance. A comparison to 'pure' exploration would still be insightful. +- the new 'SAC with Deep Coherent Exploration' only partially addresses the concerns of R2 and R4, especially in terms of performance + +While the paper has improved drastically during the reviewing process, there are still a few too many doubts.",ICLR2021, +QTVEAg6jT8,1642700000000.0,1642700000000.0,1,jWaLuyg6OEw,jWaLuyg6OEw,Paper Decision,Reject,"This paper starts from the observation that a certain class of rescaled gradient flows - referred to in the paper as RGF and SGF - converge to a solution in finite time (Wibisono et al., 2016; Romero and Benosman, 2020). As a result, it is plausible to ask whether the Euler discretizations of these flows - viewed now as optimization algorithms - enjoy superior convergence properties or not. The authors' main results establish a linear convergence rate under a certain gradient dominance condition, as well as linear convergence to an $\epsilon$-neighborhood of a solution if the algorithms are run with minibatch gradients of size $O(1/\epsilon^\rho)$ for some positive exponent $\rho>0$. + +The reviewers raised several concerns regarding the motivation of the authors' work and the comparison of the rates they obtain to other related papers in the literature. The reviewers that raised these concerns were not convinced by the authors' rebuttal and maintained their original assessment during the discussion phase. + +From my own reading of the paper, I was perplexed by the fact that the authors did not compare the rates they obtained to existing results in the context of KL optimization, such as the cited paper by Attouch and Bolte and many follow-up works in the area. Also, in the stochastic part, while the authors argue that ""utilizing batches with size dependent on $1/\epsilon$ is absolutely reasonable and usual, in both theory and practice"", it should be noted that a high accuracy requirement (small $\epsilon$) could lead to completely unreasonable batch sizes (effectively exceeding the size of the dataset, especially when $\psi$ is small). Thus, while it is possible to achieve convergence to arbitrarily high accuracy with a sufficiently small step-size for a _fixed_ batch size, the rate of this convergence cannot be linear overall - in contrast to the way that the authors frame their result. + +In view of the above, I concur that the paper does not clear the bar for ICLR, so I am recommending rejection at this stage (but I would encourage the authors to resubmit a suitably revised version of their paper at the next opportunity).",ICLR2022, +H1DHUJ6Bf,1517250000000.0,1517260000000.0,775,ryOG3fWCW,ryOG3fWCW,ICLR 2018 Conference Acceptance Decision,Reject,"This paper does not meet the bar for ICLR - neither in terms of the quality of the write-up, nor in experimental design. The two confident reviewers agree to reject the paper, the weak accept comes from a less confident reviewer who did not write a good review at all. The rebuttal does not change this assessment.",ICLR2018, +NMIRXyxEzA,1610040000000.0,1610470000000.0,1,NcFEZOi-rLa,NcFEZOi-rLa,Final Decision,Accept (Poster),"This paper proposes to do a fine-grained analysis of how shape and texture play a role in the decisions made by CNNs. Lots of recent evidence suggests that CNNs exhibit a texture bias, and there has been considerable effort in understanding where this comes from and how to overcome it. The paper focuses in particular on understanding: (a) what fraction of the neurons are devoted to shape-vs-texture (roughly speaking), and (b) per-pixel results using a convolutional readout function. The reviewers were divided at the time of submission and remained so at the end of discussion. At the end of discussion, the reviewers were split, with scores ranging from 4 (R1,R4), 7 (R2), and 8 (R3). The AC wants to thank and acknowledge the authors as well as all of the reviewers for their engagement in the discussion. + +- R2 and R3 are largely positive, driven by the extent of the experiments and the number of interesting results (e.g., how the fraction of the dimensions used for shape changes as a function of depth in the network). Both had smaller non-critical concerns that were addressed (as far as the AC can tell) in the discussion. +- R4’s most important concern, in the AC's view, is the question: could these results / different conclusions have been obtained via linear probe methods like Hermann et al.? The authors argue that analyzing the fraction of neurons used and at a per-pixel is more fine-grained than linear probes. This boils down to an intangible question of contribution, on which the AC is inclined to agree with the authors and R2 and R3: analyzing the dimensions contribute provides, at least to the AC, a complementary view to the linear probe and that this will be of interest to the ICLR community (although see final comment). R4 also had a number of smaller concerns that seem to be largely addressed (e.g., about correctness). +- R1 argues primarily that the paper does not have a clear point or methodological contribution, for instance pointing out that readout modules were used in Hermann et al. or (as an example) arguing the readout function design is too simple. The AC is inclined to agree with the authors’ response that the other reviewers seem to largely agree on the contribution (especially contributions via experiments rather than method) but disagree on how to weigh these contributions. The AC would also add that readout modules are a core idea for understanding neural representations that long predate Hermann and are by design (as the authors note) almost always as simple as possible. + +At the end of the day, the AC is agrees with R2 and R3 for the contribution of the work and is inclined to accept. Given the other reviews, the AC does not agree with R1’s arguments, but would suggest that the authors think about how to sharpen their claims further. The AC is sympathetic to the concerns of R4, and urges the authors to think about a more concise and clean argument for R4’s concerns --- many other readers will have similar concerns and as clean of an illustration will be helpful. Overall, the AC believes that the paper’s methods, experiments and analysis are of interest and value and is thus in favor of acceptance.",ICLR2021, +Ac4AzCPEUCp,1642700000000.0,1642700000000.0,1,U8pbd00cCWB,U8pbd00cCWB,Paper Decision,Accept (Poster),"The paper presents a new way to train the prediction of implicit 3D scene representations from a single view. The main innovations are a novel numerically stable and memory efficient formulation of the derivatives of a loss function based on the spatial gradients of the implicit field, and focusing the training on regions near the surfaces of objects. The method leads to good performance, especially when training on imperfect ground truth scan data. + +Concerns were raised about the novelty of the approach and its significance. These were adequately addressed in the author response and revisions. The experiments were found to be well described and executed, which increases the confidence in the approach and its potential impact. I recommend acceptance.",ICLR2022, +gBXnx7jyjU6,1610040000000.0,1610470000000.0,1,xnC8YwKUE3k,xnC8YwKUE3k,Final Decision,Accept (Poster),"The paper addresses a pressing problem for applications involving clinical time series and introduce a pipeline that handle many of the issues pertaining to data preprocessing. + +An important contribution is the software that makes the processing more seamless, which will, without a doubt, be useful to the community given the need for reproducibility. + +The authors have responded suitably to reviewer comments with the main 'leftover criticism' being that such a paper may not be the best fit for ICLR. This isn't a typical paper. However, something that introduces this level of automation and flexibility in handling time series has not been presented at this conference (or other ML conferences) to the best of my knowledge. It seems it could work in conjunction (as opposed to competing) with any new time series models/techniques that may be introduced.",ICLR2021, +SkxLwipIl4,1545160000000.0,1545350000000.0,1,BJej72AqF7,BJej72AqF7,Reasonable strong theory paper,Accept (Poster),"While the reformulation of RNNs is not practical as it is missing sigmoids and tanhs that are common in LSTMs it does provide an interesting analysis of traditional RNNs and a technique that's novel for many in the ICLR community. +",ICLR2019,4: The area chair is confident but not absolutely certain +TdX0BPMmUw,1576800000000.0,1576800000000.0,1,rJxRJeStvB,rJxRJeStvB,Paper Decision,Reject,"Unfortunately, the reviewers of the paper are all not certain about their review, none of them being RL experts. Assessing the paper myself—not being an RL expert but having experience—the authors have addressed all points of the reviewers thoroughly. +",ICLR2020, +LVaHLGCoqeL,1642700000000.0,1642700000000.0,1,2M0WXSP6Qi,2M0WXSP6Qi,Paper Decision,Reject,"This paper presents a method for conditional generations for GANs. +The reviewers note the lack of novelty, or the lack of a theoretical or empirical motivation for the novel bits. They point out flaws in the correctness of the paper, and limited experimental evaluation. +The reviewers agree to reject the paper. Unfortunately the authors did not answer the reviewers. I therefore recommend to reject the paper for this conference, and I strongly suggest that the authors address the reviewers concerns if they are to submit this paper again in a future venue.",ICLR2022, +B19wQkTHG,1517250000000.0,1517260000000.0,161,SJcKhk-Ab,SJcKhk-Ab,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"All the reviews like the theoretical result presented in the paper which relates the gating mechanism of LSTMS (and GRUs) to time invariance / warping. The theoretical result is great and is used to propose a heuristic for setting biases when time invariance scales are known. The experiments are not mind-boggling, but none of the reviewers seem to think that's a show stopper. ",ICLR2018, +Skn2G16Hf,1517250000000.0,1517260000000.0,22,rkRwGg-0Z,rkRwGg-0Z,ICLR 2018 Conference Acceptance Decision,Accept (Oral),"Very solid paper exploring an interpretation of LSTMs. +good reviewss",ICLR2018, +ryekUrS-eE,1544800000000.0,1545350000000.0,1,rJzoujRct7,rJzoujRct7,"Not clear enough, lacking details",Reject,"The paper presents a CNN that is trained from human games to predict which actions to take for China Competitive Poker (Dou dizhu). + +The paper is poorly written, not because of the English, but because it is hard to understand the details of the proposed solution: it is not straight-forward to reimplement a solution from the presentation in the paper. It lacks explanations for several design decisions. This is unfortunate, as the authors point out in the rebuttal that they actually did way more experiments that are presented in the paper. Moreover, the experimental results lack comparisons to baselines, ablations, so that the proposed solution could be evaluated fairly. + +In its current state, this paper can not be accepted for presentation at ICLR 2019.",ICLR2019,4: The area chair is confident but not absolutely certain +oEQoNvJszA,1642700000000.0,1642700000000.0,1,-9uy3c7b_ks,-9uy3c7b_ks,Paper Decision,Reject,"This paper introduces an objective for representation learning that captures ""controllable elements"" in the environment (i.e., things that are affected by the agent's actions). In their reviews and discussion, the reviewers agreed this idea was intuitive, well-motivated, and the paper well written. However, multiple reviewers raised concerns about the evaluation and the extent to which LCER is truly an improvement over PI-SAC. Although many of the reviewer's concerns were addressed in the rebuttal period, at the end of the discussion they were still unconvinced or confused about how much LCER really helps over PI-SAC. Based on this, my assessment is that this paper is a promising piece of work, and that with some more controlled comparisons (see suggestion below) it would be a useful contribution to the literature. However, given that the claims are not fully supported as it currently stands, I recommend rejection. + +Specific suggestion to improve the paper: based on reading the paper and the discussion, it seems to me (as per the authors' own statement in response to Reviewer uWv6) that the most valid/controlled comparison between LCER and PI-SAC is in Figure 4, where LCER w/ $\beta=0.1$ ""can be seen as a variant of PI-SAC with the same embedding choices as LCER"" (author's words). However, when taking into account the error bars of the training curves, other values of $\beta$ are only clearly better than $\beta=0.1$ in 1/3 environments (D.walker-walk). This does not make for a particularly convincing result that LCER is better than PI-SAC. To improve the paper, I'd encourage the authors to run further well-controlled comparisons such as this in a larger number of environments. If they can show via such controlled comparisons that LCER is generally better than PI-SAC (i.e. LCER w/ $\beta=0.1$) then that would be a much more compelling demonstration of LCER's superiority.",ICLR2022, +H1eNXlxJl4,1544650000000.0,1545350000000.0,1,rkzDIiA5YQ,rkzDIiA5YQ,Good paper,Accept (Poster),The reviewers that provided extensive reviews agree that the paper is well-written and contains solid technical material. The paper however should be edited to address specific concerns regarding theoretical and empirical aspects of this work. ,ICLR2019,3: The area chair is somewhat confident +SylLlFSke4,1544670000000.0,1545350000000.0,1,HkxLXnAcFQ,HkxLXnAcFQ,An intriguing experimental paper on the current state of few-shot learning.,Accept (Poster),"This paper provides a number of interesting experiments for few-shot learning using the CUB and miniImagenet datasets. One of the especially intriguing experiments is the analysis of backbone depth in the architecture, as it relates to few-shot performance. The strong performance of the baseline and baseline++ are quite surprising. Overall the reviewers agree that this paper raises a number of questions about current few-shot learning approaches, especially how they relate to architecture and dataset characteristics. + +A few minor comments: +- In table 1, matching nets are mistakenly attributed to Ravi and Larochelle. Should be Vinyals et al. +- The notation for cosine similarity in section 3.2 is odd. It looks like you’re computing some cosine function of two vectors which doesn’t make sense. Please clarify this. +- There are a few results that were promised after the revision deadline, please be sure to include these in the final draft. +",ICLR2019,4: The area chair is confident but not absolutely certain +qVGN3td7V3p,1610040000000.0,1610470000000.0,1,HajQFbx_yB,HajQFbx_yB,Final Decision,Accept (Oral),"This paper proposes a technique of decomposing the nonsymmetric kernel of determinantal point processes, which enables inference and learning in time and space linear with respect to the size of the ground set. This substantially improves upon existing work. The proposed method is well supported both with theory and experiments. All of the reviewers find that the contributions are significant, and no major flaws are identified through reviews and discussion. The determinantal point process might not be one of the most popular topics in the ICLR community today but certainly is relevant.",ICLR2021, +A5rGpt4GKDI,1610280000000.0,1610470000000.0,1,le9LIliDOG,le9LIliDOG,Final Decision,Reject,"This work proposes an efficient method for modelling long-range connections in point-cloud data. Reviewers found the paper to be generally well-written. On the less positive side, reviewers felt that the novelty of the work was marginal, and that the experimentation, limited to synthetic data in one domain, was too limited. These concerns remain after the discussion phase. In addition, the authors stated during the discussion that ""Our goal is indeed to develop an efficient strategy to model LRIs in real chemical and materials systems”, which conflicts with the presentation of the work as motivated by more general point cloud modelling problems. Given these weaknesses, the final decision was to reject.",ICLR2021, +#NAME?,1642700000000.0,1642700000000.0,1,7vcKot39bsv,7vcKot39bsv,Paper Decision,Reject,"The paper is aimed at providing an explaining the perceived lack of generalization results for Adam as compared to SGD. To this end the paper decouples the effect of adaptive per parameter learning rate and the momentum aspect of Adam. The paper shows that the while adaptive rates help escape saddle points faster - they are worse when consider the flatness of minima being selected. Further momentum has no effect on the flatness of minima but again leads to better optimization by providing a drift leading to saddle point evasion. They also provide a new algorithm Adai (based on inertia) targeted at better generalization of adaptive methods. + +The paper definitely provides an interesting perspective and the approach to decouple the effect of momentum and adaptive LR and study their efficacy in escaping saddle points and flatness of minima seems a very useful perspective. The primary reason for my recommendation is the presentation of the paper in terms of the rigor its assumptions to establish the results. These aspects have been highlighted by the reviewers in detail. I suggest the authors to carefully revisit the paper and improve the presentation of the assumptions, adding rigor to the presentation as well as adding justifications where appropriate especially in light of non-standardness of these assumptions in optimization literature.",ICLR2022, +mQj1Ir792m,1576800000000.0,1576800000000.0,1,Syl-xpNtwS,Syl-xpNtwS,Paper Decision,Reject,"The authors propose to use the information bottleneck to learn state representations for RL. They optimize the IB objective using Stein variational gradient descent and combine it with A2C and PPO. On a handful of Atari games, they show improved performance. + +The reviewers primary concerns were: +*Limited evaluation. The method was only evaluated on a handful of the Atari games and no comparison to other representation learning methods was done. +*Using a simple Gaussian embedding function would eliminate the need for amortized SVGD. The authors should compare to that alternative to demonstrate the necessity of their approach. + +The ideas explored in the paper are interesting, but given the issues with evaluation, the paper is not ready for acceptance at this time.",ICLR2020, +rJAhifIul,1486400000000.0,1486400000000.0,1,H1oyRlYgg,H1oyRlYgg,ICLR committee final decision,Accept (Oral),"All reviews (including the public one) were extremely positive, and this sheds light on a universal engineering issue that arises in fitting non-convex models. I think the community will benefit a lot from the insights here.",ICLR2017, +DZhgUQ0Qdwg,1642700000000.0,1642700000000.0,1,A05I5IvrdL-,A05I5IvrdL-,Paper Decision,Accept (Poster),The paper studies the problem of finding an optimal memory less policy for POMDPs. This work makes an important theoretical contribution. The reviewers are unanimous in recommending the acceptance of the paper. Well done!,ICLR2022, +XPid9nX-Rwi,1610040000000.0,1610470000000.0,1,jMPcEkJpdD,jMPcEkJpdD,Final Decision,Accept (Poster),"Reviewers liked the self-supervised learning of compressed videos, noting that it is an ""exciting topic"" and an ""important problem"", although they found the proposed methods (PMSP andCTP) less exciting. Reviewers were satisfied with the execution and the extensive experimental studies. AC felt the community may benefit from the paper's intuitive integration of self-supervised learning and the compressed video's signals (I and P frames, residuals, motion vectors, etc). ",ICLR2021, +S1gdQ_JblN,1544780000000.0,1545350000000.0,1,HklY120cYm,HklY120cYm,Well written paper with detailed experiments,Accept (Poster),"The authors discuss an improved distillation scheme for parallel WaveNet using a Gaussian inverse autoregressive flow, which can be computed in closed-form, thus simplifying training. The work received favorable comments from the reviewers, along with a number of suggestions for improvement which have improved the draft considerably. The AC agrees with the reviewers that the work is a valuable contribution, particularly in the context of end-to-end neural text-to-speech systems. ",ICLR2019,4: The area chair is confident but not absolutely certain +zh07CJW-5JE,1642700000000.0,1642700000000.0,1,a0SRWViFYW,a0SRWViFYW,Paper Decision,Reject,"The submission considers a stochastic variant of the projective splitting algorithm, with a focus on monotone inclusion problems, and it proposes a novel separable algorithm with the ability to handle multiple constraints and non-smooth regularizers. All reviewers felt that there were merits to the submission and that the submission was borderline. Public and non-public discussion concluded that the paper would be of greater value to the community if the suggestions of the reviewers and related issues were addressed.",ICLR2022, +SJgUztnNkV,1543980000000.0,1545350000000.0,1,HyMuaiAqY7,HyMuaiAqY7,not ready for publication at ICLR,Reject,"This paper combines two recently proposed ideas for GAN training: Fisher integral probability metrics, and the Deli-GAN. As the reviewers have pointed out, the writing is somewhat haphazard, and it's hard to identify the key contributions, why the proposed method is expected to help, and so on. The experiments are rather minimal: a single experiment comparing Inception scores to previous models on CIFAR; Inception scores are not a great measure, and the experiments don't yield much insight into where the improvement comes from. No author response was given. I don't think this paper is ready for publication in ICLR. +",ICLR2019,5: The area chair is absolutely certain +bRNa35-thPs,1610040000000.0,1610470000000.0,1,lE1AB4stmX,lE1AB4stmX,Final Decision,Reject,"The authors extends the transformer to multivariate time series. The proposed extension is simple, and lacks novelty. Some design decisions of the proposed method should be better justified. Similar works that also use the transformer for timeseries are not compared. + +Experimental results are not convincing. The settings are unclear, and the selection of datasets needs more justifications. Some important experiments are missing. + +Finally, writing can also be improved.",ICLR2021, +iRPHM4q4lCI,1642700000000.0,1642700000000.0,1,vyn49BUAkoD,vyn49BUAkoD,Paper Decision,Reject,"This paper studied Bayesian active regression with Gaussian processes, and proposed two intuitive algorithms inspired by the classical disagreement-based and uncertainty sampling criteria. The reviewers appreciate the motivation and overall idea of taking a fully Bayesian approach by utilizing the joint posterior of the hyperparameters for active learning. However, there are shared concerns among the reviewers in the clarify and consistency of several key technical components, including discussion around bias-variance tradeoff and its connection to the fully Bayesian approach, as well as in the experimental details, which make the current package insufficient for publication. + +Reviewers provide very useful feedback (in particular with a very extensive review by Reviewer hDWW) for improving the current work. The authors acknowledge in their responses that these are valid concerns and they would address these issues in a further version of this work.",ICLR2022, +SJ9WXkaHM,1517250000000.0,1517260000000.0,78,rJvJXZb0W,rJvJXZb0W,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Though the approach is not terribly novel, it is quite effective (as confirmed on a wide range of evaluation tasks). The approach is simple and likely to be useful in applications. The paper is well written. + ++ simple and efficient ++ high quality evaluation ++ strong results +- novelty is somewhat limited +",ICLR2018, +j7jSeveOse3,1610040000000.0,1610470000000.0,1,MBdafA3G9k,MBdafA3G9k,Final Decision,Reject,"The paper considers the problem of learning to imitate behaviors from visual demonstrations, without access to expert actions. Consistent with recent approaches, the proposed method uses a neural network to measure the similarity between visual demonstrations and the agent's behavior, and employs this metric as a reward in RL. The primary contribution is the use of a recurrent siamese network that is trained to measure the distance between motions, as a means of better dealing with the challenges of imitation learning from a small number of (as few as one) noisy visual demonstrations. Experiments on a variety of simulated domains show that the proposed approach achieves reasonable results. + +The paper was reviewed by four knowledgeable referees, who read the author responses and engaged in extensive discussion. The reviewers agree that learning to imitate behaviors from a small amount of noisy demonstrations is a challenging and important problem that is of significant interest. The proposed method nicely extends existing approaches to visual imitation learning, and the results reveal that the method performs well in a variety of continuous control domains. The reviewers raise several concerns regarding the clarity of the technical presentation and the sufficiency of the experimental evaluation. The authors have made a significant effort to address these concerns in their responses and updates to the paper, which the reviewers very much appreciate. However, some of the reviewers' primary concerns regarding clarity and the thoroughness of the experimental evaluation remain. This work has the potential to make a really nice contribution and the authors are encouraged to take this feedback into account for any future version of the manuscript.",ICLR2021, +HJgjx5y-xE,1544780000000.0,1545350000000.0,1,HyxAfnA5tm,HyxAfnA5tm,"Solid contribution, relevant to some interesting real world settings ",Accept (Poster),"The reviewers appreciated this contribution, particularly its ability to tackle nonstationary domains which are common in real-world tasks. + +",ICLR2019,4: The area chair is confident but not absolutely certain +Ys6JJCpmw8W,1642700000000.0,1642700000000.0,1,LLHwQh9zEb,LLHwQh9zEb,Paper Decision,Reject,"While the reviewers appreciated the method's ability to replace transformer models and SMILES data augmentation their main concerns were with (a) the experimental section, and (b) the technical innovation over prior work, which updated drafts of the paper did not fully resolve. Specifically for (a) this work performs very similarly to prior work: for reaction outcome prediction the proposed method improves top-1/3/5 for USPTO_STEREO_mixed but is outperformed by prior work for top-1/5/10 for USPTO_460k_mixed; for retrosynthesis the model is outperformed for USPTO_full and only outperforms prior work that does not use templates/atom-mapping/augmentation for top-1 on USPTO_50k. The authors argue that their method should be preferred because their method does not require templates, atom-mapping, and data augmentation. The reviewers agree that template-free and atom-mapping-free methods are more widely applicable. However, the benefits of being augmentation-free is not convincingly stated by the authors who only state that their approach is beneficial by ""simplifying data preprocessing and potentially saving training time."" The authors should have empirically verified these claim by reporting training time, because it is not obvious that their model which requires pairwise shortest path lengths is actually faster to train. +For (b) the reviewers believed that the paper lacked technical novelty given recent work (e.g., NERF). The authors should more clearly distinguish this work from past work (e.g., graphical depictions and finer past work categorization may help with this). +Given the similar performance to prior work, the lack of evidence to support training time claims, and the limited technical novelty, I believe this work should be rejected at this time. Once these things are clarified this paper will be improved.",ICLR2022, +HJxJD6AgeV,1544770000000.0,1545350000000.0,1,SJ4Z72Rctm,SJ4Z72Rctm, Multiple reviewers had concerns about the clarity of the presentation and the significance of the results.,Reject,"Multiple reviewers had concerns about the clarity of the presentation and the significance of the results. +",ICLR2019,3: The area chair is somewhat confident +E5nburImJZS,1642700000000.0,1642700000000.0,1,QEBHPRodWYE,QEBHPRodWYE,Paper Decision,Reject,"All reviewers agree that this is a reasonable contribution but that it is also extremely limited in scope. The authors suggest in one of their response that their technique could apply to ""any data mixing method with “batched k-sum” structure"". Such a larger level of generality might make the paper more interesting, but at the moment it is an extremely niche result.",ICLR2022, +4Nct-2qDo7Z,1610040000000.0,1610470000000.0,1,hvdKKV2yt7T,hvdKKV2yt7T,Final Decision,Accept (Spotlight),"This paper proposed to defend against model stealing attacks by dataset inference. The paper received unanimous rating of ""Good paper"" and ""accept"". The reviewers praise this paper insightful and well written. There are active discussion between the reviewers and authors, which further clarify some of the issues. Given the positive review and overall rating, the AC recommends it to be an spotlight paper.",ICLR2021, +HkFxXJaHf,1517250000000.0,1517260000000.0,63,rywDjg-RW,rywDjg-RW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The pros and cons of this paper cited by the reviewers can be summarized below: + +Pros: +* The method proposed here is highly technically sophisticated and appropriate for the problem of program synthesis from examples +* The results are convincing, demonstrating that the proposed method is able to greatly speed up search in an existing synthesis system + +Cons: +* The contribution in terms of machine learning or representation learning is minimal (mainly adding an LSTM to an existing system) +* The overall system itself is quite complicated, which might raise the barrier of entry to other researchers who might want to follow the work, limiting impact + +In our decision, the fact that the paper significantly moves forward the state of the art in this area outweighs the concerns about lack of machine learning contribution or barrier of entry.",ICLR2018, +BJNkIJaBG,1517250000000.0,1517260000000.0,694,B1p461b0W,B1p461b0W,ICLR 2018 Conference Acceptance Decision,Reject,"The paper studies the robustness of deep learning against label noise on MNIST, CIFAR-10 and ImageNet. But the generalization of the claim ""deep learning is robust to massive label noise"" is still questionable due to the limited noise types investigated. +The paper presents some tricks to improve learning with high label noise (batch size and learning rate), which is not novel enough. +",ICLR2018, +#NAME?,1576800000000.0,1576800000000.0,1,Hyl9xxHYPr,Hyl9xxHYPr,Paper Decision,Accept (Poster),"This paper proposes a novel method for class-supervised disentangled representation learning. The method augments an autoencoder with asymmetric noise regularisation and is able to disentangled content (class) and style information from each other. The reviewers agree that the method achieves impressive empirical results and significantly outperforms the baselines. Furthermore, the authors were able to alleviate some of the initial concerns raised by the reviewers during the discussion stage by providing further experimental results and modifying the paper text. By the end of the discussion period some of the reviewers raised their scores and everyone agreed that the paper should be accepted. Hence, I am happy to recommend acceptance.",ICLR2020, +SkOhmJaBf,1517250000000.0,1517260000000.0,222,H196sainb,H196sainb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"There is significant discussion on this paper and high variance between reviewers: one reviewer gave the paper a low score. However the committee feels that this paper should be accepted at the conference since it provides a better framework for reproducibility, performs more large scale experiments than prior work. One small issue the lack of comparison in terms of empirical results between this work and Zhang et al's work, but the responses provided to both the reviewers and anonymous commenters seem to be satisfactory.",ICLR2018, +H1g_1AiaJV,1544560000000.0,1545350000000.0,1,rkeSiiA5Fm,rkeSiiA5Fm,Solid technical contributions and valuable insights for spherical convolutions,Accept (Poster),"Strengths: +Well written paper on a new kind of spherical convolution for use in spherical CNNs. +Evaluated on rigid and non-rigid 3D shape recognition and retrieval problems. +Paper provides solid strategy for efficient GPU implementation. + +Weaknesses: There was some misunderstanding about the properties of the alt-az convolution detected by one of the reviewers along with some points needing clarifications. However, discussion of these issues appears to have led to a resolution of the issues. + +Contention: The weaknesses above were discussed in some detail, but the procedure was not particularly contentious and the discussion unfolded well. + +All reviewers rate the paper as accept, the paper clearly provides value to the community and therefore should be accepted. +",ICLR2019,4: The area chair is confident but not absolutely certain +TRj1Vyrbe6d,1642700000000.0,1642700000000.0,1,dvl241Sbrda,dvl241Sbrda,Paper Decision,Reject,"The paper proposes to learn embeddings into complex hyperbolic space. This is an extension of the popular hyperbolic-space embeddings which have shown success on graph-like and tree-like data. Reviews and discussion mostly centered around the lack of clear motivation for the work (why complex hyperbolic spaces?) and the lack of a clear advantage over other manifold embedding methods that have varying curvature. The reviewers mentioned many questions and points that they thought the work should cover. There was also concern about the baselines against which the method was compared. There was not a consensus that the paper should be accepted, and no reviewer argued strongly for acceptance, even after the author response. As a result, I recommend that this paper not be accepted at this time. I expect a new version of this paper, incorporating this reviewer feedback and especially improving the explanation of the motivation, will be a good submission for a future conference.",ICLR2022, +YCvCBUAU-A,1610040000000.0,1610470000000.0,1,muppfCkU9H1,muppfCkU9H1,Final Decision,Reject," +This paper has been reviewed by four knowledgeable referees. Two of them slightly leaned towards acceptance, whereas the other two suggested rejection. The main issues raised by the reviewers were (1) limited novelty [R1,R2], (2) missing baselines and ablations [R1,R3], (3) limited insights on the spectral analysis [R2], and (4) missing motivation behind modeling choices [R1,R3]. The rebuttal included a number of experiments requested by the reviewers (e.g. ablation with diffusion only [R1,R3], extended Diffusion GCN [R1], APPNP baseline [R3]), and adequately motivated some of the modeling choices. + +The central question of the reviewers' discussion was whether the contribution of this paper was significant enough or too incremental. The discussion emphasized relevant literature which already considers multi-hop attention (e.g. https://openreview.net/forum?id=rkKvBAiiz [Cucurull et al.], https://ieeexplore.ieee.org/document/8683050 [Feng et al.], https://arxiv.org/abs/2001.07620 [Isufi et al.]), and which should have served as baseline. In particular, the experiment suggested by R3 was in line with some of these previous works, which consider ""a multi-hop adjacency matrix "" as a way to increase the GAT's receptive field. This was as opposed to preserving the 1-hop adjacency matrix used in the original GAT and stacking multiple layers to enlarge the receptive field, which as noted by the authors, may result in over-smoothed node features. The reviewers acknowledged that there is indeed as slight difference between the formulation proposed in the paper and the one in e.g. [Cucurull et al.]. The difference consists in calculating attention and then computing the powers with a decay factor vs. increasing the receptive field first by using powers of the adjacency matrix and then computing attention. Still, the multi-hop GAT baseline of [Cucurull et al.] could be extended to use a multi-hop adjacency matrix computed with the diffusion process from [Klicpera 2019], as suggested by R3. In light of these works and the above-mentioned missing baselines, the reviewers agreed that the contribution may be viewed as rather incremental (combining multi-hop graph attention with graph diffusion). The discussion also highlighted the potential of the presented spectral analysis, which could be strengthened by developing new insights in order to become a stronger contribution (see R2's suggestions). + +To sum up, this was a very discussed paper, where the reviewers ultimately reached a consensus to reject, with no strong opposition. I agree with the reviewers' assessment and therefore must reject. I encourage the authors to follow the reviewers' suggestions and consider the multi-hop baselines as well as the hints provided by the reviewers about the spectral analysis to strengthen their work. +",ICLR2021, +cOn8m4Y4mI,1576800000000.0,1576800000000.0,1,Hye87grYDH,Hye87grYDH,Paper Decision,Reject,"The paper proposes a variant of Sparse Transformer where only top K activations are kept in the softmax. The resulting transformer model is applied to NMT, image caption generation and language modeling, where it outperformed a vanilla Transformer. + +While the proposed idea is simple, easy to implement, and it does not add additional computational or memory cost, the reviewers raised several concerns in the discussion phase, including: several baselines missing from the tables; incomplete experimental details; incorrect/misleading selection of best performing model in tables of results (e.g. In Table 1, the authors boldface their results on En-De (29.4) and De-En (35.6) but in fact, the best performance on these is achieved by competing models, respectively 29.7 and 35.7. The caption claims their model ""achieves the state-of-the-art performances in En-Vi and De-En"" but this is not true for De-En (albeit by 0.1). In Table 3, they boldface their result of 1.05 but the best result is 1.02; the text says their model beats the Transf-XL ""with an advantage"" (of 0.01) but do not point out that the advantage of Adaptive-span over their model is 3 times as large (0.03)). + +This prevents me from recommending acceptance of this paper in its current form. I strongly encourage the authors to address these concerns in a future submission.",ICLR2020, +HJi82zUdg,1486400000000.0,1486400000000.0,1,HkyYqU9lx,HkyYqU9lx,ICLR committee final decision,Reject,"While this area chair disagrees with some reviewers about (1) the narrowness of the approach's applicability and hence lack of relevance to ICLR, and also (2) the fairness of the methodology, it is nonetheless clear that a stronger case needs to be made for novelty and applicability.",ICLR2017, +ZzkI5F0r9vA,1610040000000.0,1610470000000.0,1,NgZKCRKaY3J,NgZKCRKaY3J,Final Decision,Reject,"The paper proposes an adjustment to the ECE metric to make it less biased in the small sample case by including the assumption that the confidence output by a classifier is monotonic with the true correctness probability. The main idea is to successively make finer bins until a non-monotonicity is observed. The paper is interesting, but the magnitude of the contribution would be just enogh for a short paper if such a track existing in ICLR. + +Reviewers have raised concerns about the discrepancy between their revised ECE formula and the Algorithm accompanying it, although that has been fixed through the author feedback phase. Another concern is that for a paper whose core technical contribution is a revised metric for measuring calibration, a more thorogh empirical study over larger datasets is required.",ICLR2021, +Hylaf9VggV,1544730000000.0,1545350000000.0,1,ryxhynC9KX,ryxhynC9KX,More experimental evidence needed,Reject,"The authors provide a convolutional neural network for predicting the satisfiability of SAT instances. The idea is interesting, and the main novelty in the paper is the use of convolutions in the architecture and a procedure to predict a witness when the formula is satisfiable. However, there are concerns about the suitability of convolutions for this problem because of the permutation invariance of SAT. Empirically, the resulting models are accurate (correctly predicting sat/unsat 90-99% of the time) while taking less time than some existing solvers. However, as pointed out by the reviewers, the empirical results are not sufficient to demonstrate the effectiveness of the approach. I want to thank the authors for the great work they did to address the concerns of the reviewers. The paper significantly improved over the reviewing period, and while it is not yet ready for publication, I want to encourage the authors to keep pushing the idea to further and improve the experimental results. +",ICLR2019,4: The area chair is confident but not absolutely certain +9pnD3ESmy,1576800000000.0,1576800000000.0,1,SklgTkBKDr,SklgTkBKDr,Paper Decision,Reject,"This paper presents two new architectures that model latent intermediate utilities and use non-additive utility aggregation to estimate the set utility based on the computed latent utilities. These two extensions are easy to understand and seem like a simple extension to the existing RNN model architectures, so that they can be implemented easily. However, the connection to Choquet integral is not clear and no theory has been provided to make that connection. Hence, it is hard for the reader to understand why the integral is useful here. The reviewers have also raised objection about the evaluation which does not seem to be fair to existing methods. These comments can be incorporated to make the paper more accessible and the results more appreciable. ",ICLR2020, +ffN8i7PkVTld,1642700000000.0,1642700000000.0,1,ek9a0qIafW,ek9a0qIafW,Paper Decision,Accept (Poster),"The paper presents a prompt learning method for few-shot learning in NLP. In particular, they proposed DART, a new soft prompt tuning method, to optimize the label representations and template. + +Overall, the paper is well-written and well-motivated. The proposed approach is interesting. The experiments were well justified and sufficient experimental analyses are provided. All reviewers support the paper. + +There are a few remaining critics of the papers. + +- The major one is the positioning of the paper raised by the reviewers. I agree with the reviewers that it is a bit misleading to emphasize the approach requires no external architecture. Although the approach can reuse the same transformer architecture (rather than additional LSTM) so that it enjoys the beauty of simplicity, it is still required additional parameters. I would suggest better clarifying this point in the final version. + +- There is also a critic that the paper is related to ADAPET. However, the key ideas in this paper are sufficiently different from ADAPET. Also, ADAPET is published at EMNLP 21 after the paper is submitted, although it was in Arxiv earlier. It's fair to say this work is concurrent with ADAPET.",ICLR2022, +H1gcAF3AkV,1544630000000.0,1545350000000.0,1,B1euhoAcKX,B1euhoAcKX,Limited applicability,Reject,"The paper addresses the complexity issue of Determinantal Point Processes via generative deep models. + +The reviewers and AC note the critical limitation of applicability of this paper to variable ground set sizes, whether authors' rebuttal is not convincing enough. + +AC thinks the proposed method has potential and is interesting, but decided that the authors need more works to publish.",ICLR2019,4: The area chair is confident but not absolutely certain +ahD5t0PW0yQ,1642700000000.0,1642700000000.0,1,OZ_2rF2D4Nw,OZ_2rF2D4Nw,Paper Decision,Reject,"This paper proposes that ML models might be better expressed in a way closer to their mathematical representation than to Python code. This is an attractive proposition, but the paper's development of this proposition is that models might best be expressed in LaTeX, which is not a hypothesis that the reviewers consider proven. + +Ultimately, this paper proposes a new language in which to express ML models, and compares that language against one baseline: PyTorch. However, this is far from a reasonable baseline. Even within Python, systems such as JAX (which the paper dismisses as a ""Program translator"") are much closer to the pure functional style; and going further afield, comparisons should be to DEX and Julia, to name but two. + +The reviewers appreciate the approach to autobroadcasting, but again note that this does not require a new language, and again, systems like JAX, DEX, Julia all have approaches to broadcasting which are not compared. + +Reviewer 8MK2 is concerned that we ""still need to write other modules .. in Python"", but the authors rebut this well: Kokoyi is not expected to be applied to an entire program, just to the model components. + +Even if Kokoyi were to be successful, there is a question of its wider applicability. A major strength of PyTorch/JAX is that they are used by a much larger community than just ML paper authors. It is because the authors write in these tools that their work is usable by other practitioners. The paper explicitly says it is targeted not even at ML paper authors, but at a subset of that community. + +The usability analyses are very much lacking. Lines of code is a notoriously coarse tool to assess programming paradigms. I would also caution against trying to do any small-group user study - the best initial study is to release Kokoyi into the wild, get feedback from users, and then if it proves popular, prepare a paper or monograph. This is the path of PyTorch and other frameworks. + +Until then, the paper may be of interest to a workshop very focused on programming models for ML, but is not currently suitable for the wider ICLR audience.",ICLR2022, +BklwhMc7lE,1544950000000.0,1545350000000.0,1,HygcvsAcFX,HygcvsAcFX,"A new optimal margin distribution loss with some good empirical results, yet premature ",Reject,"The paper proposed an optimal margin distribution loss and applied PAC-Bayesian bounds that are from Sanov large deviation inequalities to give generalization error bounds for such a loss. Some interesting empirical results are shown to support the proposed method. + +The majority of reviewers think the paper’s empirical results are encouraging, although still in premature stage. The theoretical analysis is a kind of being standard. After reading the authors’ response and revision, the reviewers do not change much of their opinions and think the paper better undergoes systematic further study on their proposal for big improvement. + +Based on current ratings, the paper is therefore proposed to borderline lean rejection. +",ICLR2019,4: The area chair is confident but not absolutely certain +05ntgCksbhL,1642700000000.0,1642700000000.0,1,WVX0NNVBBkV,WVX0NNVBBkV,Paper Decision,Accept (Poster),"In this work, authors use proxy distributions learned by advanced generative models to improve adversarial robustness. In the discussion period, authors did a good job in addressing reviewers' questions and comments. All reviewers think the paper is above the accept threshold, so do I.",ICLR2022, +ZZ1a4RF7jxK,1610040000000.0,1610470000000.0,1,TJSOfuZEd1B,TJSOfuZEd1B,Final Decision,Reject,"Reviewer #2 has written a nice summary of the paper which I quote below. + +“The core idea is simple - which is a strength in my view - and does not require retraining the base language model, which could be important as language models become more expensive to train. However, the clarity and experiments in this paper fall short: the experimental setup has issues, the effect on perplexity is quite large but relegated to the Appendix, several claims are speculative and lacking corresponding experimental evidence, and it is unclear how the additional heuristics affect performance. + +The method seems promising, but with the current experiments it is difficult to draw conclusions about how the method affects performance and which parts of it are necessary; given that this is an empirical paper, I would therefore not recommend acceptance in its current form.” + +Key Strengths ++ Well-motivated problem of considerable interest ++ A relatively straightforward Bayesian solution ++ Proposed solution is computationally efficient compared to other competing approaches. + +The paper has been thoroughly reviewed by the reviewers and as a result numerous questions has surfaced. While the authors addressed most of the questions adequately, there are still many unanswered questions. They include: +- Readability issues highlighted by Reviewer #1 +- Reviewer #1: “""how did you measure model confidence about the toxicity label"" +- Reviewer #4: The perplexity gets much worse as the gedi training is introduced (i.e. Λ decreases), e.g. going from 25 to 45 on IMDb. This result is in the Appendix, and perplexity is never evaluated/reported in the other experiments. +- Reviewer #4: Crucially, the GEDI training does not appear to help over just re-weighting with the conditional LM ( vs. ). Could the authors comment on this result? How well does domain transfer work for less similar domains? How is perplexity affected for the models reported in Table 2? +- Reviewer #4: How small can the conditional LM be? Why was medium used instead of small? What if large was used? Does the conditional LM need to be a large-scale pretrained model (it would be nice to see a baseline of a simpler conditional LM)? +Several heuristics are used: weighting, nucleus filtering, keeping tokens over a threshold, repetition penalty, and rescaling the logits to positive (used in only one experiment). +- How does each of these affect performance? There are no ablations, and given the small differences in some of the experiments it is unclear whether performance would actually be worse if we changed one of the heuristics. One outcome may be that the method only works for a careful balance of hyperparameters, which could be fine, but we don't have a sense of the variation. +- Reviewer #2: The output in Table 6 makes me doubt how the experiments are badly controlled. The outputs from positive and negative sentiment are totally different and almost random text, meaning that the content of the generators is not controlled properly. +",ICLR2021, +r1u6UJ6BG,1517250000000.0,1517260000000.0,885,rkvDssyRb,rkvDssyRb,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers agree this is an interesting paper with interesting ideas, but is not ready for publication in its current shape. In particular, there is a need for strong empirical results.",ICLR2018, +qrWg-hBGriT,1610040000000.0,1610470000000.0,1,JU8ceIgm5xB,JU8ceIgm5xB,Final Decision,Reject,"The authors propose a learning approach based on mutual information maximization. By considering a view x, and two subviews, x’ and x’’, the authors provide a bound on MI by combining two InfoNCE-like bounds on I(x’’; y) and I(x’; y | x’’). The authors show that optimising this (approximate) bound leads to improvements in several tasks covering NLP and vision. + +This paper is aiming to address a significant problem for the ICLR community and provide a novel solution. The manuscript is well written and the main idea is clear. The reviewers appreciated the fact that the experimental setup covers both vision and NLP. On the negative side, the reviewers raised several major issues, both with the presented theory and the experimental setup. From the theoretical point of view, the approach hinges on a good approximation for p(y|x''), which could arguably be as hard as the original problem. The author's response is definitely a step in the right direction, but the changes to the original manuscript are quite substantial and there is no time for a thorough validation of the updated claims. I will hence recommend rejection and strongly suggest that the authors incorporate the reviewers’ feedback and submit the manuscript to a future venue.",ICLR2021, +OVEpOfB95M,1576800000000.0,1576800000000.0,1,B1gUn24tPr,B1gUn24tPr,Paper Decision,Reject,"The paper is interested in Chinese Name Entity Recognition, building on a BERT pre-trained model. All reviewers agree that the contribution has limited novelty. Motivation leading to the chosen architecture is also missing. In addition, the writing of the paper should be improved. +",ICLR2020, +SycMQJaHM,1517250000000.0,1517260000000.0,93,r1l4eQW0Z,r1l4eQW0Z,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Thank you for submitting you paper to ICLR. This paper was enhanced noticeably in the rebuttal period and two of the reviewers improved their score as a result. There is a good range of experimental work on a number of different tasks. The addition of the comparison with Liu & Feng, 2016 to the appendix was sensible. Please make sure that the general conclusions drawn from this are explained in the main text and also the differences to Tran et al., 2017 (i.e. that the original model can also be implicit in this case).",ICLR2018, +KJye-gIiC5,1576800000000.0,1576800000000.0,1,rJeQoCNYDS,rJeQoCNYDS,Paper Decision,Accept (Poster),"This is an interesting paper that is concerned with single episode transfer to reinforcement learning problems with different dynamics models, assuming they are parameterised by a latent variable. Given some initial training tasks to learn about this parameter, and a new test task, they present an algorithm to probe and estimate the latent variable on the test task, whereafter the inferred latent variable is used as input to a control policy. + +There were several issues raised by the reviewers. Firstly, there were questions with the number of runs and the baseline implementations, which were all addressed in the rebuttals. Then, there were questions around the novelty and the main contribution being wall-clock time. These issues were also adequately addressed. + +In light of this, I recommend acceptance of this paper.",ICLR2020, +uwRLOCk9-B5,1610040000000.0,1610470000000.0,1,AjrRA6WYSW,AjrRA6WYSW,Final Decision,Reject,"This paper describes a procedure for estimating the number of clusters in assortative sparse networks generated from a stochastic blockmodel, where the average degree scales sublogarithmically with the number of nodes. The approach proposed by the authors is based on the spectra of the Bethe Hessian matrix. + +The article is well written. The reviewers raised a number of questions regarding the ad-hoc procedure for estimating the parameter $\zeta$ needed to estimate $K$, and the limited experiments. While the authors provided some additional experiments in the revised version, including on a real world dataset, the overall article stills appears to be too borderline for ICLR. Adding experiments on other real-world datasets would strengthen the paper. + +I recommend rejection.",ICLR2021, +F-DUI8zPt8z,1610040000000.0,1610470000000.0,1,gZ2qq0oPvJR,gZ2qq0oPvJR,Final Decision,Reject,"This paper describes an application of reinforcement learning to theorem proving in the connection tableau calculus. The paper does a reasonable job in the application of RL techniques and the high level issues are important. However, as the reviewers note, there is little connection to the notion of ""analogy"" outside of the very general idea that RL methods learn to generalize to novel situations. + +I did not find the methods very original as it seems a somewhat mechanical application of RL methods. That would be fine if the empirical results were convincing or surprising. However, I found the Robinson arithmetic domains not very interesting as the problems were literally arithmetic, as in 2+5 = 7, rather than theorems such as the commutativity of addition. The empirical results were not as convincing in the TPTP domains where MCTS seemed to dominate. + +Also there are related papers in the area of deep learning applied to theorem proving that I believe dominate this paper (""learning to reason in large theories"" and ""an inequality benchmark"".",ICLR2021, +S1g7nf8Ol,1486400000000.0,1486470000000.0,1,Bk2TqVcxe,Bk2TqVcxe,ICLR committee final decision,Invite to Workshop Track,"This paper proposes RNs, relational networks, for representing and reasoning about object relations. Experiments show interesting results such as the capability to disentangling scene descriptions. AR3 praises the idea and the authors for doing this nice and involved analysis. AR1 also liked the paper. Indeed, taking a step back and seeing whether we are able to learn meaningful relations is needed in order to build more complex systems. + + However, AR2 raised some important issues: 1) the paper is extremely toy; RN has a very simplistic structure, and is only shown to work on synthetic examples that to some extent fit the assumptions of the RN. 2) there is important literature that addresses relation representations that has been entirely overlooked by the authors. The reviewer implied missed citations from a field that is all about learning object relations. In its current form, the paper does not have a review of related work. The AC does not see any citations in a discussion nor in the final revision. This is a major letdown. The reviewer also mentioned the fact that showing results on real datasets would strengthen the paper, which was also brought up by AR3. This indeed would have added value to the paper, although it is not a deal breaker. + + The reviewer AR2 did not engage in discussions, which indeed is not appropriate. The AC does weigh this review less strongly. + + Give the above feedback, we recommend this paper for the workshop. The authors are advised to add a related work section with a thorough review over the relevant fields and literature.",ICLR2017, +ZAwj9Pyym,1576800000000.0,1576800000000.0,1,HkePNpVKPB,HkePNpVKPB,Paper Decision,Accept (Poster),"This paper examines the correspondence between topological similarity of languages (correlation between the message space and object space) and ability to learn quickly in a situation of emergent communication between agents. + +While this paper is not without issues, it does seem to present a nice contribution that all of the reviewers appreciated to some extent. I think it will spark further discussions in this area, and thus can recommend it for acceptance.",ICLR2020, +E0rOYfSKP5N,1642700000000.0,1642700000000.0,1,7N-6ZLyFUXz,7N-6ZLyFUXz,Paper Decision,Reject,"The reviewers overall thought the problem was worth studying. However, no reviewer was particularly excited about this work. The main concern was that the new problem formulation is difficult to compare to prior work. Reviewers felt both more explanation and a deeper detailed comparison would make this a stronger paper.",ICLR2022, +Hy-O4kaSM,1517250000000.0,1517260000000.0,381,SJTB5GZCb,SJTB5GZCb,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track," + interesting novel extension of equilibrium propagation, as a biologically more plausible alternative to backpropagation, with encouraging initial experimental validation. + - currently lacks theoretical guarantees regarding convergence of the algorithm to a meaningful result + - experimental study should be more extensive to support the claims",ICLR2018, +hdxPAzwwYVm,1642700000000.0,1642700000000.0,1,PyBp6nFfzuj,PyBp6nFfzuj,Paper Decision,Reject,"The paper introduces a method for uncertainty quantification for medical applications, which quantifies both aleatoric and epistemic components. + +The paper initially received three strong reject recommendations. The main limitations pointed out by reviewers relate to the limited contributions (either methodological or applicative and clinical), the lack of positioning with respect to related works, the presentation needing improvement and the lack of experimental comparison with respect to recent relevant baselines. +No rebuttal was provided. \ +The AC carefully read the submission and agrees that the paper is premature for publication in the current form. Therefore, the AC recommends rejection.",ICLR2022, +rJWTsfUOe,1486400000000.0,1486400000000.0,1,BkVsEMYel,BkVsEMYel,ICLR committee final decision,Accept (Poster),"The paper uses the notion of separation rank from tensor algebra to analyze the correlations induced through convolution and pooling operations. They show that deep networks have exponentially larger separation ranks compared to shallow ones, and thus, can induce a much richer correlation structure compared to shallow networks. It is argued that this rich inductive bias is crucial for empirical success. + + The paper is technically solid. The reviewers note this, and also make a few suggestions on how to make the paper more accessible. The authors have taken this into account. In order to bridge the gap between theory and practice, it is essential for theory papers to be accessible. + + The paper covers related work pretty well. One aspect is misses is the recent geometric analysis of deep learning. Can the algebraic analysis be connected to geometric analysis of deep learning, e.g. in the following paper? + https://arxiv.org/abs/1606.05340",ICLR2017, +Skg2zkObxE,1544810000000.0,1545350000000.0,1,SygLehCqtm,SygLehCqtm,Clear accept,Accept (Poster),"The reviewers and authors had a productive conversation, leading to an improvement in the paper quality. The strengths of the paper highlighted by reviewers are a novel learning set-up and new loss functions that seem to help in the task of protein contact prediction and protein structural similarity prediction. The reviewers characterize the work as constituting an advance in an exciting application space, as well as containing a new configuration of methods to address the problem. + +Overall, it is clear the paper should be accepted, based on reviewer comments, which unanimously agreed on the quality of the work.",ICLR2019,5: The area chair is absolutely certain +hpHnkX27pKu,1642700000000.0,1642700000000.0,1,I_RLPhVUfw8,I_RLPhVUfw8,Paper Decision,Reject,"This paper proposes the use of Gaussian process regression embedded into a neural network architecture for few-shot segmentation. In more detail, support and query images and support masks are fed through their encoders and their corresponding features are then used for Gaussian process regression to infer the distribution of the query mask encoding given the support set and the query images. The mean and the variance characterizing the GP predictive distribution is then fed into a CNN-based decoder to make the final prediction (segmentation). The method is evaluated on PASCAL-5^i and COCO-20^i datasets, showing the superiority of the proposed approach wrt several competitive baselines. + +Overall, the reviewers found the approach of using GPs within the proposed architecture interesting and somewhat significant and novel to the few-shot segmentation community. Technically, the proposed method does not develop a new algorithm and simply uses standard Gaussian process regression. The authors seemed to have addressed several concerns raised by the reviewers including the ablation study evaluating the influence of the GP module. However, the reviewers felt that there were quite a few changes/clarifications to the paper and new results that were not highlighted in the revised version, which made it difficult to provide a new assessment of the paper. Furthermore, the reviewers also thought that the authors did not provide convincing explanations in terms of the improvements from 1-shot to 5-shots, the not-so-good results when the model was trained with standard SGD without loss weighting and the rationale behind the success of the 5-shot setting.",ICLR2022, +HklY6xKgx4,1544750000000.0,1545350000000.0,1,S1gUsoR9YX,S1gUsoR9YX,Accept,Accept (Poster),This paper presents good empirical results on an important and interesting task (translation between several language pairs with a single model). There was solid communication between the authors and the reviewers leading to an improved updated version and consensus among the reviewers about the merits of the paper.,ICLR2019,4: The area chair is confident but not absolutely certain +k1Dhgqve6Mc,1642700000000.0,1642700000000.0,1,Bc4fwa76mRp,Bc4fwa76mRp,Paper Decision,Reject,"The paper explores the usefulness of intermediate layers for linear probing, aiming at improving out-of-distribution transfer with significantly less cost than fine-tuning. Two reviewers recommended borderline acceptance, while two others recommended borderline rejection as final rating. The main concerns raised by the reviewers were the limited novelty of the proposed method (e.g., compared to Elmo), unconvincing results in the natural and structure categories of VTAB, and lack of experiments to justify the claims, as well as the demonstration of the method in other tasks beyond image classification. The rebuttal has clarified several other questions. The AC really likes the simplicity of the approach, and also finds the problem of improving the efficiency of transfer learning very important. In addition, the paper is very well-written and easy to follow, as acknowledged by all reviewers. However, the AC agrees with R2 and R3 that the paper, in its current form, does not pass the bar of ICLR, unfortunately. First, the novelty is limited, as pointed out by R1, R2, and R3. In addition to the related works mentioned in the reviews like Elmo, note that the idea of selecting intermediate features, concatenating them, and running a linear classifier for OOD transfer has also been explored in [Yunhui Guo et al, A broader study of cross-domain few-shot learning, ECCV 2020]. Second, while the approach has advantages in terms of efficiency, the accuracy drop (compared to fine-tuning) for in-domain tasks limits its applicability. Finally, even though the AC agrees with the authors this is not a requirement, a more comprehensive set of experiments on more tasks would make the paper stronger, especially given that the novelty is incremental. The authors are encouraged to improve the paper for another top conference.",ICLR2022, +kC5dD21qBI,1610040000000.0,1610470000000.0,1,_b8l7rVPe8z,_b8l7rVPe8z,Final Decision,Reject,"This paper proposes a transferable adversarial attack method for object detection by using the relevance map. Four reviewers provided detailed reviews: 2 of them rated “Ok but not good enough - rejection”, 1 rated “Marginally below” and 1 rated “Marginally above”. While reviewers consider the paper well written and using relevance map novel, a number of concerns are raised, including limited novelty, the lack of theoretical results, no use of the proposed dataset, insufficient ablation, etc. During the rebuttal, the authors made efforts to response to all reviewers’ comments. However, the major concerns remain, and the rating were not changed. The ACs concur these major concerns and agree that the paper can not be accepted at its current state.",ICLR2021, +RREz8GDsfGt,1642700000000.0,1642700000000.0,1,6PTUd_zPdHL,6PTUd_zPdHL,Paper Decision,Reject,"The main consensus among the reviewers was that although the approach is interesting, this submission suffers from two main weaknesses: + +- The methodology is not very novel, and the proposed parts of the method not well justified (in particular regarding the interplay of a differentiable sorting approach and of the random choice of k) + +- The results, compared to a standard cross-entropy loss are not very convincing: there does not seem to be a statistically significant advantage.",ICLR2022, +fu6heF04gq,1576800000000.0,1576800000000.0,1,BygY4grYDr,BygY4grYDr,Paper Decision,Reject,"As the reviewers point out, the core contribution might be potentially important but the current execution of the paper makes it difficult to gauge this importance. In the light of this, this paper does not seem ready for appearance in a conference like ICLR.",ICLR2020, +r0bvJ0UsDh,1576800000000.0,1576800000000.0,1,rkx-wA4YPS,rkx-wA4YPS,Paper Decision,Reject,"This was a borderline paper, but in the end two of the reviewers remain unconvinced by this paper in its current form, and the last reviewer is not willing to argue for acceptance. The first reviewer's comments were taken seriously in making a decision on this paper. As such, it is my suggestion that the authors revise the paper in its current form, and resubmit, addressing some of the first reviewers comments, such as discussion of utility of the methodology, and to improve the exposition such that less knowledgable reviewers understand the material presented better. The comments that the first reviewer makes about lack of motivation for parts of the presented methodology is reflected in the other reviewers comments, and I'm convinced that the authors can address this issue and make this a really awesome submission at a future conference. + +On a different note, I think the authors should be congratulated on making their results reproducible. That is definitely something the field needs to see more of.",ICLR2020, +J3iKVKxIu,1576800000000.0,1576800000000.0,1,BJeKh3VYDH,BJeKh3VYDH,Paper Decision,Accept (Spotlight),"This paper provides an improved method for deep learning on point clouds. Reviewers are unanimous that this paper is acceptable, and the AC concurs. ",ICLR2020, +cW5BOphGT8,1610040000000.0,1610470000000.0,1,D3PcGLdMx0,D3PcGLdMx0,Final Decision,Accept (Poster),"This paper explores the effect of poorly sampled episodes in few-shot learning, and its effect on trained models. The improvements from the additional attention module (CEAM) and regularizer (CECR) are strong, and the ablations are thorough. The reviewers are not fully convinced that poor sampling is indeed the main issue. That is, it could be that CEAM and CECR improve performance for other reasons, but the hypothesis is sensible, and the reviewers believe a more thorough investigation is beyond the scope of this work. + +During discussions, one note that came up is whether CEAM works because of cross-episode attention, or if the idea of an instance-level FEAT is itself a good one. One ablation to sort this out would be to apply FEAT and an instance-level FEAT on episodes that are twice as large as those seen by CEAM so that the effective episode size is the same. This would help answer: is it the reduced noise due to effectively larger episodes, a stronger attention mechanism using instance-level information, or is the idea of crossover episodes indeed the important factor? The reviewers agree that this baseline, or an analogous baseline, should be included in the final version. +",ICLR2021, +BJhdhM8ux,1486400000000.0,1486400000000.0,1,BJ--gPcxl,BJ--gPcxl,ICLR committee final decision,Reject,"There has been prior work on semi-supervised GAN, though this paper is the first context conditional variant. The novelty of the approach was questioned by two of the reviewers, as the approach seems more incremental. Furthermore, it would have been helpful if the issues one of the reviewer had with statements in the document were addressed.",ICLR2017, +HjIt3MdOF,1576800000000.0,1576800000000.0,1,rJxYMCEFDr,rJxYMCEFDr,Paper Decision,Reject,The authors propose a method to train a neural network that is robust to visual distortions of the input image. The reviewers agree that the paper lacks justification of the proposed method and experimental evidence of its performance.,ICLR2020, +ryl70-QGeV,1544860000000.0,1545350000000.0,1,BJgRDjR9tQ,BJgRDjR9tQ,writing issues outweighed by high conceptual novelty,Accept (Poster)," +* Strengths + +This paper presents a very interesting connection between GANs and robust estimation in the presence of corrupted training data. The conceptual ideas are novel and can likely be extended in many further directions. I would not be surprised if this opens up a new line of research. + +* Weaknesses + +The paper is poorly written. Due to disagreement among the authors and my interest in the topic, I read the paper in detail myself. I think it would be difficult for a non-expert to understand the key ideas and I strongly encourage the authors to carefully revise the paper to reach a broader audience and highlight the key insights. Additionally, the experiments are only on toy data. + +* Discussion + +One of the reviewers was concerned about the lack of efficiency guarantees for the proposed algorithm (indeed, the algorithm requires training GANs which are currently beyond the reach of theory and finicky in practice). That reviewer points to the fact that most papers in the robustness literature are concerned with computational efficiency and is concerned that ignoring this sidesteps one of the key challenges. The reviewer is also concerned about the restriction to parametric or nearly-parametric families (e.g. Gaussians and elliptical distributions). Other reviewers were more positive and did not see these as major issues. + +* Decision + +In my opinion, the lack of efficiency guarantees is not a huge issue, as the primary contribution of the paper is pointing out a non-obvious conceptual connection between two literatures. The restriction to parametric families is more concerning, but it seems possible this could be removed with further developments. The main reason for accepting the paper (despite concerns about the writing) is the importance of the conceptual connection. I think this connection is likely to lead to a new line of research and would like to get it out there as soon as possible. + +* Comments + +Despite the accept decision, I again urge the authors to improve the quality of exposition to ensure that a large audience can appreciate the ideas.",ICLR2019,4: The area chair is confident but not absolutely certain +9SVUJqEIbG,1576800000000.0,1576800000000.0,1,S1lJv0VYDr,S1lJv0VYDr,Paper Decision,Reject,"This paper addresses challenges in offline model learning, i.e., in the setting where some trajectories are given and can be used for learning a model, which in turn serves to train an RL agent or plan action sequences in simulation. A key issue in this setting is that of compounding errors: as the simulated trajectory deviates from observed data, errors build up, leading to suboptimal performance in the target domain. The paper proposes a distribution matching approach that considers trajectory sequence information and provides theoretical guarantees as well as some promising empirical results. + +Several issues were raised by reviewers, including missing references, clarity issues, questions about limitations of the theoretical analysis, and limitations of the empirical validation. Many of the issues raised by reviewers were addressed by the authors during the rebuttal phase. + +At the same time, several issues remain. First, the authors committed to adding results for additional tasks (initially deemed too easy or too hard to show differences). Even if the tasks show little separation between methods, these would be important data points to include as they support additional comparisons with prior and future work. The AC has to assess the paper without taking promised additional results into account. Second, questions about the results for Ant are not sufficiently addressed. The plot shows no learning. The author response mentions initialization but this is not deemed a sufficient explanation. Given the remaining questions, my assessment is that the quality and contribution of the submission are not yet ready for publication at the current stage.",ICLR2020, +ryOZ7JTBz,1517250000000.0,1517260000000.0,76,rytstxWAW,rytstxWAW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Graph neural networks (incl. GCNs) have been shown effective on a large range of tasks. However, it has been so far hard (i.e. computationally expensive or requiring the use of heuristics) to apply them to large graphs. This paper aims to address this problem and the solution is clean and elegant. The reviewers generally find it well written and interesting. There were some concerns about the comparison to GraphSAGE (an alternative approach), but these have been addressed in a subsequent revision. + ++ an important problem ++ a simple approach ++ convincing results ++ clear and well written +",ICLR2018, +rRCRBFSTl-,1576800000000.0,1576800000000.0,1,HJlXC3EtwB,HJlXC3EtwB,Paper Decision,Reject,"The paper proposes a method to prune edges in proximity graphs for faster similarity search. The method works by making the graph edges annealable and optimizing over the weights. The paper tackles an important and practically relevant problem as also acknowledged by the reviewers. However there are some concerns about empirical results, in particular about missing comparisons with tree-structure based algorithms (perhaps with product quantization for high dimensional data), and about modest empirical improvement on two of the three datasets used in the paper, which leaves room for convincing empirical justification of the method. Authors are encouraged to take the reviewers' comments into account and resubmit to a future venue. + +",ICLR2020, +U64y_TBWkM-,1642700000000.0,1642700000000.0,1,vEIVxSN8Xhx,vEIVxSN8Xhx,Paper Decision,Reject,"This paper proposes an alternative for constructing convolution kernels: instead of uniform spatial resolution, it proposes a spatially varying resolution with higher precision at the center of the kernel. The resolution decreases logarithmically as a function of the distance to the center. All reviewers agree that the idea is interesting, but in its current form, the submission is not mature enough to be published. + +In particular, reviewers raised some concerns about computational efficiency of the method. The authors explain that their method runs slower than conventional convolution because the implementation uses of-the-shell conventional convolution modules, and they speculate that the speed can be accelerated if the method is directly implemented with CUDA or by directly adapting the underlying code of convolutions in the integrated framework. While this is a reasonable argument, it is not actually verified. This it is not clear if there would be other road blockers to achieve the promised performance. It would be great if authors could present actual performance of the method using either of their suggested solutions (CUDA or modifying code of convolutions). + +In addition, reviewers raised concerns about some aspects of the evaluation setup, where test data is used to report the best performance. Authors respond that baselines are trained in the same fashion, hence the comparison is still fair. However, the reviewers were not convinced by this response. In concordance, I also think the use of test data during training is misleading, even if all methods use the same strategy, because this may tell us more about which approach can better (over)fit to the data as opposed to how well the methods are able to generalize to unseen samples. + +Another concern relates to the diminishing return in the performance as networks get larger. The authors respond that this might be because only the first layer uses the proposed log-polar convolution, speculating the problem will go away if the proposed approach is used in all layers. However, this is not empirically verified again and remains unclear if this is indeed the reason. + +I suggest authors resubmit after accommodating the provided feedback.",ICLR2022, +eE8PmGd3AI,1610040000000.0,1610470000000.0,1,k9EHBqXDEOX,k9EHBqXDEOX,Final Decision,Reject,"Although the reviewers found the paper well-written that analyzes a relatively popular algorithm (TD(0) version of A3C), there are concerns regarding the novelty of the convergence results given those for A2C, the comparison of the results with those for A2C, and the sufficiency of the experiments. Although the authors addressed some of these issues/comments during the rebuttals, it seems none of the reviewers is excited about the paper and there still exist concerns regarding the novelty of the results and how they are compared with those in the literature. I would suggest that the authors take the reviewers' comments into account, have a more comprehensive discussion about the relation of their results with those in the literature (two-time scale algorithms), and prepare their work for future conferences. ",ICLR2021, +gJAnwEGaQ0fs,1642700000000.0,1642700000000.0,1,Gpp1dfvZYYH,Gpp1dfvZYYH,Paper Decision,Reject,"As pointed out by some reviewers, the proposed method basically puts progressive training in the federated context. The theoretical analysis only concerns the centralized or non-federated setting and thus give no insight or guidelines for progressive training in federated learning. The main advantage of saving communication mainly comes from the simple observation that less parameters are computed and communicated during each round before the full end-to-end stage. However, this may cause extra overhead in hyper-parameter tuning including number of stage, learning rate schedules and stage-wise warmup. Despite its potential effectiveness in practice, the current version of the paper falls short of the acceptance bar due to the weakness in novelty and relevant theory for federated learning.",ICLR2022, +rJNgTM8ul,1486400000000.0,1486400000000.0,1,r1IRctqxg,r1IRctqxg,ICLR committee final decision,Reject,"The reviewers provided detailed, confident reviews and there was significant discussion between the parties. + + Reviewer 2 and 3 felt quite strongly that the paper was a clear reject. Reviewer 1 thought the paper should be accepted. + + I was concerned with the two points raised by R3 and don't feel they were adequately addressed by the author's comments: + + - Dependence of the criteria on the learning rate (this does not make sense to me); and + - Really really poor results on CIFAR-10 (and this is not being too picky, like asking them to be state-of-the-art; they are just way off) + + I engaged R1 to see how they felt about this. In reflection, I side with the majority opinion that the paper needs re-work to meet the ICLR acceptance bar.",ICLR2017, +649iNz_YB,1576800000000.0,1576800000000.0,1,rJx7wlSYvB,rJx7wlSYvB,Paper Decision,Reject,"The main contribution is a Bayesian neural net algorithm which saves computation at test time using a vector quantization approximation. The reviewers are on the fence about the paper. I find the exposition somewhat hard to follow. In terms of evaluation, they demonstrate similar performance to various BNN architectures which require Monte Carlo sampling. But there have been lots of BNN algorithms that don't require sampling (e.g. PBP, Bayesian dark knowledge, MacKay's delta approximation), so it seems important to compare to these. I think there may be promising ideas here, but the paper needs a bit more work before it is to be published at a venue such as ICLR. +",ICLR2020, +HJxdcNSleN,1544730000000.0,1545350000000.0,1,Hkxx3o0qFX,Hkxx3o0qFX,All reviewers assess the paper as being marginally below acceptance threshold ,Reject,"All reviewers gave a 5 rating. +The author rebuttal was not able to alter the consensus view of reviewers. +See below for details.",ICLR2019,4: The area chair is confident but not absolutely certain +yUNS9k_kprL,1642700000000.0,1642700000000.0,1,RGrj2uWTLWY,RGrj2uWTLWY,Paper Decision,Reject,"The paper studies the noisy labels problem in semi-supervised node classification and proposes a method that leverages pairwise interactions to explicitly force the embeddings for certain node pairs to be close to each other leading to better robustness. + +The reviewers agreed the proposed method is promising. However, the reviewers also had concerns about the novelty, and that certain aspects of the method could be justified better and the experiments should consider larger scale settings to make the paper more convincing. These were the key reasons for rejection.",ICLR2022, +SkxZI1pSG,1517250000000.0,1517260000000.0,718,H1a37GWCZ,H1a37GWCZ,ICLR 2018 Conference Acceptance Decision,Reject,"The paper presents an interesting extension of the SkipThought idea by modeling sentence embeddings using several document-structure related information. Out of the various kinds of evaluations presented, the coreference results are interesting -- but, they fall short by a bit (as noted by Reviewer 2) because they don't compare with recent work by Kenton Lee et al. In summary, the idea provides an interesting bit on building sentence embeddings, but the experimental results could have been stronger.",ICLR2018, +I5WCTRJt9UD,1610040000000.0,1610470000000.0,1,ioXEbG_Sf-a,ioXEbG_Sf-a,Final Decision,Reject,"This paper proposed a new experience replay approach, applicable to deep RL methods. Two reviewers suggested acceptance and two did rejection. The first negative reviewer R1 raised a concern on continuous vs. discrete issue, but AC thinks that the authors' response is not fully convincing enough. The second negative reviewer R2 pointed out that the reported performance of SAC is poor compared to the existing implementation (although authors claim a different set of hyperparameters is used), which AC thinks is a critical weakness to judge the value of the experiments. Two other positive reviewers (even R4) shows mixed opinions. Overall, AC thinks this is a borderline paper, a bit toward rejection.",ICLR2021, +AK579phXVG,1610040000000.0,1610470000000.0,1,UiLl8yjh57,UiLl8yjh57,Final Decision,Reject,"The reviewers mostly agree that this paper presents a new deep reinforcement learning-based approach to solving a challenging problem in the communications domain -- wireless scheduling. However, the main concern, expressed almost unanimously, is about the novelty of the ideas in the paper beyond the assembly of existing deep RL techniques and the translation of the scheduling problem to the language of MDPs in a careful manner that respects modern communication systems standards such as 5G (e.g., URLLC and eMBB traffic demands). A secondary concern, also expressed during the author rebuttal discussion, is about adequate comparison to competing approaches motivated from the literature in wireless scheduling. In view of these issues, I suggest that the author(s) explore more appropriate avenues to submit this piece of valuable translational work, including venues that address the specific topic of wireless communication where a more comprehensive evaluation and comparison could be possible. + +(NOTE: The comments and evaluation above disregard the ""enhanced"" draft submitted by the author(s) during the rebuttal phase. I was informed that the submission was reverted to the original draft due to space constraints being exceeded in the enhanced version.) +",ICLR2021, +SueApy-7WVM,1642700000000.0,1642700000000.0,1,EhYjZy6e1gJ,EhYjZy6e1gJ,Paper Decision,Accept (Oral),"This paper presents PiCO, a novel approach for partial label learning, which achieves very strong performance close to that of fully supervised learning and outperforms PPL baselines. The experiments are extensive with very impressive results and the analysis are thorough.",ICLR2022, +nknjsONTXpH,1642700000000.0,1642700000000.0,1,gVRhIEajG1k,gVRhIEajG1k,Paper Decision,Accept (Poster),"This work studied an important issue, i.e., adversarial transferability, in adversarial examples. It provides a novel perspective that samples in the low-density region of the ground truth distribution where models are not well trained have stronger transferability across different models. Based on that, it proposed a metric called Alignment between its Adversarial attack and the Intrinsic attack (AAI) to indicate transferability. Inspired by the connection between AAI and transferability, this work further proposed to replace the regular ReLU activation with some smooth activation functions, to enhance the transferability. + +Most reviewers appreciate that the observation is interesting, and the theoretical analysis and the proposed method are intuitive. The reviewers posed some important comments on experiments, and the relationship between the proposed method and the proposed metric. The authors provided satisfied responses to most of these concerns. Although there is one remaining concern that AAI may be not the best metric to choose the structural hyper-parameters, the reviewer still thought it is a good theoretical starting point to further analyze the adversarial transferability. + +After reading the submission, reviewers' comments and the discussions between reviewers and authors, I believe that this work has provided a valuable perspective, a reasonable theoretical analysis and an effective solution for adversarial transferability. It could inspire further studies on adversarial transferability.",ICLR2022, +H-ehLtvS7,1576800000000.0,1576800000000.0,1,Bke9u1HFwB,Bke9u1HFwB,Paper Decision,Reject,"The paper makes broad claims, but the depth of the experiments is very limited to a narrow combination of algorithms.",ICLR2020, +K4Rqcw9JIPA,1610040000000.0,1610470000000.0,1,ac288vnG_7U,ac288vnG_7U,Final Decision,Accept (Poster),This paper considers the problem of sequential decision making through the lens of submodular maximization. I read the paper myself and found the idea quite appealing and interesting. The authors also make a very effective rebuttal and brought a borderline paper into a clear accept. ,ICLR2021, +vUB_4r85oy,1610040000000.0,1610470000000.0,1,BIwkgTsSp_8,BIwkgTsSp_8,Final Decision,Reject,"The paper considers the problem of private data sharing under local differential privacy. + +(1) it assumes having access to a public unlabeled dataset for learning a VAE, so it reduces the dimensionality in a more meaningful way than simply running PCA. (2) the LDP guarantee is coming from the standard Laplace mechanism and Randomized Responses. (3) then the authors propose how to learn a model based on the privately released (encoded) data which exploits the knowledge of the noise distribution. + +None of these components are new as far as I know, nor were they new in the context of differential privacy. For example, the use of a publicly available data for DP was considered in: + +- Amos Beimel, Kobbi Nissim, and Uri Stemmer. Private learning and sanitization: Pure +vs. approximate differential privacy. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 363–378. Springer, 2013. + +(they called it Semi-Private Learning...) + +- Papernot, N., Abadi, M., Erlingsson, U., Goodfellow, I., & Talwar, K. (2017). Semi-supervised knowledge transfer for deep learning from private training data. In ICLR-17. + +The idea of integrating out the noise by leveraging the known noise structure were considered in: + +- Williams, O., & McSherry, F. (2010). Probabilistic inference and differential privacy. Advances in Neural Information Processing Systems, 23, 2451-2459. + +- Balle, B., & Wang, Y. X. (2018). Improving the Gaussian Mechanism for Differential Privacy: Analytical Calibration and Optimal Denoising. In International Conference on Machine Learning (pp. 394-403). + +And many subsequent work. + +The contribution of this work is in combining these known pieces (without citing some of the earlier work) to achieve a reasonably strong set of experimental results (for LDP standard). I believe this is the first experimental study that uses VAE for the dimension reduction, however, this alone is not sufficient to carry the paper in my opinion; especially since the setting is now much easier, with access to a public dataset. + +The reviewers question the experiments are baselines are usually not using a public dataset as well as the practicality of the proposed method. Also, connections to some of the existing work on private data release (a.k.a., private synthetic data generation) were note clarified. For these reasons, there were not sufficient support among the reviewers to push the paper through. + +The authors are encouraged to revise the paper according to the suggestions and resubmit in the next appropriate venue.",ICLR2021, +bwYyOVAfnF,1576800000000.0,1576800000000.0,1,Bkln2a4tPB,Bkln2a4tPB,Paper Decision,Reject,"This work proposes a dynamical systems model to allow the user to better control sequence generation via the latent z. Reviewers all agreed the that the proposed method is quite interesting. However, reviewers also felt that current evaluations were weak and were ultimately unconvinced by the author rebuttal. I recommend the authors resubmit with a stronger set of experiments as suggested by Reviewers 2 and 3.",ICLR2020, +Ez9-amgQWPO,1642700000000.0,1642700000000.0,1,kSqyNY_QrD9,kSqyNY_QrD9,Paper Decision,Reject,"The paper considers the problem of solving time-constrained multi-robot task allocation (MRTA) problems. Formulating the problem as a Markov decision process (MDP), the paper proposes Covariant Attention-based Mechanism (CAM), a graph neural network-based policy that can be trained to solve MRTA problems via standard RL methods. The encoder adapts the covariant compositional network to improve generalizability, while the decoder extends a recent combinatorial optimization architecture to the multi-agent optimization domain. Experimental results demonstrate that CAM outperforms an encoder-decoder baseline in terms of task completion, generalizability, and scalability, while also providing greater computational efficiency than non-learning baselines. + +The paper considers an important topic---multi-agent task allocation is an interesting and challenging combinatorial optimization problem. The proposed CAM architecture adapts existing components in an interesting way and seems sensible for the MRTA domain. The reviewers initially raised concerns regarding the conclusions that can be drawn from the experimental evaluation, the significance of the algorithmic contributions, as well as the motivation for the proposed approach. The authors made a concerted effort to address these concerns through the addition of new experimental evaluations (e.g., comparisons to a myopic baseline and ablation studies), updates to the text, and detailed responses to each reviewer. Unfortunately, only one reviewer responded and updated their review (increasing their score). In light of this, the AC also reviewed the paper. The AC agrees with the strengths identified by the reviewers (including those noted above) and with the contributions provided by the additional evaluations. However, the paper remains unnecessarily dense, while at the same time not being self-contained (e.g., the new experimental results are relegated to the appendix rather than appearing in the main text). The paper would also benefit from a more concise motivation for learning-based solutions to MRTA and a clearer discussion of the paper's contributions.",ICLR2022, +rkgumKTZl4,1544830000000.0,1545350000000.0,1,SklrrhRqFX,SklrrhRqFX,metareview,Reject,"The paper suggests a new way to learn a physics prior, in an action-free way from raw frames. The idea is to ""learn the common rules of physics"" in some sense (from purely visual observations) and use that as pre-training. The authors made a number of experiments in response to the reviewer concerns, but the submission still fell short of their expectations. In the post-rebuttal discussion, the reviewers mentioned that it's not clear how SpatialNet is different from a ConvLSTM, mentioned the writing quality and the fact that the ""physics prior"" is really quite close to what others call video prediction in other baselines. + +",ICLR2019,4: The area chair is confident but not absolutely certain +xin4glD4Xh,1576800000000.0,1576800000000.0,1,SklR6aEtwH,SklR6aEtwH,Paper Decision,Reject,"This paper proposes an MCTS method for neural architecture search (NAS). Evaluations on NAS-Bench-101 and other datasets are promising. Unfortunately, no code is provided, which is very important in NAS to overcome the reproducibility crisis. + +Discussion: +The authors were able to answer several questions of the reviewers. I also do not share the concern of AnonReviewer2 that MCTS hasn't been used for NAS before; in contrast, this appears to be a point in favor of the paper's novelty. However, the authors' reply concerning Bayesian optimization and the optimization of its acquisition function is strange: using the ConvNet-60K dataset with 1364 networks, it does not appear to make sense to use only 1% or even only 0.01% of the dataset size as a budget for optimizing the acquisition function. The reviewers stuck to their rating of 6,3,3. + +Overall, I therefore recommend rejection. ",ICLR2020, +r1erBAXgxE,1544730000000.0,1545350000000.0,1,HygT9oRqFX,HygT9oRqFX,Meta-review,Reject,"The paper describes a method to improve generalization by mixing examples in the hidden space. Experiments on CIFAR-10 and CIFAR-100 showed that the proposed method improves the generalization of the networks. The reviewers found these results promising, but argue that the experimental section was too weak in its current form - notable lacking experiments on larger scale datasets such as Imagenet. Notably the paper should compare more with the relevant baselines to better understand its significance.",ICLR2019,4: The area chair is confident but not absolutely certain +IvA7JNBQZF,1576800000000.0,1576800000000.0,1,SJeFNlHtPS,SJeFNlHtPS,Paper Decision,Reject,"The paper shows how meta-learning contains hidden incentives for distributional shift and how a technique called context swapping can help deal with this. Overall, distributional shift is an important problem, but the contributions made by this paper to deal with this, such as the introduction of unit-tests and context-swapping, is not sufficiently clear. Therefore, my recommendation is a reject.",ICLR2020, +ujVniuLZRY,1610040000000.0,1610470000000.0,1,w_haMPbUgWb,w_haMPbUgWb,Final Decision,Reject,"This paper builds upon recent iterative refinement approaches NMT with an evaluator model that controls the termination of the translation process, yielding a “rewriter-evaluator framework” for multi-pass decoding. Their approach is an alternative to the policy network used in Geng et al (EMNLP 2018). The main delta wrt previous studies is that the evaluator offers this framework the capability of flexibly controlling the termination. While the idea behind the rewriter-evaluator framework is sensible and well described, and the proposed method achieves significant performance improvement against reported baselines, reviewers pointed out some concerns with the baselines and model optimization details. More analysis of the termination procedure against the RL-based model of Geng et al. 2018 could shed some light on why the proposed approach is better. Some analysis testifying how many iterations the model uses for translating one sentence, and what factors could affect the iteration number, such as sentence length, would greatly improve the paper. A second weakness pointed out by reviewers is related to the results of WMT’15 En-De reported in Table 1, where the reported baseline numbers seem to be weaker than expected. As pointed out by one the reviewers, pre-trained checkpoints on English->German (available at https://github.com/pytorch/fairseq/tree/master/examples/translation) exist which achieve much higher sacre BLEU than the reported baseline. I found the authors’ answer not very convincing regarding this point. Therefore, I recommend rejection. I suggest the authors, in future iterations of their work, address some of the issues pointed out by the reviewers and re-implement their method following the settings in (Ott et al., 2019) to get more convincing results. ",ICLR2021, +zxCjqDTUVMI,1610040000000.0,1610470000000.0,1,QpU7n-6l0n,QpU7n-6l0n,Final Decision,Reject,"The reviewers brought up many important concerns about this paper. On the positive side, the understanding of data augmentation is an important topic in deep learning, having good theoretical results is interesting here , and the experiments seem to do an okay job of backing up the theory. On the negative side, presentational issues make the paper difficult to follow and mischaracterize the results. A major issue is that some of the assumptions are hidden in the appendix and are not stated formally, and other assumptions are stated in a much weaker form, then made suddenly stronger when the theorems are stated. For example, Assumption 2 as stated holds trivially for any dataset as long as the possible data-augmented versions any two different examples are disjoint (just choose the discrete metric on the images of the examples under the data-augmentation function); however, in every theorem that uses A2, the distance chosen is restricted to be the L1 norm. Other Assumptions are stated strangely: for example, A1 says ""i.e., for any $a_1(), a_2() \in A$, $a_1(x_1) ⫫ a_1(x_2)$ for any $x_1, x_2 \in X$ that $x_1 \ne x_2$. But what is the point of introducing $a_2$ if it's never used in the formula? And what is the meaning of the symbol ⫫? Normally, this is used for conditional independence, but there aren't any random variables in this formula ($a_1$ and $a_2$ are defined as just functions, not random functions, and $x_1$ and $x_2$ are just examples and aren't random variables either). This paper will be much stronger with these presentational issues cleared up.",ICLR2021, +497D-GJ8zQo,1642700000000.0,1642700000000.0,1,djhu4DIZZHR,djhu4DIZZHR,Paper Decision,Reject,"This paper introduces a dataset, based on preexisting standardized tests, of elimination/grid-completion-style logical reasoning puzzles expressed in text; available in both Chinese and English (with some of the text coming from semi-automatic translation). The early pretrained MLMs BERT and RoBERTa perform poorly. + +This paper is solidly borderline. Reviewers had some concerns about the motivation and novelty the work, but I think that there is a plausible enough story for where this data will have value that I'm not comfortable rejecting it only on this basis. More worryingly, the initial submission had some fairly serious writing quality and clarity issues, which impacted both the paper *and* the data. It seems like the authors made significant progress on this in the revision and engaged substantially during the discussion, but reviewers were not fully satisfied that the paper was up to ICLR standards, either as an initial submission or after revisions. This is a small detail, but it's a bad sign for the carefulness of the work that the OpenReview abstract is still unreadable, even after a request from a reviewer.",ICLR2022, +Hy61myTSM,1517250000000.0,1517260000000.0,53,S1HlA-ZAZ,S1HlA-ZAZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper presents a distributed memory architecture based on a generative model with a VAE-like training criterion. The claim is that this approach is easier to train than other memory-based architectures. The model seems sound, and it is described clearly. The experimental validation seems a bit limited: most of the comparisons are against plain VAEs, which aren't a memory-based architecture. The discussion of ""one-shot generalization"" is confusing, since the task is modified without justification to have many categories and samples per category. The experiment of Section 4.4 seems promising, but this needs to be expanded to more tasks and baselines since it's the only experiment that really tests the Kanerva Machine as a memory architecture. Despite these concerns, I think the idea is promising and this paper contributes usefully to the discussion, so I recommend acceptance.",ICLR2018, +H1gmGW1-g4,1544770000000.0,1545350000000.0,1,HyztsoC5Y7,HyztsoC5Y7,"Promising work, should make sure final version carefully references robotics literature",Accept (Poster),"The authors consider the use of MAML with model based RL and applied this to robotics tasks with very encouraging results. There was definite interest in the paper, but also some concerns over how the results were situated, particularly with respect to the related research in the robotics community. The authors are strongly encouraged to carefully consider this feedback, as they have been doing in their responses, and address this as well as possible in the final version. +",ICLR2019,3: The area chair is somewhat confident +qr0-XhhOa,1576800000000.0,1576800000000.0,1,BJlZ5ySKPH,BJlZ5ySKPH,Paper Decision,Accept (Poster),"The paper proposes a new architecture for unsupervised image2image translation. +Following the revision/discussion, all reviewers agree that the proposed ideas are reasonable, well described, convincingly validated, and of clear though limited novelty. Accept.",ICLR2020, +HJe2o-BxxE,1544730000000.0,1545350000000.0,1,rygk9oA9Ym,rygk9oA9Ym,Reviews not strong enough to justify acceptance,Reject,"With ratings of 6, 5 & 3 the numerical scores are just not strong enough to warrant acceptance. +The author rebuttal was not able to sway opinions. +",ICLR2019,4: The area chair is confident but not absolutely certain +c43J6KITpEu,1610040000000.0,1610470000000.0,1,eyXknI5scWu,eyXknI5scWu,Final Decision,Reject,"While the reviewers found parts of the paper interesting, the main concern about this paper was lack of novelty and marginal improvements obtained by the proposed methods.",ICLR2021, +c6nLG4bXYbs,1610040000000.0,1610470000000.0,1,dzZaIeG9-fW,dzZaIeG9-fW,Final Decision,Reject,"The paper gives a way of constructing a dataset of programs aligned with invariants that the programs satisfy at runtime, and training a model to predict invariants for a given program. + +While the overall idea behind the paper is reasonable, the execution (in particular, the experimental evaluation) is problematic. As a result, the paper cannot be accepted in its present form. Please see the reviews for more details.",ICLR2021, +Ugc9Nhbpd-t,1610040000000.0,1610470000000.0,1,KwgQn_Aws3_,KwgQn_Aws3_,Final Decision,Reject,"The authors introduce an RNN model, ProtoryNet, which uses trajectories of sentence protoypes to illuminate the semantics of text data. + +Good points were brought up and addressed in discussion, which have improved the paper - including a helpful suggestion from Rev 3 to fine-tune BERT sentence embeddings in ProtoryNet, which led to significant performance gains. + +Unfortunately the tone of discussion with one reviewer slipped below the respectful standards to which we aspire, but rest assured that only substantive points on the paper were considered. + +Reviewers were split but in discussion converged to leaning against acceptance, allowing the authors to reflect on, and incorporate new results carefully in an updated manuscript.",ICLR2021, +splp5c2fMX6,1642700000000.0,1642700000000.0,1,bHqI0DvSIId,bHqI0DvSIId,Paper Decision,Reject,"This work presents the Neural Simulated Annealing (NSA) approach as a heuristic for general combinatorial optimization problems. After revising the paper and reading the comments from the reviewers, here are the general comments: + +- In general, the paper is clear enough. The contributions are stated in a proper way. +- The novelty is rather limited, but the key idea of using neural networks in SA, and training it with RL, has merit. +- This approach has merit but the novelty is very limited. +- The NSA improves the vanilla SA, but the benchmark reveals that NSA is not enough competitive with other state-of-the-art methods. +- The benchmark does not reveal enough information about the NSA against the SOTA methods. +- The work needs technical improvements and validation is required before accepting the work.",ICLR2022, +SJ33Nk6BM,1517250000000.0,1517260000000.0,444,rkdU7tCaZ,rkdU7tCaZ,ICLR 2018 Conference Acceptance Decision,Reject,"The pros and cons of the paper are summarized below: + +Pros: +* The proposed tweaks to the dynamic evaluation of Mikolov et al. 2010 are somewhat effective, and when added on top of already-strong baseline models improve them substantially + +Cons: +* Novelty is limited. This is essentially a slightly better training scheme than the method proposed by Mikolov et al. 2010. +* The fair comparison against Mikolov et al. 2010 is only shown in Table 1, where a perplexity of 78.6 turns to a perplexity of 73.5. This is a decent gain, but the great majority of this is achieved by switching the optimizer from SGD to an adaptive method, which as of 2018 is a somewhat limited contribution. The remainder of the tables in the paper do not compare with the method of Mikolov et al. +* The paper title, abstract, and introduction do not mention previous work, and may give the false impression that this is the first paper to propose dynamic evaluation for neural sequence models, significantly overclaiming the paper's contribution and potentially misleading readers. + +As a result, while I think that dynamic evaluation itself is useful, given the limited novelty of the proposed method and the lack of comparison to the real baseline (the simpler strategy of Mikolov et al.) in the majority of the experiments, I think this papers till falls short of the quality bar of ICLR. + +Also, independent of this decision, a final note about perplexity as an evaluation measure to elaborate on the comments of reviewer 1. In general, perplexity is an evaluation measure that is useful for comparing language models of the same model class, but tends to not correlate well with model performance (e.g. ASR accuracy) across very different types of models. For example, see ""Evaluation Metrics for Language Models"" by Chen et al. 1998. The method of dynamic evaluation is similar to the cache-based language models that existed in 1998 in that it reinforces the model to choose similar vocabulary to that it's seen before. As you can see from this paper that the quality of perplexity of an evaluation measure falls when cache-based models are thrown into the mix, and one reason for this is that cache models, while helping perplexity greatly, tend to reinforce previous errors when errors do occur.",ICLR2018, +hIwG46zlKYr,1642700000000.0,1642700000000.0,1,Ltkwl64I91,Ltkwl64I91,Paper Decision,Reject,"Three out of four of the reviewers are leaning (weakly or strongly) towards rejecting this paper. Unfortunately, the authors only responded to the concerns of the most positive reviewer, making it difficult to disregard the concerns from the three more negative reviewers. + +I also took a look at the paper myself, and share a number of the reviewers' concerns. First, the proposed method appears to be performing transductive inference for its predictions, while many baselines it compares with rely on inductive inference. Transductive inference is generally known to outperform inductive inference, therefore some of the improvements in accuracy can potentially be accounted to that. The authors did mention in their one author response that they generated results in the inductive setting and still saw an improvement, however the submission was not updated with details around that new experiment, making it hard to rely on it. Second, the paper is using a 224x224 resolution for images, while the original mini-ImageNet benchmark (and the majority of baselines evaluated on it) assume a 84x84 resolution. Here too, using the former resolution is known to outperform the latter. Third, I too found the paper to lack clarity at a number of places in the writing. + +I also notice that the final predictions is made following the averaging of features from two models (A and B, as in Eq. 5). This is a form of model ensembling, which generally is a principle know to help improve generalization. It seems appropriate to wonder whether the baselines are worse partly due to not relying on any ensembling at all. + +Finally, I've found a recent method from ICJAI 2021 (Cross-Domain Few-Shot Classification via Adversarial Task Augmentation) which appears to beat the proposed method in the cross-domain setting for the majority of domains. + +Given the above, and the lack of rebuttals to the reviewers with the most concerns, I'm afraid I must recommend this paper be rejected at this time.",ICLR2022, +hCYsFFhKivQ,1610040000000.0,1610470000000.0,1,gZ9hCDWe6ke,gZ9hCDWe6ke,Final Decision,Accept (Oral),"Accept. The paper proposes Deformable DETR that builds on DETR and solves the slow convergence and limited spatial resolution problem while getting impressive results. +The authors should think about comparing with other linear attention mechanisms to show the applicability of the method.",ICLR2021, +QICytKmBADx,1642700000000.0,1642700000000.0,1,J8P7g_mDpno,J8P7g_mDpno,Paper Decision,Reject,"The reviewers were generally split on this paper. On the one hand, reviewers generally appreciated the clear presentation, discussion, and explanations, and the experiments. On the other hand, most reviewers commented on the lack of comparative evaluation to other works, including works that are related conceptually. While the authors have a potentially reasonable argument for omitting such comparisons, in the balance I do not believe that the reviewers were actually convinced by this. Particularly when the novelty of the contribution is not crystal clear, such comparisons are important, so I am inclined to not recommend acceptance at this point (though I acknowledge that the paper is clear borderline and could be accepted).",ICLR2022, +ryiRifUux,1486400000000.0,1486400000000.0,1,rJfMusFll,rJfMusFll,ICLR committee final decision,Accept (Poster),"This is an interesting and timely paper combining off-policy learning with seq2seq models to train a chatbot on a restaurant reservation task, using labels collected through Amazon Mechanical Turk while using the bot with a baseline maximum likelihood policy. + The paper is clear, well-written and well-executed. Although the improvements are modest and the actual novelty of the paper is limited (combining known pieces in a rather straightforward way), this is still an interesting and informative read, and will probably be of interest to many people at ICLR.",ICLR2017, +PTh1gsfegq,1576800000000.0,1576800000000.0,1,Bkgwp3NtDH,Bkgwp3NtDH,Paper Decision,Reject,"This paper proposes a general framework for constructing Trojan/Backdoor attacks on deep neural networks. The authors argue that the proposed method can support dynamic and out-of-scope target classes, which are particularly applicable to backdoor attacks in the transfer learning setting. This paper has been very carefully discussed. While the idea is interesting and could be of interest to the broader community, all reviewers agree that it lacks of experimental comparison with existing methods for backdoor attacks on benchmark problems. The paper needs to be significantly revised before publication. I encourage the authors to improve this paper and resubmit to future conference.",ICLR2020, +0hzK1f5oJF,1576800000000.0,1576800000000.0,1,B1gX8JrYPr,B1gX8JrYPr,Paper Decision,Reject,"The authors construct a weighted objective that subsumes many of the existing approaches for sequence prediction, such as MLE, RAML, and entropy regularized policy optimization. By dynamically tuning the weights in the objective, they show improved performance across several tasks. + +Although there were no major issues with the paper, reviewers generally felt that the technical contribution is fairly incremental and the empirical improvements are limited. Given the large number of high-quality submissions this year, I am recommending rejection for this submission.",ICLR2020, +Hyi43fIdx,1486400000000.0,1486400000000.0,1,ry2YOrcge,ry2YOrcge,ICLR committee final decision,Accept (Poster),"The paper applies a previously introduced method (from ICLR '16) to the challenging question answering dataset (wikitables). The results are strong and quite close to the performance obtained by a semantic parser. There reviewers generally agree that this is an interesting and promising direction / results. The application of the neural programmer to this dataset required model modifications which are reasonable though quite straightforward, so, in that respect, the work is incremental. Still, achieving strong results on this moderately sized dataset with an expressive + model is far from trivial. Though the approach, as has been discussed, does not directly generalize to QA with large knowledge bases (as well as other end-to-end differentiable methods for the QA task proposed so far), it is an important step forward and the task is already realistic and important. + + Pros + + + interesting direction + + strong results on a interesting dataset + + Cons + - incremental, the model is largely the same as in the previous paper",ICLR2017, +hiypc6Ffvf4,1642700000000.0,1642700000000.0,1,cLcLdwOfhoe,cLcLdwOfhoe,Paper Decision,Reject,"The paper introduces a compression method for distributed Split Learning for better communication efficiency, by compressing the intermediate output between client and server model by vector quantization. Convergence analysis and experimental results are provided. +Unfortunately consensus among the reviewers remained that it remains slightly below the bar after the discussion phase. Main remaining concerns were the variety of baselines and benefits from split learning setup in experiments, compared to other FL approaches, quantization approaches, architecture splits. Reviewers also missed a discussion of latency requirements of model-parallel training in FL as opposed to data parallel which allows less frequent communication compared to here (e.g. discussing the split layer size vs latency trade-off, here of quantized intermediate layers compared to regular FL). The newly added Figure 6 does not specify or vary the number of local steps (or batch size) in FedAvg. +We hope the detailed feedback helps to strengthen the paper for a future occasion.",ICLR2022, +V12FnapKx_-,1610040000000.0,1610470000000.0,1,Z3XVHSbSawb,Z3XVHSbSawb,Final Decision,Reject,"The paper proposes several simple alternatives to generate adversarial examples for deep reinforcement learning algorithms based on image distortions such as lighting change, blurring and rotation, and show the performance of DRL agents degenerate as a result. Most reviewers appreciate the simplicity and computational efficiency of the proposed attacks. The results revealed by the work is however rather unsurprising, given similar attacks have been evaluated for DNNs. The authors did not offer much more insight on the presented results beyond that, for example, robustness of different DRL algorithms with regards to these attacks as mentioned by reviewer 2, sensitivities of the parameters for each attack proposed, effectiveness of different attacks on different environment and possible combination of attacks. ",ICLR2021, +HvHUkCi5_u,1642700000000.0,1642700000000.0,1,7KdAoOsI81C,7KdAoOsI81C,Paper Decision,Accept (Poster),"This work suggests using models of the environment as regularizers for performing explicit transfer in RL. Here are some of the highlights from the reviews and subsequent discussions: + * Novel problem + * Unclear to some of the reviewers why the problem setting is in fact important. + * Well-written + * Interesting theoretical results + * Somewhat limited experimental results +Post-rebuttal, while there is not necessarily a great consensus, the reviewers all feel that it's an improved piece of work. While I am myself not fully convinced that the problem setting motivation truly aligns with the kind of empirical results that the work provides, on the balance I think this work is interesting and has sufficient novel contributions to be accepted at ICLR.",ICLR2022, +NWAgWQetFWB,1610040000000.0,1610470000000.0,1,A-Sp6CR9-AA,A-Sp6CR9-AA,Final Decision,Reject,"This work proposes a novel reparameterization of batch normalization that is hypothesized to give a better inductive bias for learning several tasks, including neural architecture search, conditional image generation, adversarial robustness and neural style transfer. The reviewers indicate that this is useful and is of interest to the ICLR audience, but they are not satisfied with the analysis offered in the paper. Specifically, the reviewers request that the authors provide a more detailed analysis of why the proposed reparameterization improves results, given that it does not change the expressive power of the model class. Additionally, the reviewers have some concerns about the structure of the paper. I therefore recommend rejecting the paper at this time.",ICLR2021, +BkjPQkTrz,1517250000000.0,1517260000000.0,162,HyUNwulC-,HyUNwulC-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Paper presents a way in which linear RNNs can be computed (fprop, bprop) using parallel scan. They show big improvements in speedups and show application on really long sequences. Reviews were generally favorable.",ICLR2018, +rJHePkpSz,1517250000000.0,1517260000000.0,924,ryA-jdlA-,ryA-jdlA-,ICLR 2018 Conference Acceptance Decision,Reject,"This paper does not meet the acceptance bar this year, and thus I must recommend it for rejection.",ICLR2018, +4-vfWjTydh,1576800000000.0,1576800000000.0,1,rkecl1rtwB,rkecl1rtwB,Paper Decision,Accept (Poster),"The paper proposes a way to tackle oversmoothing in Graph Neural Networks. The authors do a good job of motivating their approach, which is straightforward and works well. The paper is well written and the experiments are informative and well carried out. Therefore, I recommend acceptance. Please make suree thee final version reflects the discussion during the rebuttal.",ICLR2020, +xmJRxD20f-W,1610040000000.0,1610470000000.0,1,RDiiCiIH3_B,RDiiCiIH3_B,Final Decision,Reject,"The work falls under the setting of learning-based sketching/compressive subsampling. It extends the work of Indyk et al 2019 (including sparsity pattern optimization and some theoretical enhancements). The reviewers agree that while the conceptual novelty including the greedy optimization step is not too much, it is nonetheless interesting and is non-trivial. However, given the highly competitive submissions at ICLR, the current scores are not sufficient for acceptance. ",ICLR2021, +EVlkr-o5sJQ,1610040000000.0,1610470000000.0,1,hJmtwocEqzc,hJmtwocEqzc,Final Decision,Accept (Poster),"This paper presents a method named LowKey, which is designed to protect user privacy. This is done by taking advantage of adversarial attacks to pre-process facial images against the black-box facial recognition system in social media, yet the processed facial images remain visually acceptable. The paper experimentally illustrates that it is effective against two existing commercial facial recognition APIs. + +The reviewers unanimously agree that this is an interesting and important problem, and recommend the paper for acceptance. The ACs agree.",ICLR2021, +ihmfUxggG,1576800000000.0,1576800000000.0,1,rygPm64tDH,rygPm64tDH,Paper Decision,Reject,"This work claims two primary contributions: first a new saliency method ""expected gradients"" is proposed, and second the authors propose the idea of attribution priors to improve model performance by integrating domain knowledge during training. Reviewers agreed that the expected gradients method is interesting and novel, and experiments such as Table 1 are a good starting point to demonstrate the effectiveness of the new method. However, the claimed ""novel framework, attribution priors"" has large overlap with prior work [1]. One suggestion for improving the paper is to revise the introduction and experiments to support the claim ""expected gradients improve model explainability and yield effective attribution priors"" rather than claiming to introduce attribution priors as a new framework. One possibility for strengthening this claim is to revisit experiments in [1] and related follow-up work to demonstrate that expected gradients yield improvements over existing saliency methods. Additionally, current experiments in Table 1 only consider integrated gradients as a baseline saliency method, there are many others worth considering, see for example the suite of methods explored in [2]. + +Finally, I would add that the current section on distribution shift provides an overly narrow perspective on model robustness by only considering robustness to additive Gaussian noise. It is known that it is easy to improve robustness to Gaussian noise by biasing the model towards low frequency statistics in the data, however this typically results in degraded robustness to other kinds of noise types. See for example [3], where it was observed that adversarial training degrades model robustness to low frequency noise and the fog corruption. If the authors wish to pursue using attribution priors for improving robustness to distribution shift, it is important that they evaluate on a more varied suite of corruptions/noise types [4]. Additionally, one should compare against strong baselines in this area [5]. + +1. https://arxiv.org/abs/1703.03717 +2. https://arxiv.org/abs/1810.03292 +3. https://arxiv.org/abs/1906.08988 +4. https://arxiv.org/abs/1807.01697 +5. https://arxiv.org/abs/1811.12231 +",ICLR2020, +ryl27G2exV,1544760000000.0,1545350000000.0,1,rygrBhC5tQ,rygrBhC5tQ,Well motivated problem; good solution,Accept (Poster),"Strengths: The paper tackles a novel, well-motivated problem related to options & HRL. +The problem is that of learning transition policies, and the paper proposes +a novel and simple solution to that problem, using learned proximity predictors and transition +policies that can leverage those. Solid evaluations are done on simulated locomotion and +manipulation tasks. The paper is well written. + +Weaknesses: Limitations were not originally discussed in any depth. +There is related work related to sub-goal generation in HRL. +AC: The physics of the 2D walker simulations looks to be unrealistic; +the character seems to move in a low-gravity environment, and can lean +forwards at extreme angles without falling. It would be good to see this explained. + +There is a consensus among reviewers and AC that the paper would make an excellent ICLR contribution. +AC: I suggest a poster presentation; it could also be considered for oral presentation based +on the very positive reception by reviewers.",ICLR2019,5: The area chair is absolutely certain +8LbpP-FriuC,1642700000000.0,1642700000000.0,1,bYGSzbCM_i,bYGSzbCM_i,Paper Decision,Accept (Poster),"This paper opens the area of adversarial-attack research on streaming data (e.g., real-world settings such as self-driving cars and robotic visual tasks for a robot). For instance, online adversaries can focus their attack on a small subset of the streamed/online data, but still cause much damage to downstream models. This work highlights the need for stateful defense strategies. Connections to online algorithms and the k-secretary problem are made, along with improvements to some online-algorithms work of Albers and Ladewig. + +Overall, the attack model introduced is important, and the bridge to online algorithms would be useful for the ICLR community. I also believe this topic lends diversity to the typical set of ICLR papers.",ICLR2022, +avq5BIyUhI,1642700000000.0,1642700000000.0,1,pjqqxepwoMy,pjqqxepwoMy,Paper Decision,Accept (Poster),"This paper studies the problem of using oracle information that's only available during training in RL. The key contributions are 1) a variational Bayesian approach that models the oracle observation as latent variables; and 2) a Mahjong environment for benchmarking RL with oracle guiding. The novelty of the proposed approach is limited, but reviewers find the problem intriguing and agreed that it's a reasonable application of Bayesian approach to RL with latent oracle information. In addition, the Mahjong environment could benefit the community and spur new work in this direction. Therefore, I recommend this paper to be accepted as a poster.",ICLR2022, +D7zEZHwQtR6,1642700000000.0,1642700000000.0,1,wogsFPHwftY,wogsFPHwftY,Paper Decision,Accept (Poster),"The paper proposes a model for large-scale image retrieval. Unlike previous work that rely on local features, the proposed method aggregates local features into the so-called Super-features to improve their discriminability and expressiveness. To do so, the method proposes an iterative attention module (Local Feature Integracion Transformer, LIT), that outputs an ordered set of such features. By exploiting the fact that features are ordered, the paper proposes a contrastive loss on Super-features that match across images. The paper presents a thorough empirical evaluation on several publicly available datasets including relevant baselines. + +Overall the paper is well written and the empirical results are strong (including detailed ablations that motivate the design of the method). All reviewers and the AC appreciate the idea of applying the contrastive training at local feature level while only requiring image-level labels. + +Reviewer hp4Y points out that the proposed LIT is not particularly novel, but previous work are properly cited. Also this is not a major issue given that the motivation is very clear, it is well executed and the empirical results are strong. + +Reviewer uoYN had initial concerns regarding inconsistencies in the mathematical formulation of the method, which were resolved in a detail (and constructive) discussion with the authors. + +All reviewers recommend accepting the paper, three of which consider the contribution to be strong. The AC agrees with this assessment and recommends accepting the paper.",ICLR2022, +N34UL0LMiuz,1642700000000.0,1642700000000.0,1,ARw4igiN2Qm,ARw4igiN2Qm,Paper Decision,Reject,"This paper proposes a stepped sampler for LSTM-based video detection. However, reviewers raised a series of issues of this paper, including the weakness in novelty, experiment evaluations, and generalizability of the method. Considering the limited contribution of this paper, and limited experiment evaluations, the AC agrees with the reviewers and recommends reject for this paper.",ICLR2022, +HkwdmkpSf,1517250000000.0,1517260000000.0,172,r1NYjfbR-,r1NYjfbR-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The paper got mixed scores of 4 (R1), 6 (R3), 8 (R2). R1 initially gave up after a few pages of reading, due to clarity problems. But looking over the revised version was much happier, so raised their score to 7. R2, who is knowledge about the area, was very positive about the paper, feeling it is a very interesting idea. R3 was also cautiously positive. The authors have absorbed the comments by the reviewers to make significant changes to the paper. The AC feels the idea is interesting, even if the experimental results aren't that compelling, so feels the paper can be accepted. +",ICLR2018, +B1gZaHaExN,1545030000000.0,1545350000000.0,1,H1eqviAqYX,H1eqviAqYX,"Important question, but unconvincing treatment here",Reject,"This paper seeks to shed light on why seq2seq models favor generic replies. The problem is an important one, unfortunately the responses proposed in the paper are not satisfactory. Most reviewers note problems and general lack of rigorousness in the assumptions used to produce the theoretical part of the paper (e.g., strong assumption of independence of generated words). The experiments themselves are not convincing enough to warrant acceptance by themselves.",ICLR2019,5: The area chair is absolutely certain +S1xRoumtyE,1544270000000.0,1545350000000.0,1,HJM4rsRqFX,HJM4rsRqFX,Intersting work with slighlty limited originality that would benefit from a clearer motivation. ,Reject,"The paper proposes a novel variational inference framework for knowledge graphs which is evaluated on link prediction benchmark sets and is competitive to previous generative approaches. +While the idea is interstnig and technically correct, the originality of the contribution is limited, +and the paper would be clearly improved by providing a clearer motivation for using generative models instead of standard methods and a experimental demonstration of the benefits of using a generative instead of a discriminative model, especially since the standard method perform slightly better in the experiments. Overall, the work is slightly under the acceptance threshold. +",ICLR2019,4: The area chair is confident but not absolutely certain +fXxg3D9dK,1576800000000.0,1576800000000.0,1,HkxTwkrKDB,HkxTwkrKDB,Paper Decision,Accept (Poster),"This paper shows that DeepSets and PointNet, which are known to be universal for approximating functions, are also universal for approximating equivariant set functions. Reviewer are in agreement that this paper is interesting and makes important contributions. However, they feel the paper could be written to be more accessible. + +Based on the reviews and discussions following author response, I recommend accepting this paper. I appreciate the authors for an interesting paper and look forward to seeing it at the conference.",ICLR2020, +agCVrWiLFe,1576800000000.0,1576800000000.0,1,H1eVlgHKPr,H1eVlgHKPr,Paper Decision,Reject,The authors propose approaches to handle partial observability in reinforcement learning. The reviewers agree that the paper does not sufficiently justify the methods that are proposed and even the experimental performance shows that the proposed method is not always better than baselines.,ICLR2020, +Bkn6jGUdl,1486400000000.0,1486400000000.0,1,SkgSXUKxx,SkgSXUKxx,ICLR committee final decision,Reject,"The paper extends a regularizer on the gradients recently proposed by Hariharan and Girshick. I agree with the reviewers that while the analysis is interesting, it is unclear why this particular regularizer is especially relevant for low-shot learning. And the experimental validation is not strong enough to warrant acceptance.",ICLR2017, +sObCy3B9Bm9,1610040000000.0,1610470000000.0,1,eNdiU_DbM9,eNdiU_DbM9,Final Decision,Accept (Spotlight),"The reviewers all agreed that the paper is a solid contribution. + +Pros: +- A simple and reasonable extension to adaptive prediction sets that performs well empirically. +- The procedure presented is versatile (i.e. can be applied to general scores or be used to improve base conformal prediction methods). +- A very thorough experimental analysis, including large datasets (i.e. Imagenet) and a wide range of model architectures including ResNet-152. +- Some formal theoretical guarantees are provided for the procedure, although they appear to be straightforward. + +Cons: +- Limited technical novelty. + +Overall, I recommend a spotlight because the reviewers felt that the topic of predictive uncertainty is of interest to the broader ML and computer vision community, and the paper can have a potentially large impact in popularizing conformal methods as a viable uncertainty estimation method.",ICLR2021, +8bTrzcTzx1p,1610040000000.0,1610470000000.0,1,UoAFJMzCNM,UoAFJMzCNM,Final Decision,Reject,"This paper introduces a scalable method for FSP based on FBSDE. The method is theoretically derived then applied on two problems, one simple but with many (1000) agents, and one with only 2 agents but partial observability. + +The main strength of this paper lies in the scalability and the time complexity of the proposed method. Computing Nash equilibriums for many agents is a difficult problem and this paper is interesting in this aspect. + +However, the reviewers point out several weak points to this paper. The difference with a previous work by Han, Hu and Long needs to be highlighted. Some parts of the paper are not clear, and too much of the important results are pushed into the appendix. Maybe this work is not best fitted to a conference format, and should be submitted to a journal? Another concern raised by the reviewers is that the experimental section does not show significant enough results, and that it is surprising to see a 2-agents problem as an illustration of a method that is aiming at addressing scalability with respect to the number of agents. + +Reviewers agree on rejection for this paper, although by a small margin. I therefore recommend rejection. I think that if the authors improve this paper by following the reviewer suggestions, it can be accepted in a future venue.",ICLR2021, +Yvguv1SCw,1576800000000.0,1576800000000.0,1,HyxyIgHFvr,HyxyIgHFvr,Paper Decision,Accept (Spotlight),"The authors take a closer look at widely held beliefs about neural networks. Using a mix of analysis and experiment, they shed some light on the ways these assumptions break down. The paper contributes to our understanding of various phenomena and their connection to generalization, and should be a useful paper for theoreticians searching for predictive theories.",ICLR2020, +GZWfiO2bmt5,1610040000000.0,1610470000000.0,1,BM---bH_RSh,BM---bH_RSh,Final Decision,Accept (Poster),"Existing works mostly focus on model compression for the classification task. This paper aims for an efficient recommendation system that can well balance the model compression and model accuracy, which therefore brings in new challenges and opportunities. The authors propose to unify the model compression and feature embedding compression and develop an effective and reasonable solution. The concerns raised by the reviewers have been well fixed and all reviewers agree on the paper's contribution. The paper is therefore recommended for acceptance. ",ICLR2021, +AJbp1Dpm4Xv,1610040000000.0,1610470000000.0,1,wQRlSUZ5V7B,wQRlSUZ5V7B,Final Decision,Accept (Poster),"This paper proposes a novel technique to learn a disentangled latent space using VAEs and semi-supervision. The technique is based on a careful specification of the joint distribution where the labels inform a factorisation of the distribution over continuous latent factors. The technique allows for inference, generation, and intervention in a tractable way. + +The paper is well-written, the formulation is original, and the experiments convincing. There were some confusions that were mostly resolved during the discussion. + +In addition to the expert reviews attached, I would like to remark that I too find the formulation interesting and elegant. And if I may add to the discussion, oiVAE (output-interpretable VAEs) by Ainsworth et al presented at ICML18 is a related piece of work that did not occur to me earlier, but which the authors could still relate to (I'd certainly enjoy reading about the authors' views on that line of work). ",ICLR2021, +SJW6SJpHz,1517250000000.0,1517260000000.0,664,r17lFgZ0Z,r17lFgZ0Z,ICLR 2018 Conference Acceptance Decision,Reject,"This paper tackles a very important problem: evaluating natural language generation. The paper presents an overview of existing unsupervised metrics, and looks at how they correlate with human evaluation scores. This is important work and the empirical conclusions are useful to the community, but the datasets used are too limited and the authors agree it would be better to use newer bigger and more diverse datasets suggested by reviewers for drawing more general conclusions. This work would indeed be much stronger if it relied on better, more recent datasets; therefore publication as is seems premature. +",ICLR2018, +77p8wT0f7TC,1642700000000.0,1642700000000.0,1,xy_2w3J3kH,xy_2w3J3kH,Paper Decision,Accept (Poster),"This paper makes a contribution in the literature of cooperative multi-agent reinforcement learning by proposing a decentralized and communication-efficient training framework under a fully observable setting. The paper first defines the homogeneous or permutation invariant subclass of Markov games (homogeneous MG), where it is proved that sharing policy parameters does not loose optimality. The paper then proposes an actor-critic algorithm for the homogeneous MG. The proposed approach is empirically supported. The reviewers had originally raised concerns or confusions, but no major concerns remain after discussion. + +Overall, the paper studies an interesting and practically relevant setting, providing new insights and solid basis for policy sharing that has been used in the literature without much understanding.",ICLR2022, +B1xoH6bYxV,1545310000000.0,1545350000000.0,1,r1lYRjC9F7,r1lYRjC9F7,metareview,Accept (Oral),"All reviewers agree that the presented audio data augmentation is very interesting, well presented, and clearly advancing the state of the art in the field. The authors’ rebuttal clarified the remaining questions by the reviewers. All reviewers recommend strong acceptance (oral presentation) at ICLR. I would like to recommend this paper for oral presentation due to a number of reasons including the importance of the problem addressed (data augmentation is the only way forward in cases where we do not have enough of training data), the novelty and innovativeness of the model, and the clarity of the paper. The work will be of interest to the widest audience beyond ICLR.",ICLR2019,5: The area chair is absolutely certain +keGN06LZ1ii,1642700000000.0,1642700000000.0,1,LOz0xDpw4Y,LOz0xDpw4Y,Paper Decision,Reject,"This paper proposes a dynamic programming strategy for faster approximate generation in denoising diffusion probabilistic models. + +All reviewers appreciated the paper, but they are not overly excited. + +Two reviewers are focused on the log likelihood not being the objective for image quality. This AC does not really buy this argument. + +The method and story around are well-rounded and finished. So it is hard to think of any major modifications that will change the overall story a lot. One could therefore argue for acceptance as it stands. On the other hand this is difficult to argue for given the below acceptance level scores. + +So the final recommendation is reject with a strong encouragement to submit to the next conference. Updating the paper with preemptive arguments on why the ELBO and not FID is the right thing to consider.",ICLR2022, +ey-_PyqTGM,1642700000000.0,1642700000000.0,1,CuV_qYkmKb3,CuV_qYkmKb3,Paper Decision,Accept (Spotlight),The paper explores self-supervised learning on tabular data and proposes a novel augmentation method via corrupting a random subset of features. The idea is simple but effective. Experiments include 69 datasets and compare with a number of methods. The result shows its superiority. It would be inspiring more work for SSL on the tabular domain.,ICLR2022, +Certguw8_p,1642700000000.0,1642700000000.0,1,CxebB5Psl1,CxebB5Psl1,Paper Decision,Reject,"Reviewers unanimously vote for rejection for several reasons. First, the draft is incomplete and difficult to read. Second, one of the proposed methods (contextual sentence encoder) appears the same as past work, while the other proposed method (graph encoding) is difficult to interpret from what is written. Third, the draft is missing comparisons with recent work, and some included comparisons may be unfair due to data conditions. No author response was provided. The reviewer consensus is that this draft is underdeveloped, and not yet ready for submission or publication.",ICLR2022, +SGZqttJ_YFm,1642700000000.0,1642700000000.0,1,G-7GlfTneYg,G-7GlfTneYg,Paper Decision,Reject,"PAPER: This paper addresses the problem of learning methods for general speech restoration which generalizes across at least 4 tasks (additive noise, room reverberation, low-resolution and clipping distortion). The proposed approach is based on a two-stage process, which includes both analysis and synthesis stages. +DISCUSSION: The reviewers wrote very detailed reviews which ask some important questions and point to some potential issues. The authors responded to all reviews, but only addressed a subset of the issues and questions mentioned by the reviewers. Novelty and comparison with previous approaches was one of the issues mentioned by reviewers. +SUMMARY: While reviewers are supportive of this line of research, reviewers were also concerned with the novelty of the proposed approach and details of the experiments. In its current form, the paper may not be ready for publication.",ICLR2022, +19hgyKP7pS,1610040000000.0,1610470000000.0,1,kmG8vRXTFv,kmG8vRXTFv,Final Decision,Accept (Oral),"The authors propose a method for modeling dynamical systems that balances theoretically derived models, which may be grounded in domain knowledge but subject to overly strict assumptions, with neural networks that can pick up the slack. All reviewers were enthusiastic about this work, appreciating its balance of mathematical rigor and experimental assessment. One concern was that this paper follows on decades of related work, which was difficult to adequately summarize. However, changes made throughout discussion phase did address these concerns.",ICLR2021, +EuR9XRfNAvn,1642700000000.0,1642700000000.0,1,MR7XubKUFB,MR7XubKUFB,Paper Decision,Accept (Poster),"This paper introduces a new method for jointly training a dense bi-encoder retriever with a cross-encoder ranker. More precisely, the proposed method is iteratively training the retriever and the ranker, using an objective function inspired by adversarial training. In addition, the authors propose to use a distillation loss from the ranker to the retriever as a regularization term. The proposed method, called AR2, is evaluated on three retrieval benchmarks from question answering: NaturalQuestions, TriviaQA and MS-MARCO. The method obtains state-of-the-art retrieval performance on these three datasets. + +Overall, the reviewers agree that the strong performance obtained by the proposed method is a strength of the paper. Regarding novelty, some reviewers argue that the method is a combination of existing techniques, hence lacking novelty, while the others believe that combining these different techniques is novel enough. Regarding the experimental section, some concerns were raised about comparisons with previous work (eg, BERT vs ERNIE) or the fact that it was a bit hard to determine where the improvements come from. I believe that these concerns were well addressed by the authors, and I tend to believe that combining existing techniques to obtain a strong system is novel enough. I thus lean towards accepting this paper to the ICLR conference.",ICLR2022, +fKgW-2cbOxh,1610040000000.0,1610470000000.0,1,iy3xVojOhV,iy3xVojOhV,Final Decision,Reject,"This paper presents a method to combine graph convolutional neural networks (GCNs) with generative adversarial networks (GANs) for graph-based semi-supervised learning. + +**Strengths:** + * It is a reasonable attempt to combine GCN with GAN for semi-supervised node classification. + * The proposed method is general in that it can work with different graph neural networks. + +**Weaknesses:** + * The novelty of this work is limited. + * The proposed method, GraphCGAN, has no significant performance improvement over state-of-the-art methods. + * The writing has much room to improve in terms of both clarity and the linguistic quality. + +Since both the novelty and the significance of this paper in its current form are limited, it is premature for publication. There is consensus among all the reviewers that this paper is not up to the acceptance standard of ICLR. +",ICLR2021, +e3a9e1WpKu,1642700000000.0,1642700000000.0,1,Qu_XudmGajz,Qu_XudmGajz,Paper Decision,Reject,"In order to evaluate the evidence lower bound (ELBO), VAEs typically use a parametric distribution-based decoder $p(x|z)$. If the data is continuous, one often considers a Gaussian VAE, where the canonical setting is to assume a diagonal covariance matrix $p(x|z) = N(x; \mu(z), \sigma^2 \mathbf{I})$. In this paper, the authors suggest replacing the diagonal covariance matrix with a structured covariance matrix (low-rank + diagonal). As this only amounts to a minor change to a canonical Gaussian VAE, strong empirical results are expected to justify its acceptance. However, the image generation results presented in the paper are not comparable to the state-of-the-art VAE results (e.g., Arash Vahdat, and Jan Kautz. ""NVAE: A Deep Hierarchical Variational Autoencoder."" Neural Information Processing Systems (NeurIPS), 2020).",ICLR2022, +c3lIqdC0-2a8,1642700000000.0,1642700000000.0,1,I1dg7let3Q,I1dg7let3Q,Paper Decision,Reject,"This submission has generated sufficient debate, including some messages that, in my viewpoint, have the wrong tone. It may well be that different colleagues see the work in different ways. It is very hard to evaluate submissions in a short time and mistakes can happen. In this case, I think there were and still are misunderstandings and unclarity wrt very crucial points of the paper. This does not mean that the work overall is weak not that there is no contribution. If the content is so interesting (as discussed by authors and multiple reviewers) in some way (which it seems to be), then a better presentation and argumentation will lead to a publication elsewhere soon, but based on all the data that I have here, I recommend rejection. I see to reason to list details about the content and possible concerns, as they should be clear from the multiple messages among authors and reviewers. Best of luck.",ICLR2022, +HJXsif8_l,1486400000000.0,1486400000000.0,1,S1RP6GLle,S1RP6GLle,ICLR committee final decision,Accept (Oral),"All the reviewers agreed that the paper is original, of high quality, and worth publishing.",ICLR2017, +mNbW2d3Ki3,1576800000000.0,1576800000000.0,1,B1gHokBKwS,B1gHokBKwS,Paper Decision,Accept (Poster),"This paper develops a methodology to perform global derivative-free optimization of high dimensional functions through random search on a lower dimensional manifold that is carefully learned with a neural network. In thorough experiments on reinforcement learning tasks and a real world airfoil optimization task, the authors demonstrate the effectiveness of their method compared to strong baselines. The reviewers unanimously agreed that the paper was above the bar for acceptance and thus the recommendation is to accept. An interesting direction for future work might be to combine this methodology with REMBO. REMBO seems competitive in the experiments (but maybe doesn't work as well early on since the model needs to learn the manifold). Learning both the low dimensional manifold to do the optimization over and then performing a guided search through Bayesian optimization instead of a random strategy might get the best of both worlds? ",ICLR2020, +cpaZXwDPR,1576800000000.0,1576800000000.0,1,HylrB04YwH,HylrB04YwH,Paper Decision,Reject,"The paper shows that overparameterized autoencoders can be trained to memorize a small number of training samples, which can be retrieved via fixed point iteration. After rounds of discussion with the authors, the reviewers agree that the idea is interesting and overall quality of writing and experiments is reasonable, but they were skeptical regarding the significance of the finding and impact to the field and thus encourage studying the phenomenon further and resubmitting in a future conference. I thus recommend rejecting this submission for now.",ICLR2020, +S1yq3fLul,1486400000000.0,1486400000000.0,1,Hy6b4Pqee,Hy6b4Pqee,ICLR committee final decision,Accept (Poster),"There was general agreement from the reviewers that this looks like an important development in the area of probabilistic programming. Some reviewers felt the impact of the work could be very significant. The quality of the work and the paper were perceived as being quite high. The main weakness highlighted by the most negative reviewer (who felt the work was marginally below threshold) is the level of empirical evaluation given within the submitted manuscript. The authors did submit a revision and they outline the reviewerÕs points that they have addressed. It appears that if accepted this manuscript would constitute the first peer-reviewed paper on the subject of this new software package (Edward). Based on both the numeric scores, the quality and potential significance of this work I recommend acceptance.",ICLR2017, +Skg1dY3keV,1544700000000.0,1545350000000.0,1,SJzvDjAcK7,SJzvDjAcK7,Interesting topic but the analysis is lacking,Reject,"Dear authors, + +The reviewers all appreciated the interest of studying properties of the latent representations rather than of the weights. The impact of the rank on the robustness to adversarial attacks is also of interest. + +There were, however, two main issues raised. Due to the lack of confidence of some reviewers, I reviewed the paper myself and found the same issues: +- Clarity could be improved. Some models are mentioned before being described (N-LR) and some important details are missing. In particular, we sometimes lose track of the goal of the experiments. For instance, there are quite a few experiments on the further reduction of the rank of the representation but it is not clear what to extract from them. +- More importantly, there are several important gaps in the analysis. In particular: a/ As many reviewers have pointed out, low-rank constraints on the weight matrices induce low-rank representations if the activation function is linear. As it is not, this might not be true but deserves a discussion. b/ You state that the rank constraint has little effect given that the actual rank is much less than the constraint. However, one would expect the resulting rank to be a smooth function of the rank of the constraint. Since there is a discrepancy between ResNet N-LR and ResNet 1-LR, this should be investigated. c/ For the robustness to black-box adversarial attacks, these attacks are constructed using the N-LR models. Is is thus not too surprising that those models do not perform as well. + +Thus, despite the lack of confidence of one reviewer (the question about the N-LR models might stem from the fact that it is used before being introduced), I strongly encourage you to take their comments into account for a future submission.",ICLR2019,4: The area chair is confident but not absolutely certain +QAeaHlwsJ,1576800000000.0,1576800000000.0,1,rketraEtPr,rketraEtPr,Paper Decision,Reject,"This paper provides a data-driven approach that learns to improve the accuracy of numerical solvers. It solves an important problem and provides some promising direction. However, the presented paper is not novel in terms of ML methodology. The presentation can be significantly improved for ML audience (e.g., it would be preferred to explicitly state the problem setting in the beginning of Section 3).",ICLR2020, +JQLVzB7tgd,1576800000000.0,1576800000000.0,1,HklBjCEKvH,HklBjCEKvH,Paper Decision,Accept (Poster),"This paper proposes an idea of using a pre-trained language model on a potentially smaller set of text, and interpolating it with a k-nearest neighbor model over a large datastore. The authors provide extensive evaluation and insightful results. Two reviewers vote for accepting the paper, and one reviewer is negative. After considering the points made by reviewers, the AC decided that the paper carries value for the community and should be accepted.",ICLR2020, +8TSAkHxw10,1642700000000.0,1642700000000.0,1,d5IQ3k7ed__,d5IQ3k7ed__,Paper Decision,Reject,"The authors propose a deep multi-agent RL framework to compute equilibria in a economics problem. Several reviewers raised issues with the presentation, as well as issues with evaluating the impact of the work, partly because the novelty of the approach is made insufficiently clear. While the authors have resolved some of the confusions arising from the presentation in their rebuttal, resulting in 2 out of the 4 reviewers to increase their score, the concerns regarding novelty mostly remain. For these reasons, I don’t think this work is ready for publication at ICLR at the moment and recommend rejection.",ICLR2022, +jlAm0R8rA,1576800000000.0,1576800000000.0,1,SyxdC6NKwH,SyxdC6NKwH,Paper Decision,Reject,"The authors propose a way to produce uncertainty measures in graph neural networks. However, the reviewers find that the methods proposed lack novelty and are incremental additions to prior work.",ICLR2020, +Cca6eeMJWY2,1610040000000.0,1610470000000.0,1,tyd9yxioXgO,tyd9yxioXgO,Final Decision,Reject,"The paper focuses on the task of conditional video synthesis starting from a single image. The authors propose an *Action Graph* to model the configuration of objects, their interactions, and actions. They show promising results on two benchmark datasets (one synthetic and another realistic). + +Based on the reviewers' comments and the limited discussion that ensued, it seems that some concerns in the paper were addressed by the authors; however, a main concern persists, namely the applicability of this Action Graph representation to more complex realistic videos (*e.g.* for datasets such as Kinetics and AVA). The authors do mention a manner in which an automated extraction of Action Graphs can be done, specifically with off-the-shelf (spatial or spatiotemporal) detectors for actions, objects, and object-object interactions. However, these are complicated tasks in their own right and still open problems in the field. Given that the Action Graph computable from this automated pipeline will undoubtedly contain noise (compounded by the errors of each component of this pipeline), the paper could have made a stronger argument for its contributions in realistic video, if for example an ablation study was done where *noisy* action graphs were used in training. Without more evidence that this representation will be applicable to more realistic scenarios of interest, it is difficult to gauge the impact it will have on the community. Despite its merits and promising initial results, the authors are encouraged to address this persisting concern and the other reviewers' comments to produce a stronger submission in the future. ",ICLR2021, +Byxd--QMxV,1544860000000.0,1545350000000.0,1,SJekyhCctQ,SJekyhCctQ,interesting direction with extensive experiments but critical flaw,Reject,"* Strengths + +The paper proposes a novel and interesting method for detecting adversarial examples, which has the advantage of being based on general “fingerprint statistics” of a model and is not restricted to any specific threat model (in contrast to much of the work in the area which is restricted to adversarial examples in some L_p norm ball). The writing is clear and the experiments are extensive. + +* Weaknesses + +The experiments are thorough. However, they contain a subtle but important flaw. During discussion it was revealed that the attacks used to evaluate the method fail to reduce accuracy even at large values of epsilon where there are simple adversarial attacks that should reduce the accuracy to zero. This casts doubt on whether the attacks at small values of epsilon really are providing a good measure of the method’s robustness. + +* Discussion + +There was substantial disagreement about the paper, with R1 feeling that the evaluation issues were serious enough to merit rejection and R3 feeling that they were not a large issue. In discussion with me, both R1 and R3 agreed that if an attack were demonstrated to break the method, that would be grounds for rejection. They also both agreed that there probably is an attack that breaks the method. A potential key difference is that R3 thinks this might be quite difficult to find and so merits publishing the paper to motivate stronger attacks. + +I ultimately agree with R1 that the evaluation issues are indeed serious. One reason for this is that there is by now a long record of adversarial defense papers posting impressive numbers that are often invalidated within a short period (often less than a month or so) of the paper being published. The “Obfuscated Gradients” paper of Athalye, Carlini, and Wagner suggests several basic sanity checks to help avoid this. One of the sanity checks (which the present paper fails) is to test that attacks work when epsilon is large. This is not an arbitrary test but gets at a key issue---any given attack provides only an *upper bound* on the worst-case accuracy of a method. For instance, if an attack only brings the accuracy of a method down to 80% at epsilon=1 (when we know the true accuracy should be 0%), then at epsilon=0.01 we know that the measured accuracy of the attack comes 80% from the over-optimistic accuracy at epsilon=1 and at most 20% from the true accuracy at epsilon=0.01. If the measured accuracy at epsilon=1 is close to 100%, then accuracy at lower values of epsilon provides basically no information. This means that the experiments as currently performed give no information about the true accuracy of the method, which is a serious issue that the authors should address before the paper can be accepted.",ICLR2019,2: The area chair is not sure +aUeUtrZeVnv,1610040000000.0,1610470000000.0,1,7t1FcJUWhi3,7t1FcJUWhi3,Final Decision,Accept (Poster),"Pros: +- All reviewers agreed that the idea was particularly interesting/novel. I personally appreciated the perspective of unlearning invariances that prove inconsistent with the training data, rather than learning invariances that are demonstrated by the training data. +- The authors significantly improved clarity during the rebuttal period, and two out of three reviewers raised scores or confidence as a result. + +Cons: +- There were significant concerns raised by reviewers about clarity of presentation, and some concern around whether the specific instantiation of the high level idea was the most sensible. From a *lightweight* reading of the paper on my part, I also feel that the writing style is unnecessarily dense, though I believe the underlying ideas are solid. +- One of the reviewers (AnonReviewer4) continues to have serious concerns. I believe the authors and AnonReviewer4 may have both become more entrenched in their positions during the discussion, in a way that wasn't particularly productive. + +This paper is borderline score-wise. I believe it is particularly important to reward and encourage unusually novel work. Primarily for this reason I bias my decision upwards, and recommend acceptance. + +nit: belive --> believe",ICLR2021, +Vkaoa0A2hL,1576800000000.0,1576800000000.0,1,ryeUg0VFwr,ryeUg0VFwr,Paper Decision,Reject,All the reviewers recommend rejecting the submission. There is no basis for acceptance.,ICLR2020, +Syx_k8PHl4,1545070000000.0,1545350000000.0,1,Sk4jFoA9K7,Sk4jFoA9K7,Accept,Accept (Poster),"The paper presents a novel with compelling experiments. Good paper, accept. +",ICLR2019,5: The area chair is absolutely certain +CB_jIq400H_,1610040000000.0,1610470000000.0,1,GFsU8a0sGB,GFsU8a0sGB,Final Decision,Accept (Poster),The reviewers raised a number of concerns which are addressed by the authors. The paper provides an interesting/novel perspective for federated learning (as a posterior inference problem rather than an optimization problem) which can potentially allow for faster and more accurate solutions. ,ICLR2021, +XQBZZoZGJO2,1642700000000.0,1642700000000.0,1,gjNcH0hj0LM,gjNcH0hj0LM,Paper Decision,Accept (Poster),"The authors design a framework for active learning on time-series data. The framework, called Temporal Coherence-based Label Propagation (TCLP) leverages temporal coherence to propagate expert labels to nearby points by a plateau model. In addition to describing the framework clearly with simple pseudocode, several experiments are carried out with careful analysis to validate the effectiveness of the framework. + +The reviewers are mostly positive on the simple algorithm with strong empirical performance as well as the solid analysis of experimental results. They are also satisfied with most of the rebuttal feedback from the authors. Somehow there are joint concerns on the weaker theoretical results, especially in terms of their correctness. In particular, the unrealistic assumptions and over-simplification make it hard to connect the theoretical results with the actual algorithm. Several reviewers suggest the authors to move theoretical analysis section to a supplementary section as a hypothesis, and the authors are also encouraged to clearly discuss what the theory can and cannot cover.",ICLR2022, +5ifOpZ0JTA2,1642700000000.0,1642700000000.0,1,EcGGFkNTxdJ,EcGGFkNTxdJ,Paper Decision,Accept (Poster),"The submission proposes a new approach to deriving a policy gradient type algorithm for multi agent RL (MARL) where the agents are interested in a common objective but with potentially different action spaces. It extends the monotone improvement property for single agent trust-region based methods like TRPO to a multi agent update setting where the updates are performed in sequence by the agents, and uses this idea to derive new multi agent analogues of TRPO and PPO. These algorithms are shown to be competitive with existing strategies for MARL on a Starcraft environment, and superior in the case of common Mujuco benchmarks. + +All reviewers are unanimous in their appreciation for the paper's contributions. The initial concerns about clarity of the technical results, especially the improvement guarantee of the key lemma, that some reviewers had were addressed adequately by the author responses. Hence, I gladly recommend acceptance.",ICLR2022, +i3N68N6BcN5,1642700000000.0,1642700000000.0,1,liV-Re74fK,liV-Re74fK,Paper Decision,Reject,"The authors introduce a modification to CQL to use a weighting based on density estimates. In an idealized setting, they show that the estimate Q-values bound the true Q-values. Finally, they evaluate their proposed approach on a few benchmark offline RL tasks. + +Generally, all reviewers felt that the results were too incremental. The theoretical result follows with light modifications from the CQL paper and even then, the implications of the result are unclear. The experimental results showed small improvements or comparable performance while requiring training a density estimator and introducing an additional hyperparameter. Furthermore, the set of tasks evaluated was limited and no comparisons to other methods than CQL were shown. + +While I appreciate the effort the authors took to investigate this improvement, at this time, the paper falls below the bar and I recommend rejection.",ICLR2022, +mfmGtUPx9,1576800000000.0,1576800000000.0,1,ryestJBKPB,ryestJBKPB,Paper Decision,Reject,"This paper proposes and evaluates using graph convolutional networks for semi-supervised learning of probability distributions (histograms). The paper was reviewed by three experts, all of whom gave a Weak Reject rating. The reviewers acknowledged the strengths of the paper, but also had several important concerns including quality of writing and significance of the contribution, in addition to several more specific technical questions. The authors submitted a response that addressed these concerns to some extent. However, in post-rebuttal discussions, the reviewers chose not to change their ratings, feeling that quality of writing still needed to be improved and that overall a significant revision and another round of peer review would be needed. In light of these reviews, we are not able to recommend accepting the paper, but hope the authors will find the suggestions of the reviewers helpful in preparing a revision for another venue. ",ICLR2020, +BCwj-yt3y2I,1642700000000.0,1642700000000.0,1,#NAME?,#NAME?,Paper Decision,Accept (Poster),"This paper proposes an ""embedding layer"" in which points on a model are mapped into a feature space, trained using a reconstruction-based pretext task. Then, the resulting embedding layer can be applied to shape data before using different learning architectures for modalities like meshes and point clouds. The work is particularly interesting in its attempt to derive a learned shape representation that is agnostic to modality. Some questions remained about experiments (e.g. baselines), but these are relatively minor and partially addressed in the rebuttal phase; also, sometimes the improvement seems to be marginal in practice. + +Two reviewers championed this work during the discussion phase. The AC tends to agree this work is an interesting direction for future work and contains insight that the vision/learning communities might be able to use in other settings.",ICLR2022, +V5p4pQNXef,1576800000000.0,1576800000000.0,1,rylmoxrFDH,rylmoxrFDH,Paper Decision,Accept (Poster),"The authors study neural networks with binary weights or activations, and the so-called ""differentiable surrogates"" used to train them. +They present an analysis that unifies previously proposed surrogates and they study critical initialization of weights to facilitate trainability. + +The reviewers agree that the main topic of the paper is important (in particular initialization heuristics of neural networks), however they found the presentation of the content lacking in clarity as well as in clearly emphasizing the main contributions. +The authors imporved the readability of the manuscript in the rebuttal. + +This paper seems to be at acceptance threshold and 2 of 3 reviewers indicated low confidence. +Not being familiar with this line of work, I recommend acceptance following the average review score.",ICLR2020, +zqwm3tx1F3,1576800000000.0,1576800000000.0,1,Ske9VANKDH,Ske9VANKDH,Paper Decision,Reject,The paper is rejected based on unanimous reviews.,ICLR2020, +KeBUhSqPVS,1642700000000.0,1642700000000.0,1,0jP2n0YFmKG,0jP2n0YFmKG,Paper Decision,Accept (Poster),"The reviewers were split about this paper: on one hand they appreciated the clarity and the experimental improvments in the paper, on the other they were concerned about the novelty of the work. After going through it and the discussion I have decided to vote to accept this paper for the following reasons: (a) the potential impact of the work, (b) the simplicity of the idea, and (c) promise of release of open source code. I think these things make the paper a strong contribution to ICLR. The only thing I would like to see added, apart from the suggestions detailed by the reviewers, is a small discussion on the carbon footprint of training such largescale graph networks. The authors motivated the work by saying it could have a beneficial impact for modelling energy which is needed to combat climate change. However, we know from recent results that such large scale models also have a non-trivial emission footprint. So I'd like to see the authors specifically calculate the carbon footprints of the models they trained. There are tools to help with this such as: https://mlco2.github.io/impact/ With this addition I think this paper will not only make a large impact on graph network training but also start a discussion of how to responsibly decide training, taking environmental impact into account.",ICLR2022, +mlSVnoJXHf,1610040000000.0,1610470000000.0,1,JbAqsfbYsJy,JbAqsfbYsJy,Final Decision,Reject,"The paper presents an KL-divergence minimisation approach to the action–perception loop, and thus presents a unifying view on concepts such as Empowerment, entropy-based RL, optimal control, etc. The paper does two things here: it serves as a survey paper, but on top of that puts these in a unifying theory. While the direct merit of that may not be obvious, it does serve as a good basis to combine the fields more formally. + +Unfortunately, the paper suffers from the length restrictions. With more than half of the paper in the appendix, it should be published at a journal or directly at arXiv. Not having a page limit would improve the readability much. ICLR may not be the best venue for review papers.",ICLR2021, +BJeEKDkIgV,1545100000000.0,1545350000000.0,1,HJePy3RcF7,HJePy3RcF7,Borderline paper,Reject,"R4 recommends acceptance while R2 is lukewarm and R1 argues for rejection to revise the presentation of the paper. As we unfortunately need to reject borderline papers given the space constraints, the AC recommends ""revise and resubmit"".",ICLR2019,3: The area chair is somewhat confident +D2E4ARPRAnb,1642700000000.0,1642700000000.0,1,D8njK_Ix5dJ,D8njK_Ix5dJ,Paper Decision,Reject,"The paper addresses unsupervised domain adaptation under covariate shift and missing source and target features. Three approaches are proposed for tackling respectively covariate shift, missing data and simultaneous covariate shift and missing data. The proposed method relies on the minimization of the maximum mean discrepancy between the source and target representations in the different settings. Experiments are performed on a synthetic dataset and on two other datasets. + +All the reviewers highlighted several weaknesses: lack of formal definitions and of formal analyses, lack of connection with existing approaches for handling missing data, weak reproducibility. The authors did not provide responses. Reject.",ICLR2022, +H1eCYuINlE,1545000000000.0,1545350000000.0,1,BJe1E2R5KX,BJe1E2R5KX,Good paper,Accept (Poster),"This paper proposes model-based reinforcement learning algorithms that have theoretical guarantees. These methods are shown to good results on Mujuco benchmark tasks. All of the reviewers have given a reasonable score to the paper, and the paper can be accepted.",ICLR2019,2: The area chair is not sure +J_o14dyl6p_,1642700000000.0,1642700000000.0,1,iedYJm92o0a,iedYJm92o0a,Paper Decision,Reject,"The main remaining criticism of the paper is reproducibility, i.e., ""it is nearly impossible to verify the correctness of the result in the paper or to reproduce any of these results"" (AC). We generally agree with this statement. While the authors do provide some details in the paper, reviewers, AC, SAC, and PCs agree that this is insufficient. Further points that came up in our discussions were the simplicity of the baselines and the choice of testing to demonstrate that the approach really works. Our impression is that the work lacks a rigorous experimental evaluation. These considerations led to the decision in the end.",ICLR2022, +r1lpfU9elE,1544750000000.0,1545350000000.0,1,SkEYojRqtm,SkEYojRqtm,limited contribution but well executed,Accept (Poster),"although i (ac) believe the contribution is fairly limited (e.g., (1) only looking at the word embedding which goes through many nonlinear layers, in which case it's not even clear whether how word vectors are distributed matters much, (2) only considering the case of tied embeddings, which is not necessarily the most common setting, ...), all the reviewers found the execution of the submission (motivation, analysis and experimentation) to be done well, and i'll go with the reviewers' opinion.",ICLR2019,3: The area chair is somewhat confident +2sH993qYbi,1576800000000.0,1576800000000.0,1,HJeIU0VYwB,HJeIU0VYwB,Paper Decision,Reject,"In this paper, the authors proposed a general framework, which uses an explicit function as an adjustment to the actual learning rate, and presented a more adaptive specific form Ada+. Based on this framework, they analyzed various behaviors brought by different types of the function. Empirical experiments on benchmarks demonstrate better performance than some baseline algorithms. The main concern of this paper is: (1) lack of justification or interpretation for the proposed framework; (2) the performance of the proposed algorithm is on a par with Padam; (3) missing comparison with some other baselines on more benchmark datasets. Plus, the authors did not submit response. I agree with the reviewers’ evaluation.",ICLR2020, +S1kVN16SM,1517250000000.0,1517260000000.0,326,r1SnX5xCb,r1SnX5xCb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper is well written, addresses and interesting problem, and provides an interesting solution.",ICLR2018, +7XST0x7XpKK,1642700000000.0,1642700000000.0,1,MOm8xik_TmO,MOm8xik_TmO,Paper Decision,Reject,"This paper develops a variational auto transformer model (VAT), a VAE based on the transformer (encoder-decoder) architecture designed to provide isotropic representations by adding a token-level loss for isotropy. All the reviewers agree that this is a novel architecture with a valid and interesting goal behind it. + +Reviewers varied somewhat on their impressions of the paper, but none were strongly positive on accepting it. I think the strongest and most aligned concerns were from reviewers ZoL1 and pcez. They both feel that the experiments do not convincingly demonstrate what is required. It would be good to better establish the success of variational sampling and the usefulness of isotropic representations. I would think that even a page of examples in the appendix, contrasting sampling by various methods, would add a lot of information to what is presented here. It would be even better to have experiments showing the relation between improved isotropy and improved task performance (suggested by j72L). Both reviewers are concerned about the small model and weak results and whether these results would extend to larger models that people actually use. While on the one hand, controlled comparisons are valuable, it is also true that people in NLP routinely like to see results on models of a reasonably competitive size. In practice, for 2019-2021, it seems that people regard having models of BERT-base size as the ""reasonable"" small size that they will accept and for which there is reasonably good performance and lots of available empirical results. Transformers directly trained with very few layers do not perform that well. Reviewer pcez is also concerned about the change of the data set in the MiniBERT comparison, which seems valid, and reviewer 5v5U is concerned about what's fair in terms of parameter counts. + +This paper needs further work with larger and more careful experimental comparisons to meet the needed level of experimental rigor to be convincing. The authors were not able to iterate sufficiently quickly to achieve this during the ICLR reviewing period, so it seems best that the paper be rejected for now, and the authors look to subsequently submit a more developed version of this work.",ICLR2022, +ybp-YqzmmL-,1642700000000.0,1642700000000.0,1,tD7eCtaSkR,tD7eCtaSkR,Paper Decision,Accept (Spotlight),"The paper provides a procedure for certifying L2 robustness in image classification. The paper shows that the technique indeed works in practice by demonstrating it's accuracy on CIFAR-10 and CIFAR-100 datasets. + +The reviewers are positive about the paper. Please do incorporate feedback, especially around experimental setup to ensure that the work compares various methods fairly and provides a clear picture to the reader.",ICLR2022, +SkxR7jIhyV,1544480000000.0,1545350000000.0,1,HyeS73ActX,HyeS73ActX,Clarity and motivation need improvement,Reject,"The main issue with the work in its current form is a lack of motivation and some clarity issues. The paper presents some interesting ideas, and will be much stronger when it incorporates a more clear discussion on motivation, both for the problem setting and the proposed solutions. The writing itself could also be significantly improved. ",ICLR2019,4: The area chair is confident but not absolutely certain +r1HnfkTBf,1517250000000.0,1517260000000.0,16,Sk2u1g-0-,Sk2u1g-0-,ICLR 2018 Conference Acceptance Decision,Accept (Oral),Looks like a great contribution to ICLR. Continuous adaptation in nonstationary (and competitive) environments is something that an intelligent agent acting in the real world would need to solve and this paper suggests that a meta-learning approach may be quite appropriate for this task.,ICLR2018, +OPHuOtflBvT,1610040000000.0,1610470000000.0,1,punMXQEsPr0,punMXQEsPr0,Final Decision,Reject,"The paper proposes a new pre-trained language model for information extraction on documents. It consists of a new pre-training strategy with area-masking and a new graph-based decoder to capture the relationships between text blocks. Experimental results show better performances of the proposed approach. + +Pros • The paper is generally clearly written. • Experimental results show better performances on the benchmark datasets. + +Cons • Novelty of the work might not be enough. For example, the graph-based decoder is not new. The masking technique is also a natural extension of that in BERT. • Significance of the work might not be enough. For example, the improvement from the area masking is not so significant. • There are additional experiments which can be added, as pointed out by Reviewer 3. • Presentation can be further improved. Some of the issues indicated by the reviewers have been addressed in the rebuttal. We appreciate the authors’ efforts. + +During the rebuttal, the authors have addressed the clarity issues pointed out by the reviewers. However, the main issues in novelty and significance still exist. The reviewers think that the quality of the work is still not high enough as an ICLR paper. + +",ICLR2021, +d5g3sOauYKR_,1642700000000.0,1642700000000.0,1,hbGV3vzMPzG,hbGV3vzMPzG,Paper Decision,Reject,"The paper proposes a metric to measure the difficulty of training examples. The main thesis is that hard training examples lead to bad test adversarial error. There are theoretical results on simple models establishing such claims. The paper also proposes a method to adaptively weight training examples to improve training which gives improvement for adversarial error. + +The reviewers have raised a number of questions and the rebuttal period has been useful. In particularly, I agree with the reviewers that 'model-agnostic' is misleading in this context and the authors have agreed to remove this in the future. It is felt that more experiments, comparison to adversarial training, etc. is needed and I think the paper will need to go through a proper review process again before acceptance.",ICLR2022, +r1xERX6GeE,1544900000000.0,1545350000000.0,1,SylKoo0cKm,SylKoo0cKm,Interesting work on how to measure the importance of an individual neuron in a network,Accept (Poster),"This paper proposes a new measure to quantify the contribution of an individual neuron within a deep neural network. Interpretability and better understanding of the inner workings of neural networks are important questions, and all reviewers agree that this work is contributing an interesting approach and results.",ICLR2019,5: The area chair is absolutely certain +bthB_IX6XUa,1642700000000.0,1642700000000.0,1,vPK-G5HbnWg,vPK-G5HbnWg,Paper Decision,Reject,"Thank you for your first (hopefully of many!) submissions to ICLR. +This work describes a method for allowing nodes to be processed concurrently instead of sequentially, allowing for a reduction in computation time. +The reviewers identified a number of concerns about the paper (lack of citations and baselines, an additional experiments demonstrating scale, and a number of clarifications and motivation in the text). The authors addressed the majority of these concerns due the rebuttal. I'm afraid a promise of a revised manuscript is not a sufficient substitute for the reviewers seeing a revised manuscript, and due the nature of the feedback, a revision is needed, which the reviewers have not seen to check their concerns are fully addressed. Therefore, at this stage, unfortunately, I recommend rejection.",ICLR2022, +U8mJDX2tlp,1610040000000.0,1610470000000.0,1,Q4EUywJIkqr,Q4EUywJIkqr,Final Decision,Accept (Poster),"Reviewers agreed that overall the two-pronged message of the submission has utility. + +1. That ObjectNet is continues to be difficult for models to understand and is a challenging test platform even when objects are isolated from their backgrounds. This is significant and not obvious. Cropping objects makes the distribution shift between ObjectNet and ImageNet far smaller, but the large remaining performance gap points to the fact that detectors are limited by their ability to recognize the foregrounds of objects not by their ability to isolate objects from their backgrounds. + +2. That segmentation could be a promising direction for robustness to adversarial perturbations which has so far been overlooked.",ICLR2021, +uxVpc51AxEQ,1642700000000.0,1642700000000.0,1,DesNW4-5ai9,DesNW4-5ai9,Paper Decision,Accept (Poster),"This paper proposes integrating three existing approaches to give a simple algorithm called TAIG for generating transferable adversarial examples under blackbox attacks. + +In the original reviews, some strengths and weaknesses of the papers were highlighted although some of them have not reached general agreement after the discussion period. + +Regarding the merits, it is generally felt that the experimental results are good and the idea of updating along the integrated gradients is new (despite a simple idea) and has some theoretical justification. + +Nevertheless, even after the discussion period, some concerns still remain, including the technical novelty of the proposed method and the high computational requirements of the proposed method, among others. + +We appreciate the authors for responding to the reviews by clarifying some points and providing further experimental results. The paper would be more ready for publication if all the comments and suggestions are taken into consideration to improve the paper more thoroughly.",ICLR2022, +1hS_L-LMeNf,1610040000000.0,1610470000000.0,1,3nSU-sDEOG9,3nSU-sDEOG9,Final Decision,Reject,"This work tackles sparse or delayed reward problem in reinforcement learning. The key idea is to build a classifier to detect states that will lead to high rewards in the future and provide a bonus to those states. All the reviewers liked the idea but had issues with the execution of empirical results. The approach is evaluated only in a few Atari games skipping many sparse reward games and missing comparison to many exploration baselines. Furthermore, many reviewers found the writing confusing and hard to follow. The authors provided the rebuttal and addressed some of the concerns. However, upon discussion post rebuttal, the reviewers decided to maintain their score. Reviewers believe that the paper will immensely benefit with improved writing, evaluation on all atari games, and comparison to exploration baselines. Please refer to the reviews for final feedback and suggestions to strengthen the future submission.",ICLR2021, +BkOOmJ6rG,1517250000000.0,1517260000000.0,173,BJGWO9k0Z,BJGWO9k0Z,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The paper got generally positive scores of 6,7,7. The reviewers found the paper to be novel but hard to understand. The AC feels the paper should be accepted but the authors should revise their paper to take into account the comments from the reviewers to improve clarity.",ICLR2018, +YNtSl4UBV_c,1642700000000.0,1642700000000.0,1,0d1mLPC2q2,0d1mLPC2q2,Paper Decision,Reject,"All of the reviewers recommended rejecting this paper. +There were concerns that the underlying research questions being probed were not expressed clearly enough. +Reviewers were concerned that the experimental work was not sufficient to warrant acceptance. +Other concerns included the technical depth of the paper, the degree to which related work was discussed, placed in context and compared with empirically. +The AC recommends rejecting this paper.",ICLR2022, +Bk3pXJTrG,1517250000000.0,1517260000000.0,240,HyyP33gAZ,HyyP33gAZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The authors investigate various class aware GANs and provide extensive analysis of their ability to address mode collapse and sample quality issues. Based on this analysis they propose an extension called Activation Maximization-GAN which tries to push each generated sample to a specific class indicated by the Discriminator. As experiments show, this leads to better sample quality & helps with mode collapse issue. The authors also analyze inception score to measure sample quality and propose a new metric better suited for this task.",ICLR2018, +51OIQhQMDx6,1610040000000.0,1610470000000.0,1,CZ8Y3NzuVzO,CZ8Y3NzuVzO,Final Decision,Accept (Poster),"There was a predominantly positive feedback from the reviewers so I recommend acceptance of the paper. It is well-written and well-motivated tackling an important problem: That in self-supervised learning one might encode different invariances by default, even if some of these invariances are useful for downstream tasks (e.g. being rotation invariant may be detrimental to predicting if an image has the correct rotation on a phone). For this, they propose a simple, yet elegant approach and validate it on many downstream tasks. Given the recent interest in self-supervised learning, this appears to be a relevant and interesting paper for the ICLR community.",ICLR2021, +jnvQckEXdlN,1642700000000.0,1642700000000.0,1,roxWnqcguNq,roxWnqcguNq,Paper Decision,Reject,"The paper present results using syntactic information (primarily through constituency trees) on the task of recognizing argument discourse units. No reviewer recommends acceptance of the paper: +- The empirical results appear strong, though the reviewers raise questions about some of the experimental choices. +- The writing is unclear and reviewers point out many missing or incorrect references in the bibliography. +- There is little methodological novelty - known techniques are applied to a topic that has not been studied much. +Overall, the area chair agrees with the reviewers that this work does not yet meet the bar for ICLR.",ICLR2022, +nYIW6bIcLi,1642700000000.0,1642700000000.0,1,XuS18b_H0DW,XuS18b_H0DW,Paper Decision,Reject,"The authors develop an approach to improve upon methods for training certifiably robust models. They propose an input dependent margin-based weighting and an automatically generated curriculum schedule and demonstrate improvements on training certifiably robust models on MNIST and CIFAR-10. + +Reviewers agree that the paper makes interesting and novel contributions. However, the lack of novelty in the approach combined with the limited empirical gains make it difficult to justify acceptance. In particular, reviewers raise valid concerns on the quality of experiments comparing to prior work (in particular Crown-IBP (Zhang et al 2020) and COLT (Balunovic & Vechev 2020)) (in particular hyperparameter tuning, inability to recreate baseline results and unjustified claims that the prior art cannot run on GPU hardware). Further, even the gains demonstrated are marginal. + +Hence, I recommend rejection, but encourage the authors to revise the paper based on the feedback received.",ICLR2022, +H1JCnfUul,1486400000000.0,1486400000000.0,1,ry3iBFqgl,ry3iBFqgl,ICLR committee final decision,Reject,"The program committee appreciates the authors' response to concerns raised in the reviews. All reviewers agree that the paper is not convincingly above the acceptance threshold. The paper will be stronger, and the benefit of this dataset over SQuAD will likely be more clear once authors incorporate reviewers' comments and finish evaluation of inter-human agreement.",ICLR2017, +bVGuAAaTMv,1576800000000.0,1576800000000.0,1,HklSeREtPB,HklSeREtPB,Paper Decision,Accept (Spotlight),"This paper studies properties that emerge in an RNN trained to report head direction, showing that several properties in natural neural circuits performing that function are detected. +All reviewers agree that this is quite an interesting paper. While there are some reservations as to the value of letting a property of interest emerge as opposed to simply hand-coding it in, this approach is seen as powerful and valuable by many people, in that it suggests a higher plausibility that the emerging properties are actually useful when optimizing for that function -- a claim which hand-coding would not make possible. Reviewers have also provided valuable suggestions and requests for clarifications, and authors have responded by improving the presentation and providing more insights. +Overall, this is a solid contribution that will be of interest to the part of the ICLR audience that is interested in biological systems.",ICLR2020, +By-9XyprM,1517250000000.0,1517260000000.0,195,HkwBEMWCZ,HkwBEMWCZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"pros: +* novel explanation: skip connections <--> singualrities +* thorough analysis +* significant topic in understanding deep nets + +cons: +* more rigorous theoretical analysis would be better + +overall, the committee feels this paper would be interesting to have at ICLR. +",ICLR2018, +HyeLDcN8xN,1545120000000.0,1545350000000.0,1,HyN-M2Rctm,HyN-M2Rctm,Original generalization of batchnorm that yields small accuracy improvement.,Accept (Poster),"The paper develops an original extension/generalization of standard batchnorm (and group norm) by employing a mixture-of-experts to separate incoming data into several modes and separately normalizing each mode. The paper is well written and technically correct, and the method yields consistent accuracy improvements over basic batchnorm on standard image classification tasks and models. +Reviewers and AC noted the following potential weaknesses: a) while large on artificially mixed data, improvements are relatively small on single standard datasets (<1% on CIFAR10 and CIFAR100) b) the paper could better motivate why multi-modality is important e.g. by showing histograms of node activations c) the important interplay between number of modes and batch size should be more thoroughly discussed +d) the closely related approach of Kalayeh & Shah 2018 should be presented and contrasted with in more details in the paper. Also comparing to it in experiments would enrich the work. +",ICLR2019,3: The area chair is somewhat confident +sVEbY0ry4ivL,1642700000000.0,1642700000000.0,1,TN-W4p7H2pK,TN-W4p7H2pK,Paper Decision,Reject,"This paper proposes a conditional quantile generative model using optimal transport. Although the problem addressed in this paper is interesting and important, the proposed convex potential quantile (CPQ) approach is highly relevant to a recent work (Carlier et al. 2017). Due to the lack of clear explanations of the contributions compared to the existing work, none of the reviewers suggested acceptance of this paper.",ICLR2022, +tHYBF3YXvdl,1610040000000.0,1610470000000.0,1,WlT94P_zuHF,WlT94P_zuHF,Final Decision,Reject,"This paper introduces Transformer-QL, a new variant of transformer networks that can process long sequences more efficiently. This is an important research problem, which has been widely studied recently. Unfortunately, this paper does not compare to such previous works (eg. see ""Efficient transformers: A survey""), the only considered baselines being Transformer-XL and Compressive transformer. Moreover, the reviewers found the experimental section to be lacking, as the results are weak compared to existing work, and important ablation studies are missing. The authors did not provide a rebuttal. For these reasons, I recommend to reject the paper.",ICLR2021, +Skgd-0eIlN,1545110000000.0,1545350000000.0,1,SkMwpiR9Y7,SkMwpiR9Y7,Borderline accept,Accept (Poster),"This paper proposes to regularize neural network in function space rather than in parameter space, a proposal which makes sense and is also different than the natural gradient approach. + +After discussion and considering the rebuttal, all reviewers argue for acceptance. The AC does agree that this direction of research is an important one for deep learning, and while the paper could benefit from revision and tightening the story (and stronger experiments); these do not preclude publishing in its current state. + +Side comment: the visualization of neural networks in function space was done profusely when the effect of unsupervised pre-training on neural networks was investigated (among others). See e.g. Figure 7 in Erhan et al. AISTATS 2009 ""The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training"". This literature should be cited (and it seems that tSNE might be a more appropriate visualization techniques for non-linear functions than MDS).",ICLR2019,4: The area chair is confident but not absolutely certain +6MDjOtlIMdYa,1642700000000.0,1642700000000.0,1,xs-tJn58XKv,xs-tJn58XKv,Paper Decision,Reject,"The idea of learning unstable features from source tasks to help learn stable features for a target task is interesting and well-motivated. As the proposed method and its theoretical analysis of learning unstable features from tasks are an incremental extension of an existing work [Bao et al. 2021], the technical contributions line in applying the idea of stable and unstable features learning to the setting of transfer learning. Therefore, the evaluation of this work is focused on the effectiveness of the proposed method in the transfer learning setting. + +In transfer learning, one major goal is to make use of knowledge extracted from source tasks to help learn a precise target classifier even with a few or no labeled examples of the target task. It would be more convincing if experiments are conducted to show how the performance of the proposed method changes when the size of labeled data of the target task changes. This is to verify whether the exploitation of unstable features can help to learn a stable classifier for the target tasks more efficiently (i.e., with fewer labeled examples). In addition, as some baseline methods used for comparison do not need to access any labeled data of the target task (like unsupervised domain adaptation or domain generalization approaches), it is not fair to conduct comparison experiments in the setting where there are sufficient labeled examples of the target task since the original designs of such baselines may fail to fully exploit label information in the target task. + +Another concern is whether the proposed method is realistic for real-world transfer learning problems. Though in the rebuttal, the authors provided experimental results on a natural environment (CelebA), the constructed transfer learning problem is more like a toy problem. Indeed, there are many transfer learning benchmark datasets that contain multiple domains/tasks. It would be more convincing if experiments are conducted on those datasets. + +By considering the above two concerns, this paper is on the borderline. My recommendation is a weak rejection based on the current form of this paper. Note that as some references listed by reviewers RJhJ and J8M5 are not really related to the proposed research here, the novelty of the proposed method compared with those references is NOT taken into consideration to make this recommendation.",ICLR2022, +xwcjxHX_ObF,1642700000000.0,1642700000000.0,1,BlyXYc4wF2-,BlyXYc4wF2-,Paper Decision,Reject,"The paper addresses safe multi-agent reinforcement learning and makes two key contributions. First is a safety concerned multi agent benchmark, which is an extension of MAMuJoCo. Second, is the formulation and two solution to safety MARL problem. The authors pose safe MARL, and MARL problem with safety constraints, as a constrained Markov game. + +The safety constrained MARL is an important, difficult, and understudied problem. The problem is more difficult that the single agent safe RL because of the non-stationarity in the MARL setting, which renders any theoretical guaranties conditioned on the assumptions of the behaviors of other agents. The authors are right to point out the lack of the benchmarks in the space. + +That said, reflecting on the reviewers' feedback and my own reading of the paper, this paper is attempting to do too much (benchmark, problem formulation, and two methods), in too little space, and is falling short. For example, the benchmark is an important contribution, but it is barely mentioned in the main text of the paper. If this was fully safety benchmark paper, there is an opportunity to go beyond MAMuJoCo, which feels like a forced multi-agent problem, and construct a safety benchmark with energy constraints, cooperative and competitive tasks etc... If this was fully methods paper, there would be an opportunity for more in-depth analysis of the results that the reviewers' pointed out. In it's current form, the paper feels like proposing a benchmark not grounded in a real world problem, and then a method to solve the problem. + +I would suggest the authors to either: +- submit the paper to a journal where a space constraint would not be in a way, or +- split it into two papers, a more comprehensive benchmark, and methods paper evaluated on more difficult problems. + +Minor: +- Please update the literature. Some of the papers have been published, and they are cited as Arxiv papers.",ICLR2022, +rJsB7JaBf,1517250000000.0,1517260000000.0,134,S1J2ZyZ0Z,S1J2ZyZ0Z,ICLR 2018 Conference Acceptance Decision,Accept (Poster),Important problem and all reviewers recommend acceptance. I agree.,ICLR2018, +NJAtzIpLdUG,1610040000000.0,1610470000000.0,1,3ZeGLibhFo0,3ZeGLibhFo0,Final Decision,Reject,"Summary: This paper provides an approach for causal inference in observational survival dataset in which the outcome is of time-to-event type with right-censored samples. To this end, the paper adapts the balanced representation learning approach proposed in (Shalit et al, 2017) to the context of survival analysis. +The paper adapts an approach that uses flexible models to learn nuisance models, common in machine learning. + +The authors validated their approach via simulation study and a set of application datasets: a EHR-based cohort study of cardio-vascular health, an RCT dataset of HIV patients, and a semi-synthetic dataset. + +The main concerns of reviewers were due to perceived lack of originality relative to the original proposal in (Shalit et al, 2017) +",ICLR2021, +oAKcm2cgGf,1576800000000.0,1576800000000.0,1,SJlbvp4YvS,SJlbvp4YvS,Paper Decision,Reject,"The authors propose to extend model-based/model-free hybrid methods (e.g., MVE, STEVE) to stochastic environments. They use an ensemble of probabilistic models to model the environment and use a lower confidence bound of the estimate to avoid risk. They found that their proposed method yields state-of-the-art performance over previous methods. + +The valid concerns by Reviewers 1 & 4 were not addressed by the authors and although the authors responded to Reviewer 3, they did not revise the paper to address their concerns. The ideas and results in this paper are interesting, but without addressing the valid concerns raised by reviewers, I cannot recommend acceptance.",ICLR2020, +SQ28nOFmeI3,1610040000000.0,1610470000000.0,1,w6p7UMtf-0S,w6p7UMtf-0S,Final Decision,Reject,"The submission proposes a transductive few-shot classification method on the basis of the simple Conditional Neural Adaptive Processes (CNAPS) introduced by Bateni et al. The paper received two borderline accept and two borderline reject reviews, indicating that the paper may not be yet ready for a publication. The meta reviewer recommends rejection based on the observations below. + +All reviewers indicated that the paper is well-motivated, clearly written and neatly organized. However, all four reviewers agree that the novelty of the paper compared to the CNAPS paper is limited. The main novelty of the method being transductive CNAPS extends the task encoder of CNAPS to incorporate both a support-set embedding and a query-set embedding through Long-Short Term memory (LSTM) network. Similarly, the classifier in CNAPS has been modified to operate in the transduction setting, i.e. it is extended to include the unlabeled examples in the query set. The reviewers indicate that extension of the task encoder via LSTM may not be enough technical novelty for such a competitive venue. Additionally, in terms of experimental evaluation, although R1 found the experimental evaluation adequate, R3 indicated some concerns about the unexpected behaviour of the method and R4&R2 found the benchmark evaluations limited. +",ICLR2021, +TjaTTCcVEw,1576800000000.0,1576800000000.0,1,SJexHkSFPS,SJexHkSFPS,Paper Decision,Accept (Poster),"This paper studies the setting in reinforcement learning where the next action must be sampled while the current action is still executing. This refers to continuous time problems that are discretised to make them delay-aware in terms of the time taken for action execution. The paper presents adaptions of the Bellman operator and Q-learning to deal with this scenario. + +This is a problem that is of theoretical interest and also has practical value in many real-world problems. The reviewers found both the problem setting and the proposed solution to be valuable, particularly after the greatly improved technical clarity in the rebuttals. As a result, this paper should be accepted.",ICLR2020, +ADuls484mPn,1610040000000.0,1610470000000.0,1,uFkGzn9RId8,uFkGzn9RId8,Final Decision,Reject,"This paper presents a refreshening insight into the classical idea of using external memory for reinforcement learning agents that learn and act in partially observable environments. The authors investigate a number of different memory architectures (Ok, OAk, Kk) and provide an insightful discussion on why we want to restrict the structure of the memory. + +Reviewers generally appreciated the technical contribution of the paper, although not very convinced that this work will have a significant impact on future work. AC is also not sure about the conclusion drawn from the paper, where policies with external memory could have better sample complexity compared to rnn-based policies. BTTT is computationally expensive, but it shall give better direction of which state to jump to, compared to the authors approach where the gradients are stopped at every timestep. So there should be pros and cons about this approach, and AC suspects that the sample complexity improvement actually comes from the fact that authors are explicitly limiting what can be stored in the memory, e.g. O or OA. This advantage can be broken in some other domains. AC admits that this is only a speculation at this point, but the motivation to use the external memory framework proposed in the paper needs to be more carefully investigated. +",ICLR2021, +S1giB6D4eN,1545010000000.0,1545350000000.0,1,r1GgDj0cKX,r1GgDj0cKX, not convincing experiments and lack of novelty,Reject,This paper propose to obtain high pruning ratio by adding constraints to obtain small weights. Reviewers have a consensus on rejection due to not convincing experiments and lack of novelty.,ICLR2019,5: The area chair is absolutely certain +KbSY705Mm,1576800000000.0,1576800000000.0,1,HJloElBYvB,HJloElBYvB,Paper Decision,Accept (Poster),"This submission presents a theoretical study of phase transitions in IB: adjusting the IB parameter leads to step-wise behaviour of the prediction. Quoting R3: “The core result is given by theorem 1: the phase transition betas necessarily satisfy an equation, where the LHS is expressed in terms of an optimal perturbation of the encoding function X->Z.” +This paper received a borderline review and two votes for weak accept. The main comment for the borderline review was about the rigor of a proof and the use of << symbols. The authors have updated the proof using limits as requested, addressing this primary concern. On the balance, the paper makes a strong contribution to understanding an important learning setting and a contribution to theoretical understanding of the behavior of information bottleneck predictors. ",ICLR2020, +vqKoVb0lZS,1576800000000.0,1576800000000.0,1,ByxQB1BKwH,ByxQB1BKwH,Paper Decision,Accept (Poster),"This paper a new method of constructing graph neural networks for the task of reasoning to answer IQ style diagrammatic reasoning, in particular including Raven Progressive Matrices. The model first learns an object representation for parts of the image and then tries to combine them together to represent relations between different objects of the image. Using this model they achieve SOTA results (ignoring a parallely submitted paper) on the PGM and Raven datasets. The improvement in SOTA is subtantial. + +Most of the critique made for the paper is on writing style and presentation. The authors seem to have fixed several of these concerns in the newly uploaded version of the paper. I will further request the authors to revise the paper for readability. However, since the paper presents both an interesting modeling and improved empirical results, I recommend acceptance.",ICLR2020, +jJrO00YPBn_,1642700000000.0,1642700000000.0,1,847CwJv9Vx,847CwJv9Vx,Paper Decision,Reject,"All reviewers agree that the presented investigation of existing person re-identification approaches is easy to read and can be used as a tutorial in the field. However, the reviewers raised a number of major concerns including inadequate discussion of insights made from the conducted survey, lack of some related important experimental studies and inadequate/ unconvincing conclusions made in the presented work. The authors’ rebuttal addressed some of the reviewers’ questions but failed to alleviate all reviewers’ concerns. Hence, I cannot suggest this paper for presentation at ICLR.",ICLR2022, +L_3LDfTqDUI,1610040000000.0,1610470000000.0,1,u8APpiJX3u,u8APpiJX3u,Final Decision,Reject,"All reviewers agree that the paper is well written and some of the experiments are interesting. However, the paper did not clearly highlight how this work fits in with prior research, neither did it show what the advantages of the presented homogeneous network are. The authors addressed some of these concerns in the rebuttal, but not enough to sway the reviewers. In the end all reviewers recommend rejection, and the AC sees no evidence to overturn this recommendation.",ICLR2021, +vx4wFaMSXz2,1610040000000.0,1610470000000.0,1,TgSVWXw22FQ,TgSVWXw22FQ,Final Decision,Accept (Poster),"The authors propose a new model to learn voice style transfer using an encoder-decoder framework with the aim of disentangling content and style representations. + +The strengths of the paper are: ++ the method is well-motivated with sound theoretical justification ++ the authors improve up on the prior work by augmenting the loss with an information-theoretic term ++ empirical evaluations demonstrate performance improvements in speaker verification and speech similarity tasks ++ demonstrate improvements in the challenging zero-shot task + +Several reviewers requested improvements in readability ++ “ideally the central intuitions and actual specific bottom-line criteria used would be much clearer.” ++ more clarify on empirical details including challenges that needed to be addressed +",ICLR2021, +gbRYb-R4IC,1576800000000.0,1576800000000.0,1,ryxOBgBFPH,ryxOBgBFPH,Paper Decision,Reject,"Although the reviewers appreciated the novelty of this work, they unanimously recommended rejection. The current version of the paper exhibits weak presentation quality and lacks sufficient technical depth. The experimental evaluation was not found to be sufficiently convincing by any of the reviewers. The submitted comments should help the authors improve their paper.",ICLR2020, +m1ayDtZnYUB,1610040000000.0,1610470000000.0,1,E9W0QPxtZ_u,E9W0QPxtZ_u,Final Decision,Reject,The reviewers brought up significant concerns that were not resolved by the authors' responses. The concerns are too significant for the paper to be accepted at this time.,ICLR2021, +B1ShH1arf,1517250000000.0,1517260000000.0,654,SkAK2jg0b,SkAK2jg0b,ICLR 2018 Conference Acceptance Decision,Reject,"Three reviewers recommended rejection, and there was no rebuttal.",ICLR2018, +CXA_JGka-jT,1642700000000.0,1642700000000.0,1,KEQl-MZ5fg7,KEQl-MZ5fg7,Paper Decision,Accept (Poster),"The paper examines neural architecture search for multi-task networks, by associating model hyperparameters with a coding space and building an MLP predictor for mapping codes to task performance. After the discussion phase, reviewers are marginally in favor or accepting the paper, pointing to the extensive experimental results as a convincing contribution.",ICLR2022, +SJhv716Sz,1517250000000.0,1517260000000.0,163,HkTEFfZRb,HkTEFfZRb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Paper was well written and rebuttal was well thought out and convincing. + +The reviewers agree that the paper showed BNNs were good (relatively speaking) at resisting adversarial examples. Some question was raised about whether the methods would work on larger datasets and models. The authors offered some experiments in this regard in the rebuttal to this end. Also, a public comment appeared to follow up on CIFAR and report correlated results. ",ICLR2018, +s-h3mZabw,1576800000000.0,1576800000000.0,1,BkxackSKvH,BkxackSKvH,Paper Decision,Reject,"This paper proposes a method for learning sentence embeddings such that entailment and contradiction relationships between sentence pairs can be inferred by a simple parameter-free operation on the vectors for the two sentences. + +Reviewers found the method and the results interesting, but in private discussion, couldn't reach a consensus on what (if any) substantial valuable contributions the paper had proven. The performance of the method isn't compellingly strong in absolute or relative terms, yielding doubts about the value of the method for entailment applications, and the reviewers didn't see a strong enough motivation for the line of work to justify publishing it as a tentative or exploratory effort at ICLR.",ICLR2020, +HMvkpVdyBY,1576800000000.0,1576800000000.0,1,rJgzzJHtDB,rJgzzJHtDB,Paper Decision,Accept (Poster),"The authors develop a novel technique to train networks to be robust and accurate while still being efficient to train and evaluate. The authors propose ""Robust Dynamic Inference Networks"" that allows inputs to be adaptively routed to one of several output channels and thereby adjust the inference time used for any given input. They show + +The line of investigation initiated by authors is very interesting and should open up a new set of research questions in the adversarial training literature. + +The reviewers were in consensus on the quality of the paper and voted in favor of acceptance. One of the reviewers had concerns about the evaluation in the paper, in particular about whether carefully crafted attacks could break the networks studied by the authors. However, the authors performed additional experiments and revised the paper to address this concern to the satisfaction of the reviewer. + +Overall, the paper contains interesting contributions and should be accepted.",ICLR2020, +O9HULKUsGf,1576800000000.0,1576800000000.0,1,BJxt2aVFPr,BJxt2aVFPr,Paper Decision,Reject,"The paper proposes an iterative learning method that jointly trains both a model and a scorer network that places a non-uniform weights on data points, which estimates the importance of each data point for training. This leads to significant improvement on several benchmarks. The reviewers mostly agreed that the approach is novel and that the benchmark results were impressive, especially on Imagenet. There were both clarity issues about methodology and experiments, as well as concerns about several technical issues. The reviewers felt that the rebuttal resolved the majority of minor technical issues, but did not sufficiently clarify the more significant methodological concerns. Thus, I recommend rejection at this time.",ICLR2020, +Zn9CT3fodxH,1642700000000.0,1642700000000.0,1,Nct9j3BVswZ,Nct9j3BVswZ,Paper Decision,Reject,"The paper worked on fully unsupervised anomaly detection and proposed to use self-supervised representation learning to improve the performance of one-class classification. This is a borderline case close to acceptance but cannot make it. Specifically, it is useful, but its novelty is the main issue, since it is not surprising that self-supervised representation learning can improve one-class classification without representation learning (this part is still much of the taste of ICLR) and an ensemble of multiple models can improve upon a single model (which is just ""bootstrap aggregating"" or ""bagging"" used everyday in practice and known to machine learning and statistics societies a very long time ago). After seeing the rebuttal, the concerns were not really addressed well and the issues were only partially solved. Thus, the paper is not enough to guarantee an acceptance to ICLR unfortunately.",ICLR2022, +BkUD4JTSG,1517250000000.0,1517260000000.0,371,BkoXnkWAb,BkoXnkWAb,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"the reviewers were not fully convinced of the setting under which the proposed bipolar activation function was found by the authors to be preferable, and neither am i.",ICLR2018, +DQcenQv7mq,1576800000000.0,1576800000000.0,1,HyxgBerKwB,HyxgBerKwB,Paper Decision,Reject,"This paper introduces an approach for estimating the quality of protein models. The proposed method consists in using graph convolutional networks (GCNs) to learn a representation of protein models and predict both a local and a global quality score. Experiments show that the proposed approach performs better than methods based on 1D and 3D CNNs. + +Overall, this is a borderline paper. The improvement over state of the art for this specific application is noticeable. However, a major drawback is the lack of methodological novelty, the proposed solution being a direct application of GCNs. It does not bring new insights in representation learning. The contribution would therefore be of interest to a limited audience, in light of which I recommend to reject this paper.",ICLR2020, +CCZJxNqrtG,1576800000000.0,1576800000000.0,1,SklwGlHFvH,SklwGlHFvH,Paper Decision,Reject,"This paper studies deep neural network (DNN) learning curves by leveraging recent connections of (wide) DNNs to kernel methods such as Gaussian processes. + +The bulk of the arguments contained in this paper are, thus, for the ""kernel regime"" rather than ""the problem of non-linearity in DNNs"", as one reviewer puts it. +When it comes to scoring this paper, it has been controversial. However a lot of discussion has taken place. On the positive side, it seems that there is a lot of novel perspectives included in this paper. On the other hand, even after the revision, it seems that this paper is still very difficult to follow for non-physicists. + +Overall, it would be beneficial to perform a more careful revision of the paper such that it can be better appreciated by the targeted scientific community. +",ICLR2020, +OBOm1mzc8R,1642700000000.0,1642700000000.0,1,AMpki9kp8Cn,AMpki9kp8Cn,Paper Decision,Accept (Poster),"This paper proposes an identifiable nonlinear ICA model based on volume-preserving transformations. The overall approach is very similar to the GIN method published @ ICLR 2020. There is a weak consensus among the reviewers that this paper has some merit, although none pushed for acceptance. After reviewing the paper myself, I agree that the contributions here appear to be incremental, but the results do push this growing field of identifiable latent variable models forward.",ICLR2022, +7VkgQvePCg7,1610040000000.0,1610470000000.0,1,37nvvqkCo5,37nvvqkCo5,Final Decision,Accept (Spotlight),"This paper got 3 acceptance and 1 marginally below the threshold. After the rebuttal, the rating was raised to above the threshold. All the reviewers are positive about this submission. They agree that the method proposed in the submission is novel, the experiments are comprehensive and convincing. AC agrees and recommend acceptance. +",ICLR2021, +gqpUjyeeJRi,1642700000000.0,1642700000000.0,1,dpXL6lz4mOQ,dpXL6lz4mOQ,Paper Decision,Accept (Poster),"This paper shows that (under some parameter range) graph convolutional networks learns communities in the stochastic block model. The result is clean, the proof techniques rely on partitioning neurons of three types and seems applicable to more general settings. The reviewers agree that the main theorems are interesting. There are some concerns among reviewers about the presentation of the paper, but many of them seem to be already addressed in the revised version, and I would recommend the authors to continue to improve the writing. There are also some concern about experiments, but the experiments are mostly used to validate the theorems, so clarifying how they are related would suffice. Overall the paper seems to have an interesting theoretical result on GCN.",ICLR2022, +SK0RgJYdcH,1610040000000.0,1610470000000.0,1,_TGlfdZOHY3,_TGlfdZOHY3,Final Decision,Reject,"This paper is right on the borderline. It questions the utility of episodic training from a novel perspective, driven by a comparison to NCA, with thorough experiments. The hypothesis that more pairwise comparisons per batch/episode benefit learning is also quite interesting, but some reviewers didn’t feel this was convincingly presented. + +Prototypical networks are indeed a popular method for FSL, but I do as well think that NCA is more closely related to matching networks, and that it makes more sense for that to be the focus of experimentation. Matching networks involve more direct pairwise comparisons, and so a leave-one-out baseline with this model would probably be a useful comparison. + +While I appreciate the desire to focus on a fundamental aspect of FSL and not chase state of the art, I think that it’s important to show where one should go from here. That is, as the reviewers pointed out there are many mechanisms beyond vanilla PNs that have yielded better results than those presented in this paper. I don’t think matching SOTA is necessary here, but it would be nice to show that the insights here complement other mechanisms in FSL. +",ICLR2021, +HkxK23JNgE,1544970000000.0,1545350000000.0,1,HylDpoActX,HylDpoActX,Area chair recommendation,Reject,"The submission proposes a hierarchical clustering approach (nested-means clustering) to determine good quantization intervals for non-uniform quantization. An empirical validation shows improvement over a very closely related approach (Zhu et al, 2016). + +There was an overall consensus that the literature review was insufficient in its initial form. The authors have proposed to extend it somewhat. Other concerns are related to the novelty of the technique (R4 was particularly concerned about novelty over Zhu et al, 2016). + +Two reviewers were against acceptance, and one was positive about the paper. Due to the overall concerns about the novelty of the approach, and that these concerns were confirmed in discussion after the rebuttal, this paper is unlikely to meet the threshold for acceptance to ICLR.",ICLR2019,5: The area chair is absolutely certain +0xYkMSuE9LN,1642700000000.0,1642700000000.0,1,CJzi3dRlJE-,CJzi3dRlJE-,Paper Decision,Accept (Poster),"The authors build an encoding model of whole-brain brain activity by integrating incomplete functional data with anatomical/connectomics data. This work is significant from a computational neuroscience perspective because it constitutes a proof of concept regarding how whole brain calcium imaging data can be used to constrain the missing parameters of a connectome-constrained, biophysically detailed model of the C. elegans nervous system. There were issues related to clarity in the initial submission which all appeared to have been addressed in the final revision. This paper received 3 accepts (including one marginal accept) and 1 reject. The paper was discussed and the reviewers (including the negative reviewer) were unanimous that the current submission should be accepted.",ICLR2022, +CvdSMg7CqGP,1610040000000.0,1610470000000.0,1,KVTkzgz3g8O,KVTkzgz3g8O,Final Decision,Reject,"This work explores an auto-regressive density estimator based on transformer networks. The model is trained via MLE with an additional MMD regularization term. +Various experiments are performed on small benchmarks and show good results on density estimation. It is great to see that such a simple model is indeed very effective for density estimation on various small benchmarks (such as 2D density estimation and MNIST). + +The ablation experiments are informative and justify some of the model choices (such as the use of RNN to encode ""positions""). Experiments are nicely chosen and paint a broad picture of the behaviour of the studied model. + +The paper and author responses, however, excessively exaggerate the extent to which these results are relevant to the bigger picture in comparison to existing literature (e.g. flows and existing auto-regressive models). + +As it has been extensively discussed with the reviewers, the proposed model is a straightforward application of a transformer network to auto-regressive modelling, this is specially so in light of existing work on auto-regressive models with transformers [e.g 1, 6, 8], self-attention [e.g 2]. BERT [7] itself can be used for auto-regressive modelling almost out-of-the-box (with the appropriate choice of masks during training). + +At various points in the paper and author responses, it refers to flow models as ""complicated/expensive"" counterparts. These arguments are unfounded: auto-regressive models are particular cases of flows [3], and there are no obstructions to using transformer networks inside flows (in fact they have been already used, to achieve permutation equivariance and long-range correlations [e.g. 4]). +The paper leaves comparisons to spline-flows out, arguing they are ""hard to implement"". This is quite conspicuous, as not only spline-flows are straightforward to implement, they produce results entirely on-par with the presented model (as an example, look at Fig 2 from [9] in comparison to Fig 1 from this paper). +Finally, the paper also misses an important discussion about the computational complexity of the proposed method. Auto-regressive models are considerably slower to sample from in relation to other types of directed models. Even more so with transformer networks as conditioners. For instance, flows [3, 5] allow for substantially faster sampling of large-dimensional data relative to auto-regressive models (by exploiting parallel sampling). + + +Extra comments: + +The paper says ""... Self-attention also enables +permutation equivariance and naturally enables TraDE to be agnostic to the ordering of the features ... "" +This is true only for a *single* conditional $p(x_i | \text{Transformer}(x_{0 \ldots (i-1)}))$, not for the *joint* density. It is actually not straightforward to build auto-regressive models that are permutation invariant or that incorporate other forms of domain knowledge in general. +As an example, see [4] for how transformers and spline-flows can be used to produce exact permutation-invariant densities. + + +[1] Chen, M., Radford, A., Child, R., Wu, J., Jun, H., Luan, D. and Sutskever, I., 2020, November. Generative pretraining from pixels. In International Conference on Machine Learning (pp. 1691-1703). PMLR. + +[2] Parmar, N., Vaswani, A., Uszkoreit, J., Kaiser, Ł., Shazeer, N., Ku, A. and Tran, D., 2018. Image transformer. arXiv preprint arXiv:1802.05751. + +[3] Papamakarios, G., Nalisnick, E., Rezende, D.J., Mohamed, S. and Lakshminarayanan, B., 2019. Normalizing flows for probabilistic modeling and inference. arXiv preprint arXiv:1912.02762. + +[4] Wirnsberger, P., Ballard, A.J., Papamakarios, G., Abercrombie, S., Racanière, S., Pritzel, A., Rezende, D.J. and Blundell, C., 2020. Targeted free energy estimation via learned mappings. arXiv preprint arXiv:2002.04913. + +[5] Huang, C.W., Krueger, D., Lacoste, A. and Courville, A., 2018. Neural autoregressive flows. arXiv preprint arXiv:1804.00779. + +[6] Sun, C., Myers, A., Vondrick, C., Murphy, K. and Schmid, C., 2019. Videobert: A joint model for video and language representation learning. In Proceedings of the IEEE International Conference on Computer Vision (pp. 7464-7473). + +[7] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova; ACL 2019. + +[8] Child, R., Gray, S., Radford, A. and Sutskever, I., 2019. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509. + +[9] Durkan, C., Bekasov, A., Murray, I. and Papamakarios, G., 2019. Neural spline flows. In Advances in Neural Information Processing Systems (pp. 7511-7522).",ICLR2021, +b7BMFDzfOF,1576800000000.0,1576800000000.0,1,HJgb7lSFwS,HJgb7lSFwS,Paper Decision,Reject,"The paper proposes an approach for learning class-level and individual-level (token-level) representations based on Wasserstein distances between data subsets. The idea is appealing and seems to have applicability to multiple tasks. The reviewers voiced significant concerns with the unclear writing of the paper and with the limited experiments. The authors have improved the paper, but to my mind it still needs a good amount of work on both of these aspects. The choice of wording in many places is imprecise. The tasks are non-standard ones so they don't have existing published numbers to compare against; in such a situation I would expect to see more baselines, such as alternative class/instance representations that would show the benefit specifically of the Wasserstein distance-based approach. I cannot tell from the paper in its current form whether or when I would want to use the proposed approach. In short, despite a very interesting initial idea, I believe the paper is too preliminary for publication.",ICLR2020, +b42d8XgbIj,1610040000000.0,1610470000000.0,1,bIrL42I_NF8,bIrL42I_NF8,Final Decision,Reject,"The authors study the problem of (insufficient) generalization in gossip-type decentralized deep learning. Specifically, they establish an upper bound on the square of the consensus parameter distance, which the authors identify as a key quantity that influences both optimization and generalization. This upper bound (called the critical consensus distance) can be monitored and controlled during the training process via (e.g.) learning rate scheduling and tweaking the amount of gossip. A series of empirical results on decentralized image classification and neural machine translation are presented in support of this observation. + +Initial reviews were mixed. While all reviewers liked the approach, concerns were raised about the novelty of the results, the lack of theoretical depth, and the mismatch between theory and experiments. Overall, the idea of tracking consensus distance to control generalization seems to be a practically useful concept. During the discussion phase the authors were been able to (convincingly, in the area chair's view) respond to a subset of the criticisms. + +Unfortunately, concerns remained regarding the mismatch between the theoretical and empirical results, and in the end the paper fell just short of making the cut. + +The authors are encouraged to carefully consider the reviewers' concerns while preparing a future revision.",ICLR2021, +Bkxt-4inkN,1544500000000.0,1545350000000.0,1,B1fPYj0qt7,B1fPYj0qt7,ICLR 2019 decision,Reject,"This paper proposes using a tensor train low rank decomposition for compressing neural network parameters. However the paper falls short on multiple fronts 1)lack of comparison with existing methods 2) no baseline experiments. Further there are concerns about correctness of the math in deriving the algorithms, convergence and computational complexity of the proposed method. I strongly suggest taking the reviews into account before submitting the paper it again. ",ICLR2019,5: The area chair is absolutely certain +B1gSkApel4,1544770000000.0,1545350000000.0,1,ryljV2A5KX,ryljV2A5KX,"Nice idea, experiments lacking",Reject,"Strengths: This paper introduces a clever construction to build a more principled disentanglement objective for GANs than the InfoGAN. The paper is relatively clearly written. This method provides the possibility of combining the merits of GANs with the useful information-theoretic quantities that can be used to regularize VAEs. + +Weaknesses: The quantitative experiments are based entirely around the toy dSprites dataset, on which they perform comparably to other methods. Additionally, the qualitative results look pretty bad (in my subjective opinion). They may still be better than a naive VAE, but the authors could have demonstrated the ability of their model by comparing their models against other models both qualitatively and quantitatively on problems hard enough to make the VAEs fail. + +Points of contention: The quantitative baselines are taken from another paper which did zero hyperparameter search. However the authors provided an updated results table based on numbers from other papers in a comment. + +Consensus: Everyone agreed that the idea was good and the experiments were lacking. Some of the comments about experiments were addressed in the updated version but not all.",ICLR2019,2: The area chair is not sure +BG_mlaVO3bS,1610040000000.0,1610470000000.0,1,LIOgGKRCYkG,LIOgGKRCYkG,Final Decision,Reject,"I thank the authors and reviewers for the lively discussions. Reviewers found the work to be interesting but some concerns were raised regarding the significance of the results. In particular, two reviewers mentioned that authors did not fully address their concerns in the rebuttal period. Given all, I think the paper still needs a bit of work before being accepted. I recommend authors to address comments raised by the reviewers to improve their work. + +-AC ",ICLR2021, +CA5RoG3anA,1576800000000.0,1576800000000.0,1,rkeZ9a4Fwr,rkeZ9a4Fwr,Paper Decision,Reject,"This work a ""Seatbelt-VAE"" algorithm to improve the robustness of VAE against adversarial attacks. The proposed method is promising but the paper appears to be hastily written and leave many places to improve and clarify. This paper can be turned into an excellent paper with another round of throughout modification. + + +",ICLR2020, +0xLVVLI_rsD,1610040000000.0,1610470000000.0,1,QkRbdiiEjM,QkRbdiiEjM,Final Decision,Accept (Poster),"Three of the reviewers are very positive about this work, and R3 is slightly concerned about the datasets, writing, and notations etc. The authors responded to these concerns in detail and have agreed to take care of these comments. Thus an accept is recommended based on the understanding that the authors will fulfil their commitments.",ICLR2021, +mujx6qVC7fr,1642700000000.0,1642700000000.0,1,ULfq0qR25dY,ULfq0qR25dY,Paper Decision,Accept (Poster),"The paper introduces the maximum n-times coverage, a new NP-hard (and non-submodular) optimization problem. It is shown that the problem can naturally arise in ML-based vaccine design, and two heuristics are given to solve the problem. The results are used to produce a pan-strain COVID vaccine. + +The reviewers and I think that this is an interesting paper with a compelling application. There were some concerns about theoretical novelty and biological accuracy but these were addressed during the author response period. Given this, I am delighted to recommend acceptance. Please incorporate the feedback in the reviews in the final version of the paper.",ICLR2022, +HJxTGceZe4,1544780000000.0,1545350000000.0,1,S1en0sRqKm,S1en0sRqKm,Interestng empirical analysis but insights might be limited,Reject,"The paper presents an interesting empirical analysis showing that increasing the batch size beyond a certain point yields no decrease in time to convergence. This is an interesting finding, since it indicates that parallelisation approaches might have their limits. On the other hand, the study does not allow the practitioners to tune their hyperparamters since the optimal batch size is dependent on the model architecture and the dataset. Furthermore, as also pointed out in an anonymous comment, the batch size is VERY large compared to the size of the benchmark sets. Therefore, it would be nice to see if the observation carries over to large-scale data sets, where the number of samples in the mini-batch is still small compared to the total number of samples. ",ICLR2019,3: The area chair is somewhat confident +-3fOKs3Ml,1576800000000.0,1576800000000.0,1,SyepHTNFDS,SyepHTNFDS,Paper Decision,Reject,The authors propose a graph residual flow model for molecular generation. Conceptual novelty is limited since it is simple extension and there isn't much improvement over state of art.,ICLR2020, +6idK4XXJM_ho,1642700000000.0,1642700000000.0,1,iPHLcmtietq,iPHLcmtietq,Paper Decision,Accept (Poster),"This paper proposes that the superior performance of modern convolutional networks is partly due to a phase collapse mechanism that eliminates spatial variability while ensuring linear class separation. To support their hypothesis, authors introduce a complex-valued convolutional network (called Learned Scattering network) which includes a phase collapse on the output of its wavelet filters and show that such network has comparable performance to ResNets but its performance degrades if the phase collapse is replaced by a threshold operator. + +Reviewers are all in agreement about the novelty and significance of the work. They also find the empirical results compelling. The main weakness of this work which was highlighted by all reviewers is clarity. The paper can be significantly improved in terms of the writing. While I am recommending acceptance, I strongly recommend authors to take reviewers' feedback into account and improve the writing significantly for the final version so that more people would benefit from this paper and build on it in the future.",ICLR2022, +ryV2r16HG,1517250000000.0,1517260000000.0,653,HJDUjKeA-,HJDUjKeA-,ICLR 2018 Conference Acceptance Decision,Reject,All three reviewers recommended rejection and there was no rebuttal.,ICLR2018, +S1znzJ6rz,1517250000000.0,1517260000000.0,13,BJOFETxR-,BJOFETxR-,ICLR 2018 Conference Acceptance Decision,Accept (Oral),"There was some debate between the authors and an anonymous commentator on this paper. The feeling of the commentator was that existing work (mostly from the PL community) was not compared to appropriately and, in fact, performs better than this approach. The authors point out that their evaluation is hard to compare directly but that they disagreed with the assessment. They modified their texts to accommodate some of the commentator's concerns; agreed to disagree on others; and promised a fuller comparison to other work in the future. + +I largely agree with the authors here and think this is a good and worthwhile paper for its approach. + +PROS: +1. well written +2. good ablation study +3. good evaluation including real bugs identified in real software projects +4. practical for real world usage + +CONS: +1. perhaps not well compared to existing PL literature or on existing datasets from that community +2. the architecture (GGNN) is not a novel contribution",ICLR2018, +JHpeyguEDo,1576800000000.0,1576800000000.0,1,B1l8L6EtDS,B1l8L6EtDS,Paper Decision,Accept (Poster),"This paper proposes a method for improving training of text generation with GANs by performing discrimination between different generated examples, instead of solely between real and generated examples. + +R3 and R1 appreciated the general idea, and thought that while there are still concerns, overall the paper seems to be interesting enough to warrant publication at ICLR. R2 has a rating of ""weak reject"", but I tend to agree with the authors that comparison with other methods that use different model architectures is orthogonal to the contribution of this paper. + +In sum, I think that this paper would likely make a good contribution to ICLR and recommend acceptance.",ICLR2020, +m33UKm5RxE,1576800000000.0,1576800000000.0,1,H1lyiaVFwB,H1lyiaVFwB,Paper Decision,Reject,"The paper introduces a new method for 3d point cloud generation based upon auto encoders and GANs. + +Two reviewers voted for accept and one reviewer for outright reject. Both authors and reviewers posted thorough responses. Based upon these it is judged best to not accept the paper in the present. The authors should take the feedback into account in a an updated version of the paper. + +Rejection is recommended. ",ICLR2020, +S1IHEJaHf,1517250000000.0,1517260000000.0,343,SJDJNzWAZ,SJDJNzWAZ,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"I've summarized the pros and cons of the reviews below: + +Pros: +* The method for time representation in event sequences is novel and well founded +* It shows improvements on several but not all datasets that may have real-world applications + +Cons: +* Gains are somewhat small +* The task is also not of huge interest to ICLR in particular, and thus the paper might be of limited interest + +As a result, because the paper is well done, but drew little excitement from any of the reviewers, I suggest that this not be accepted to the main conference, but encouraged to present at the workshop track.",ICLR2018, +FLQtoo4KU3i,1642700000000.0,1642700000000.0,1,Y8Ivdg7typR,Y8Ivdg7typR,Paper Decision,Reject,"This paper presents a novel method for class-incremental learning (CIL) with the help of placebo data chosen from a free image stream. Such placebo data are unlabeled and easy to obtain in practice. To adaptively generate phase-specific functions as the accurate estimation of placebos' quality for KD, this paper applies reinforcement learning based on the constraints of the CIL. The effectiveness of the method has been verified on multiple datasets, including ImageNet-1k and ImageNet-Subset with both lower memory and higher accuracy than baselines. The major concern is about the novelty that the unlabeled auxiliary data is not quite new for CIL despite the minor difference in settings and methods. Moreover, the improvements over the baselines are not significant enough, which is a minor concern.",ICLR2022, +QEh3-eRxB,1576800000000.0,1576800000000.0,1,Byeq_xHtwS,Byeq_xHtwS,Paper Decision,Reject,The paper has several clarity and novelty issues.,ICLR2020, +2vL0W1t_Waf,1642700000000.0,1642700000000.0,1,zuDmDfeoB_1,zuDmDfeoB_1,Paper Decision,Reject,"The paper compares MAML and NAL for meta-learning, and provides theoretical explanations on some very simple models when MAML can be significantly better than NAL, related to a definition of task hardness. The findings are also supported by experimental results. +While the results are plausible and can mark the starting point of a useful analysis, the models analyzed in the paper are too simplistic to warrant publication at ICLR. The authors are encouraged to extend their methodology to more complicated task models, as well as to, e.g., multi-step versions of MAML (since the considered version of MAML makes a single step, the proposed problem hardness may not be applicable in more general situations). It is also not clear how the derived insights can guide the practical applications of MAML.",ICLR2022, +jgXzJRjl9X,1576800000000.0,1576800000000.0,1,Hkl_sAVtwr,Hkl_sAVtwr,Paper Decision,Reject,"This paper proposes a compressed sensing (CS) method which employs deep image prior (DIP) algorithm to recovering signals for images from noisy measurements using untrained deep generative models. A novel learned regularization technique is also introduced. Experimental results show that the proposed methods outperformed the existing work. The theoretical analysis of early stopping is also given. All reviewers agree that it is novel to combine the deep learning method with compressed sensing. The paper is well written and overall good. However the reviewers also proposed many concerns about method and the experiments, but the authors gave no rebuttal almost no revisions were made on the paper. I would suggest the author to consider the reviewers' concern seriously and resubmit the paper to another conference or journal.",ICLR2020, +SkuNQkTBG,1517250000000.0,1517260000000.0,119,ryup8-WCW,ryup8-WCW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The authors make an empirical study of the ""dimension"" of a neural net optimization problem, where the ""dimension"" is defined by the minimal random linear parameter subspace dimension where a (near) solution to the problem is likely to be found. I agree with reviewers that in light of the authors' revisions, the results are interesting enough to be presented at the conference.",ICLR2018, +PvX2s_4eIF,1610040000000.0,1610470000000.0,1,8Sqhl-nF50,8Sqhl-nF50,Final Decision,Accept (Poster),"This paper provides a theoretically rigorous treatment of approximation properties and convergence analysis of LINEAR RNNs. The reviewers were divided in their evaluation. On the positive side, the presented relation between approximation error and required memory size is not obvious and interesting. On the less positive side, two of the reviewers raised the necessity of mathematical machinery that were invoked. Furthermore, its applicability is unclear in ML, since they aren't applicable to the usual nonlinear RNNs. However, given that the theoretical contributions are clear, the final decision was to accept.",ICLR2021, +ry7r3z8de,1486400000000.0,1486400000000.0,1,HyWG0H5ge,HyWG0H5ge,ICLR committee final decision,Invite to Workshop Track,"The paper presents a theoretical analysis of the convergence of the training error. The presented result is rather general and can potentially apply to many neural networks. + + Reviewers pointed out several important concerns regarding the mathematical rigor and precision of the claims. The authors' response partially addressed concerns about the scope of the claims and unclear arguments of the proofs. + + We invite the authors to submit a revision of the paper, where all statements and proofs are mathematically clear, to the workshop track.",ICLR2017, +FFSv_ojbN27,1642700000000.0,1642700000000.0,1,#NAME?,#NAME?,Paper Decision,Accept (Poster),"The paper sets up a complex algorithm for out-of-distribution generalization. The algorithm requires first, a generalization of identification results for variational autoencoders, the followed by second, a causal discovery subroutine, and third, learning an invariant predictor using the discovered causes. The procedure reads sound, and the results on common benchmarks look good, though I do not know how practical the approach would be in general.",ICLR2022, +xJM-xSYP-qb,1610040000000.0,1610470000000.0,1,oKWmzgO7bfl,oKWmzgO7bfl,Final Decision,Reject,"In this paper the authors propose an approach to improving the accuracy of the classification problem based on deep neural networks by detecting the in-domain data from background/noise. The strategy is designed in such a way that the detector and the classifier share the bottom layers of the network. Theoretical proof is given and experiments are conducted on a variety of datasets. The novelty of the work is to come up with a better estimate the pdf of the data and use it to help the classification based on the deep neural networks. There are concerns raised by the reviewers regarding the related work, the exposition and the experimental design. After the rebuttal from the authors, which is meticulous, some of the issues unfortunately still stand. The paper needs to make a stronger case in order to be accepted, especially, for instance, the theoretical and empirical comparison with the existing techniques sharing the similar idea. ",ICLR2021, +sHsS79kJf1b,1642700000000.0,1642700000000.0,1,4jUmjIoTz2,4jUmjIoTz2,Paper Decision,Reject,"The paper proposes a novel ensemble method, CDA^2, in which base models collaborate to defend against adversarial attacks. To do so the base models have two heads: the label head for predicting the label and the posterior probability density (PPD) head that is trained by minimizing binary cross entropy between it and the true-label logit given by the label head. During inference the base model with the highest PPD value is chosen to make the prediction. During training base models learn from the adversarial examples produced by other base models. + +The evaluation of the manuscript of different reviewers was very diverse, resulting in final scores ranging between 3 and 8 after the discussion period. While the rebuttal clearly addressed the concerns of one reviewer and several additional experimental results were added for different adversarial attacks, it did not fully addressed the concerns of another reviewer, who rated his confidence higher. He was also not convinced by the update in the revised version of the manuscript, in which crucial changes in the pseudocode describing the proposed algorithm were made, which contradicted some statements in the first version. Therefore, the paper can unfortunately not be accepted in its current version. In a future version of the manuscript, the description of the algorithm and of he role of the PPD head should be improved and experiments on another dataset next to CIFAR-10 could be added.",ICLR2022, +A-Rv6ur7lh,1576800000000.0,1576800000000.0,1,rkxmPgrKwB,rkxmPgrKwB,Paper Decision,Reject,"After communicating with each reviewer about the rebuttal, there seems to be a consensus that the paper contains a number of interesting ideas, but the motivation for the paper and the relationship to the literature needs to be expanded. The reviewers have not changed their scores, and so there is not currently enough support to accept this paper.",ICLR2020, +jj_J2TPVwhv,1642700000000.0,1642700000000.0,1,8c50f-DoWAu,8c50f-DoWAu,Paper Decision,Accept (Oral),"The paper is exceptionally well summarized by Reviewer QC5G which is difficult to improve up on. I will save the readers the effort of reading more text (without adding more substance). The reviewers unanimously rated this paper highly. The discussion has been robust, enlightening and also has improved the revised paper.",ICLR2022, +BX8yoCjTI1V,1610040000000.0,1610470000000.0,1,0aW6lYOYB7d,0aW6lYOYB7d,Final Decision,Accept (Poster),"This article provides an analysis of feedforward neural network with iid Gaussian weights and biases in the infinite-width limit. The paper complements earlier work on this topic by taking a function-space approach, considering neural networks as infinite-dimensional random elements on the input space. This is a well-written and rigorous theoretical paper. Although, as noted by a reviewer, there are no direct practical implications, the result is interesting in itself, highly relevant to the ICLR audience, and likely to lead to further exploration of the connections between Gaussian processes and neural networks. + +There were a few questions regarding the proofs that have been answered satisfactorily by the authors. + +I recommend acceptance. ",ICLR2021, +ryw73MIOx,1486400000000.0,1486400000000.0,1,HJIY0E9ge,HJIY0E9ge,ICLR committee final decision,Reject,all reviewers agree that the paper is not convincing enough at this stage but needs more work to be ready for ICLR (e.g. missing comparisons to other existing methods).,ICLR2017, +ryQZlWjDsvgP,1642700000000.0,1642700000000.0,1,jGmNTfiXwGb,jGmNTfiXwGb,Paper Decision,Reject,"## A Brief Summary +This paper uses offline algorithms that can see the entire time-series to approximate the online algorithms that can only view the past time-series. The way this is done is basically, the offline algorithm is used to provide discrete class targets to train the online algorithm. The paper presents results on synthetic and historical stock market data. + +## Reviewer s1H9 +**Strengths:** +- Practical problem. +- Novel approach. +- Clear presentation. +**Weaknesses:** +- No other baselines. +- No theoretical guarantees behind the approach. +- Writing could be improved. + +## Reviewer EgW9 +**Strengths:** +- Clear writing. +- Interesting research direction. +**Weaknesses:** +- The primary claim seems incorrect and unclear. +- Due to the unclarity about the primary claim of this paper, it is difficult to evaluate the paper. +- Lack of baselines. +- The lack of discussions of the related works. + +## Reviewer gii5 +**Strengths:** +- Interesting and novel approach. +**Weaknesses:** +- Difficult to evaluate, with no empirical baselines or theoretical evidence. +- The datasets used in the paper are not used in the literature before. Authors should provide experimental results on datasets from the literature as well. +- The paper needs to compare against the other baselines discussed in the related works. +- More ablations and analysis on the proposed algorithm is required. +- Unsubstantiated claims regarding being SOTA on the task, since the paper doesn't compare against any other baselines on these datasets. +- The paper can be restructured to improve the flow and clarity. + +## Reviewer zoKR +**Strengths:** +- Novel and interesting research topic. +- Bridging classical algorithms and ML. +- Clearly written. + +**Weaknesses:** +- Lack of motivation for the problem. +- The approach only works with offline algorithms that work on time-segmented data. + +## Reviewer aaFn +**Strengths:** +- Novel algorithm. + +**Weaknesses:** +- Potentially overfitting to the offline data. +- Data hungry approach. +- Confusion related to the occurrence moments of predicted future actions. +- Section 2 is difficult to understand. + +## Key Takeaways and Thoughts +Overall, I think the problem setup is very interesting. However, as pointed out by reviewers gii5 and EgW5, due to the lack of baselines, it is tough to compare the proposed algorithm against other approaches, and this paper's evaluation is challenging. I would recommend the authors include more ablations in the future version of the paper and baselines and address the other issues pointed out above by the reviewers.",ICLR2022, +ByxG3WnJlN,1544700000000.0,1545350000000.0,1,Syxt5oC5YQ,Syxt5oC5YQ,A new take on momentum which deserves a longer assessment of related work,Accept (Poster),"Dear authors, + +Reviewers liked the idea of your new optimizer and found the experiments convincing. However, they also would have liked to get better insights on the place of AggMo in the existing optimization literature. Given that the related work section is quite small, I encourage you to expand it based on the works mentioned in the reviews.",ICLR2019,4: The area chair is confident but not absolutely certain +6ezG5-mz2ys,1610040000000.0,1610470000000.0,1,8KhxoxKP3iL,8KhxoxKP3iL,Final Decision,Reject,"The three reviewers seem to reach a consensus that the assumptions made in the paper are too strong and hard to interpret. In particular, R1&R2 made the comments that the generative model for the data by itself uses attention, which seems to be make the comparison unfair. The authors seem to argue that the attention model is still expressive enough, which in the AC's opinion could be true but does not justify the use of the generative model when comparing the sample complexity of the two methods. The reviewers also pointed out a few other limitations of the paper. The AC mostly agrees with the reviewers' points (though perhaps with one or two exceptions.) In summary, I think the assumption of using an attention model itself seems to be a big enough issue that makes the paper not read for publication at ICLR. ",ICLR2021, +yZUzvJry5gp,1610040000000.0,1610470000000.0,1,NeRdBeTionN,NeRdBeTionN,Final Decision,Accept (Spotlight),"All four reviewers unanimously recommended for an acceptance (four 7s). They generally appreciated that the proposed idea is novel and experiments are convincing. I think the paper tackles an important problem of evaluating GANs, and the idea of using self-supervised representations, as opposed to the conventional ImageNet-based representations, would lead to interesting discussions and follow-ups. ",ICLR2021, +PCqcUzemY2P,1610040000000.0,1610470000000.0,1,5g5x0eVdRg,5g5x0eVdRg,Final Decision,Reject,"During the discussion phase, although the reviewers acknowledge superior empirical performance of the proposed method, they shared the two major concerns: +1. Lack of theoretical or empirical justification/proof for the key statement: ""the current methods do not effectively maximize the MI objective because greedy SGD typically results in suboptimal local optima"". +1. Lack of comparisons with newer methods from e.g. ECCV2020 etc. + +In particular, the first point is crucial. As the reviewers pointed out, since it is the main contribution and the key message of this paper, it should be carefully examined theoretically and/or empirically. However, in its current state, there is no theoretical analysis, and empirical evaluation is not convincing. + +About the second point, although I think it cannot be a solo reason for rejection, at least it is better to cite and discuss it fo the completeness. + +Overall, the contribution of this paper it not significant enough for publication. Hence I will reject the paper.",ICLR2021, +9rw_sI9X4NN,1642700000000.0,1642700000000.0,1,yBYVUDj7yF,yBYVUDj7yF,Paper Decision,Reject,"While the reviewers appreciated the theoretical analysis performed in this work, some concerns were raised during discussion, such as how relevant the obtained results are wrt current contrastive learning practices, and whether the comparison against a simple auto-encoder (basically PCA) is fair or insightful. The authors' response did not satisfactorily address these concerns. Overall, the current work appears to be preliminary, and important questions were left out: (a) how realistic the assumptions are (linear, spike covariance, Bernoulli random augmentation)? (b) performing better than PCA in a specifically designed setting may not be as impressive as it appears; what can we say about the optimality of CL against any algorithm? (c). validation on existing benchmark and CL algorithms would be welcome. The authors are encouraged to revise the current draft by incorporating the reviewers' comments and submit again. In its current form, we believe this work is not ready yet.",ICLR2022, +CwtCXuKewH,1576800000000.0,1576800000000.0,1,Hyg4kkHKwH,Hyg4kkHKwH,Paper Decision,Reject,"The paper proposes a neurally inspired model that is a variant of conv-LSTM called V1net. The reviewers had trouble gleaning the main contributions of the work. Given that it is hard to obtain state of art results in neurally inspired architectures, the bar is much higher to demonstrate that there is value in pursuing these architectures. There are not enough convincing results in the paper to show this. I recommend rejection.",ICLR2020, +awOI1TxL-,1576800000000.0,1576800000000.0,1,ryeFzT4YPr,ryeFzT4YPr,Paper Decision,Reject,"The authors present the task lift-the-flap where an agent (artificial or human) is presented with a blurred image and a hidden item. The agent can de-blur the parts of the image by clicking on it. The authors introduce a model for this task (ClickNet) and they compare this against others. +As reviewers point, this paper presents an interesting set of experiments and analyses. Overall, this type of work can be quite influential as it gives an alternative way to improve our models by unveiling human strategies and using those as inductive biases for our models. That being said, I find the conclusions of this paper quite narrow for the general audience of ICLR (as R2 and R3 also point), as authors look into an artificial task and show ClickNet performs well. But what have we learned beyond that? How do we use these results to improve either our models or our understanding of these models? I believe these are the type of questions that are missing from the current version of the paper and that if answered would greatly increase its impact and relevance to the ICLR community. At the moment though, I cannot recommend this paper for acceptance. +",ICLR2020, +Byec_jnVlE,1545030000000.0,1545350000000.0,1,rke_YiRct7,rke_YiRct7,Interesting investigation into the existence of spurious local minima in nonlinear networks,Accept (Poster),"This is an interesting paper that develops new techniques for analyzing the loss surface of deep networks, allowing the existence of spurious local minima to be established under fairly general conditions. The reviewers responded with uniformly positive opinions.",ICLR2019,5: The area chair is absolutely certain +Skgr1ubWe4,1544780000000.0,1545350000000.0,1,BJlpCsC5Km,BJlpCsC5Km,"Intersting idea, but novelty is limited and experimental analysis could be extended.",Reject,"The paper proposes to define the GAN discriminator as an explicit function of a invertible generator density and a structured Gibbs distribution to tackle the problems of spurious modes and mode collapse. The resulting model is similar to R2P2, i.e. it can be seen as adding an adversarial component to R2P2, and shows competitive (but no better) performance. Reviewers agree, that these limits the novelty of the contribution, and that the paper would be improved by a more extensive empirical evaluation. ",ICLR2019,3: The area chair is somewhat confident +M6mvZGwkco,1642700000000.0,1642700000000.0,1,fwzUgo0FM9v,fwzUgo0FM9v,Paper Decision,Accept (Poster),"The authors provide an interesting improvement on privacy attacks in federated learning, demonstrating the ability to extract individual points even over large batches. While there were some concerns about the technical difficulty of the approach, reviewers were broadly in support of the work. As I tend to agree, this is an interesting strengthening beyond what it appears we were able to do before. This is yet another piece of evidence against the canard in FL that only sharing gradient updates provides privacy guarantees.",ICLR2022, +HKH-REzpMrU,1642700000000.0,1642700000000.0,1,kEvhVb452CC,kEvhVb452CC,Paper Decision,Reject,"The paper proposes a method to accelerate training of an architectural hybrid of Transformers and CNNs: first train a CNN and then use the learned parameters to initialize a more general Transformed CNN (T-CNN) model; subsequently continue training the T-CNN. + +Reviewers ratings are marginal, with three ""marginally above threshold"" and one ""marginally below threshold"". However, no reviewer makes a compelling argument for acceptance, and all reviewers point to significant weaknesses in the work. Reviewer ojmG: ""novelty of the proposed method is limited"" and ""do not always reach the performance of end-to-end Transformers"". Reviewer Q4Pp: ""experiments are very limited"" and also (after rebuttal): ""it would good to provide some experiments on a dataset different to ImageNet"". Reviewer ZjBY: ""proposed model is not compared with many of the existing model architectures"" and (after rebuttal): ""would benefit from additional experimental analysis"". Reviewer zV42: ""limited novelty prevents me from giving a higher rating"". + +In summary, while reviewer ratings span either side of above/below the acceptance threshold, the reviewer comments point to limited novelty and limited experimental impact. Results appear not particularly surprising or significant: while the method provide some savings in training time, it does not seem to ultimately improve top accuracy on tasks and still lags behind the latest vision transformer architectures. The author response did not substantially change reviewer opinion. The AC has also taken a detailed look at the paper and does not believe the contribution to be of sufficient significance to warrant acceptance.",ICLR2022, +6Ta2M-tgB,1576800000000.0,1576800000000.0,1,rkg-mA4FDr,rkg-mA4FDr,Paper Decision,Accept (Poster),"This paper conducts a comprehensive study on different retrieval algorithms and show that the two-tower Transformer models with properly designed pre-training tasks can largely improve over the widely used BM-25 algorithm. In fact, the deep learning based two tower retrieval model is already used in the IR field. The main contribution lies in the comprehensive experimental evaluation. + +Blind Review #3 has a major misunderstanding of the paper; hence his review will be excluded. The other two reviewers tend to accept the paper with several minor comments. + +As the authors promise to release the code as a baseline for further works, I agree to accept the paper. +",ICLR2020, +r1gcvUSlxV,1544730000000.0,1545350000000.0,1,HkeyZhC9F7,HkeyZhC9F7,Borderline paper,Reject,"The paper proposes the use of reinforcement learning to learn heuristics in backtracking search algorithm for quantified boolean formulas, using a neural network to learn a suitable representation of literals and clauses to predict actions. The writing and the description of the method and results are generally clear. The main novelty lies in finding a good architecture/representation of the input, and demonstrating the use of RL in a new domain. While there is no theoretical justification for why this heuristic should work better than existing ones, the experimental results look convincing, although they are somewhat limited and the improvements are dataset dependent. In practice, the overhead of the proposed method could be an issue. There was some disagreement among the reviewers as to whether the improvements and the results are significant enough for publication.",ICLR2019,3: The area chair is somewhat confident +HJcFEk6Bf,1517250000000.0,1517260000000.0,401,HyDMX0l0Z,HyDMX0l0Z,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"The paper presents a really interesting take on the mode collapse problem and argue that the issue arises because of the current GAN models try to model distributions with disconnected support using continuous noise and generators. The authors try to fix this issue by training multiple generators with shared parameters except for the last layer. + +The paper is well written and authors did a good job in addressing some of the reviewer concerns and improving the paper. + +Even though arguments presented are novel and interesting, reviewers agree that the paper lacks sufficient theoretical or experimental analysis to substantiate the claims/arguments made in the paper. Limited quantitative and subjective results are not always in favor of the proposed algorithm. More controlled toy experiments and results on larger datasets are needed. The central argument about ""tunneling"" is interesting and needs deeper investigation. Overall the committee recommends this paper for workshop.",ICLR2018, +B1QvVJprf,1517250000000.0,1517260000000.0,368,Sk4w0A0Tb,Sk4w0A0Tb,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"although the authors argue that their experiments were selected from the earlier work from which major comparing approaches were taken, the reviewers found the empirical result to be weak. why not some real tasks (i do not believe bAbI nor PTB could be considered real) that could clearly reveal the superiority of the proposed unit against existing ones?",ICLR2018, +ryNc7JpHf,1517250000000.0,1517260000000.0,198,rkhlb8lCZ,rkhlb8lCZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The idea of using wavelet pooling is novel and will bring many interesting research work in this direction. But more thorough experimental justification such as those recommended by the reviewers would make the paper better. Overall, the committee feels this paper will bring value to the conference.",ICLR2018, +UkrobBJ4cR,1642700000000.0,1642700000000.0,1,L2jrxKBloq8,L2jrxKBloq8,Paper Decision,Reject,"The submitted paper considers a form of second-order extension of successor features building on a second-order representation of the reward function in terms of state-features. The authors demonstrate that this approach can be useful for transfer learning and also show an application to exploration. +All reviewers gave borderline recommendations (2x weak accept, 2x weak reject). While most reviewers agree that the proposed approach can be sensible and that the paper is well written, there are concerns that experimental results do not fully support all claims and additional experiments are required to clearly demonstrate advantages over existing baselines. Also the proposed approach for exploration is rather incomplete and not well studied. The raised concerns were not fully refuted by the authors during the discussion period but rather made some reviewers more concerned about full validty of all claims. Thus, while I think the paper has potential and can be turned into a good paper, I am recommending rejection of the paper in its current form. I would like to encourage to authors to carefully address the reviewers' concerns in future versions of the paper.",ICLR2022, +TVcCSsz9ACc,1642700000000.0,1642700000000.0,1,9otKVlgrpZG,9otKVlgrpZG,Paper Decision,Accept (Poster),"Three knowledgeable referees recommend Accept. Reviewer eyrZ's concerns have been addressed by the authors in the rebuttal, in my opinion. Therefore I recommend Accept. I ask the authors to 1) rename the title of their paper and their model to the more specific name Multi-task Neural Processes (MTNP). I agree with both reviewers F6YH and ACBa that the name ""Multi-task Processes"" does not make justice to the many other models out there that also provide ways to model several stochastic processes simultaneously. Make sure you propagate the name of the new model through the paper. 2) include a discussion in the main paper about the variability of the new results provided in the rebuttal. Only mean NLL and MSE are provided which can be misleading without standard deviations and potential tests for statistical significance.",ICLR2022, +xYRNUWTG6,1576800000000.0,1576800000000.0,1,HkeZQJBKDB,HkeZQJBKDB,Paper Decision,Reject,"The article studies universal approximation for the restricted class of equivariant functions, which can have a smaller number of free parameters. The reviewers found the topic important and also that the approach has merits. However, they pointed out that the article is very hard to read and that more intuitions, a clearer comparison with existing work, and connections to practice would be important. The responses did clarify some of the differences to previous works. However, there was no revision addressing the main concerns. +",ICLR2020, +NMxqliUXi6W,1610040000000.0,1610470000000.0,1,exa2mDqPb5E,exa2mDqPb5E,Final Decision,Reject,"Description: +The paper presents a weakly-supervised model CICGMO for disentangling category, shape and view information from images. Label information is not need as the weak supervision is done by grouping together different views of the same object. They show that this outperforms other techniques on tasks such as invariant clustering and one-shot classification. + +Strengths: +- Paper is well written +- Data category is explicitly modeled +- The weakly supervision approach is appealing, since the grouped data used as supervision information is easy to obtain +- Invariant clustering and one-shot classification results outperforms other methods significantly, showing CIGMO is doing a decent job at those tasks. This could be explained by CIGMO ability to better disentangle category-shape-view. + +Weaknesses: +- It is unclear how well (quality) the generative model is able to disentangle shape from view +- The reconstruction quality is quite low, such that it is difficult often times, in the MULTIPIE example, to clearly identify a face geometry. +- Generated results are not evaluated directly, but rather evaluation is done through down-stream tasks such as invariant clustering. This makes it difficult to show the quality of shape and view information.",ICLR2021, +rygMyJRTyV,1544570000000.0,1545350000000.0,1,HJgJS30qtm,HJgJS30qtm,Refinement of objective and comparison against prior work needed,Reject,"This paper proposes reducing so called ""negative transfer"" through adversarial feature learning. The application of DANN for this task is new. However, the problem setting and particular assumptions are not sufficiently justified. As commented by the reviewers and acknowledged by the authors there is miscommunication about the basic premise of negative transfer and the main assumptions about the target distribution and it's label distribution need further justification. The authors are advised to restructure their manuscript so as to clarify the main contribution, assumptions, and motivation for their problem statement. + +In addition, the paper in it's current form is lacking sufficient experimental evidence to conclude that the proposed approach is preferable compared to prior work (such as Li 2018 and Zhang 2018) and lacks the proper ablation to conclude that the elimination of negative transfer is the main source of improvements. + +We encourage the authors to improve these aspects of the work and resubmit to a future venue. ",ICLR2019,4: The area chair is confident but not absolutely certain +dQnFrGh3_rc,1610040000000.0,1610470000000.0,1,Ggx8fbKZ1-D,Ggx8fbKZ1-D,Final Decision,Reject,"The paper proposes an optimization framework that automatically adapts the learning rates at different levels of a neural network based on hypergradient descent. The AC and reviewers all found the approach interesting and promising and appreciate the author feedback. + +We strongly encourage the authors to incorporate the points and additional results provided in their response to the reviewers. + +Additionally, some concerns remain to be addressed regarding initialization of hyper-parameters of combination weights. Specifically it would be important to further investigate the impact of such initialization on the optimization performance. Furthermore, additional experiments with other network optimizers would be valuable as pointed out in the reviews.",ICLR2021, +ya2lRyXSep5,1610040000000.0,1610470000000.0,1,YZ-NHPj6c6O,YZ-NHPj6c6O,Final Decision,Reject,"This paper proposes a new metric to measure symmetry-based disentanglement and uses this metric to optimize diffusion VAEs on a set of small, synthetic datasets. In general, reviewers found the theoretical framework introduced to be interesting and relevant, but there were a number of concerns regarding the empirical evaluation in the paper and the clarity of many of the claims, particularly wrt the need for strong supervision (pairs of data points with a known transformation between them) for both evaluating the metric and for training by regularizing the proposed metric. I'd encourage the authors to focus on the improvement points suggested by reviewers, most notably by improving the empirical evaluation by adding detailed ablations and comparisons (e.g., exploring the relative amount of supervision needed, comparisons to previous approaches) and clarity regarding the supervision required. As such, I recommend that it be rejected in its current form. ",ICLR2021, +VwNxNNbFsYk,1642700000000.0,1642700000000.0,1,a61qArWbjw_,a61qArWbjw_,Paper Decision,Reject,"PAPER: This paper proposes a method to learn joint representations from potentially missing data when (1) cross-generation may be difficult, and/or (2) with large number of modalities. This is achieved by minimizing the divergence between a surrogate joint posterior and inferences from arbitrary subsets. +DISCUSSION: The reviews and discussion brought many relevant issues and concerns. The authors submitted a revised version that improved the clarity of the paper and added an important experiment with PolyMNIST. In their responses, authors also addressed some misunderstanding about JMVAE-KL. The comparison with a relatively similar work, from Sutter et al., 2020, was only mentioned in the related work, with no direct comparisons. Also, the authors did not directly address the issue of studying tradeoffs between quality of generated samples and their coherences. It should also be noted that the advantage of the proposed SMVAE is marginal when the number of modalities increases, for the latent representation experiments on PolyMNIST. +SUMMARY: Enthusiasm for this paper was not unanimous. The reviewers brought some concerns about its differentiation with priori work, such as Sutton et al., 2020, and about a more detailed analysis of the tradeoffs. While the clarity of the paper improved during the revision, a good number of issues remained. I am leaning towards rejection.",ICLR2022, +XFlHkvRo6Ml,1642700000000.0,1642700000000.0,1,h0OYV0We3oh,h0OYV0We3oh,Paper Decision,Accept (Poster),"The paper addresses the problem of generating images by combining visual components. These components are learned during pretraining, forming a dictionary of visual concpets which plays the role of text in DALLE. The technique is based on DALLE and slot attention approach to generate VQ codes in a way that is consistent. + +Reviewers had various concerns, including (1) that using synthetic images makes it easier to combine visual components' (2) that the novelty and relation to literature was not clear enough (3) missing ablations. The authors provided a detailed rebuttal which addressed reviewer concerns in a convincing way. + +One remaining issue of the paper is the writing. The paper fails to clearly explain the workflow (what are input and output during pretraining, training and inference), and how compositionality is controlled (what can be used for conditioning). As a consequence, it requires substantial effort to understand the idea of the paper, and what real problems can be solved with the proposed approach . + +The paper can be accepted to ICLR, but it is expected that the writing would be improved. The abstract and introduction should make concrete statements about what the approach does, what problems it solves and how it can be used for the various tasks as disucussed in the experiments",ICLR2022, +qYM_YwEhon,1576800000000.0,1576800000000.0,1,Sklf1yrYDr,Sklf1yrYDr,Paper Decision,Accept (Poster),"This paper proposed an improved ensemble method called BatchEnsemble, where the weight matrix is decomposed as the element-wise product of a shared weigth metrix and a rank-one matrix for each member. The effectiveness of the proposed methods has been verified by experiments on a list of various tasks including image classification, machine translation, lifelong learning and uncertainty modeling. The idea is simple and easy to follow. Although some reviewers thought it lacks of in-deep analysis, I would like to see it being accepted so the community can benefit from it.",ICLR2020, +S0qS0DoHmhk,1642700000000.0,1642700000000.0,1,2bO2x8NAIMB,2bO2x8NAIMB,Paper Decision,Accept (Poster),"The authors argue in favor of task-aware continued pretraining and demonstrate through experiments that using objectives based on the end-task during continued pretraining help in improving downstream performance. + +The reviewers generally appreciated the motivation, the formal treatment of the topic and the thoroughness in the experiments. There were some concerns about (i) positioning of the paper (pretraining as opposed to continued pre-training) (ii) thorough comparison with other MTL frameworks (iii) evaluating on more datasets (iv) cost of continued pretraining for each task v) the benefit of META-TARTAN over MT-TARTAN only in specific settings and (vi) lack of surprise/novelty in the results. + +IMO, the authors have adequately addressed ALL the above concerns raised by the reviewers. Further, despite the above concerns, all reviewers agree that the problem is well motivated and of interest to the community and most aspects of this work are thorough. The findings will be useful and may spawn other work in this area.",ICLR2022, +H1l7f-e8l4,1545110000000.0,1545350000000.0,1,Bye5OiR5F7,Bye5OiR5F7,Reject,Reject,"Both R3 and R1 argue for rejection, while R2 argues for a weak accept. Given that we have to reject borderline paper, the AC concludes with ""revise and resubmit"".",ICLR2019,4: The area chair is confident but not absolutely certain +HylX1WVgeN,1544730000000.0,1545350000000.0,1,H1gKYo09tX,H1gKYo09tX,"Marginally novel method, extensive experiments, reasons for experimental results not extremely clear.",Accept (Poster),"Overall this paper presents a few improvements over the code2vec model of Alon et al., applying it to seq2seq tasks. The empirical results are very good, and there is fairly extensive experimentation. + +This is a relatively crowded space, so there are a few natural baselines that were not compared to, but I don't think that comparison to every single baseline is warranted or necessary, and the authors have done an admirable job. One thing that still is quite puzzling is the strength of the ""AST nodes only baseline"", which the authors have given a few explanations for (using nodes helps focus on variables, and also there is an effect of combining together things that are close together in the AST tree). Still, this result doesn't seem to mesh with the overall story of the paper all that well, and again opens up some obvious questions such as whether a Transformer model trained on only AST nodes would have done similarly, and if not why not. + +This paper is very much on the borderline, so if there is space in the conference I think it would be a reasonable addition, but there could also be an argument made that the paper would be stronger in a re-submission where the above questions are answered.",ICLR2019,4: The area chair is confident but not absolutely certain +2csQv39Fibt,1642700000000.0,1642700000000.0,1,3Qh8ezpsca,3Qh8ezpsca,Paper Decision,Reject,"Most reviewers came to the conclusion, that this work lacks novelty and theoretical depth. Further severe concerns about the validity of some statements and about the experimental setup have been raised. The rebuttal was not perceived as being fully convincing, and nobody wanted to champion this paper. +I share most of these points of criticism. Although there is certainly some potential in this work, I think it is not ready for publication and would (at least) need a major revision.",ICLR2022, +VPjlNiiFbRz,1610040000000.0,1610470000000.0,1,#NAME?,#NAME?,Final Decision,Accept (Poster),"The paper is written in defense of pseudo-labeling. The authors `aim at demonstrating that pseudo-labeling based methods can perform on par with consistency regularization methods which have been show to achieve strong performance. + +The paper is well-written and easy-to-follow. The reviewers are generally positive about the contribution. It has to be underline, however, that pseudo-labeling is still a controversial approach with a very limited theoretical understanding. This paper does not provide any further understanding of it, but proposes several ""heuristics"", intuitively well-motivated, but justified only experimentally. Nevertheless, the results are promising and the paper is an important voice in the general discussion around learning with weak labels and semi-supervised learning, two crucial problems in many practical applications. Taking this into account I recommend to accept the paper as a poster. + +The reviewers have raised several problems that the authors have been exhaustively discussing in their rebuttals. The one remaining issue is the interaction between calibration and the threshold. This problem has to be clarified in the final version of the paper, as indeed calibration usually does not change the order.",ICLR2021, +BJe1R3G8dl,1486400000000.0,1486400000000.0,1,ryHlUtqge,ryHlUtqge,ICLR committee final decision,Accept (Poster),"This paper provides an interesting framework for handling semi-supervised RL problems, settings were one can interact with many MDPs drawn from some class, but where only a few have observable rewards; the agent then uses a policy derived from the labeled MDPs to estimate a reward function for the unlabeled MDPs. The approach is straightforward, and one reviewer raises the reasonable concern that this seems to be a fairly narrow definition and approach to the universe of things that could be thought of as semi-supervised reinforcement learning (elements like Laplacian value functions, e.g. proto value functions, for instance, present a very different take on could what also be considered semi-supervised RL). + + Overall, however, the main benefit of this paper is that the overall idea here is quite compelling. Even if it's a narrow view on semisupervised RL, it nonetheless is clearly thinking more broadly about skill and data efficiency than what is common in many current RL papers. Given this impressive scope, plus good performance (if only on a relatively small set of benchmarks), it seems like this paper is certainly above bar for acceptance. + + Pros: + + Nice introduction of a new/modified semisupervised reinforce setting + + Results on benchmarks look compelling (if still fairly small scale) + + Cons: + - Rather limited view of the space of all possible semisupervised Rl + - Results on hardest task (half-cheetah) aren't _that_ much better than much simpler approaches",ICLR2017, +Hyg5HtrKJN,1544280000000.0,1545350000000.0,1,rJfW5oA5KQ,rJfW5oA5KQ,Interesting theoretical work proving sample complexity bounds for GAN training,Accept (Poster),"The paper presents an interesting theoretical analysis by deriving polynomial sample complexity bounds for the training of GANs that depend on the approximator properties of the discriminator. +Even if it is not clear if the theory will help to pick suitable discriminators in practice, it provides +new and interesting theoretical insights on the properties of GAN training. +",ICLR2019,4: The area chair is confident but not absolutely certain +t3RhQQOYsyD,1642700000000.0,1642700000000.0,1,vnENCLwVBET,vnENCLwVBET,Paper Decision,Reject,"The authors propose a reference-less metric for evaluating NLG systems by training a discriminator which distinguishes between human-generated and machine-generated text. + +The main concerns raised by the reviewers were (i) lack of clarity in certain portions of the paper (ii) lack of demonstration of the ""universal"" applicability of the proposed metric (only evaluated for poetry generation) (iii) lack of clear guidelines on how to use the proposed metric in a reproducible manner (iv) lack of details about what exactly does the proposed metric capture and look for in the generated text. + +The authors did not respond to the specific queries of the reviewer and agreed that more work is needed on their part.",ICLR2022, +ZVaOffB9Py,1610040000000.0,1610470000000.0,1,v2tmeZVV9-c,v2tmeZVV9-c,Final Decision,Reject,"The paper proposes speeding up iterative simulations of complex dynamics systems based on connected rods. Traditionally, these systems alternate between forward integration of the dynamics and constraint projections. Instead of replacing this entirely with end-to-end trained ML, here ML is only added a single point in the method to speed up the iterative solver itself, more precisely by providing initial estimates for the constraint projection step. This is done with graph networks. + +At initial evaluation, the paper had two slightly favorable reviews (6) and two unfavorable reviews (4) and was therefore on the fence leaning towards rejection. + +Reviewers appreciated a well motivated method and in an interesting problem. + +However, on the downside, issues raised where lukewarm performance; novelty (a direct application of graph networks); lack of generality of the approach; similarity to graph networks applied on mesh based physics simulations, and similarity to NN applied for speeding up elasticity simulations; application on the finest level only; memorization/lack of generalization; simplicity of baselines; simplicity of tasks (including the added more complex tree task). + +There seemed to be some confusion on whether one of the reviewers had read the initial NeurIPS submission only (which he also had reviewed) or also the ICLR submission; the authors seemed to be upset up this possibility and made it clear in their responses, but the AC can assure them that the proper version has been read, reviewed and discussed; the author's responses in that respect were not helpful. + +The authors provided responses to most of the raised issues, and several reviewers acknowledged that the paper had been improved, in particular by adding comparisons (e.g NN search). + +However, in spite of these improvements, the discussion among reviewers and AC revealed that the paper still has serious issues, in particular minor novelty, lukewarm improvements, and some issues re: comparisons to baselines. While the reviewers acknowledged merits in the idea, the weaknesses hindered them to champion the paper for acceptance at this point, and the AC concurs, recommending rejection.",ICLR2021, +ryed_g33JE,1544500000000.0,1545350000000.0,1,H1e0-30qKm,H1e0-30qKm,metareview,Reject,"The paper received mixed reviews. It proposes a variant of Siamese network objective function, which is interesting. However, it’s unclear if the performance of the unguided method is much better than other baselines (e.g., InfoGAN). The guided version of the method seems to require much domain-specific knowledge and design of the feature function, which makes the paper difficult to apply to broader cases. +",ICLR2019,4: The area chair is confident but not absolutely certain +gfUBUXlQjP,1642700000000.0,1642700000000.0,1,ovRQmeVFbrC,ovRQmeVFbrC,Paper Decision,Reject,"This paper a framework of learning with noisy labels named PARS that combines three types of approaches, i.e., sample selection, noise robust loss, and label correction. The framework leverages both original noisy labels and estimated pseudo labels of all samples for improving the training performance, and the empirical studies demonstrated competitive results on CIFAR datasets especially in high-noise and low-resource settings. + +Reviewers raised some major concerns about the weaknesses. For example, empirical gain in small noise regime are small or negligible, and no empirical gain against SOTA in large dataset with real-world noise (Clothing1M). While large gains in large noise regime (more than 80%), such setting may not be very realistic and there also lack of in-depth analysis on the sources of the gain (e.g., it is unknown if the gain is mainly because of using a better SSL or other factors since LNL becomes more similar to SSL when noise is very high). For technical novelty perspective, while the proposed approach is new, the overall novelty may not be very significant as this paper mainly combines existing techniques, e.g., negative learning and FixMatch (a semi-supervised learning method) in the proposed learning approach. + +Authors have made great efforts for addressing the reviewers’ concerns partly, but some major concerns on the technical novelty and empirical studies remain. Therefore, the paper is not recommended for acceptance in its current form. I hope authors found the review comments and discussions useful and constructive, and like to see it accepted in the near future after these issues are fully addressed.",ICLR2022, +uWqxgeGU7f,1576800000000.0,1576800000000.0,1,Hkx1qkrKPr,Hkx1qkrKPr,Paper Decision,Accept (Poster),"The paper proposes a very simple but thoroughly evaluated and investigated idea for improving generalization in GCNs. Though the reviews are mixed, and in the post-rebuttal discussion the two negative reviewers stuck to their ratings, the area chair feels that there are no strong grounds for rejection in the negative reviews. Accept.",ICLR2020, +U4UaUBrKG4,1642700000000.0,1642700000000.0,1,HO_LL-oqBzW,HO_LL-oqBzW,Paper Decision,Reject,"This paper proposed a flow-based approach FCause to Bayesian causal discovery that is scalable, flexible, and adaptive to missing data. Reviewers were split on this paper and could not reach a consensus during the discussion, and no reviewer pushed for acceptance. After taking a closer look myself, I agree with several of the reviewers that while the core ideas here are interesting and novel, there remain too many unresolved issues that require another round of revision. + +I encourage the authors to carefully take in account the reviewers' comments and re-submit this promising work to another ML venue.",ICLR2022, +oUOGJM0fwLo,1610040000000.0,1610470000000.0,1,AZ4vmLoJft,AZ4vmLoJft,Final Decision,Reject,"This paper studies the problem of computing a similarity measure between two pieces of code. The main contributions are a configurable alternative (CASS) to abstract syntax trees (ASTs) for representing code and a model for embedding these structures within a Siamese net-like architecture. While parts of the ICLR community that make use of ASTs would likely find interest in the options provided by CASS and the associated experiments, the contribution is mostly around feature engineering of AST-like structures for one specific application, which is quite niche. The machine learning modeling appears fairly standard. Thus in total, I don’t see enough here to recommend acceptance.",ICLR2021, +StM1iy46SeVA,1642700000000.0,1642700000000.0,1,hxitw01k_Ql,hxitw01k_Ql,Paper Decision,Reject,All reviewers agreed that the contribution is too limited for the paper to be published. I encourage the authors to take the reviews into account when improving their work.,ICLR2022, +XYcpRwU8Swr,1642700000000.0,1642700000000.0,1,ErX-xMSek2,ErX-xMSek2,Paper Decision,Reject,"This work studies a number of feature representations for few-shot classification, including representations learned from MAML, supervised classification, and some self-supervised tasks. The main conclusion of the study is that learning from more complex tasks result in better representations for few-shot classification. As a practical solution, then, the authors suggest using representations learned from multiple tasks for few-shot classification. + +The paper studies an important problem in machine learning, and reviewers all appreciate that. However, they raised concerns about the draft in its current state. Authors replied to these comments, and while reviewers acknowledged and appreciated the responses and the revision of the draft, unfortunately that did not convince the reviewers. Several major concerns remained unresolved at the end. Specifically, EEwV believes that the paper is a case study, which though useful, does not bring deep new insights or findings. 63j7 believes that even after revisions made to the paper, there are additional experiments required to understand and examine the claims. Tk8a finds the submission unready for publication due to weak experimental analysis and suggests running additional experiments to examine the hypotheses made by the authors (e.g, relating to spurious features, the need of input harmonization, the benefit of voting) and better tie in the findings of this work to related work. Tk8a provides a list of concrete suggestion along these lines. Similar to EEwV, 28ox also thinks that the paper lack novelty or does not really bring new insights on the way to train the backbone. + +Based on these comments, and the ratings, I encourage authors to address these issues and resubmit.",ICLR2022, +BJ-HHy6rM,1517250000000.0,1517260000000.0,555,BJInMmWC-,BJInMmWC-,ICLR 2018 Conference Acceptance Decision,Reject,"This paper presents a novel model for generating images and natural language descriptions simultaneously. The aim is to distangle representations learned for image generation by connecting them to the paired text. The reviews praise the problem setup and the mathematical formulation. However they point out significant issues with the clarity of the presentation in particular the diagrams, citations, and optimization procedure in general. They also point out issues with the experimental setup in terms of datasets used and lack of natural images for the tasks in question. Reviews are impressively thorough and should be of use for a future submission. ",ICLR2018, +SkpXWoAZAN,1576800000000.0,1576800000000.0,1,Hyg-JC4FDr,Hyg-JC4FDr,Paper Decision,Accept (Poster),"This work addresses new insights in the imitation learning setting, and shows how a popular type of approach can be extended in a principled way to the off-policy learning setting. Several requests for clarification were addressed in the rebuttal phase, in particular regarding the empirical evaluation in off-policy settings. The authors improved the empirical validation and overall clarity of the paper. The resulting manuscript provides valuable new insights, in particular in its principled connections, and extension to previous work.",ICLR2020, +KrZbySgxv4,1576800000000.0,1576800000000.0,1,HklWsREKwr,HklWsREKwr,Paper Decision,Reject,"This paper extends Adam by adding another hyperparameter that allows the second moments to be raised to a power p other than 1/2. This certainly seems worth trying. The paper is well written, and the experiments seem reasonably complete. But some of the reviewers and I feel like the contribution is a bit obvious and incremental. The ""small learning rate dilemma"" needs a bit more justification: since the denominator has a different scale, the learning rates for different values of p are not directly comparable. It could very well be that Adam's learning rate has to be set too small due to some outlier dimensions, but showing this would require some evidence. From the experiments, it does seem like there's some practical benefit, though it's not terribly surprising that adding an additional hyperparameter will result in improved performance. The reviewers think the theoretical analysis is a straightforward extension of prior work (though I haven't checked myself). Overall, it doesn't seem to me like the contribution is quite enough for publication at ICLR. +",ICLR2020, +rkeqzcYNgE,1545010000000.0,1545350000000.0,1,B1xWcj0qYm,B1xWcj0qYm,Nice results for learning a classifier from unlabeled data,Accept (Poster),"This paper studies the task of learning a binary classifier from only unlabeled data. They first provide a negative result, i.e., they show it is impossible to learn an unbiased estimator from a set of unlabeled data. Then they provide an empirical risk minimization method which works when given two sets of unlabeled data, as well as the class priors. + +The four submitted reviews were unanimous in their vote to accept. The results are impactful, and might make for an interesting oral presentation.",ICLR2019,5: The area chair is absolutely certain +N3ua2FJkNCa,1642700000000.0,1642700000000.0,1,8la28hZOwug,8la28hZOwug,Paper Decision,Accept (Poster),"This paper proposes a prototypical contrastive predictive coding by combining the prototypical method and contrastive learning, and presents its efficient implementation for three distillation tasks: supervised model compression, self-supervised model compression, and self-supervised learning via self-distillation. The paper is well-written, and the effectiveness of the proposed method is validated through extensive experiments. Reviewers generally agree the paper has clear merits despite some weaknesses for improvement. Overall, I would like to recommend it for acceptance and encourage authors to incorporate all the review comments and suggestions in the final version.",ICLR2022, +B1gKRcpzeV,1544900000000.0,1545350000000.0,1,HJgkx2Aqt7,HJgkx2Aqt7,Methodology could be improved but ideas are very intriguing,Accept (Poster),"This paper discusses the promising idea of using RL for optimizing simulators’ parameters. + +The theme of this paper was very well received by the reviewers. Initial concerns about insufficient experimentation were justified, however the amendments done during the rebuttal period ameliorated this issue. The authors argue that due to considered domain and status of existing literature, extensive comparisons are difficult. The AC sympathizes with this argument, however it is still advised that the experiments are conducted in a more conclusive way, for example by disentangling the effects of the different choices made by the proposed model. For example, how would different sampling strategies for optimization perform? Are there more natural black-box optimization methods to use? + +The reviewers believe that the methodology followed has a lot of space for improvement. However, the paper presents some fresh and intriguing ideas, which make it overall a relevant work for presentation at ICLR.",ICLR2019,4: The area chair is confident but not absolutely certain +NctEFyY9Ai8,1642700000000.0,1642700000000.0,1,n54Drs00M1,n54Drs00M1,Paper Decision,Reject,"This work presents a new sentiment representation method with the use of affect control theory and BERT. Reviewers pointed out several major concerns towards the insufficient experiments and results, as well as the lack of ablation studies and related work discussion. I would like to encourage the authors to take into account the comments from reviewers to further improve their work for a stronger version for future submissions.",ICLR2022, +WuXLZkdTlB,1610040000000.0,1610470000000.0,1,bhCDO_cEGCz,bhCDO_cEGCz,Final Decision,Accept (Poster),"This paper received 4 reviews with mixed initial ratings: 6,7,7,5. The main concerns of R1, who gave an unfavorable score, included lack of clarity (the manuscript is hard to follow) and limited empirical evaluation (the method is tested on a single synthetic dataset, CLEVRER). The latter point is echoed in other reviews as well. In response to that, the authors submitted a new revision and provided detailed responses to each of the reviews separately, which seemed to have addressed these concerns. R1 upgraded the rating and recommended acceptance. +As a result, the final recommendation is to accept this submission for presentation at ICLR as a poster.",ICLR2021, +_7XeEO8vrY,1576800000000.0,1576800000000.0,1,B1lda1HtvB,B1lda1HtvB,Paper Decision,Reject,"The authors propose a method for feature selection in non linear models by using an appropriate continuous relaxation of binary feature selection variables. The reviewers found that the paper contains several interesting methodological contributions. However, they thought that the foundations of the methodology make very strong assumptions. Moreover the experimental evaluation is lacking comparison with other methods for non linear feature selection such as that of Doquet et al and Chang et al.",ICLR2020, +r14C_T1y4zn,1642700000000.0,1642700000000.0,1,ljCoTzUsdS,ljCoTzUsdS,Paper Decision,Reject,"The paper proposes a novel protocol for examining the inductive biases in learning systems, by quantifying the exemplar-rule trade-off (as measured by the exemplar-vs-rule propensity (EVR) defined in Eq. (2)) while controlling for feature-level bias. + +Reviewers mostly agree that the problem studied in this paper is practically relevant and that the two bias measures are potentially interesting and (jointly) more informative than existing measures such as spurious correlation. However, a shared concern among the reviewers (with confidences scores >=3) is the clarity of the exposition (e.g., many key concepts such as the data conditions are informally specified [Section 2 (Reviewer TPBn)], some key messages not clearly conveyed in the main paper [Section 3 (Reviewer RJtk)], and results inconclusive or not sufficiently supported by the experimental results [for both the synthetic setting (Reviewer RJtk) and the real-world setting (Reviewer yoH5)]. Based on the above concerns, the reviewers were not convinced that this work is well supported in its current state to merit acceptance for publication.",ICLR2022, +M73H5fLae0,1642700000000.0,1642700000000.0,1,jxTRL-VOoQo,jxTRL-VOoQo,Paper Decision,Reject,"The paper studies why existing deep GCNs suffer from poor performance and propose DGMLP to improve over existing GCNs. However, the reviewers think there are still many unjustified claims and the paper. Further, several reviewers question about the novelty of the proposed method, which seems to be a combination of existing approaches. + +I suggest the authors to revise the paper by defining terms like model degradation and smoothness mathematically and try to justify each claim (e.g., the effect of disentangling) with solid experiments. These will significantly improve the analysis part and make the conclusions stronger.",ICLR2022, +Nw0s1sDEno,1610040000000.0,1610470000000.0,1,Y5TgO3J_Glc,Y5TgO3J_Glc,Final Decision,Reject,"This paper was pretty borderline, but ultimately I am recommending rejection, for the following reason: + +The two most negative reviewers (in terms of original score) were concerned about both the quality of the evaluation +and whether the evaluation metrics actually meant what the paper claimed they meant. +The authors did make a good faith effort to update the paper to respond to some of the concerns about quality, +but R4 (who has read the rebuttal) is still not convinced that the results of the evaluation are meaningful, +and I think I agree with their concern (and I don't see any attempt to address that concern in the rebuttal?). +I view (maybe naively) the goal of this review process as being mostly a correctness check, +and I don't think this paper has passed the correctness check to my satisfaction or to the satisfaction of the majority of reviewers. + +However! This is a fixable issue. The paper is definitely cool and interesting, and I would urge the authors to think harder +about what sort of evaluation makes sense here and resubmit to the next suitable machine learning conference.",ICLR2021, +JgZ_IL9Gvu,1576800000000.0,1576800000000.0,1,BkeOp6EKDH,BkeOp6EKDH,Paper Decision,Reject,"This paper proposes a new dimensionality reduction technique that tries to preserve the global structure of the data as measured by the relative distances between triplets. As Reviewer 1 noted, the construction of the TriMap algorithm is fairly heuristic, making it difficult to determine how TriMap ought to behave “better” than existing dimensionality reduction approaches other than through qualitative assessment. Here, I share Reviewer 2’s concern that the qualitative behavior of TriMap is difficult to distinguish from existing methods in many of the figures. +",ICLR2020, +SAJB044NG,1576800000000.0,1576800000000.0,1,Byg9A24tvB,Byg9A24tvB,Paper Decision,Accept (Poster),"This paper proposes an alternative loss function, the max-mahalanobis center loss, that is claimed to improve adversarial robustness. + +In terms of quality, the reviewers commented on the convincing experiments and theoretical results, and were happy to see the sample density analysis. + +In terms of clarity, the reviewers commented that the paper is well-written. + +The problem of adversarial robustness is relevant to the ICLR community, and the proposed approach is a novel and significant contribution in this area. + +The authors have also convincingly answered the questions of the authors and even provided new theoretical and experimental results in their final upload. ",ICLR2020, +rkfcHkaBf,1517250000000.0,1517260000000.0,622,SJCq_fZ0Z,SJCq_fZ0Z,ICLR 2018 Conference Acceptance Decision,Reject,"The authors propose to use attention over past time steps to try and solve the gradient flow problem in learning recurrent neural networks. Attention is performed over a subset of past states by a hueristic that boils to selecting best time-steps. + +I agree with the authors that they offer a lot of comparisons, but like the reviewers, I am inclined to find the experiments not very convincing of the arguments they are attempting to make. The model that they propose has similarities to seq2seq in that they use attention to pass more information in the forward pass; in a sense this is a seq2seq model with the same encoder and decoder, and there are parallels to self-attention. The model also has similarities to clockwork RNNs and other skip connection methods.. However, the experiments offered to not tease out these effects. It is unsurprising that a fixed size neural network is unable to do a long copy task perfectly, but an attention model can. What would have been more interesting would have been to explore if other RNN models could have done so. The experiments on pMNIST aren't really compelling as the baselines are far from SOTA (example: https://arxiv.org/pdf/1606.01305.pdf report 0.041 error rate (95.9% test acc) with LSTMs and regularization). Text8 also shows worse results in full BPTT on LSTM. If BPTT is consistently better than this method, it defeats the argument that gradient explosion and forgetting over long sequences is a problem for RNNs (one of the motivations offered for this attention model). +",ICLR2018, +5jApeGU6NSW,1642700000000.0,1642700000000.0,1,MWQCPYSJRN,MWQCPYSJRN,Paper Decision,Reject,"This paper suggests a new technique to utilize generative replay for continual learning. Specifically, the authors claim that even though the generated samples are imperfect (thus cannot be used as positive samples for old classes), they can still be used as negative samples for the current class. 3 reviewers are negative and 1 reviewer is positive. The main concerns of negative reviewers are (a) non-ablated effects of baseline and proposed components, (b) insufficient analysis of negative replay, and (c) no assessment of generated data quality. The rebuttal provides an additional experiment to address the issue (a), but the reviewers and AC think the experiments should be better polished. Also, AC believes the issues (b) and (c) should be better analyzed. The rebuttal claims that issue (c) is not applicable as they generate samples on the latent space. However, the main motivation of the paper is the low quality of generated samples, and the paper should provide a quality measure to support their claim. For example, an update of the feature extractor may move the latent space generative replay to the wrong class (i.e., low quality), and thus one should not use it as positive but only as negative, as suggested in this paper. Here, the negative replay would increase the margin of current and old classes, enhancing the accuracy of the current class. To analyze the source of benefits (old vs. current classes), the authors could report the task-wise accuracy trends, not only the overall accuracy. It would be a nice addition to the issue (b). Due to these unresolved concerns, AC tends to recommend rejection.",ICLR2022, +rklMmkpSf,1517250000000.0,1517260000000.0,83,Hk5elxbRW,Hk5elxbRW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The submission proposes a loss surrogate for top-k classification, as in the official imagenet evaluation. The approach is well motivated, and the paper is very well organized with thorough technical proofs in the appendix, and a well presented main text. The main results are: 1) a theoretically motivated surrogate, 2) that gives up to a couple percent improvement over cross-entropy loss in the presence of label noise or smaller datasets. + +It is a bit disappointing that performance is limited in the ideal case and that it does not more gracefully degrade to epsilon better than cross entropy loss. Rather, it seems to give performance epsilon worse than cross-entropy loss in an ideal case with clean labels and lots of data. Nevertheless, it is a step in the right direction for optimizing the error measure to be used during evaluation. The reviewers uniformly recommended acceptance.",ICLR2018, +#NAME?,1576800000000.0,1576800000000.0,1,BJe7h34YDS,BJe7h34YDS,Paper Decision,Reject,"This paper suggests stabilizing the training of GANs using ideas from control theory. The reviewers all noted that the approach was well-motivated and seemed convinced that that the problem was a worthwhile one. However, there were universal concerns about the comparisons with baselines and performance over previous works on Stabilizing GAN training and the authors were not able to properly address them.",ICLR2020, +Kb3LiMCIOARx,1642700000000.0,1642700000000.0,1,hcoswsDHNAW,hcoswsDHNAW,Paper Decision,Accept (Poster),"This paper improves the training speed and decrease the computation cost of AdvProp, which is a method that leverages the adversarial example to improve the image recognition accuracy. The method achieves the speedup by leveraging a collection of practical heuristics, including reusing some gradient computation during training. The paper is well written, well justified with empirical supports, and can be potentially useful in many vision tasks. On the other hand, some novelty of the method is incremental, and the issues regarding empirical results and claims pointed out by the reviewers need to be addressed in the revision.",ICLR2022, +2SzUVrg5Y-,1576800000000.0,1576800000000.0,1,HJgFW6EKvH,HJgFW6EKvH,Paper Decision,Reject,"This paper proposes a method called iterative proportional clipping (IPC) for generating adversarial audio examples that are imperceptible to humans. The efficiency of the method is demonstrated by generating adversarial examples to attack the Wav2letter+ model. Overall, the reviewers found the work interesting, but somewhat incremental and analysis of the method and generated samples incomplete, and I’m thus recommending rejection.",ICLR2020, +ufqHJmHCjZn,1642700000000.0,1642700000000.0,1,BduNVoPyXBK,BduNVoPyXBK,Paper Decision,Reject,"This paper develops a mechanism for learning modular state representations in RL that organize recurring patterns into composable schemas. The approach combines modular RNNs as in RIMs (Goyal et al., 2020) with a dynamic feature attention mechanism. There were a variety of concerns in the initial reviews that were addressed by the authors through a set of clarifications and improved empirical analysis, substantially improving the paper. However, there still remain some issues in clarity of presentation and inconsistent empirical results, especially in the form of clear take-aways from the empirical analysis and broader insights from the paper, as detailed in the individual reviews. The authors are encouraged to take these aspects into consideration in revising their manuscript.",ICLR2022, +SJgq3_vj1N,1544420000000.0,1545350000000.0,1,ByN7Yo05YX,ByN7Yo05YX,metareview: unconvincing experiments,Reject,"This paper proposes adaptive neural trees (ANT), a combination of deep networks and decision tress. Reviewers 1 leans toward reject the paper, pointing out several flaws. Reviewer 3 also raises concerns, despite later increasing rating to marginally above threshold. Of particular note is the weak experimental validation. + +The paper reports results only on MNIST and CIFAR-10. MNIST performance is too easily saturated to be meaningful. The CIFAR-10 results show ANT models to have far greater error than the state-of-the-art deep neural network models. + +As Reviewer 1 states, ""performance of the proposed method is also not the best on either of the tested datasets. Please clearly elaborate on why and how to address this issue. It would be more interesting and meaningful to work with a more recent large datasets, such as ImageNet or MS COCO."" + +The rebuttal fails to offer the type of additional results that would remedy this situation. Without a convincing experimental story, it is not possible to recommend acceptance of this paper.",ICLR2019,5: The area chair is absolutely certain +H1x93GI_e,1486400000000.0,1486400000000.0,1,HJOZBvcel,HJOZBvcel,ICLR committee final decision,Invite to Workshop Track,"The authors provide a modern twist to the classical problem of graphical model selection. Traditionally, the sparsity priors to encourage selection of specific structures is hand-engineered. Instead, the authors propose using a neural network to train for these priors. Since graphical models are useful in the small-sample regime, using neural networks directly on the training data is not effective. Instead, the authors propose generating data based on the desired graph structures to train the neural network. + + While this is a nice idea, the paper is not clear and convincing enough to be accepted to the conference, and instead, recommend it to the workshop track.",ICLR2017, +SRtkZ4JumqRx,1642700000000.0,1642700000000.0,1,eYyvftCgtD,eYyvftCgtD,Paper Decision,Reject,"The authors propose modifications to the Transformer architecture in BERT by using grouped FFN and an additional convolution module. +The paper doesn't have all the results and comparison that should be done for a model that has seen similar architecture modification in the previous papers. While it is not necessary to show improvements on multiple hardware systems, it is important to see comparisons to more, stronger baselines and metrics on the full downstream GLUE eval rather than just Squad to establish improvements. +A reject.",ICLR2022, +HkenHJVzxN,1544860000000.0,1545350000000.0,1,r1xFE3Rqt7,r1xFE3Rqt7,"The paper is clearly written, but there are remaining concerns on contributions and comparisons",Reject,"The paper is clearly written and well motivated, but there are remaining concerns on contributions and comparisons. + +The paper received mixed initial reviews. After extensive discussions, while the authors successfully clarified several important issues (such as computation efficiency w.r.t splitting) pointed out by Reviewer 4 (an expert in the field), they were not able to convince him/her about the significance of the proposed network compression method. + +Reviewer 4 has the following remaining concerns: + +1) This is a typical paper showing only FLOPs reduction but with an intent of real-time acceleration. However, wall-clock speedup is different from FLOPs reduction. It may not be beneficial to change the current computing flow optimized in modern software/hardware. This is one of major reasons why the reported wall-clock time even slows down. The problem may be alleviated with optimization efforts on software or hardware, then it is unclear how good/worse will it be when compared with fine-grain pruning solutions (Han et al. 2015b, Han et al. 2016 & Han et al. 2017), which achieved a higher FLOP reduction and a great wall-clock speedup with hardware optimized (using ASIC and FPGA); + +2) If it is OK to target on FLOPs reduction (without comparison with fine-grain pruning solutions), + 2.1) In LSTM experiments, the major producer of FLOPs -- the output layer, is excluded and this exclusion was hidden in the first version. Although the author(s) claimed that an output layer could be compressed, it is not shown in the paper. Compressing output layer will reduce model capacity, making other layers more difficult being compressed. + 2.2) In CNN experiments, the improvements of CIFAR-10 is within a random range and not statistically significant. In table 2, ""Regular low-rank MobileNet"" improves the original MobileNet, showing that the original MobileNet (an arXiv paper) is not well designed. ""Adaptive Low-rank MobileNet"" improves accuracy upon ""Regular low-rank MobileNet"", but using 0.3M more parameters. The trade-off is unclear. + +In addition to these remaining concerns of Reviewer 4, the AC feels that the paper essentially modifies the original network structure in a very specific way: adding a particular nonlinear layer between two adjacent layers. Thus it seems a little bit unfair to mainly use low-rank factorization (which can be considered as a compression technique that barely changes the network architecture) for comparison. Adding comparisons with fine-grain pruning solutions (Han et al. 2015b, Han et al. 2016 & Han et al. 2017) and a large number of more recent related references inspired by the low-rank baseline (M. Jaderberg et al 2014) , as listed by Reviewer 4, will make the proposed method much more convincing. ",ICLR2019,4: The area chair is confident but not absolutely certain +SYLj1sHTj,1576800000000.0,1576800000000.0,1,H1gzR2VKDH,H1gzR2VKDH,Paper Decision,Accept (Poster),"This paper proposes a method that uses subgoals for planning when using video prediction. The reviewers thought that the paper was clearly written and interesting. The reviewer questions and concerns were mostly addressed during the discussion phase, and the reviewers are in agreement that the paper should be accepted.",ICLR2020, +35viKf4BF,1576800000000.0,1576800000000.0,1,S1eZYeHFDS,S1eZYeHFDS,Paper Decision,Accept (Spotlight),"The paper presents a deep learning approach for tasks such as symbolic integration and solving differential equations. + +The reviewers were positive and the paper has had extensive discussion, which we hope has been positive for the authors. + +We look forward to seeing the engagement with this work at the conference.",ICLR2020, +2WL0uqxQCXJ,1642700000000.0,1642700000000.0,1,yjxVspo7gXt,yjxVspo7gXt,Paper Decision,Reject,"The manuscript performs an empirical analysis of existing bias mitigation methods on two large datasets CelebA and ImageNet People Subtree where there are multiple sensitive attributes and some unavailable sensitive attribute labels. The results show that existing methods can mitigate intersectional bias at scale but unlabeled mitigation methods generalize poorly. The manuscript further proposes a knowledge distillation approach which can augment other labeled mitigation approaches. + +On the positive aspect, the manuscript studies an important problem: intersectional subgroups on deep learning methods. Reviewers acknowledged that an empirical study on this problem is as an opportunity to make a contribution as it can highlight previously unknown issues. + +There are however several major concerns including: +1. Methodological contribution (knowledge distillation) is under-developed, while empirical investigation is interesting but can be further developed; +2. The fairness metrics adopted in this manuscript need to be clarified; +3. A discussion on the hyperparameter tuning, maybe involving a fairness-accuracy tradeoff; +4. The claimed O(1) complexity for the knowledge distillation approach is implausible because it assumes the availability of G group-specific models. This has been clarified in the rebuttal that the claim is only for the inference complexity, and the approach does not improve the training complexity. + +Reviewers also concluded that while the empirical analysis is interesting, the results on CelebA to be of limited use because the sensitive attributes are ""purely illustrative."" It's not clear that the insights from these illustrative intersectional groups (e.g. big nose & attractive) will hold for groups that are meaningful in a fairness sense.",ICLR2022, +SJxLAQjbxN,1544820000000.0,1545350000000.0,1,rygFmh0cKm,rygFmh0cKm,Meta-Review,Reject,"The paper proposes new methods for optimization of optimization of KL(student_model||teacher_model). + +The topic is relevant. The paper also contains interesting ideas and the proposed methods are interesting; they are elegant and seems to work reasonably well on the tasks tried. + +However, the reviewers do not all agree that the paper is well written. The reviewers have pointed out several issues that need to be addresses before the paper can be accepted. + + + + +",ICLR2019,4: The area chair is confident but not absolutely certain +2GyfBUYfnJ,1610040000000.0,1610470000000.0,1,Lc28QAB4ypz,Lc28QAB4ypz,Final Decision,Accept (Poster),"The paper examines an idea that knowledge and rewards are stationary and reusable across tasks. An interesting paper that combines number of related topics (meta RL, HRL, time scale in RL, and attention), improving the speed of training. + +The authors have addressed the reviewer comments, strengthening the paper. The reviewers agree, and I concur, that the paper contributes a novel model, valuable to the ICLR community. It is well thought-out, presented, and evaluated. ",ICLR2021, +HyxfwCuZxE,1544810000000.0,1545350000000.0,1,SkgToo0qFm,SkgToo0qFm,No strong reviewer support,Reject,"Two out of three reviews for this paper were provided in detail, but all three reviewers agreed unanimously that this paper is below the acceptance bar for ICLR. The reviewers admired the clarity of writing, and appreciated the importance of the application, but none recommended the paper for acceptance due largely to concerns on the experimental setup.",ICLR2019,4: The area chair is confident but not absolutely certain +CGRpO1wco0,1576800000000.0,1576800000000.0,1,BylQm1HKvB,BylQm1HKvB,Paper Decision,Reject,"This paper is very different from most ICLR submissions, and appears to be addressing interesting themes. However the paper seems poorly written, and generally unclear. The motivation, task, method and evaluation are all unclear. I recommend that the authors add explicit definitions, equations, algorithm boxes, and more examples to make their paper clearer.",ICLR2020, +S1HuVJTrf,1517250000000.0,1517260000000.0,384,Syr8Qc1CW,Syr8Qc1CW,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"The method proposed in the paper for latent disentanglement and attribute-conditional image generation is novel to the best of my understanding but reviewers (Anon1 and Anon3) have expressed concerns on the quality of results (CelebA images) as well as on the technical presentation and claims in the paper. + +Given the novelty of the proposed method, I would *not* like to recommend a ""reject"" for this paper but the concerns raised by the reviewers on the quality of results and lack of quantitative results seem valid. Authors rule out possibility of any quantitative results in their response but I am not fully convinced -- in particular, effectiveness of attribute-conditional image generation can be captured by first training an attribute classifier on the generated images and then measuring how often the predicted attributes are flipped when conditioning signal is changed. There are also other metrics in the literature for evaluating generative models. + +I would recommend inviting it to the workshop track, given that the work is novel and interesting but has scope for improvements. +",ICLR2018, +sNMqllqTYS,1642700000000.0,1642700000000.0,1,45L_dgP48Vd,45L_dgP48Vd,Paper Decision,Accept (Spotlight),"The paper tackles the problem of detecting anomalies in multiple time-series. All the reviewers agreed that the methodology is novel, sound and very interesting. Initially, there were some concerns regarding the experimental evaluation, however, the rebuttal and subsequent discussion cleared up these concerns to some extent and all reviewers are eventually supporting or strongly supporting acceptance.",ICLR2022, +9jjSWkFvGw,1576800000000.0,1576800000000.0,1,rJe_cyrKPB,rJe_cyrKPB,Paper Decision,Reject,"The authors use a Tucker decomposition to represent the weights of a network, for efficient computation. The idea is natural, and preliminary results promising. The main concern was lack of empirical validation and comparisons. While the authors have provided partial additional results in the rebuttal, which is appreciated, a thorough set of experiments and comparisons would ideally be included in a new version of the paper, and then considered again in review. +",ICLR2020, +e52kwC7Ozd,1576800000000.0,1576800000000.0,1,Bye2uJHYwr,Bye2uJHYwr,Paper Decision,Reject,"This paper aims to address transfer learning by importance weighted ERM that estimates a density ratio from the given sample and some auxiliary information on the population. Several learning bounds were proven to promote the use of importance weighted ERM. + +Reviewers and AC feel that the novelty of this paper is modest given the rich relevant literature and the practical use of this paper may be limited. The discussion with related theoretical work such as generalization bound of PU learning can be expanded significantly. The presentation can be largely improved, especially in the experiment part. The rebuttal is somewhat subjective and unconvincing to address the concerns. + +Hence I recommend rejection.",ICLR2020, +#NAME?,1576800000000.0,1576800000000.0,1,Hkxi2gHYvH,Hkxi2gHYvH,Paper Decision,Reject,"The paper proposes to use the representation learned via CPC to do reward shaping via clustering the embedding and providing a reward based on the distance from the goal. + +The reviewers point out some conceptual issues with the paper, the key one being that the method is contingent on a random policy being able to reach the goal, which is not true for difficult environments that the paper claims to be motivated by. One reviewer noted limited experiment runs and lack of comparisons with other reward shaping methods. + +I recommend rejection, but hope the authors find the feedback helpful and submit a future version elsewhere.",ICLR2020, +TycDN3Rs-jD,1610040000000.0,1610470000000.0,1,xHqKw3xJQhi,xHqKw3xJQhi,Final Decision,Reject,"The authors propose a new approach to topology optimization to address over-smoothing in GCNs. This is a borderline paper. Topology optimization is clearly important and relevant and the approach tries to optimize the topology (add/delete edges) by viewing the problem as a latent variable model and aiming to optimize the graph together with the GCN parameters to maximize the likelihood of observed node labels. A number of related joint topology optimization approaches exist, however, as discussed in the reviews and the responses. The proposed methodology is termed variational EM but is a bit heuristic in the sense that E and M steps do not follow a consistent criterion (the direction of KL is flipped between the steps). A number of comparisons are provided with consistent gains though the gains appear relatively small. No error bars are provided despite request to add them to better assess the significance of these results. It remains unclear whether the gains are worth the added complexity. +",ICLR2021, +rzcOe77Iv5V,1642700000000.0,1642700000000.0,1,V0A5g83gdQ_,V0A5g83gdQ_,Paper Decision,Accept (Poster),"This paper presents a tensor diagram view of the multi-headed self-attention (MHSA) mechanism used in Transformer architectures, and by modifying the tensor diagram, introduces a strict generalization of MHSA called the Tucker-head self attention (THSA) mechanism. While there is some concern regarding the incremental nature of the proposition, the identification of where to usefully add the additional parameter that converts from MHSA to THSA was nontrivial, and the experimental results on the performance benefits across multiple tasks is convincing.",ICLR2022, +H1s73z8Ol,1486400000000.0,1486400000000.0,1,HJlgm-B9lx,HJlgm-B9lx,ICLR committee final decision,Reject,"The consensus amongst reviewers' was that this paper, incorporating global context into classification, is not ready for publication. It provides no novelty over similar methods. The evaluation did not convince most of the reviewers. The paper seems peppered with unjustified and (as rather bluntly, but accurately, put by one reviewer) unscientific claims. Disappointingly, the authors did not respond to pre-review questions. Perhaps more understandably, they did not respond to the uniformly negative reviews of their paper to defend it. I see no reason to diverge from the reviewers' recommendation, and advocate rejection of this paper.",ICLR2017, +B-wu7JbbUM,1576800000000.0,1576800000000.0,1,S1lk61BtvB,S1lk61BtvB,Paper Decision,Reject,"This paper proposed an improvement on VAE-GAN which draws multiple samples from the reparameterized latent distribution for each inferred q(z|x), and only backpropagates reconstruction error for the resulting G(z) which has the lowest reconstruction. While the idea is interesting, the novelty is not high compared with existing similar works, and the improvement is not significant.",ICLR2020, +NaR5pfs6EDL,1610040000000.0,1610470000000.0,1,6s7ME_X5_Un,6s7ME_X5_Un,Final Decision,Accept (Spotlight),"Reviewers agreed that connecting neural networks with dynamical systems to create a new kind of optimizer is an interesting idea. After the authors' improvements, this is a strong submission of wide interest.",ICLR2021, +tG1zbTUaSh,1576800000000.0,1576800000000.0,1,SkgscaNYPS,SkgscaNYPS,Paper Decision,Accept (Poster),"This paper studies the spectrum of the Hessian through training, making connections with the NTK limit. While many of the results are perhaps unsurprising, and more empirically driven, together the paper represents a valuable contribution towards our understanding of generalization in deep learning. Please carefully account for the reviewer comments in the final version.",ICLR2020, +EY8fS8_2cRi,1642700000000.0,1642700000000.0,1,8Py-W8lSUgy,8Py-W8lSUgy,Paper Decision,Accept (Spotlight),"The paper describes a novel learning scenario where there are many related tasks, some seen at test time, and some seen only at training time, where additionally the task labels can be hidden or present. This approach generalizes both a ""relational setting"" (where auxiliary task labels could be used as features) and a ""meta setting"" (where new tasks need to be solved in a zero-shot setting using data from related tasks only). The idea behind the method is to do MTL with a common representation and a set of task-specific heads, and build a graph where (1) tasks are nodes associated with the parameters of their task-specific ""heads"" and (2) edges link examples to tasks with known labels. A GNN method is then used to find regularities in the graph. + +Pros + - The setting is innovative and the approach is novel + - The experimental results are strong + +Cons + - Some of the terminology seems awkward and/or strained (eg ""knowledge graph"" for the task-example graph)",ICLR2022, +O71KGil1YK,1576800000000.0,1576800000000.0,1,rkgU1gHtvr,rkgU1gHtvr,Paper Decision,Accept (Poster),"The authors present a method to address off-policy policy evaluation in the infinite horizon case, when the available data comes from multiple unknown behavior policies. Their solution -- the estimated mixture policy -- combines recent ideas from both infinite horizon OPE and regression importance sampling, a recent importance sampling based method. At first, the reviewers were concerned about writing clarity, feasibility in the continuous case, and comparisons to contemporary methods like DualDICE. After the rebuttal period, the reviewers agreed that all the major issues had been addressed through clarifications, rewriting, code release, and additional empirical comparisons. Thus, I recommend to accept this paper.",ICLR2020, +6i3sUc815s,1610040000000.0,1610470000000.0,1,nIqapkAyZ9_,nIqapkAyZ9_,Final Decision,Reject,"The paper presents a new regularizer based on singular value decomposition in embedding space to avoid model collapse. The reviewes liked the simplicity of the idea, but there were some remaining concerns regarding the experiments. Moreover, two reviewers mentionned some concerns with respect to the clarity of the paper. While some concerns have been addressed by the rebuttal, in particular regarding the clarity of the paper, the concerns regarding the experiments remained, and the reviewers agreed that the paper needs a revision before publication. + +The main directions of improvement are to make the comparison with previous published results clearer, in particular comparing different methods with better hyperparameter tuning, and test on larger datasets. ",ICLR2021, +ByldPHwxxV,1544740000000.0,1545350000000.0,1,B1eKk2CcKm,B1eKk2CcKm,Not ready for publication,Reject,"This paper proposes to learn continuous of k-mer embeddings for RNA-seq analysis. Major concerns of the paper include: 1. novelty seems limited; 2. questions about the scalability of the approach; 3. evaluation experiments were not suitable for supporting the aim. Overall, this paper cannot be accepted yet. ",ICLR2019,5: The area chair is absolutely certain +r1lqM84kxE,1544660000000.0,1545350000000.0,1,SyVuRiC5K7,SyVuRiC5K7,A new transductive few-shot learning algorithm with strong empirical results,Accept (Poster),"As far as I know, this is the first paper to combine transductive learning with few-shot classification. The proposed algorithm, TPN, combines label propagation with episodic training, as well as learning an adaptive kernel bandwidth in order to determine the label propagation graph. The reviewers liked the idea, however there were concerns of novelty and clarity. I think the contributions of the paper and the strong empirical results are sufficient to merit acceptance, however the paper has not undergone a revision since September. It is therefore recommended that the authors improve the clarity based on the reviewer feedback. In particular, clarifying the details around learning \sigma_i and graph construction. It would also be useful to include the discussion of timing complexity in the final draft.",ICLR2019,4: The area chair is confident but not absolutely certain +RUrWsk2bO,1576800000000.0,1576800000000.0,1,rkeIq2VYPr,rkeIq2VYPr,Paper Decision,Accept (Poster),"Most reviewers seems in favour of accepting this paper, with the borderline rejection being satisfied with acceptance if the authors take special heed of their comments to improve the clarity of the paper when preparing the final version. From examination of the reviews, the paper achieves enough to warrant publication. Accept.",ICLR2020, +HDZJyl5by00,1610040000000.0,1610470000000.0,1,rcQdycl0zyk,rcQdycl0zyk,Final Decision,Accept (Spotlight),"The authors propose a new parameterization which (across multiple architectures) generalized hypercomplex multiplication and provides for small low dimensions strong performance at substantial parameter savings. All reviewers are happy with the theoretical contributions of the work, but would appreciate additional empirical evidence.",ICLR2021, +YrUjsMc6Cq,1576800000000.0,1576800000000.0,1,SJl3h2EYvS,SJl3h2EYvS,Paper Decision,Reject,"This paper demonstrates that per-image semantic supervision, as opposed to class-only supervision, can benefit zero-shot learning performance in certain contexts. Evaluations are conducted using CUB and FLOWERS fine-grained zero-shot data sets. In terms of evaluation, the paper received mixed final scores (two reject, one accept). + +During the rebuttal period, both reject reviewers considered the author responses, but in the end did not find the counterarguments sufficiently convincing. For example, one reviewer maintained that in its present form, the paper appeared too shallow without additional experiments and analyses to justify the suitability of the contrastive loss used for obtaining embeddings applied to zero-shot learning. Another continued to believe post-rebuttal that reference Reed et al., (2016) undercut the novelty of the proposed approach. + +And consistent with these sentiments, even the reviewer who voted for acceptance alluded to the limited novelty of the proposed approach; however, the author response merely states that a future revision will clarify the novelty. But this then requires another round of reviewing to determine whether the contribution is sufficiently new, especially given that all reviewers raised this criticism in one way or another. Furthermore, the rebuttal also mentions the inclusion of some additional experiments, but again, we don't know how these will turn out. + +Based on these considerations then, the AC did not see sufficient justification for accepting a paper with aggregate scores that are otherwise well below the norm for successful ICLR submissions.",ICLR2020, +Fnmvsgx3pG,1610040000000.0,1610470000000.0,1,4SiMia0kjba,4SiMia0kjba,Final Decision,Reject,"The paper presents a spatial-temporal prediction framework with causal effects of predictors for better interpretability. The idea is interesting and the touch on modeling causal relations could be useful in practical applications. The paper receives mixed ratings and therefore there has been extensive discussion. We agree that while the paper has some merits, it falls short on the following aspects: + +1, One central issue pointed out by all reviewers is the evaluation. For example, the contribution on efficient attention was not compared to any previous work; most of the baselines do no have access to the spatial information, which makes the comparison unfair. The authors did add two more baselines with access to spatial information. However, there are not enough details and discussions to make the results convincing; In addition, other stronger baselines should be added. + +2. The notation and technical presentation was extremely lacking in the submitted version, the amount of unintroduced notations. Even in their core contribution equations had major issues with norm and vectors mixed together (see the difference between the corrected equation in the Taylor equation and the one in the original submission) + +After the discussion, all reviewers agree that the paper fails to provide a fair and convincing evaluation, and the ratings will be adjusted to reflect the discussion. We hope that the reviews can help the authors improve the draft for a stronger submission in the future. ",ICLR2021, +d65vCXdbrVG,1642700000000.0,1642700000000.0,1,26gKg6x-ie,26gKg6x-ie,Paper Decision,Accept (Spotlight),"Thanks for your submission to ICLR. + +Three of the four reviewers are ultimately (particularly after discussion) very enthusiastic about the paper, and feel that their concerns have been adequately addressed. The fourth reviewer has not updated his/her score but has indicated that their concerns were at least somewhat addressed. I took a look at their review and agree that the authors have addressed these concerns sufficiently. I am happy to recommend this paper for acceptance at this point. Note that I really appreciate the time and effort that the authors went into adding additional results and clarifications for the reviewers.",ICLR2022, +gb78d2sm0sx,1642700000000.0,1642700000000.0,1,TfhfZLQ2EJO,TfhfZLQ2EJO,Paper Decision,Accept (Poster),"The topic of learning reward functions from preferences and how to do this efficiently is of high interest to the ML/RL community. All reviewers appreciate the suggested technical approach and the thorough evaluations that demonstrate clear improvements. While the technical novelty of the paper is not entirely compelling, all reviewers recommend acceptance of the paper.",ICLR2022, +r1l_1dGglN,1544720000000.0,1545350000000.0,1,SkGtjjR5t7,SkGtjjR5t7,meta-review,Reject,"The authors present a method for training a policy for a self-driving car. The inputs to the policy are map-based perceptual features and the outputs are waypoints on a trajectory, and the method is an augmented imitation learning framework that uses perturbations and additional losses to make the policy more robust and effective in rare events. The paper is clear and well-written and the authors do demonstrate that it can be used to control a real vehicle. However, the reviewers all had concerns about the oracle feature representation which is the input and also concerns about the lack of baselines such as optimization based methods. They also felt that the approach was limited to self-driving cars and thus would have limited interest for the community.",ICLR2019,5: The area chair is absolutely certain +Hkxf2PcZlN,1544820000000.0,1545350000000.0,1,Hkg4W2AcFm,Hkg4W2AcFm,metareview,Accept (Poster),"The paper proposes a new way to tackle the trade-off between disentanglement and reconstruction, by training a teacher autoencoder that learns to disentangle, then distilling into a student model. The distillation is encouraged with a loss term that constrains the Jacobian in an interesting way. The qualitative results with image manipulation are interesting and the general idea seems to be well-liked by the reviewers (and myself). + +The main weaknesses of the paper seem to be in the evaluation. Disentanglement is not exactly easy to measure as such. But overall the various ablation studies do show that the Jacobian regularization term improves meaningfully over Fader nets. Given the quality of the results and the fact that this work moves the needle in an important (albeit hard to define) area of learning disentangled representations, I think would be a good piece of work to present at ICLR so I recommend acceptance.",ICLR2019,4: The area chair is confident but not absolutely certain +5tX-3e3N9pl,1610040000000.0,1610470000000.0,1,lmTWnm3coJJ,lmTWnm3coJJ,Final Decision,Accept (Poster),"This paper has been thoroughly evaluated by four expert reviewers and it had received one public comment. The authors provided extensive explanations and added technical updates to the contents of their submission in response to constructive critiques from the reviewers. Even though some minor issues have not been fully resolved in the discussion between the authors and the reviewers, I consider this paper worthy of inclusion in the program of ICLR 2021 since, albeit marginally, the apparent strengths outweigh its outstanding limitations. ",ICLR2021, +HyemsGSgxE,1544730000000.0,1545350000000.0,1,SJeXSo09FQ,SJeXSo09FQ,"Strong paper, well received by reviewers -- accept",Accept (Poster),"All reviewers gave an accept rating: 9, 7 &6. +A clear accept -- just not strong enough reviewer support for an oral.",ICLR2019,4: The area chair is confident but not absolutely certain +dYr0Gp9aYo,1576800000000.0,1576800000000.0,1,Sye2s2VtDr,Sye2s2VtDr,Paper Decision,Reject,"The authors propose a simple but effective method for feature crossing using interpretation inconsistency (as defined by the authors). + +I think this is a good work and the authors as well as the reviewers participated well in the discussions. However, there is still disagreement about the positioning of the paper. In particular, all the reviewers felt that additional baselines should be tried. While the authors have strongly rebutted the necessity of these baselines the reviewers are not convinced about it. Given the strong reservations of the all the 3 reviewers at this point I cannot recommend the acceptance of this paper. I strongly suggest that in subsequent submissions the authors should position their work better and perhaps compare with some of the related works recommended by the reviewers.",ICLR2020, +PdPAfR3uwCE,1610040000000.0,1610470000000.0,1,Dmpi13JiqcX,Dmpi13JiqcX,Final Decision,Reject,"This paper explores a methodology for learning disentangled representations using a triplet loss to find subnetworks within a transformer. The authors compare against several other methods and find that their method performs well without needing to train from scratch. The reviewers thought this paper was well written and the authors were very responsive during the review period. However, there were some questions about the experimental setup and empirical performance of the paper, leaving the reviewers wondering if the performance was convincing. We agree that there is value in exploring disentangled representations even if they do not necessarily improve performance (as the authors point out), but clearly explaining the reasoning behind all analyses (e.g. specifically choosing domains to introduce a spurious correlation), and justifying differences in performance is particularly important in these cases.",ICLR2021, +Exp_4x29Dc,1610040000000.0,1610470000000.0,1,9wHe4F-lpp,9wHe4F-lpp,Final Decision,Reject,"## Description + +The paper proposes an improvement to binary neural networks with real-valued skip connections between pre-activations, by introducing more flexible learnable non-linearities on the real-valued connections. The parametric non-linearity is actually linear at initialization, which makes the training easier at the beginning. Due to learnable parameters it eventually adjusts to a more complex one, able to refine the accuracy. I think this idea is a good finding. + +## Review Process and Decision +The reviewers initially gave low ratings to the paper, indicating that the contribution is incremental and not fully clearly presented. +There was no detailed discussion with the authors, since the author's response and the rebuttal revision came in the very end of the discussion period. In the subsequent discussion phase the reviewer board has not indicated any major changes to the initial reviews/ranking. The AC checked the paper and supports rejection. + +## Details + +The authors are encouraged to improve the paper carefully addressing points proposed by reviewers. + +I think the argumentation of the paper should be improved. Some explanations are intuitive, but operating with fuzzy notions and may in fact be incorrect or irrelevant. The paper should be made more precise, based on verifiable arguments. + +I think the following is crucial and not made clear in the paper: +The non-linearities inserted before the sign function *do not affect the result of sign*. They indeed affect only the residual connections. Furthermore, the structure of residual connections should be fully clarified to reveal that there are complete real-valued paths all the way from the input to the network to its output, made of the residual connections with their own learnable parameters (and 1x1 convolutions) and (learnable) non-linearities and an intake from binary convolutions on the way. The learnable non-linearities can in principle improve performance just because the real-valued paths can learn better. + +I paste below feedback by reviewers to author's response (I believe they would agree to share it with authors but did not find a suitable way of doing it): + +## Response by R1: +I acknowledge that I read and appreciated the authors' answers to my questions. I think the idea of analyzing the role of non-linearities is nice and I tend to confirm my score. But I also agree with other reviewers that, as it is, the paper has some unclear parts and would not complain if it is rejected. + +## Response by R3: +Thanks for your responses to answer my questions for the paper. I agree with the results of the proposed FBTN for improved Binary Neural Networks (BNNs). However, my concern about the novelty of using group convolution modules in BNNs has not been addressed. I think the paper is not sufficient enough to publish at the conference. So, I do not change my rating of the paper. + +## Response by R4: +I maintained my rating when combining other reviews and responses to them, despite of their well response. It is still questionable whether FPReLU, one of the main contributions they claimed, actually improves the performance of BNN remarkably. In particular, this is supported by the fact that the performance of BNN on ResNet-34 which the techniques in this paper were applied does not show much difference from 'Real-to-Bin' model.",ICLR2021, +80NGYsHl2Ob,1642700000000.0,1642700000000.0,1,sOK-zS6WHB,sOK-zS6WHB,Paper Decision,Accept (Spotlight),"The paper proposes and studies a method for the responsible disclosure of a fingerprint along with samples generated by a generative model, which has important applications in identifying ""deep fakes"". The authors establish both the detectability of their fingerprint-without significant loss of fidelity-as well as the robustness to perturbations. The reviewers found the problem and contributions to be important and significant, well substantiated by an extensive experimental study.",ICLR2022, +drJCq17qTk,1576800000000.0,1576800000000.0,1,HJeEP04KDH,HJeEP04KDH,Paper Decision,Reject,"The paper investigates quantization for speeding up RL. While the reviewers agree that the idea is a good one (it should definitely help), they also have a number of concerns about the paper and presentation. In particular, the reviewers feel that the authors should have provided more insight into the challenges of quantization in RL and the tradeoffs involved. After having read the rebuttals, the reviewers believe that the authors are on the right track, but that the paper is still not ready for publication. If the authors take the reviewer comments and concerns seriously and update their paper accordingly, the reviewers believe that this could eventually result in a strong paper.",ICLR2020, +irzTSku4Eef,1642700000000.0,1642700000000.0,1,J9_7t9m8xRj,J9_7t9m8xRj,Paper Decision,Reject,"The work proposed multi-view learning framework that combines diversity and consistency objectives for semi-supervised learning. While reviewers appreciated that simplicity of the proposed method, they raised concerns on the limited contribution on top of the original Bayesian Co-Training work. Although authors provided detailed rebuttals that addressed some of the reviewers' concerns, and one reviewer did raise their score, the other reviewers' scores remained unchanged. Given the work is closely based off the BCT work, I would like to see more detailed analyses on the importance of the changes brought in this work, such as changing the base learners and introduction of diversity objectives as pointed out by the authors.",ICLR2022, +qi-YR3wl6,1576800000000.0,1576800000000.0,1,H1gmHaEKwB,H1gmHaEKwB,Paper Decision,Accept (Poster),"The rebuttal period influenced R1 to raise their rating of the paper. +The most negative reviewer did not respond to the author response. +This work proposes an interesting approach that will be of interest to the community. +The AC recommends acceptance.",ICLR2020, +UzZkPJaM9Pz,1610040000000.0,1610470000000.0,1,xCcdBRQEDW,xCcdBRQEDW,Final Decision,Accept (Spotlight),"This paper proposes a new differentiable physics benchmark for soft-body manipulation. The proposed benchmark is based on the DiffTaichi system. Several existing reinforcement learning algorithms are evaluated on this benchmark. The paper identify a set of key challenges that are posed by this specific benchmark to RL algorithms. Short horizon tasks are shown to be feasible by optimizing the physics parameters via gradient descent. The reviewers agree that this paper is very well-written, the problem tackled in it is quite interesting and challenging, and the use of differentiable physics in RL for manipulating soft objects quite intriguing.",ICLR2021, +MsLyUVyqQO,1642700000000.0,1642700000000.0,1,Snqhqz4LdK,Snqhqz4LdK,Paper Decision,Reject,"This paper proposes to generate 3D molecules using a step by step approach. The reviewers raised major concerns on the experiments, novelty, writing and technical details. The authors also were not aware of many of the important references, part (but not all) of which have been included during discussions. It is clear that this work is not ready to be accepted by ICLR.",ICLR2022, +jOeOc1sGGH,1576800000000.0,1576800000000.0,1,r1gixp4FPH,r1gixp4FPH,Paper Decision,Accept (Poster),"The authors provide an empirical and theoretical exploration of Nesterov momentum, particularly in the over-parametrized settings. Nesterov momentum has attracted great interest at various times in deep learning, but its properties and practical utility are not well understood. This paper makes an important step towards shedding some light on this approach for training models with a large number of parameters. ",ICLR2020, +B1gGK2wxlE,1544740000000.0,1545350000000.0,1,Skl6k209Ym,Skl6k209Ym,Potentially interpretable few-shot learning algorithm.,Reject,"The reviewers are polarized on this paper and the overall feeling is that it is not quite ready for publication. There is also an interesting interpretability aspect that, while given as a motivation for the approach, is never really explored beyond showing some figures of alignments. One of the main concerns of the method’s effectiveness in practice is the computational cost. There is also concern from one of the reviewers that the formulation could result in creating sparse matching maps where only a few pixels get matched. The authors provide some justification for why this wouldn’t happen, and this should be put in a future draft. Even better would be to show statistics to demonstrate empirically that this doesn’t happen. + +There were a number of clarifications that were brought up during the discussion, and the authors should go over this carefully and update the draft to resolve these issues. There is also a typo in the title that should be fixed. +",ICLR2019,3: The area chair is somewhat confident +Hy-RG1THG,1517250000000.0,1517260000000.0,41,HJzgZ3JCW,HJzgZ3JCW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The paper presents a modification of the Winograd convolution algorithm that reduces the number of multiplications in a forward pass of a CNN with minimal loss of accuracy. The reviewers brought up the strong results, the readability of the paper, and the thoroughness of the experiments. One concern brought up was the applicability to deeper network structures. This was acknowledged by the authors to be a subject of future work. Another issue raised was the question of theoretical vs. actual speedup. Again, this was acknowledged by the authors to be an eventual goal but subject to further systems work and architecture optimizations. The reviewers were consistent in their support of the paper. I follow their recommendation: Accept. +",ICLR2018, +ryxNmu6ge4,1544770000000.0,1545350000000.0,1,SJgEl3A5tm,SJgEl3A5tm,metareview: interesting approach,Accept (Poster),"This work develops a method for learning camouflage patterns that could be painted onto a 3d object in order to reliably fool an image-based object detector. Experiments are conducted in a simulated environment. + +All reviewers agree that the problem and approach are interesting. Reviewers 1 and 3 are highly positive, while Reviewer 2 believes that real-world experiments are necessary to substantiate the claims of the paper. While such experiments would certainly enhance the impact of the work, I agree with Reviewers 1 and 3 that the current approach is sufficiently interesting and well-developed on its own.",ICLR2019,4: The area chair is confident but not absolutely certain +aUamXx1lkI,1576800000000.0,1576800000000.0,1,rkgdYhVtvH,rkgdYhVtvH,Paper Decision,Reject,The authors attempt to unify graph convolutional networks and label propagation and propose a model that unifies them. The reviewers liked the idea but felt that more extensive experiments are needed. The impact of labels needs to be specially studied more in-depth.,ICLR2020, +H1Qg7kaBM,1517250000000.0,1517260000000.0,58,HyjC5yWCW,HyjC5yWCW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"R3 summarizes the reasons for the decision on this paper: ""The universal learning algorithm approximator result is a nice result, although I do not agree with the other reviewer that it is a ""significant contribution to the theoretical understanding of meta-learning,"" which the authors have reinforced (although it can probably be considered a significant contribution to the theoretical understanding of MAML in particular). Expressivity of the model or algorithm is far from the main or most significant consideration in a machine learning problem, even in the standard supervised learning scenario. Questions pertaining to issues such as optimization and model selection are just as, if not more, important. These sorts of ideas are explored in the empirical part of the paper, but I did not find the actual experiments in this section to be very compelling. Still, I think the universal learning algorithm approximator result is sufficient on its own for the paper to be accepted.""",ICLR2018, +6xiu8ERAfoZu,1642700000000.0,1642700000000.0,1,tQ2yZj4sCnk,tQ2yZj4sCnk,Paper Decision,Reject,"The paper presents multi-agent RL framework that uses the divergence between the learned policies and a target policy as a penalty that pushes the agent to learn cooperative strategies. The proposed method is built on top of an existing one (DAPO, Wang et al., 2019). Empirical experiments clearly show the advantage of the proposed method. + +The reviews for this paper are mixed and borderline. The reviewers appreciate the experiments reported in the paper and that indicate the advantage of the proposed method. But two reviewers do not think that the proposed analysis is sufficiently novel compared to an existing one (DAPO). The responses provided by the authors were appreciated, but did not dissipate these concerns.",ICLR2022, +2X4hzmAJk6,1576800000000.0,1576800000000.0,1,Skek-TVYvr,Skek-TVYvr,Paper Decision,Reject,"The authors received reviews from true experts and these experts felt the paper was not up to the standards of ICLR. + +Reviewer 3 and Reviewer 1 disagree as to whether the new notion of generalization error is appropriate. I think both cases can be defended. I think the authors should aim to sharpen their argument in this regard.Several reviewers at one point remark that the results follow from standard techniques: shouldn't this be the case? I believe the actual criticism being made is that the value of these new results do not go above and beyond existing ones. There is also the matter of what value should be attributed to technical developments on their own. On this matter, the reviewers seem to agree that the derivations lean heavily on prior work. ",ICLR2020, +SJxKlQDZeE,1544810000000.0,1545350000000.0,1,S1lDV3RcKm,S1lDV3RcKm,Intersting idea with practical impact ,Accept (Poster),"The paper proposes an adversarial framework that learns a generative model along with a mask generator to model missing data and by this enables a GAN to learn from incomplete data. +The method builds on AmbientGAN but it is a novel and clever adjustment to the specific problem setting of learning from incomplete data, that is of high practical interest.",ICLR2019,3: The area chair is somewhat confident +hm5hFPr0Iju,1610040000000.0,1610470000000.0,1,nzpLWnVAyah,nzpLWnVAyah,Final Decision,Accept (Poster),"This paper identifies the causal factors behind a major known issue in deep learning for NLP: Fine-tuning models on small datasets after self-supervised pretraining can be extremely unstable, with models needing dozens of restarts to achieve acceptable performance in some cases. The paper then introduces a simple suggested fix. + +Pros: +- The motivating problem is important: A large fraction of all computing time used on language-understanding tasks involves fine-tuning runs under the protocol studied here, and the problem of fine-tuning self-supervised models should be of broader interest at ICLR. +- The proposed fix is simple and well-demonstrated. It consists of only an adjustment to the range of values considered in hyperparameter tuning (which is significant, since BERT and related papers *explicitly advise* users to use inappropriate values) and an adjustment to the implementation of the optimizer. + +Cons: +- The method is demonstrated on a relatively small set of difficult text-classification datasets, so the behavior studied here may be different in very different dataset size, task difficulty, or label entropy regimes. + +This paper was divisive, so I gave it a fairly close look myself, and I'm persuaded by R1 and the other two positive reviewers: This is a classic example of a 'strong baselines paper', in that demonstrates that a more careful use of established methods can obviate the need for additional tricks. + +R3 raised two major concerns that they presented as potentially fatal, but that I find unpersuasive. +- This paper studies stability in model performance, not stability in predictions on individual data points. R3 argues that the latter sense of stability is the more important problem. Stability is an ambiguous term in this context, and both versions of the problem are interesting. However, as the authors pointed out, the definition of stability that is used here is consistent with previous work, and is widely accepted to be a major practical problem in NLP. I don't think this is a weakness of this paper, rather, it's an opportunity for someone else to write another, different paper on a different problem. +- R3 claims that the results are described as being more positive than they actually are, and the figure is potentially misleading. Looking at the quantitative results with R3's points in mind, I still see clear support for both of the paper's main suggestions. R3 opened up some potentially important questions about the handling of outliers in particular, but these questions were raised too late for the authors to be allowed to respond, and I don't see any evidence in the paper that anything improper was done. The marked outliers are clearly much farther from the mean/median in terms of standard deviations than the unmarked outliers. So, I don't see any evidence that these concerns reflect real methodological problems.",ICLR2021, +zlWYhtofdY,1610040000000.0,1610470000000.0,1,f0sNwNeqqxx,f0sNwNeqqxx,Final Decision,Reject,"This paper studies differentially private, communication-efficient training methods for federated learning. While the problem studied in this paper is well-motivated and interesting, the reviewers raised several concerns about the paper. Despite the authors' reconstruction protection explanation, the concern over large values of epsilon at the scale of 400 persists. There is not too much technical novelty since the main technique is given by prior work. ",ICLR2021, +9ZmvdOWJQ-,1576800000000.0,1576800000000.0,1,rkeuAhVKvB,rkeuAhVKvB,Paper Decision,Accept (Poster),"The paper presents a graph neural network model inspired by the consciousness prior of Bengio (2017) and implements it by means of two GNN models: the inattentive and the attentive GNN, respectively IGNN and AGNN. The reviewers think +- The idea of learning an input-dependent subgraph using GNN seems new. +- The proposed way to reduce the complexity by restricting the attention horizon sounds interesting. ",ICLR2020, +rkgz6MLOg,1486400000000.0,1486400000000.0,1,Hk-mgcsgx,Hk-mgcsgx,ICLR committee final decision,Reject,"The reviewers agree that there are issues in the paper (in particular, the weakness of the experimental part), and that it is not ready for publication.",ICLR2017, +kjZZHhM0SZ,1576800000000.0,1576800000000.0,1,ryeG924twB,ryeG924twB,Paper Decision,Accept (Poster),"This paper tackles the challenge of incentivising selfish agents towards a collaborative goal. In doing so, the authors propose several new modules. + +The reviewers commented on experiments being extremely thorough. One reviewer commented on a lack of ablation study of the 3 contributions, which was promptly provided by the authors. The proposed method is also supported by theoretical derivations. The contributions appear to be quite novel, significantly improving performance of the studied SMGs. + +One reviewer mentioned the clarity being compromised by too much material being in the appendix, which has been addressed by the authors moving some main pieces of content to the main text. + +Two reviewer commented on the relevance being lower because of the problem not being widely studied in RL. I would disagree with the reviewers on this aspect, it is great to have new problem brought to light and have fresh and novel results, rather than having yet another paper work on Atari. I also think that the authors in their rebuttal made the practical relevance of their problem setting sufficiently clear with several practical examples. ",ICLR2020, +QSKzdFINusO,1610040000000.0,1610470000000.0,1,17VnwXYZyhH,17VnwXYZyhH,Final Decision,Accept (Poster),"The paper introduces a new method to probe contextualized word embeddings for syntax and sentiment properties using hyperbolic geometry. The paper is written well and relevant to the ICLR community. Reviewers highlight that the proposed Poincaré probe offers solid results, extensive experiments that support the benefits of the approach, and proposes a new approach to analyze the geometry of BERT models. The revised version clarified various concerns of the initial reviews and improved the manuscript (comparison to Euclidean probes, low dimensional examples, new results on edge length distributions etc.). Overall, the paper makes valuable contributions to probing contextualized word embeddings and the majority of reviewers and the AC support acceptance for its contributions. Please revise your paper to take feedback from reviewers after rebuttal into account (especially to further improve clarity and discussion of the method).",ICLR2021, +3r0lgh0tIJ,1576800000000.0,1576800000000.0,1,SkxcSpEKPS,SkxcSpEKPS,Paper Decision,Reject,"The paper studies Positron Emission Tomography (PET) in medical imaging. The paper focuses on the challenges created by gamma-ray photon scattering, that results in poor image quality. To tackle this problem and enhance the image quality, the paper suggests using generative adversarial networks. Unfortunately due to poor writing and severe language issues, none of the three reviewers were able to properly assess the paper [see the reviews for multiple examples of this]. In addition, in places, some important implementation details were missing. + +The authors chose not to response to reviewers' concerns. In its current form, the submission cannot be well understood by people interested in reading the paper, so it needs to be improved and resubmitted. ",ICLR2020, +9_LKMSAY9B,1576800000000.0,1576800000000.0,1,BygSZAVKvr,BygSZAVKvr,Paper Decision,Reject,"This paper extends previous work on searching for good neural architectures by iteratively growing a network, including energy-aware metrics during the process. There was discussion about the extent of the novelty of this work and how well it was evaluated, and in the end the reviewers felt it was not quite ready for publicaiton.",ICLR2020, +POqM3srbmjs,1610040000000.0,1610470000000.0,1,C3qvk5IQIJY,C3qvk5IQIJY,Final Decision,Accept (Poster),"### Paper summary +This paper investigates theoretically and empirically the effect of increasing the number of parameters (""overparameterization"") in GAN training. By analogy to what happens in supervised learning with neural networks, overparameterization does help to stabilize the training dynamics (and improve performance empirically). This paper provides an explicit threshold for the width of a 1-layer ReLU network generator so that gradient-ascent training with a linear discriminator yields a linear rate of convergence to the global saddle point (which corresponds to the empirical mean of the generator matching the mean of the data). The authors also provides a more general theorem that generalizes this result to deeper networks. + +### Evaluation +The reviewers had several questions and concerns which were well addressed in the rebuttal and following discussion, in particular in terms of clarifying the meaning of ""overparameterization"". After discussing the paper, R1, R2 and R4 recommend acceptance while R3 recommends rejection. The main concern of R3 is that the GAN formulation analyzed in the paper is mainly doing moment matching between the generator distribution (produced from a *fixed* set of latent variables z_i) and the empirical mean of the data. R3 argues that this is not sufficient to ""understanding the training of GANs"". At least two aspects are missing: how the distribution induced by the generator converges according to other notion of divergence (like KL, Wasserstein, etc.); and what about the true generator distribution (not just its empirical version from a fixed finite set of samples z_i)? While agreeing these are problematic, the other reviewers judged that the manuscript was useful first step in understanding the role of overparameterization in GANs and thus still recommend acceptance. And importantly, this paper is the first to study this question theoretically. + +I also read the paper in more details. I have a feeling that some aspects of this work were already developed in the supervised learning literature; but the gradient descent-ascent dynamic aspect appears novel to me and the important question of the role of overparameterization here is both timely, novel and quite interesting. I side with R1, R2 and R4: this paper is an interesting first step, and thus I recommend acceptance. See below for additional comments to be taken in consideration for the camera ready version. + +### Some detailed comments +- Beginning of section 2.3: please be clearer early on that you will keep V fixed to a random initialization rather than learning it. The fact that this is standard in some other papers is not a reason to not be clear about it. +- Theorem 2.2: in the closed form of the objective when $d$ is explicitly optimized, we are back to a more standard supervised learning formulation, for example (5) could look like regression. The authors should be more clear about this, and also mention in the main text that the core technical part used to prove Theorem 2.2 is from Oymak & Soltanolkotabi 2020 (which considers supervised learning). This should also a bit more clear in the introduction -- it seems to me that the main novelty of the work is to look at the gradient-descent dynamic, which is a bit different than the supervised learning setup, even though some parts are quite related (like the full maximization with respect to $d$). +- p.6 equation (8): typo -- the $-\mu d_t$ term is redundant and should be removed as already included from $\nabla_d h(d,\theta)$. +- p.7 ""numerical validations"" paragraph: Please describe more clearly what is the meaning of ""final MSE"". Is this a global saddle point (and thus shows the limit of the generator to match the empirical mean), or is this coming from slowness of convergence of the method (e.g. after a fixed number of iterations, or according to some stopping criterion?). Please clarify. +",ICLR2021, +A1HOkkEmyEQ,1642700000000.0,1642700000000.0,1,mwdfai8NBrJ,mwdfai8NBrJ,Paper Decision,Accept (Poster),"The reviewers appreciated the treatment of the topic of certifiable robustness done in this work and although they had a number of concerns, I feel they were adequately addressed by the authors.",ICLR2022, +whoGO__q05,1642700000000.0,1642700000000.0,1,H94a1_Pyr-6,H94a1_Pyr-6,Paper Decision,Accept (Poster),"The paper introduces As-ViT, an interesting framework for searching and scaling ViTs without training. Overall, the paper received positive reviews. On the other hand, R1 rated the paper as marginally below the threshold, raising concerns about search on small datasets and issues regarding the comparison in terms of FLOPS/accuracy with other methods. The authors adequately addressed these concerns in the rebuttal, and helped clarify other questions by R2 and R3. R1 did not participate in the discussion after the author response nor updated his/her review. The AC agrees with R2 and R3 that the paper passes the acceptance bar of ICLR, as the unified approach for efficient search/scaling/training is novel and should be interesting to the ICLR audience.",ICLR2022, +Usg8_Xnv3p,1642700000000.0,1642700000000.0,1,RQ3xUXjZWMO,RQ3xUXjZWMO,Paper Decision,Reject,"The paper studies an interesting question of the relationship between the eigenvalues of the Hessian matrix with the probability of the output in the logistic loss, and use this to propose a regularization that can improve the performance of the neural networks. + + All the reviewers agree that although the question is interesting, the paper lacks significantly in terms of representation and would benefit from another round of revision.",ICLR2022, +3svp4BubgDp,1642700000000.0,1642700000000.0,1,cdZLe5S0ur,cdZLe5S0ur,Paper Decision,Reject,"Two of the initial reviews of the paper were mildly positive (2 scores of 6), and one was very positive (score of 8). However, these reviews failed to notice some severe issues with the paper, which were detailed by the Area Chair in an Extra Review which was provided late. The severe issues include: clarity of exposition (undefined notation in many places) and theory (vacuous or meaningless theorems and assumptions). I apologize to the authors for not having had the chance to defend against this late review. However, the issues are indeed severe.",ICLR2022, +jGvTS2jFEO5,1610040000000.0,1610470000000.0,1,ufS1zWbRCEa,ufS1zWbRCEa,Final Decision,Reject,"After reading the paper, reviews and authors’ feedback. The meta-reviewer agrees with the reviewers that the paper touches an important topic(scale up training). However, as some of the reviewers pointed out, the paper could be further improved by clarifying the novelty and more thorough evaluation justification of the metric being used. Therefore this paper is rejected. + +Thank you for submitting the paper to ICLR. +",ICLR2021, +KDvcyBKZM4s-,1642700000000.0,1642700000000.0,1,HFE5P8nhmmL,HFE5P8nhmmL,Paper Decision,Reject,"The paper proposes to overcome the challenge of annotating datasets to train convolutional networks by considering instead an architecture that is composed of stacked support vector machine layers. Each support vector machine is trained on a small patch from the input image. A voting mechanism is used to aggregate the predictions. Results show better performance by the model in the small data regime compared with larger convolutional neural networks trained from scratch. + +The reviewers appreciated the relevance of the problem and the originality of the approach. The reviewers also appreciated several parts of the experimental evaluation that were carefully conducted in particular the sensitivity of the analysis with respect to the patch size and the multiple datasets considered for the experimental evaluation. The reviewers also expressed concerns about the adequacy of the evaluation (unfair comparisons), the completeness of the baselines (missing baselines), and the significance of the improvements. In particular, the experimental evaluation was considered too limited given the problem considered. + +The authors submitted responses to the reviewers' comments. After reading the response, updating the reviews, and discussion, the reviewers considered that ‘the experimental evaluation improved a bit', that several concerns were satisfactorily addressed, and yet that the updated results 'do not show a significant improvement of the proposed method over existing works, simple baselines or pre-trained ResNet'. We encourage the paper to pursue their approach further taking into account the reviewers' comments, encouragements, and suggestions. The revision of the paper will generate a stronger submission to a future venue. + +Reject.",ICLR2022, +sggDS67YeJ,1642700000000.0,1642700000000.0,1,UtGtoS4CYU,UtGtoS4CYU,Paper Decision,Accept (Poster),"Introducing an adversarial agent that re-configures the rendered scenes of CLEVR to demonstrate that models that appear achieve super-human performance are actually easily fooled due to their lack of ability to reason, provides a nice insight into limitations with existing approaches and correspondingly how we evaluate on some benchmarks. There is a persistent concern that the results are only on CLEVR, and that the adversarial examples are not really disproving reasoning but rather issues with vision. However, overall reviewers were generally positive about the aims the work.",ICLR2022, +FXZw266SVhG,1642700000000.0,1642700000000.0,1,_xxbJ7oSJXX,_xxbJ7oSJXX,Paper Decision,Reject,"The authors propose the resource constrained offline RL problem where the offline dataset contains extra features that are not available online. The goal is to use these extra features to improve performance during deployment. They propose a simple modification to TD3-BC in the continuous control setting and a simple modification to CQL in the discrete setting. They evaluate their proposed approaches on D4RL, RC-D4RL (a novel dataset that they introduce for resource constrained offline RL), Atari, and a proprietary real-life Ads problem. + +Initial reviews identified the following concerns: +* While the exact problem is novel, the idea of having access to privileged features at training time that are not available at deployment has been explored in supervised learning and online RL. The reviewers were not clear how considering the offline RL setting interacts specifically with the privileged features to produce an interesting setting. +* The baseline simply trains on the limited feature set. Unsurprisingly, using the extra features can improve performance. In light of the previous point, reviewers asked for more substantial baselines, suggesting BC on the teacher and predicting the missing features as some possibilities. +* The set of tasks was too limited. + +The authors provided a substantial response: +* Experiments on Ads data +* Experiments on Atari with CQL as the base algorithm +* Additional baselines on RC-D4RL HalfCheetah-v2 datasets (BC on teacher and predictive) +* Additional analysis + +I commend the authors on the hard work they did preparing this response. It is quite substantial and does improve the paper significantly. However, reviewers and I still have a number of concerns: +* The additional baselines are appreciated, however, the results are mixed. The additional baselines are a step in the right direction, but they need to be evaluated beyond a single dataset. It is hard to evaluate the results without reasonable baselines. I agree and think that even though the specific problem is novel, the idea of transfer learning is not, so it is reasonable to require that we have more extensive baselines. Furthermore, while the authors argue that their method has an edge on the more practical dataset, that is based on a very limited evaluation. Probing this further is important. +* The CQL modification is quite different than the TD3+BC modification. The performance of the modification for CQL is not significantly better than CQL. What should we make of this? +* For the Ads dataset, all hyperparameter settings except Transfer(0, 1) show the same performance. This seems surprising as even Transfer(0.1, 0.9) shows no difference. Finally, Transfer(0, 1) beating Transfer(1, 0) 7/10 times is not statistically significant. + +At this time, the paper is not ready for publication, but the paper is moving in the right direction and I encourage the authors to submit a revised version to a future venue.",ICLR2022, +7ihY86cz8pa,1610040000000.0,1610470000000.0,1,kLbhLJ8OT12,kLbhLJ8OT12,Final Decision,Accept (Poster),"As the title states (and reads somewhat like an openreview review title), the authors apply the options framework from the RL community to perform hierarchical RL where the option is the dialogue act and the subproblem is the NLG component in task-oriented dialogue (TOD) policy learning. The two technical contributions (beyond the conceptual connection above) is showing that asynchronous updates between the hierarchy levels guarantees convergence and language-model based discriminator to densify the reward structure. Empirical results are solid improvements over recent SoTA findings. + +== Pros == ++ This is a conceptually appealing application of RL to TOD and they authors had to make additional modifications to get it to work — which will help other researchers in this space. ++ There are both theoretical and empirical contributions. The theoretical contributions are also insightful and not superfluous to the problem being studied. ++ Using a language-model based discriminator for reward shaping isn’t completely new (although I haven’t seen in this setting and stated exactly the same), but is interesting and effective. + +== Cons == ++ The writing could use significant work; while the reviewers/rebuttal cleared up many issues, I actually didn’t appreciate the value of this paper in my first read due to the writing (even if the motivation, etc. is sufficiently clear). ++ Human evaluation is treated as somewhat of an afterthought and there isn’t a deep dive into error analysis of the results. The visualization is a good first step, but there isn’t really a when/why this method works better than others, which is important for a problem where evaluation isn’t conclusive in the best cases. This is also significant since the authors claim ‘comprehensibility’. + +Evaluating along the requested dimensions: + +- Quality: The conceptual and theoretical contributions are both of high-quality. This is a promising approach to TOD and the authors additions (e.g., async optimization, LM reward shaping) are good examples of applied research. The empirical results are sufficient to above average, but not as strong (although this is partially an artifact of TOD evaluation) +- Clarity: The motivation is good, but the paper could use some work in writing. Some examples include (1) stating precisely how the option choices are derived (latent variables), (2) mapping out notation in something like a preliminaries section, (3) sketch of proofs in the main body for continuity. If the reader is familiar with the closest cited work, it is a bit easier, but I think some effort in making the paper more self-contained would increase its impact. +- Originality: Options in HRL is widely known, but applying it to TOD is novel to the best of my (and the reviewers) knowledge. I think many could have come up with the basic idea, but it took some effort to get it to work. +- Significance: This is a widely studied problem and the approach is fairly convincing. I don’t think it will be ‘disruptive’ or cross-pollinate to other application areas, but will almost certainly be cited within the conversational agent community. + +In summary, the reviewers like this paper a bit more than myself personally — I think it is borderline with a preference to accept while the reviews are a more confident accept. However, the reviewers are also experienced experts in this area. I also do think that the authors handled concerns well in the rebuttal stages and addressed my more pressing concerns. I would encourage the authors to improve the writing if accepted, but I would prefer to accept this if possible. +",ICLR2021, +jbBvuJqEU3,1642700000000.0,1642700000000.0,1,Xg47v73CDaj,Xg47v73CDaj,Paper Decision,Reject,"This paper shows the possibility to design a relatively shallow architecture, ParNet, based on parallel subnetworks, instead of traditionally deeply stacked blocks. During discussions, the reviewers pointed out two important concerns: (1) the current design heavily hinges on the recently proposed RepVGG block, whose comparison was even missed in the original submission (later added in rebuttal); (2) comparing ParNet with RepVGG, there seems no performance advantage. Although RepVGG is 2.5 times deeper than ParNet, it is still faster due to highly optimized layers. + +The authors mainly argued that their contribution is to answer the scientific question “is it possible to build high-performing non-deep neural networks?” While this is indeed an interesting question, AC feels: (1) it is perhaps unfair for this paper to claim as the first work proving the feasibility. WideResNet provided similar insight much earlier, among others; (2) the presented results, with tools being not novel, are pre-mature as they display no real appeal of using ParNet, in any aspect. Probing a new question is of course valuable, but presenting an immature and novelty-lacking answer shouldn't automatically grant publication. + +In sum, the reviewers were unanimously UN-convinced by this paper's value, nor was the AC. The authors are suggested to very seriously take into account reviewers' suggestions to make improvements, before submitting their work to the next venue.",ICLR2022, +MDjMDrTjH,1576800000000.0,1576800000000.0,1,BkxzsT4Yvr,BkxzsT4Yvr,Paper Decision,Reject,"The paper introduces a neat idea that an SGD update can be written as a solution of the linear least squares problem with a given backpropagated output; this is generalized to a larger batch size, giving a sort of ""block"" gradient-type update. Some notes that the columns of $O_t$ have to be scaled are made, but not clear why. The paper then goes into the experiments, and then gets back to the fast approximation of DGB. It really looks like bad organization of the paper, which was noted. +The reviewers agree that the actual computational improvements are marginal, and all recommend rejection. As a recommendation, I would suggest to restructure the paper for a more coherent view, and also the improvements in Top-1 are not very stimulating. The general view is interesting, but it is not clear what insight it brings.",ICLR2020, +7OVYPWPz-xi,1642700000000.0,1642700000000.0,1,xkjqJYqRJy,xkjqJYqRJy,Paper Decision,Accept (Poster),"This paper provides some empirical investigation of the choice of the prior distribution for the weights in Bayesian neural networks. It shows empirically that, when trained via SGD, weights in feedforward neural networks exhibit heavy-tails, while weights in convolutional neural networks are spatially correlated. From this observation they show that the use of such priors leads to some improved performances compared to the iid Gaussian prior in some experimental settings. + +Reviewers have conflicting views on this paper, that have not been reconcilied after the author's response and the discussion. +On the plus side, the paper is very well written, the experimental part is carefully conducted, and provides some insights on the choice of the prior in Bayesian neural networks, which could lead to further developments. +On the negative side, the claims made in the introduction are not fully supported by the experiments (the claims have been slightly amended in the revised version), and the take-home message is not so clear. In particular, Bayesian approaches with the proposed priors still underperform compared to SGD without tempering. The authors could also have considered a broader sets of experiments. + +Overall, I think the contributions outweight the limitations of this paper, and I would recommend acceptance.",ICLR2022, +HRmOKgprCB,1576800000000.0,1576800000000.0,1,Byla224KPr,Byla224KPr,Paper Decision,Reject,"This paper explores a post-processing method for word vectors to ""smooth the spectrum,"" and show improvements on some downstream tasks. + +Reviewers had some questions about the strength of the results, and the results on words of differing frequency. The reviewers also have comments on the clarity of the paper, as well as the exposition of some of the methods. + +Also, for future submissions to ICLR and other such conferences, it is more typical to address the authors comments in a direct response rather than to make changes to the document without summarizing and pointing reviewers to these changes. Without direction about what was changed or where to look, there is a lot of burden being placed on the reviewers to find your responses to their comments.",ICLR2020, +FuSoEQH4Qwg,1610040000000.0,1610470000000.0,1,YjNv-hzM8BE,YjNv-hzM8BE,Final Decision,Reject,"This paper proposes a unified cross-lingual pretraining method that works well for both natural language understanding (NLU)—typically done using encoder-only architectures like mBERT and XLM—and conditional natural language generation (NLG) tasks like machine translation—typically done using encoder-decoder architectures like mBART. + +This paper clearly split reviewers, with 2 quite or very positive on it, and 3 thinking or leaning towards thinking that it didn't have enough novelty to merit publication. + +Pro +- The model produces good SoTA results +- The method is easily replicable +- It is good for the community for leading systems on benchmark tasks to have published papers describing how they work. + +Con + +- The work is not groundbreaking in technical novelty +- The work has to do a better job of communicating its contributions: It's hard to understand how it differs from other methods + +On balance, the overall assessment is that the paper is not yet ready in its current form. The hope is that authors find the reviewer comments useful for preparing a future submission: + +- The paper **has** to do a better job of communicating its contributions. All that most researchers got from the first version was that there was parameter sharing and that helped. The revised version starts to do a better job of explaining the value of having the IS-MLM and CS-MLM objectives to doing well on NLU and NLG tasks, but much more is needed, as the discussion here shows. Indeed, even the discussion here is often opaque. In describing the key contribution of the paper, in both the revised paper and discussion, the authors fall back on phrases like ""elaborately designed"" and ""exquisite cooperation of parameter sharing and pre-training tasks"". **What do ""elaborately designed"" and ""exquisite cooperation"" mean?!?** I think you can minimally clearly explain the benefits of having an objective like IS-MLM for doing better on NLU tasks than the approach taken in mBART. You could argue for the advantages of MLM vs LM generation, which has been shown in other papers, including the original BERT paper and ELECTRA. Concretely, I wonder if you should reverse the contents of section 2 and start with equation (8) and explain why that is a good objective for your system, and better than ones that have been used previously. This discussion should be at a higher level than the current discussion under (8) which tends to be in the weeds. I haven't worked all the details, but I think you could then describe the objectives of section 2.2 before describing the implementation in section 2.1, and the result might be clearer? It would certainly emphasize the importance of these loss functions. +- The initial version didn't have important details like the number of languages covered in the main paper; the current version fixes this to the extent of saying you have 50, but still doesn't give the context of how this compares with XLM-R and mBART. And several reviewers had questions about the number of parameters of different models. I think you could fix a lot of these concerns by moving Table 8 to the main paper in a future resubmission. It doesn't take up much space and helps a lot in providing these details and easy to find citations for the models compared in other papers. ",ICLR2021, +Yg3ZN-5cc,1576800000000.0,1576800000000.0,1,HkgR8erKwB,HkgR8erKwB,Paper Decision,Reject,"This paper proposes PAC_Bayesian bounds for negative log-likelihood loss function. A few reviewers raised concerns around 1) distinguish their contributions better from prior work (eg Alquier). 2) confounders in their experiments. Both reviewers agreed that the paper, as it is written, does not provide sufficient evidence of significance. In addition, experiments shown in the paper varies two things - # parameters (therefore expressiveness and potential generalizability) and depth at each setting. As pointed out, this isn’t right - in order to capture the effect, one has to control for all confounders carefully. Another concerned raised were around Theorem 2 - that it contains data-distribution on the right hand side, which isn’t all that useful to calculate generalization bounds (we don’t have access to the distribution). We highly encourage authors to take another cycle of edits to better distinguish their work from others before future submissions. +",ICLR2020, +r1lkamDVxN,1545000000000.0,1545350000000.0,1,BylIciRcYQ,BylIciRcYQ,New notion of nonconvexity,Accept (Poster),"The proposed notion of star convexity is interesting and the empirical work done to provide evidence that it is indeed present in real-world neural network training is appreciated. The reviewers raise a number of concerns. The authors were able to convince some of the reviewers with new experiments under MSE loss and experiments showing how robust the method was to the reference point. The most serious concerns relate to novelty and the assumptions that individual functions share a global minima with respect to which the path of iterates generated by SGD satisfies the star convexity property. I'm inclined to accept the authors rebuttal, although it would have been nicer had the reviewer re-engaged. Overall, the paper is on the borderline.",ICLR2019,3: The area chair is somewhat confident +97-3Mls9IZP,1610040000000.0,1610470000000.0,1,zeFrfgyZln,zeFrfgyZln,Final Decision,Accept (Poster),"The paper explores how to effectively conduct negative sampling in learning for text retrieval. The paper shows that negative examples sampled locally are not informative, and proposes ANCE, a new learning mechanism that samples hard negative examples globally, using an asynchronously updated ANN index. + +Pros • The problem studied is important. • Paper is generally clearly written. • Solid experimental results. • There is theoretical analysis. + +Cons • The idea might not be so new. The contribution is mainly from its empirical part. + +During rebuttal, the authors have addressed the clarity issues pointed out by the reviewers. They have also added additional experimental results.",ICLR2021, +W_ekdLMvLN,1576800000000.0,1576800000000.0,1,r1lczkHKPr,r1lczkHKPr,Paper Decision,Reject,"The authors propose TD updates for Truncated Q-functions and Shifted Q-functions, reflecting short- and long-term predictions, respectively. They show that they can be combined to form an estimate of the full-return, leading to a Composite Q-learning algorithm. They claim to demonstrated improved data-efficiency in the tabular setting and on three simulated robot tasks. + +All of the reviewers found the ideas in the paper interesting, however, based on the issues raised by Reviewer 3, everyone agreed that substantial revisions to the paper are necessary to properly incorporate the new results. As a result, I am recommending rejection for this submission at this time. I encourage the authors to incorporate the feedback from the reviewers, and believe that after that is done, the paper will be a strong submission. ",ICLR2020, +d7CKVDyMGZi,1642700000000.0,1642700000000.0,1,RLtqs6pzj1-,RLtqs6pzj1-,Paper Decision,Accept (Poster),"#### Summary + +The goal of this work is to reduce the costs of inference in ensembled models by ensembling sparse models. The paper also aims to reduce the costs of training these ensembles as well. The proposed techniques (DST and EDST) each these goals, respectively. + +#### Discussion + +As noted by the reviewers, the paper is interesting and timely. The authors provided significant clarifications in the response that satisfied the reviewers' concerns. There is still significant room to revise the remaining points and polish the text of the paper for the camera-ready (I highly recommend proofreading from an individual who is not an author on the paper; there are still typos in the revised edits) + +#### Recommendation. + +I recommend Accept, due to the strengths above and the reasonably scoped remaining work to do going into the camera-ready.",ICLR2022, +DMVgMvn62z,1576800000000.0,1576800000000.0,1,H1gpET4YDB,H1gpET4YDB,Paper Decision,Reject,"This paper proposes blockwise masked attention mechanisms to sparsify Transformer architectures, the main motivation being reducing the memory usage with long sequence inputs. The resulting model is called BlockBERT. The paper falls in a trend of recent papers compressing/sparsifying/distilling Transformer architectures, a very relevant area of research given the daunting resources needed to train these models. + +While the proposed contribution is very simple and interesting, it also looks a rather small increment over prior work, namely Sparse Transformer and Adaptive Span Transformer, among others. Experiments are rather limited and the memory/time reduction is not overwhelming (18.7-36.1% less memory, 12.0-25.1% less time), while final accuracy is sometimes sacrificed by a few points. No comparison to other adaptively sparse attention transformer architectures (Correia et al. EMNLP 19 or Sukhbaatar et al. ACL 19) which should as well provide memory reductions due to the sparsity of the gradients, which require less activations to be cached. I suggest addressing this concerns in an eventual resubmission of the paper.",ICLR2020, +LSJkxNXE-aH,1610040000000.0,1610470000000.0,1,JFKR3WqwyXR,JFKR3WqwyXR,Final Decision,Accept (Poster),"This paper proposes a refinement, and analysis of, continuous-time inference schemes. + +This paper got in-depth criticism from some very thoughtful and expert reviewers, and the authors seem to have taken it to heart. I'm still worried about the similarity to GRU-ODE-Bayes, but I feel that the clarifications to the general theory of continuous-time belief updates is a worthy contribution, and the method proposed is a practical one. One reviewer didn't update their score, but the other reviewers put a lot of thought into the discussion and also raised their scores. + +I do think the title and name of the method is a bit misleading - I would call it something like ""Consistent continuous-time filtering"", because the jump ODE is really describing beliefs about an SDE.",ICLR2021, +g_5Idnu9VyV,1642700000000.0,1642700000000.0,1,cpstx0xuvRY,cpstx0xuvRY,Paper Decision,Reject,"This paper studies the generalization error of semi-supervised learning, where the algorithm gradually pseudo-labels the data throughout the learning process. Theoretically, an upper bound on the generalization error is shown to decompose into a term that vanishes with successive labeling and another that does not, leading to a plateau in performance. This is studied analytically for a mixture of two Gaussians. Experimentally, similar behavior is also observed to occur in more realistic scenarios. What reviewers struggled with is to understand what part of the results are, to some extent, obvious, and what offer deeper insight. What is obvious: even if a Bayes classifier were available for pseudo-labeling, feature overlap means that there is a plateau of noise beyond which labeling cannot improve. What is not obvious: is it even worth pseudo-labeling, or could we make things worse? The merit of the paper is in elucidating the latter. There are several concerns that remain, however, even after discussions. First, there is whether the insight is substantial or not. Here, some comparison and contrast with existing literature suggests otherwise. Second, there is whether the experimentally observed behavior is an instance of the phenomenon described by theory. Here, better structured experiments are needed to tie in with the theory. Overall, although the paper presents compelling insight, it is not yet ready to disseminate. It needs a stronger argument for its added theoretical contribution and clearer experiments to support that the presented theory is indeed behind the empirical behavior of these iterative algorithms.",ICLR2022, +HJlReIzhJV,1544460000000.0,1545350000000.0,1,B1xnPsA5KX,B1xnPsA5KX,"modularity is good, but the specifics aren't really justified in relation to similar PPLs",Reject,"This paper presents a probabilistic programming language where models are constructed out of building blocks which specify both the distribution and an inference procedure. As a demonstration, they show how a GP-LVM can be implemented. + +The paper spends a lot of space arguing for the benefits of modularity. Modularity is of course hard to argue with, and the benefits are already understood in the PPL community. But, as the reviewers point out, various other PPLs have already adopted various strategies to enable modular definition of models, and (in cases like Venture) special-purpose higher-level inference algorithms. This paper contains little discussion of other PPLs and how the specific design decisions relate to theirs, so it's hard to judge whether this paper really covers new ground. Such discussion wasn't added to the revised paper, even though multiple reviewers asked for it. I can't recommend acceptance. +",ICLR2019,5: The area chair is absolutely certain +H13oH16rf,1517250000000.0,1517260000000.0,646,B1mAkPxCZ,B1mAkPxCZ,ICLR 2018 Conference Acceptance Decision,Reject,"Two reviewers recommended rejection, and one was on the edge. There was no rebuttal to address the concerns and questions posed by the reviewers.",ICLR2018, +Skl7msNbxE,1544800000000.0,1545350000000.0,1,BJgYl205tQ,BJgYl205tQ,Intersting new evaluation metric which might have a scalabilty issue. ,Reject,"The paper propose a new metric for the evaluation of generative models, which they call CrossLID and which assesses the local intrinsic dimensionality (LID) of input data with respect to neighborhoods within generated samples, i.e. which is based on nearest neighbor distances between samples from the real data distribution and the generator. The paper is clearly written and provides an extensive experimental analysis, that shows that LID is an interesting metric to use in addition to exciting metrics as FID, at least for the case of not to complex image distributions The paper would be streghten by showing that the metric can also be applied in those more complex settings. +",ICLR2019,4: The area chair is confident but not absolutely certain +pVmZUdwnG5N,1642700000000.0,1642700000000.0,1,gSdSJoenupI,gSdSJoenupI,Paper Decision,Accept (Poster),"Realizing the fact that cross-entropy loss and focal loss are widely used for training deep learning models but mathematical understanding and exploration for such losses are lacking, the authors propose a simple framework named PolyLoss to express the loss function as a linear combination of polynomial functions. + +In this framework, the aforementioned cross-entropy loss and focal loss are the special cases of PolyLoss by easily adjusting the importance of different polynomial bases depending on the targeting tasks and datasets. The final version of PolyLoss, Poly-1 formulation, is simple with one line of code and an extra hyperparameter but outperforms the cross-entropy loss and focal loss on 2D image classification, instance segmentation, object detection, and 3D object detection tasks, sometimes by a large margin. + +This paper is well-motivated by a novel perspective of polynomial expansion. The proposed method is novel, simple to implement, and effective in practice. The authors have a deep and thorough discussion with reviewers, and most concerns were well addressed. After rebuttal and discussion, reviewers increased their scores, and all agreed with acceptance. AC checked the paper and all relevant information, and found sufficient ground for acceptance.",ICLR2022, +3-P6WGQ-0b,1576800000000.0,1576800000000.0,1,ByxKo04tvr,ByxKo04tvr,Paper Decision,Reject,This paper investigates convolutional LSTMs with a multi-grid structure. This idea in itself has very little innovation and the experimental results are not entirely convincing.,ICLR2020, +ktVqAvJ3XC,1576800000000.0,1576800000000.0,1,rJe1DTNYPH,rJe1DTNYPH,Paper Decision,Reject,"All reviewers suggest rejection. Beyond that, the more knowledgable two have consistent questions about the motivation for using the CCKL objective. As such, the exposition of this paper, and justification of the work could use improvement, so that experienced reviewers understand the contributions of the paper.",ICLR2020, +s5sLmlmauUh,1610040000000.0,1610470000000.0,1,zI38PZQHWKj,zI38PZQHWKj,Final Decision,Reject,"Motivated by (1) the problem of scaling up optimal transport to high-dimensional problems and (2) being able to tolerate noisy features, this paper introduces a new optimization problem that they call feature-robust optimal transport where they find a transport plan with discriminative features. They show that the min-max optimization problem admits a convex formulation and solve it using a Frank-Wolfe method. Finally they apply it to the layer selection problem and show that it achieves state-of-the-art performance for semantic correspondence datasets. + +The reviews were mixed for this paper. The main negative, which was brought up in all the reviews, is the lack of novelty compared to earlier methods like SRW which already combine dimensionality reduction and optimal transport. The new method in this paper still does have value since it can scale up to larger dimensional problems. It would have been nice to have a wider range of experiments, which would present a more compelling case for its applicability. Another reviewer brought up a correctness issue, however it is not clear if this is actually a bug or merely a misunderstanding about how the pieces in the overall proof fit together. In any case, the reviewers pointed out various places where the writing could be improved. ",ICLR2021, +CcBVtUNuwD,1576800000000.0,1576800000000.0,1,r1lEd64YwH,r1lEd64YwH,Paper Decision,Reject,"What is investigated is what kind of representations are formed by embodied agents; it is argued that these are different than from non-embodied arguments. This is an interesting question related to foundational AI and Alife questions, such as the symbol grounding problem. Unfortunately, the empirical investigations are insufficient. In particular, there is no comparison with a non-embodied control condition. The reviewers point this out, and the authors propose a different control condition, which unfortunately is not sufficient to test the hypothesis. + +This paper should be rejected in its current form, but the question is interesting and hopefully the authors will do the missing experiments and submit a new version of the paper.",ICLR2020, +fLW7S0Udx-b,1610040000000.0,1610470000000.0,1,uSYfytRBh-f,uSYfytRBh-f,Final Decision,Reject,"This paper studies how to efficiently expose failures of ""top-performing"" segmentation models in the real world and how to leverage such counterexamples to rectify the models. The key idea is to discover most ""controversial"" samples from massive online unlabeled images. The approach is sound, well grounded, and quite logical. Results demonstrate the effectiveness. + +However, there exists some limitations coming from R2 and R3, for example, 1) Segmentation benchmarks may not require pixel-level dense annotation. There are also examples of benchmarks where the groundtruth consists of computer segmentations corrected by humans. 2) It is much harder for segmentation data to be class-balanced in the pixel level, making highly skewed class distributions common for this particular task. 3) Citing the field of computer-assisted annotation as relevant work. + +In the end, I think that this paper may not be ready for publication at ICLR, but the next version must be a strong paper if above limitations can be well addressed.",ICLR2021, +Y_Hx7sgEFF,1576800000000.0,1576800000000.0,1,B1l6y0VFPr,B1l6y0VFPr,Paper Decision,Accept (Poster),"The paper studies the effect of various hyperparameters of neural networks including architecture, width, depth, initialization, optimizer, etc. on the generalization and memorization. The paper carries out a rather through empirical study of these phenomena. The authors also rain a model to mimic identity function which allows rich visualization and easy evaluation. The reviewers were mostly positive but expressed concern about the general picture. One reviewer also has concerns about ""generality of the observed phenomenon in this paper"". The authors had a thorough response which addressed many of these concerns. My view of the paper is positive. I think the authors do a great job of carrying out careful experiments. As a result I think this is a good addition to ICLR and recommend acceptance.",ICLR2020, +DSAGURM1td,1576800000000.0,1576800000000.0,1,BkgeQ1BYwS,BkgeQ1BYwS,Paper Decision,Reject,"There is insufficient support to recommend accepting this paper. The authors provided detailed responses, but the reviewers unanimously kept their recommendation as reject. The novelty and significance of the main contribution was not made sufficiently clear, given the context of related work. Critically, the experimental evaluation was not considered to be convincing, lacking detailed explanation and justification, and a sufficiently thorough comparison to strong baselines, The submitted reviews should help the authors improve their paper.",ICLR2020, +BkejuntglV,1544750000000.0,1545350000000.0,1,HJeKCi0qYX,HJeKCi0qYX,Reject,Reject,"Significant spread of scores across the reviewers and unfortunately not much discussion despite prompts from the area chair and the authors. The most positive reviewer is the least confident one. Very close to the decision boundary but after careful consideration by the senior PCs just below the acceptance threshold. There is significant literature already on this topic. The ""thought delta"" created by this paper and the empirical results are also not sufficient for acceptance.",ICLR2019,4: The area chair is confident but not absolutely certain +7EX-1OUtTPa,1642700000000.0,1642700000000.0,1,2aC0_RxkBL_,2aC0_RxkBL_,Paper Decision,Reject,"This paper investigates the role of representation learning when the distribution over the feature space has a long tail. The main motivation is to determine how much of the overall learning, in this case, is bottlenecked specifically by representation learning. The main findings are that vanilla learning gives brittle long-tailed representations, harming overall performance. The paper suggests a form of data augmentation to remedy this. Reviewers acknowledge that this investigation is worthwhile. However, many concerns were raised as to whether experiments support the drawn conclusions. A more principled approach to the data augmentation methodology is also needed. The authors address some of these, providing further experiments, but these were not enough to sway reviewers. Since results are fundamentally empirical in nature, this shortcoming indicates that the paper is not ready to share with the community just yet. Stronger experiments with clearer evidence are needed to fully support the thesis of the work.",ICLR2022, +TKbHXcH1tvs,1642700000000.0,1642700000000.0,1,qnQN4yr6FJz,qnQN4yr6FJz,Paper Decision,Accept (Oral),"While generative model can be used to input data, this work propose to a novel discriminative learning approach to optimize this data imputation phase by deriving a discriminative version of the traditional variational lower bound (ELBO). The resulting bound can be estimated without bias with Monte Carlo estimation leads to a practical approach, leading to encouraging experimental performances. + +The reviewers recognised the novelty and suggest that this approach, given its novelty and wide applicability, could be considered for an oral presentation.",ICLR2022, +rJI3oGLOl,1486400000000.0,1486400000000.0,1,ByQPVFull,ByQPVFull,ICLR committee final decision,Reject,"This paper was reviewed by three experts. While they find interesting ideas in the manuscript, all three point to deficiencies (lack of clean experiments, clarity in the manuscript, etc) and recommend rejection. I believe there are promising ideas here, and this manuscript will be stronger for a future deadline.",ICLR2017, +J6Z1erl2K_,1610040000000.0,1610470000000.0,1,uAX8q61EVRu,uAX8q61EVRu,Final Decision,Accept (Oral),"+ Interesting method for binaural synthesis from moving mono-audio ++ Nice insight into why l2 isn't the best loss for binaural reconstructions. ++ Interesting architectural choice with nice results. ++ Nicely motivated and clearly presented idea -- especially after addressing the reviewers comments. + +I agree with the idea of a title change. While I think its implied that the source is probably single source, making it explicit would make it clearer for those not working in a closely related topic. Hence, ""Neural Synthesis of Binaural Speech from Mono Audio"" as suggested in the review process sounds quite reasonable. +",ICLR2021, +HkllvxwklN,1544680000000.0,1545350000000.0,1,rJevYoA9Fm,rJevYoA9Fm,Meta-review,Accept (Poster),"This paper proposes an efficient method to compute the singular values of the linear map represented by a convolutional layer. It makes uses of the special block-matrix form of convolutional layers to construct their more efficient method. Furthermore, it shows that this method can be used to devise new regularization schemes for DNNs. The reviewers did note that the diversity of the experiments could be improved, and R2 raised concerns that the wrong singular values were being computed. The authors should add a section clarifying why the singular values of a convolutional linear map are not found directly by performing SVD on the reshaped kernel - indeed the number of singular values would be wrong. A contrast with the singular values obtained by simple reshaping of the kernel would also be helpful.",ICLR2019,3: The area chair is somewhat confident +tD7HMqvyv6U,1642700000000.0,1642700000000.0,1,qw674L9PfQE,qw674L9PfQE,Paper Decision,Reject,"Four experts reviewed the paper and provided mixed recommendations. All reviewers found the experimental results strong, but they have different views about the technical novelty. Three reviewers considered the technical novelty as a weakness of the paper, but Reviewer z4BR was less concerned about it than the other two. After AC carefully read the paper and the authors' responses, AC agreed with the reviewers that the combination of InfoLOOB and modern Hopfield networks, which were both existing works, is incremental despite the empirical results. Besides, AC agreed with Reviewer jmHN that the theoretical results are not significant enough and could be moved to Appendices. While the empirical results are strong, they could not answer how the trend would change with bigger models and bigger datasets. Hence, while the paper clearly has merit, the decision is not to recommend acceptance. The authors are encouraged to consider the reviewers' comments when revising the paper for submission elsewhere.",ICLR2022, +PB5z5DeAtX,1576800000000.0,1576800000000.0,1,H1eo9h4KPH,H1eo9h4KPH,Paper Decision,Reject,"This works relates adversarial robustness and Lipschitz constant regularization. After the rebuttal period reviewers still had some concerns. In particular it was felt that Theorem 1 could likely be deduced from known results in optimal transport, and it would be nice to make this connection explicit. There were still concerns about scalability. The authors are encouraged to continue with this work, considering the above points in future revisions. +",ICLR2020, +r4z7ByMW-oO,1610040000000.0,1610470000000.0,1,#NAME?,#NAME?,Final Decision,Reject,"This paper proposed to theoretically explain why a pre-trained embedding network with self-supervised training (SSL) can provide representation for downstream few-shot learning (FSL) tasks. The review process finds that the paper may over-claim the results and that the results seem unsatisfactory. Both Reviewer 4 and Reviewer 5 expressed concerns regarding the writing, organizing, and grammar errors of this paper. The paper needs a substantial revision to improve clarity and accessibility. As pointed out by Nikunj Saunshi’s public comment, this paper may benefit from discussing the differences from the previous works, including [1]. + +[1] Arora et al., A Theoretical Analysis of Contrastive Unsupervised Representation Learning, ICML 2019",ICLR2021, +mwoN47wjpyh,1610040000000.0,1610470000000.0,1,LXMSvPmsm0g,LXMSvPmsm0g,Final Decision,Accept (Poster),"This work extends the lottery ticket hypothesis to lifelong learning and, in particular, it tackles the problem of class incremental learning. This is an important and difficult problem, and of great interest to the community. The authors considered top down and bottom-up pruning strategies. The proposed approaches were validated on existing benchmarks (CIFAR10,CIFAR100, and Tiny-ImageNet), reaching state-of-the-art results, and showing that catastrophic forgetting could be alleviated. While some questions remain in terms of practical relevance, they authors showed the existence of winning tickets in the continual setting. There were concerns regarding clarity and requests for additional experiments, but all were convincingly addressed and the clarifications provided by the authors in their rebuttal further strengthened the paper.",ICLR2021, +nnqOsjlyKL,1576800000000.0,1576800000000.0,1,BJxnIxSKDr,BJxnIxSKDr,Paper Decision,Reject,"Reviewers put this paper in the lower half and question the theoretical motivation and the experimental design. On the other hand, this seems like an alternative general framework for solving large-scale multi-task learning problems. In the future, I would encourage the authors to evaluate on multi-task benchmarks such as SuperGLUE, decaNLP and C4. Note: It seems there's more similarities with Ruder et al. (2019) [0] than the paper suggests. + +[0] https://arxiv.org/abs/1705.08142",ICLR2020, +YkabTZ_BNmz,1610040000000.0,1610470000000.0,1,7wCBOfJ8hJM,7wCBOfJ8hJM,Final Decision,Accept (Poster),"This paper extends past work on kNN-augmentation for language modeling to the task of machine translation: a classic parametric NMT model is augmented with kNN retrieval from an external datastore. Decoder-internal token-level representations are used to index and retrieve relevant contexts (source + target prefix) that weigh-in during the final probability calculation for the next target word. Results are extremely positive across a range of MT setups including both in-domain evaluation and domain transfer. Reviews are thorough, but quite divergent. There is general agreement that the proposed approach is reasonable, well-motivated, and clearly described -- and further, that experimental results are both solid and relatively extensive. However, the strongest criticism concerns the paper's relationship with past work. In terms of ML novelty, everyone agrees (including the paper itself) that the proposed methodology is a relatively simple extension of past work on non-conditional language modeling. However, two of the four reviewers strongly feel that, in light of the potentially prohibitive decoding costs, the positive experimental results are not sufficient to make this paper relevant to an ICLR audience given the lack of ML novelty. In contrast, another reviewer strongly takes an opposite stand-point: rather, that the results will be extremely impactful to the MT subcommunity at ICLR since they are unexpected (i.e. that a non-parametric model might compete with highly-tuned NMT systems) and very positive across a range of domains and settings (i.e. in-domain, out-of-domain, multilingual) -- further, that the approach has substantial novelty in the context of MT where parametric models are the norm and that it might inspire substantial future work (e.g. on efficient decoding techniques and further non-parametric techniques) given that it so drastically breaks the current MT mold. The final reviewer shares the concern of the former two about novelty, but is swayed by the experimental results and potential uses for the model (given kNN augmentation is possible without further training) and therefore votes for a marginal accept. After thorough, well-reasoned, and well-intentioned discussion between all four reviewers, the reviews land just barely in favor of acceptance, but with substantial divide. After considering the paper, reviews, rebuttal, and discussion I am swayed by the argument that (a) these experimental results are largely unexpected, (b) they are both extremely positive and offer a new trade-off between test and train compute in MT, and (c) that the paper may therefore inspire substantial discussion and follow-up work in the community. Thus I lean in favor of acceptance overall.",ICLR2021, +pyypzkoXPB,1576800000000.0,1576800000000.0,1,HJx_d34YDB,HJx_d34YDB,Paper Decision,Reject,"There is no author response for this paper. The paper addresses the affective analysis of video sequences in terms of continual emotions of valence and arousal. The authors propose a multi-modal approach (combining modalities such as audio, pose estimation, basic emotions and scene analysis) and a multi-scale temporal feature extractor (to capture short and long temporal context via LSTMs) to tackle the problem. All the reviewers and AC agreed that the paper lacks (1) novelty, as the proposed approach is a combination of the existing well-studied techniques without explanations why and when this could be advantageous beyond the considered task, (2) clarity and motivation -- see R2’s and R3’s concerns and suggestions on how to improve. We hope the reviews are useful for improving the paper. +",ICLR2020, +s3mlf1QDWY,1576800000000.0,1576800000000.0,1,ryl5CJSFPS,ryl5CJSFPS,Paper Decision,Reject,"This submission investigates the properties of the Jacobian matrix in deep learning setup. Specifically, it splits the spectrum of the matrix into information (large singulars) and ``nuisance (small singulars) spaces. The paper shows that over the information space learning is fast and achieves zero loss. It also shows that generalization relates to how well labels are aligned with the information space. + +While the submission certainly has encouraging analysis/results, reviewers find these contributions limited and it is not clear how some of the claims in the paper can be extended to more general settings. For example, while the authors claim that low-rank structure is suggested by theory, the support of this claim is limited to a case study on mixture of Gaussians. In addition, the provided analysis only studies two-layer networks. As elaborated by R4, extending these arguments to more than two layers does not seem straighforward using the tools used in the submission. While all reviewers appreciated author's response, they were not convinced and maintained their original ratings. +",ICLR2020, +Eyl9TpHg8T,1576800000000.0,1576800000000.0,1,SyxXWC4KPB,SyxXWC4KPB,Paper Decision,Reject,This submission proposes to combine the CutMix data augmentation of Yun et al 2019 with the standard consistency loss of and the structured consistency loss of Liu et al 2019 and applies the resulting approach to the Cityscapes dataset. The reviewers were unanimous that the paper is not suitable for publication at ICLR due to a lack of novelty in the method. No rebuttal was provided.,ICLR2020, +D4dJeURIT4,1610040000000.0,1610470000000.0,1,Mwuc0Plt_x2,Mwuc0Plt_x2,Final Decision,Reject,"This paper proposes a hierarchical flow-based generative model to learn disentangled features at different levels of abstractions. The key technical contribution is a combination of renormalization group and flow-based models. The reviewers do find the idea interesting. However, the merit of the work with respect to StyleGAN and StyleFlow has not been well established. AR3 made the following comment: “Specially, compared with the style-based generator[1,2], …, I don’t find superiorities of the proposed method.” The authors responded to the comment briefly (but not convincingly) in their rebuttal. There is no mention of it in the revised paper. A proper account of the issue would require major revision to the paper.",ICLR2021, +4Nhb-4xKKsU,1610040000000.0,1610470000000.0,1,w_BtePbtmx4,w_BtePbtmx4,Final Decision,Reject,"In this paper, the authors proposed a new approach by the name of LoCal + SGD (Localized Updates) to replace the traditional Backpropagation method. The key idea is to selectively update some layers’ weights using localized learning rules, so as to reduce the computational complexity of training these layers so as to achieve a better tradeoff between overall speed and accuracy. +The paper received quite mixed reviewers. Some reviewers criticized the incremental nature of the proposed technology, while some other reviewers thought that this is one of the very early papers that demonstrates the practical effectiveness of localized learning. + +The reviewers have made several rounds of discussions, and as a result of that, we think while this direction (localized learning) is very important and promising, this particular paper might not have provided a sufficiently novel and good solution to it. Specifically, in terms of localized learning, this paper has not proposed brand new concepts or methodologies, instead it adopts existing methods in selective layers. In this sense, it does not really resolve the accuracy issue of localized learning, rather, it achieves the tradeoff by only applying localized learning in some layers. In other words, the current results still heavily rely on BP and has not brought a real breakthrough to localized learning. +",ICLR2021, +0UJ0v0KhtO,1576800000000.0,1576800000000.0,1,SJx4Ogrtvr,SJx4Ogrtvr,Paper Decision,Reject,"The article studies the behaviour of binary and full precision ReLU networks towards explaining differences in performance and suggests a random bias initialisation strategy. The reviewers agree that, while closing the gap between binary networks and full precision networks is an interesting problem, the article cannot be accepted in its current form. They point out that more extensive theoretical analysis and experiments would be important, as well as improving the writing. The authors did not provide a rebuttal nor a revision. ",ICLR2020, +QVONR6908j6,1642700000000.0,1642700000000.0,1,FFM_oJeqZx,FFM_oJeqZx,Paper Decision,Reject,"The reviewers were split on this paper: the positive review appreciated (a) how adaptive weighing can be viewed as part of energy minimization, (b) the flexibility of the model to work with different model backbones, (c) the demonstration that even in no-noise settings the method generates noticeable improvements. However, all reviews saw important shortcomings in the (a) few out-of-distribution results, (b) limited ablation studies, (c) clarity of the writing, particularly in notation, (d) explanations of experimental results (e.g., why using pseudolabels sometimes deteriorates performance), (e) assumptions behind the proposed method, (f) lack of self-training baselines, (g) limited technical novelty. Ultimately, the number and severity of the shortcomings outweigh the positive parts of the paper. If the authors take the reviewer’s recommendations into account the paper will be a much stronger submission.",ICLR2022, +ZeHGsJIdS_I,1642700000000.0,1642700000000.0,1,msRBojTz-Nh,msRBojTz-Nh,Paper Decision,Accept (Poster),"The paper compared different architectures of deep neural nets for learning full 3D turbulence simulations. On coarse grids, the proposed method predicts more accurately than the classical solvers, especially on preserving the high-frequency information. The reviews think the paper is clearly written with strong experiments. Pls include the suggested references in the final version.",ICLR2022, +HyBsUkTSM,1517250000000.0,1517260000000.0,855,BJlrSmbAZ,BJlrSmbAZ,ICLR 2018 Conference Acceptance Decision,Reject,"This paper shows that batch normalization can be cast as approximate inference in deep neural networks. This is an appealing result as batch normalization is used in practice in a wide variety of models. The reviewers found the paper well written and easy to understand and were motivated by underlying idea. However, they found the empirical analysis lacking and found that there was not enough detail in the main text to verify whether the claims were true. + +The authors empirically compared to a recent method showing that dropout can be cast as approximate inference with the claim that by transitivity they were comparing to a variety of recent methods. AnonReviewer1 casts significant doubt on the results of that work. This is very unfortunate and not the fault of the authors of this paper. The authors have since gone to great length to compare to Louizos and Welling, 2017. Unfortunately, that comparison doesn't appear to be complete in the manuscript. + +The main text was also lacking specific detail relating to fundamental parts of the proposed method (noted by all reviewers). + +Overall, this paper seems to be tremendously promising and the underlying idea potentially very impactful. However, given the reviews, it doesn't seem that this paper would achieve its potential impact. The response from the authors is appreciated and goes a long way to improving the paper. Taking the reviews into account, adding specific detail about the methodology and model (e.g. the prior) and completing careful empirical analysis will make this a strong paper that should be much more impactful.",ICLR2018, +ryxLhu1bx4,1544780000000.0,1545350000000.0,1,BJl6TjRcY7,BJl6TjRcY7,"reviews on balance lean negative, but recommend accept (is this excessive influence of the AC opinion?)",Accept (Poster),"Strengths: One-shot physics-based imitation at a scale and with efficiency not seen before. +Clear video, paper, and related work. + +Weaknesses described include: the description of a secondary contribution (LFPC) +takes up too much space (R1,4); results are not compelling (R1,4); prior art in graphics and robotics (R2,6); +concerns about the potential limitations of the linearization used by LFPC. + +The original reviews are negative overall (6,3,4). The authors have posted detailed replies. +R1 has posted a followup, standing by their score. We have not heard more from R2 and R3. + +The AC has read the paper, watched the video, and read all the reviews. +Based on expertise in this area, the AC endorses the author's responses to R1 and R2. +Being able to compare LFPC to more standard behavior cloning is a valuable data point for the community; +there is value in testing simple and efficient models first. +The AC identifies the following recent (Nov 2018) paper as being the closest work, which is not identified by the authors or the reviewers. The approach being proposed in the submitted paper demonstrates equal-or-better scalability, +learning efficiency, and motion quality, and includes examples of learned high-level behaviors. +An elaboration on HL/LL control: the DeepLoco work also learns mocap-based LL-control with learned HL behaviors. + although with a more dedicated structure. + Physics-based motion capture imitation with deep reinforcement learning + https://dl.acm.org/citation.cfm?id=3274506 + +Overall, the AC recommends this paper to be accepted as a paper of interest to ICLR. +This does partially discount R3 and R1, who may not have worked as directly on these specific problems before. + +The AC requests is rating the confidence as ""not sure"" to flag this for the program committee chairs, in light of the fact that this discounts the R1 and R3 reviews. +The AC is quite certain in terms of the technical contributions of the paper. +",ICLR2019,2: The area chair is not sure +U_bmRNn4UHx,1610040000000.0,1610470000000.0,1,#NAME?,#NAME?,Final Decision,Reject,"This paper introduces a novel convolution-like operator called ""optimal separable convolution"" which is based on minimizing number of operations given a fixed receptive field. Authors provide further empirical results to show the effectiveness of their proposed operator. + +Overall, this is a very interesting work. There is a consensus among reviewers that this work is well-motivated, novel and principled. However, reviewers have pointed to several issues that makes this a borderline paper and consequently none of the reviewers were willing to argue for the acceptance. After reading the paper, reviewers' comments and authors' response, I would summarize the main areas of improvements as follows: + +1- The ""optimal separable convolution"" is derived theoretically using ""volumetric receptive field condition"". However, this condition is not discussed and motivated enough in the paper. For example, different parametrization with the same volumetric receptive field could impose very different expressive power or implicit bias. Why is this not important? Adding discussions/experiments to motivate this condition would improve the paper. + +2- The derivations in Sections 2.3 and 2.4 are not well-presented and are hard to follow. I suggest authors to use the convention of having a formal Theorem statement followed by the proof. This is important since one of the main contributions of the paper is a principled derivation. + +3- All reviewers were concerned with the wall-clock time. Authors responded that theoretical #FLOPs is more important because wall-clock time is hardware dependent. However, authors reported the wall-clock time using CPUs. I understand that wall-clock time is hardware dependent but that only means algorithms that can have better wall-clock time on the current hardware are more likely to be useful because there is no guarantee that the hardware would be adjusted based on one algorithm especially if the promised improvement is not large enough. Therefore, I think reporting Wall-clock time on GPUs is important which was not done here. + +4- Even though authors mention several operators in Table 1, they only compare against depth separable conv in the experiments. Even based on FLOPs, the current empirical results are not very promising. For example: + +a) The gap between o-ResNet (the proposed method) and d-ResNet is not significant in Fig 3. In particular when #FLOPs is low, d-ResNet and o-ResNet have similar performance. + +b) In Tables 2 and 4, o-ResNet shows small improvements but uses more FLOPs. Even if authors can't exactly match #FLOPs, they should make sure that the proposed method uses less FLOPs than others not the other way around. + +c) In Table 3, authors only compare to ResNet and d-ResNet is removed. + +Considering the above issues, I think the paper is marginally below acceptance threshold. Given the novelty of the work, I want to encourage the authors to improve the paper by taking Reviewers' comments into account and resubmit their work. + +",ICLR2021, +4TpMS-im8P,1576800000000.0,1576800000000.0,1,rkly70EKDH,rkly70EKDH,Paper Decision,Reject,"The paper studies the amount of over-parameterization needed for a quadratic 2 /3 layer neural network to memorize a separable training data set with arbitrary labels. While the reviewers agree that this paper contains interesting results, the review process uncovered highly related prior work, which requires a major revision to put the current paper into perspective and generally various clarifications. The paper will benefit from a revision and resubmission to another venue, and is in its current form not ready for acceptance at ICLR-2020.",ICLR2020, +gf0H_mZzei-,1642700000000.0,1642700000000.0,1,e-JV6H8lwpl,e-JV6H8lwpl,Paper Decision,Reject,"The paper uses neural networks for system identification. The novelty of its contributions seems to be marginal, and the demonstration of its usefulness is not experimentally validated well enough.",ICLR2022, +hFYWTrdljP,1576800000000.0,1576800000000.0,1,S1lOTC4tDS,S1lOTC4tDS,Paper Decision,Accept (Spotlight),"This paper presents an approach to model-based reinforcement learning in high-dimensional tasks. The approach involves learning a latent dynamics model, and performing rollouts thereof with an actor-critic model to learn behaviours. This is extensively evaluated on 20 visual control tasks. + +This paper was favourably received, but there were concerns around it being incremental (relative to PlaNet and SVG). The authors highlighted the differences in the rebuttal, clarifying the novelty of this work. + +Given the interesting ideas presented, and the convincing results, this paper should be accepted.",ICLR2020, +h28fFsRRgT5,1642700000000.0,1642700000000.0,1,f9MHpAGUyMn,f9MHpAGUyMn,Paper Decision,Accept (Poster),"A new method for dynamic token normalization in ViTs (both within and across tokens) is introduced in the paper. As noted by the reviewers, the proposed method is technically sound, with a clear and solid motivation. The main raised concerns included the lack of experiments using larger models, unclear reason for the accuracy gains, and lack of experiments on other tasks beyond classification, such as detection and segmentation. The authors’ response was strong, clarifying other questions and providing additional experiments, for example, showing the effectiveness of the method on object detection, and when applied to larger models or architectures that explicitly model local context. Two reviewers recommend borderline rejection, but they did not participate in the discussion nor updated their reviews after the author response. The AC considers that their concerns were adequately addressed by the rebuttal, and agrees with the other two reviewers that the paper passes the acceptance bar of ICLR. The authors should carefully proofread the paper for the final version.",ICLR2022, +2OG8rGOVjB,1610040000000.0,1610470000000.0,1,DE0MSwKv32y,DE0MSwKv32y,Final Decision,Reject,"After reading the meta-reviews and the authors comment, the meta-reviewer thinks the paper is not ready for publication in a high-impact conference like ICLR. The paper is not well positioned with respect to the literature, and the proposed techniques are not well discussed in relation with predominant paradigms like optimism in the face of uncertainty.",ICLR2021, +Z5bgo8BaQqp,1610040000000.0,1610470000000.0,1,MD3D5UbTcb1,MD3D5UbTcb1,Final Decision,Reject,"The paper argues that GNNs can be understood as a graph signal denoising. While this interpretation is not surprising and not novel, the unified view does seem insightful according to some reviewers. Yet, it is not clear how much insight can be drawn from the presented theory, as no significantly better architecture or experimental results are presented. + +Additional criticism was raised wrt unclear relation between GAT and the graph signal denoising, the fact that analysis focused on one layer and does not explain the relations between layers and how nonlinear activation functions affect these theoretical findings, and that the objective of GNN cannot be viewed as a simple combination of graph denoising problems. Several reviewers complained that the paper is hard to follow. + +In light of the above, despite the significant efforts of the authors to address these issues in the rebuttal, we believe the paper is below the bar and recommend Rejection. ",ICLR2021, +s9nlt8thbT,1576800000000.0,1576800000000.0,1,Skl3SkSKDr,Skl3SkSKDr,Paper Decision,Reject,"This paper proposes a parametrisation of Euclidean distance matrices amenable to be used within a differentiable generative model. The resulting model is used in a WGAN architecture and demonstrated empirically in the generation of molecular structures. + +Reviewers were positive about the motivation from a specific application area (generation of molecular structures). However, they raised some concerns about the actual significance of the approach. The AC shares these concerns; the methodology essentially amounts to constraining the output of a neural network to be symmetric and positive semidefinite, which is in turn equivalent to producing a non-negative diagonal matrix (corresponding to the eigenvalues). As a result, the AC recommends rejection, and encourages the authors to include simple baselines in the next iteration. ",ICLR2020, +Au-lh0FsX06,1642700000000.0,1642700000000.0,1,nO5caZwFwYu,nO5caZwFwYu,Paper Decision,Accept (Poster),"This paper gives a framework for using learning in combinatorial optimization problems. In particular, active search is used to learn hueristics. The reviewers thought the paper had nice conceptual contributions for this approach and that the results would be very interesting to the community.",ICLR2022, +AMI3MH4IkBN,1610040000000.0,1610470000000.0,1,zUMD--Fb9Bt,zUMD--Fb9Bt,Final Decision,Reject,"Four reviewers have reviewed and discussed this submission. After rebuttal, two reviewers felt the paper is below acceptance threshold. Firstly, Rev. 1 and Rev. 2 were somewhat disappointed in the lack of analysis regarding non-linearities despite authors suggested this was resolved in the revised manuscript, e.g. Rev. 2 argued the paper without such an analysis is too similar to existing 'linear' models, e.g. APPNP, SGC, and so on. While Rev. 3 was mildly positive about the paper, they also noted that combining several linear operators is somewhat trivial. Overall, all reviewers felt there is some novelty in the proposed regularization term but also felt that contributions of the paper could have been stronger. While AC sympathizes with this submission and hopes that authors can improve this work, in its current form it appears marginally below the acceptance threshold.",ICLR2021, +BJeFD9qrxV,1545080000000.0,1545350000000.0,1,r1e13s05YX,r1e13s05YX,Important problem setup and strong evaluation,Accept (Poster),"The paper focuses on hybrid pipelines that contain black-boxes and neural networks, making it difficult to train the neural components due to non-differentiability. As a solution, this paper proposes to replace black-box functions with neural modules that approximate them during training, so that end-to-end training can be used, but at test time use the original black box modules. The authors propose a number of variations: offline, online, and hybrid of the two, to train the intermediate auxiliary networks. The proposed model is shown to be effective on a number of synthetic datasets. + +The reviewers and AC note the following potential weaknesses: (1) the reviewers found some of the experiment details to be scattered, (2) It was unclear what happens if there is a mismatch between the auxiliary network and the black box function it is approximating, especially if the function is one, like sorting, that is difficult for neural models to approximate, and (3) the text lacked description of real-world tasks for which such a hybrid pipeline would be useful. + +The authors provide comments and a revision to address these concerns. They added a section that described the experiment setup to aid reproducibility, and incorporated more details in the results and related work, as suggested by the reviewers. Although these changes go a long way, some of the concerns, especially regarding the mismatch between neural and black box function, still remain. + +Overall, the reviewers agreed that the issues had been addressed to a sufficient degree, and the paper should be accepted.",ICLR2019,4: The area chair is confident but not absolutely certain +Qt_JZF7kWT,1576800000000.0,1576800000000.0,1,SyxV9ANFDH,SyxV9ANFDH,Paper Decision,Accept (Poster),The authors propose a modification of the statistical recurrent unit for modelling mutliple time series and show that it can be very useful in practice for identifying granger causality when the time series are non-linearly related. The contributions are primarily conceptual and empirical. The reviewers agree that this is a useful contribution in the causality literature.,ICLR2020, +KruXrImMDDd,1610040000000.0,1610470000000.0,1,N0M_4BkQ05i,N0M_4BkQ05i,Final Decision,Accept (Poster),"This is an interesting paper discussing the impact of classifier abstention on the performance obtained for different groups of data. The reviewers are either very (scores of 8, 7 and 7) or moderately (score of 5) positive about the paper. The main concern is that the paper does not directly propose a solution for the discovered problems. Nevertheless, it can initiate interesting discussions and research around them.",ICLR2021, +JjNxacG9vkV,1610040000000.0,1610470000000.0,1,ct8_a9h1M,ct8_a9h1M,Final Decision,Accept (Poster),"This paper proposes an input-dependent dropout strategy, using variational inference to infer the rates. While the idea is a fairly straightforward variant of recent probabilistic dropout methods, the paper demonstrates consistent improvements across several types of NN layers (dense, convolutional, and attention) in large-scale experiments (e.g. ImageNet). The reviewers unanimously agreed on accepting the paper.",ICLR2021, +LI7Lo5ef3,1576800000000.0,1576800000000.0,1,S1e0ZlHYDB,S1e0ZlHYDB,Paper Decision,Reject,"Main content: Introduces Progressive Compressed Records (PCR), a new storage format for image datasets for machine learning training. +Discussion: +reviewer 4: Interesting application of progressive compression to reduce the disk I/O overhead. Main concern is paper could be clearer about setting. +reviewer 5: (not knowledgable about area): well-written paper. concern is that related work could be better, including state of the art on the topic. +reviewer 2: likes the topic but discusses many areas for improvement (stronger exeriments, better metrics reported, etc.). this is probably the most experienced reviewer marking reject. +reviewer 3: paper is well written. Main issue is that exeriments are limited to image classification tasks, and it snot clear how the method works on larger scale. +Recommendation: interesting idea but experiments could be stronger. I lean to Reject.",ICLR2020, +rkxVsCjblE,1544830000000.0,1545350000000.0,1,ryggIs0cYQ,ryggIs0cYQ,Well motivated simple idea and solution that work well in practice,Accept (Poster),"This paper proposes Switchable Normalization (SN) that leans how to combine three existing normalization techniques for improved performance. There is a general consensus that that the paper has good quality and clarity, is well motivated, is sufficiently novel, makes clear contributions for training deep neural networks, and provides convincing experimental results to show the advantages of the proposed SN.",ICLR2019,5: The area chair is absolutely certain +HketahWggV,1544720000000.0,1545350000000.0,1,BJG__i0qF7,BJG__i0qF7,"Promising work, but just below threshold in its current form",Reject,"Strengths: Execution of paper well received. Results on new dataset. Convincing demonstration that the proposed approach learns good semantic representations. + +Weaknesses: Reviewers felt the positioning with prior work was not as strong as it could be. Reviewers wanted to have seen an ablation study. + +Contention: Some general agreement among both the one positive reviewer and negative reviewer that the representation of prior work is skewed. + +Consensus: With two 5s and one 6 the numerical average of 5.33 is representative of the aggregated consensus opinion which is that the work is just below threshold in its current form.",ICLR2019,4: The area chair is confident but not absolutely certain +rJKy2ML_g,1486400000000.0,1486400000000.0,1,BJa0ECFxe,BJa0ECFxe,ICLR committee final decision,Reject,"The authors all agree that the theory presented in the paper is of high quality and is promising but the experiments are not compelling. The reviewers are concerned that the presented idea and connections to existing methods, while neat, may not be impactful as the promise of the theory does not bear out in practice. One reviewer is concerned that the presented theory is still not useful, stating that the ""information bottleneck thus only becomes meaningful when the capacity of the encoding network is controlled in some measurable way, which is not discussed in the paper"". In general, they seem to agree that the experimental evaluation is still preliminary and unfinished. As such, it would seem that the authors could make the paper far more compelling by demonstrating more compelling improvements on benchmark experiments and submitting to a future conference.",ICLR2017, +KzQDTZlZ1z,1610040000000.0,1610470000000.0,1,F9sPTWSKznC,F9sPTWSKznC,Final Decision,Reject,"This paper introduces a dataset and a trained evaluation metric for evaluating discourse phenomena for MT. Several context-aware MT models are compared against a sentence level baseline. The paper develops metrics which evaluate the models according to their performance on four discourse phenomena: anaphora, lexical consistency, coherence and readability, and discourse connective translation. Data is released for three language pairs (all using English as the target language). + +First, I’d like to point out that creating datasets and benchmarks for analyzing/evaluating discourse-level errors in machine translation is an extremely valuable contribution. This paper is addressing a very relevant problem and even though there is no new model/method/algorithm being proposed, this work *fits* this conference - it is my opinion that the community should welcome and value more than it currently does the efforts spent in creating high quality datasets that can help make progress in the field. + +There was substantial discussion among reviewers about this paper. + +The main weaknesses raised by the reviewers were: +- Limited information about the process to create the anaphora test, which was a contribution of prior work (Jwalapuram et al. (2019) - this was addressed in the updated version; but the anaphora challenge sets seem to be only a minor update over previous work. +- All language pairs use English as the target language, and it is not simple to extend this approach beyond English target languages. +- Lack of detail on how BLEU scores were computed (tokenised? true cased? My recommendation is to use sacrebleu) - this was clarified in the rebuttal. +- The evaluated NMT models all date from 2018 or earlier. +- Two of the 4 benchmarks (anaphora and coherence) are evaluated by neural models trained on WMT outputs, which makes the interpretation of scores is opaque, and their validity is unclear. + +While the creation of a benchmark for discourse evaluation of MT is a laudable effort as mentioned above, it is my opinion that due to some of the weaknesses above the current version of this work is not yet ready for publication. However, I strongly encourage the authors to improve upon these points and resubmit their work to another venue. I list some suggestions below to improve this paper. + +My biggest concern with the current version is the last weakness above. As pointed out by a reviewer, the framework of Jwalapuram et al. (2019) provides empirical support for the model's sensitivity (if there is a pronoun error, does the metric pick it up?). But they don’t necessarily capture model *specificity* (if the metric ranks one output higher, can we be confident that this is because of a pronoun translation error?). For the coherence metric, authors make an argument that their metric is sensitive to coherence issue, but concerns remain about whether it is sufficiently specific to these issues. In the rebuttal, authors argue that BLEURT is sentence-level, but they could easily aggregate sentence-level judgments and report correlation between BLEURT and human coherence judgments to show that their metric correlates better with human coherence judgments than BLEURT or even just BLEU. Besides BLEURT, I would add there are other recently proposed metrics that may capture discourse phenomena (neural metrics trained against MQM annotations or sentence-level human assessments with document context) and should be compared against: check COMET [1] or PRISM [2] (the latter is sentence-based but could be adapted for paragraphs or documents). + +There is also prior work comparing various context-aware machine translation approaches against a sentence-level baseline, some with negative findings [3,4,5]. I suggest the authors look at this related work in future iterations of their paper. + +[1] https://arxiv.org/pdf/2009.09025.pdf + +[2] https://arxiv.org/pdf/2004.14564.pdf + +[3] https://www.aclweb.org/anthology/2020.eamt-1.24.pdf + +[4] https://arxiv.org/pdf/1910.00294.pdf + +[5] https://www.aclweb.org/anthology/2020.emnlp-main.81.pdf +",ICLR2021, +b5LZ27x4sN6,1642700000000.0,1642700000000.0,1,iC4UHbQ01Mp,iC4UHbQ01Mp,Paper Decision,Accept (Oral),"The paper studies attacks on the self-supervised training pipeline of multi-modal models, e.g., CLIP and related models. The reviewers agree that the poisoning results are impressive in that they achieve good poisoning success with a fairly small number of samples. The threat model is fairly specific to one (high profile) type of self-supervised training, but the concepts presented are likely portable to the study of other related training pipelines.",ICLR2022, +LnBDFM0v-D,1576800000000.0,1576800000000.0,1,Bkxonh4Ywr,Bkxonh4Ywr,Paper Decision,Reject,"This paper presents a method for speeding up Gaussian process inference by leveraging locality information through k-nearest neighbours. + +The key idea is well-motivated intuitively, however the way in which it is implemented seems to introduce new complications. One such issue is KNN overhead in high dimensions, but R1 outlines other potential issues too. Moreover, the method's merit is not demonstrated in a convincing way through the experiments. The authors have provided a rebuttal for those issues, but it does not seem to solve the concerns entirely. +",ICLR2020, +cavq2tiijOp,1642700000000.0,1642700000000.0,1,rUwm9wCjURV,rUwm9wCjURV,Paper Decision,Accept (Poster),"The paper considers the problem of learning to carry out novel, multi-task instructions specified via temporal logic using deep reinforcement learning. A specific focus of the paper is improving generalization to test-time instructions that differ from those encountered during training. To facilitate this generalization, the proposed architecture encodes a latent specification of the goal according to the given instruction and environment state that is then combined with a task-agnostic environment embedding. Experiments on grid-like domains demonstrate that the proposed framework outperforms recent deep RL approaches to satisfying temporal logic-based instructions. + +The instruction-following problem has long been of interest in the robotics, ML, and broader AI communities dating back several decades. The problem has received renewed attention in the last few years, largely as a target for neural network-based multi-view and RL learning architectures. The primary contribution of this paper is the proposed extension of existing deep RL approaches to reason over a learned, latent goal specification as a means of improving generalization to novel test-time utterances. The approach is sound and several reviewers agree that the ablation studies together with comparisons to contemporary deep RL architectures support the advantage of these inductive biases. The reviewers raised initial concerns regarding the statistical significance of the results and the clarity of the presentation. The authors provided detailed feedback to the reviewers and updated the paper to address many of these concerns, largely satisfying two of the reviewers. + +However, concerns remain that the paper doesn't adequately position this work in the context of the decades worth of research in instruction-following. Early work in this area focused on interpreting highly structured instructions (e.g., formal logic-based), first using rule-based methods, and then parsers trained via supervised learning. Over the past decade, however, the field has largely moved towards learning to follow instructions conveyed in ""natural"" language, which brings with it a significant number of challenges, including the assumption that test-time instructions will inherently be out-of-distribution. That is not to say that the contributions of the paper aren't interesting---they are, but in the relatively narrow scope of deep RL-based approaches to following structured, temporal logic instructions.",ICLR2022, +z6UJ8Id5cX,1576800000000.0,1576800000000.0,1,Hke_f0EYPH,Hke_f0EYPH,Paper Decision,Reject,"This paper studies the problem of certified robustness to adversarial examples. It first demonstrates that many existing certified defenses can be viewed under a unified framework of regularization. Then, it proposes a new double margin-based regularizer to obtain better certified robustness. Overall, it has major technical issues and the rebuttal is not satisfying.",ICLR2020, +eVEOVWJhk28,1642700000000.0,1642700000000.0,1,#NAME?,#NAME?,Paper Decision,Reject,"The paper proposes a new approach to inductive rule prediction for knowledge graph completion. Reviewers highlighted as strengths that the paper proposes an interesting approach to an important problem that is relevant for the ICLR community. However, reviewers raised also concerns regarding model design and correctness as well as clarity of presentation (e.g., motivation, analysis, comparison to related work, evaluation). After author response and discussion, all reviewers and the AC agree that the paper is not yet ready for publication at ICLR due to the aforementioned issues.",ICLR2022, +xR6-yN0JpPd,1642700000000.0,1642700000000.0,1,AsDSpwXYGeT,AsDSpwXYGeT,Paper Decision,Reject,"The paper provides an analysis of the well known method of Iterative Magnitude Pruning (IMP) for DNN compression. The problem tackled is undoubtably an important one, and IMP is likely one of the most known solutions for DNN compression. As such, there is no doubt that the paper is well motivated. In addition to the motivated task, the reviews indicate that the paper is well written and provides a thorough review of the related literature, making the paper easy to read and follow. +The main weakness of the paper seems to its novelty, as it seems that similar analyses have been done in the past. This issue was raised by the reviews and remained after the correspondence with the authors: + + WMeJ: “As I described previously the consistent references and experimental structure borrowed from existing work hinder the novelty of the work”, dL1d: “While the paper introduces inspiring findings on how SLR (or CLR) help IMP, most components are from existing techniques”. + +Given the discussion and concerns related to the novelty of the paper, I feel that the paper requires too major of a revision to be accepted, either improving its core analysis, or presenting it in a better way that clearly distinguishes it from previous art.",ICLR2022, +fPjrfR2Q1_,1576800000000.0,1576800000000.0,1,HyeYTgrFPB,HyeYTgrFPB,Paper Decision,Accept (Poster),"This paper describes a new method for creating word embeddings that can operate on corpora from more than one language. The algorithm is simple, but rivals more complex approaches. + +The reviewers were happy with this paper. They were also impressed that the authors ran the requested multi-lingual BERT experiments, even though they did not show positive results. One reviewer did think that non-contextual word embeddings were of less interest to the NLP community, but thought your arguments for the computational efficiency were convincing.",ICLR2020, +GDrWHSQCYDg,1610040000000.0,1610470000000.0,1,tq5JAGsedIP,tq5JAGsedIP,Final Decision,Reject,"The paper is concerned with learning representations for time-varying graphs which is an important problem that is relevant to the ICLR community. For this purpose the authors propose a new method to extend skip-gram with negative sampling to higher-order tensors with the goal to perform an implicit tensor factorization of time-varying graphs.The proposed approach shows promising experimental improvements compared to previous methods. Reviewers highlighted also the tasks considered in the paper as well as the theoretical and qualitative analysis as further positive aspects. + +However, there exist still concerns regarding the current version of the manuscript. In particular, reviewers raised concerns regarding the novelty of the approach (SGNS, its extension to higher-order tensors, as well as the connection to PMI have been studied in the literature). As such the new technical contributions are limited. Reviewers raised also concerns regarding the scalability of the method and its applicability to large graphs. The revised version addresses this concern to some extent by showing experiments on mid-sized graphs with 2000/5000 nodes. While this clearly improves the paper, I agree with the majority of the reviewers that the manuscript requires an additional revision to iron out the points raised in this round of reviews. However, the presented results are indeed promising and I'd encourage the authors to revise and resubmit their work considering the reviewers' feedback.",ICLR2021, +VqsOzU6P66,1610040000000.0,1610470000000.0,1,tckGH8K9y6o,tckGH8K9y6o,Final Decision,Reject,"In this paper, the authors proposed a new variant of the Wasserstein autoencoder (WAE), which matches the joint distribution of data and the latent codes induced by the encoder and the joint distribution induced by the decoder in the framework of optimal transport. Because of matching the distributions that are not considered by existing autoencoders like WAEs or VAEs, I agree with the authors that the proposed method is novel to some degrees. + +However, the experimental part does not support the superiority of the method well. For example, some reviewers (including me) think the results of the baselines shown in Figures 8-10 are underestimated. According to my personal experience, the WAE should perform much better on CelebA than that shown in Figure 10. The experiments in Figures 11-17 provide more reasonable results, but the advantage of the proposed method is not convincing. + +Here is my suggestion: +1) Because the proposed method can achieve flexible prior, besides randomly generating data, the authors can consider adding some experiments on conditional generation, i.e., generating data from a single modality of the learned prior. I believe the proposed method will be more convincing if it can show some advantages in the conditional generation task. +2) The runtime comparison for the method and the baselines in the training phase should be discussed. +3) The short name ""SWAE"" is in conflict with an earlier work ""Sliced Wasserstein Autoencoder"", which is also called ""SWAE"".",ICLR2021, +uA_KA4HMttZ,1610040000000.0,1610470000000.0,1,49mMdsxkPlD,49mMdsxkPlD,Final Decision,Reject,"I think there is a lot to commend in this paper: the general approach for training f_phi in this way is creative and interesting, the discussion of the amortization gap is thought-provoking, and the general idea is not something that I have seen in the literature before. That said, the reviewers raise a number of important concerns about the approach, chief of which is that the paper's explanation for why the method works is questionable. The proposed method can be viewed as simply a more expressive policy architecture. In fact, I suspect strongly that the modest increase in performance is likely explained by this alone. The discussion with Reviewer 2 in particular makes this issue very clear. I don't think the authors offered a very compelling response to this. Therefore, I think there are just too many question marks about this approach to accept the paper for publication at this time. + +I do however think that this line of research is very promising, and I would encourage the authors to continue this work and flesh out the evaluation to be more rigorous and complete, understand whether the increase in performance comes down simply to increased expressivity or if the discussion of amortization is closer to reality, and also address other concerns raised by R2 and the other reviewers.",ICLR2021, +rB9MrR0YC0_,1642700000000.0,1642700000000.0,1,gxRcqTbJpVW,gxRcqTbJpVW,Paper Decision,Reject,"The paper proposes a pruning approach that regularizes the gram matrix of convolutional kernels to encourage kernel orthogonality among the important filters meanwhile driving the unimportant weights towards zero. While the reviewers found the proposed method well-motivated and intuitive, they believe that the proposed claims are of limited novelty and are not supported well by the experiments. Analyzing and explaining the effect of different parts of the proposed method, i.e., orthogonalization and regularization of batch normalization parameters, on the accuracy of the pruned models would significantly improve the manuscript.",ICLR2022, +5OPLk_z4sS,1576800000000.0,1576800000000.0,1,BJl8ZlHFwr,BJl8ZlHFwr,Paper Decision,Reject,"This paper proposes a relation-based model that extends VAE to explicitly alleviate the domain bias problem between seen and unseen classes in the setting of generalized zero-shot learning. + +Reviewers and AC think that the studied problem is interesting, the reported experimental results are strong, and the writing is clear, but the proposed model and its scientific reasoning for convincing why the proposed method is valuable is somewhat limited. Thus the authors are encouraged to further improve in these directions. In particular: + +- The idea of using a variant of the widely-used domain discriminator to make seen and unseen classes distinguishable is somewhat contradicted to the basic principle of zero-shot learning. How to trade off the balance between seen and unseen classes has been an important problem in generalized ZSL. These problems need further elaboration. + +- The proposed model itself is not a real ""VAE"", making the value of an extensive derivation based on variational inference less prominent. + +- There is also the need to compare with the baselines mentioned by the reviewers. + +Overall, this is a borderline paper. Since the above concerns were not addressed convincingly in the rebuttal, I am leaning towards rejection.",ICLR2020, +vpJUj0AYD,1576800000000.0,1576800000000.0,1,BJepq2VtDB,BJepq2VtDB,Paper Decision,Reject,"The paper presented an adaptive stochastic gradient descent method with layer-wise normalization and decoupled weight decay and justified it on a variety of tasks. The main concern for this paper is the novelty is not sufficient. The method is a combination of LARS and AdamW with slight modifications. Although the paper has good empirically evaluations, theoretical convergence proof would make the paper more convincing. ",ICLR2020, +42owlipPkB,1576800000000.0,1576800000000.0,1,H1x-3xSKDr,H1x-3xSKDr,Paper Decision,Reject,"This article studies the effects of BN on robustness. The article presents a series of experiments on various datasets with noise, PGD adversarial attacks, and various corruption benchmarks, that show a drop in robustness when using BN. It is suggested that a main cause of vulnerability is the tiling angle of the decision boundary, which is illustrated in a toy example. +The reviewers found the contribution interesting and that the effect will impact many DNNs. However, they the did not find the arguments for the tiling explanation convincing enough, and suggested more theory and experimental illustration of this explanation would be important. In the rebuttal the authors maintain that the main contribution is to link BN and adversarial vulnerability and consider their explanation reasonable. In the initial discussion the reviewers also mentioned that the experiments were not convincing enough and that the phenomenon could be an effect of gradient masking, and that more experiments with other attack strategies would be important to clarify this. In response, the revision included various experiments, including some with various initial learning schedules. The revision clarified some of these issues. However, the reviewers still found that the reason behind the effect requires more explanations. In summary, this article makes an important observation that is already generating a vivid discussion and will likely have an impact, but the reviewers were not convinced by the explanations provided for these observations. +",ICLR2020, +5IiGJ5BP83-,1610040000000.0,1610470000000.0,1,qrwe7XHTmYb,qrwe7XHTmYb,Final Decision,Accept (Poster),"This paper is a study of neural network scaling, with models containing hundred of billions of parameters. To that end, the paper introduce a new module called GShard, consisting of annotations APIs on how to split computations across accelerators, which is integrated in the XLA compiler. This enables the training of models with hundreds billions of parameters. To scale efficiently to very large models, the paper proposes to use transformer networks, where every other feed forward sub-layer is replaced by a sparse mixture of experts (similar to Shazeer et al. 2017). This model is then evaluated on a multilingual machine translation task, from 100 languages to English. + +On the one hand, I believe that the contributions of the paper are significant: scaling to 600B parameters, and showing that this leads to better translation quality are important achievement. The analysis of transformer networks scaling could also have an important impact. Finally I think that GShard and its integration in XLA could be very valuable. On the other hand, I agree with some of the concerns raised by the reviewers, regarding the writing of the paper and the reproducibility. I found the paper not well written, and hard to identify the differences with previous work. As GShard is one of the main contribution, I would expect a better description of it in the main text (compared to the MoE which seems more incremental). Regarding reproducibility, I do not think that the authors provided a good reason not to evaluate on standard benchmarks: the test sets could be excluded from the train set through various deduplication heuristics. + +To conclude, I am leaning toward accepting the paper, but believe it is borderline. The reason is that the contributions are significant, and worth publishing. But I would not oppose a rejection based on the reproducibility and writing issues.",ICLR2021, +Br72Ua5za1,1576800000000.0,1576800000000.0,1,r1gc3lBFPH,r1gc3lBFPH,Paper Decision,Reject,"Main summary: Design an effective and economical model which spots keywords about pests and disease from community radio data in Luganda and English. + +Discussions: +all reviewers vote on rejecting the paper, due to lack of generalizability, training and evaluation discussion need work +Recommendation: Reject",ICLR2020, +UN16ATEFU,1576800000000.0,1576800000000.0,1,ryenvpEKDr,ryenvpEKDr,Paper Decision,Accept (Poster),"The paper presents a very interesting idea for estimating the held-out error of deep models as a function of model and data set size. The authors intuit what the shape of the error should be, then they fit the parameters of a function of the desired shape and show that this has predictive power. I find this idea quite refreshing and the paper is well written with good experiments. Please make sure that the final version contains the cross-validation results provided during the rebuttal.",ICLR2020, +YwgJcxOgkVr,1610040000000.0,1610470000000.0,1,ccwT339SIu,ccwT339SIu,Final Decision,Reject,"The paper initially had mixed reviews (4,5,6). The main issues raised were: +1) limited novelty (re-using/integrating components) [R2]; +2) limited generalization ability since the model needs to be retrained on every video [R2, R3]; +3) limited applicability - experiments limited to certain domain of video, while results on videos with large motion are not convincing [R2, R3]; +4) missing ablation studies / experiments [R3, R4]. + +The author response partially addressed some concerns, but the main points 1-3 are still problematic. In addition, the AC noted that the technical aspect was lacking: + +- Training with contrastive loss on a single video may likely overfit the embedding to the video, which leads to a meaningless embedding where all non-neighboring segments are orthogonal in the embedding space. While changing the softmax temperature can yield higher entropy transition probabilities, the induced probability distribution is probably highly noisy. It would be better to train this on a large video corpus, which will prevent overfitting. Also contrastive loss is typically used to build a discriminative embedding space for classification/recognition, not a smooth embedding space for generation (where distances between embedding vectors are strongly correlated to similarity). Thus some other embedding smoothness terms could be added during contrastive learning. +- The learning is only on the transition probabilities, while the video generation is separate. It would have been more convincing to learn the transition probabilities with the video generation process in an end-to-end manner. Perhaps a discriminator could be placed after the video generator so that the transition probabilities could be learned so as to better mimic real video. Other loss terms based on video temporal smoothness could also be added ensure smoother transitions between clips (e.g., motion consistency). + +The negative reviewers remained unconvinced by the author response, and the AC agreed with their concerns. Thus, the paper was recommended for rejection.",ICLR2021, +sIkdp3GQoaW,1642700000000.0,1642700000000.0,1,kK3DlGuusi,kK3DlGuusi,Paper Decision,Reject,"Reviewer rRp9 expressed concerns regarding the theoretical results included in Appendix A. In the discussion (not visible to the authors), the AC and Reviewer zn4a agree that the exposition in the original manuscript was confusing and could lead readers to assume these results were valid for the proposed algorithm. Also, in the original manuscript the presentation of the theoretical results in the appendix was quite poor (e.g. Proposition A.1). Having said that, the contributions and main points of the work are not affected by these observations as it is mainly an empirical study. + +Following from the previous point, Reviewers rRp9 and zn4a pointed out that the overall presentation of the method, particularly the mathematical presentation could be improved. + +Reviewer zn4a points out that the method is not particularly novel, this was also indicated as a weakness by Reviewer iyVU. The main contributions of the work are to simultaneously solve the tensor factorization and vector quantization problems usinga form of projected gradient descent (with hard-thresholding). While the empirical results seem promising, are somewhat limited. The authors could make them stronger by studying other applications on top of image classification (e.g. semi-supervised setting, object detection or segmentation). + +In the discussion (not visible to the authors), Reviewer iyVU stated in light of the other reviews, he/she does not oppose rejecting the work. + +Overall, the method is technically sound and produces promising results. In its current form, however, the paper is not yet ready for publication. The AC encourages the authors to incorporate the feedback and resubmit the work to a different venue.",ICLR2022, +H1xonh9BlN,1545080000000.0,1545350000000.0,1,BJl4f2A5tQ,BJl4f2A5tQ,"A valuable direction, needs more systematic analysis into possible causes of negative results ",Reject,"The paper addresses questions on the relationship between model-free and model-based reinforcement learning, in particular focusing on planning using learned generative models. The proposed approach, GATS, uses learned generative models for rollouts in MCTS, and provide theoretical insights that show a favorable bias-variance tradeoff. Despite this theoretical advantage, and high-quality models, the proposed approach fails to perform well empirically. This surprising negative results motivates the paper and providing insights on it is the main contribution. + +Based on the initial submitted version, the reviewers positively emphasized the need to understand and publish important negative results. All reviewers and the AC appreciate the import role that such a contribution can bring to the research community. Reviewers also note the careful discussion of modeling choices for the generative models. + +The reviewers also noted several potential weaknesses. Central were the need to better motivate and investigate the hypothesis proposed to explain the negative results. Several avenues towards a better understanding were proposed, and many of these were picked up by the authors in the revision and rebuttal. A novel toy domain ""goldfish and gold bucket"" was introduced for empirical analysis, and experiments there show that GATS can outperform DQN when a longer planning horizon is used. + +The introduced toy domain provides additional insights into the relationship between planning horizon and GATS / MCTS performance. However, it does not address key questions around why the negative result is maintained. The authors hypothesize that the Q-value is less accurate in the GATS setting - this is something that can be empirically evaluated, but specific evidence for this hypothesis is not clearly shown. Other forms of analysis that could shed further light on why the specific negative result occurs could be to inspect model errors. For example, if generated frames are sorted by the magnitude of prediction errors - what are the largest mistakes? Could these cause learning performance to deteriorate? + +The reviewers also raised several issues around the theoretical analysis, clarity (especially of captions) and structure - these were largely addressed by the revision. The concern that most strongly affected the final evaluation is the limited insight (and evidence) of the factors that influence performance of the proposed approach. Due to this, the consensus is to not accept the paper for publication at ICLR at this stage.",ICLR2019,4: The area chair is confident but not absolutely certain +rylZtocMlN,1544890000000.0,1545350000000.0,1,ByxkijC5FQ,ByxkijC5FQ,"A topological complexity measure of neural networks based on persistent 0-homology of weights, with a new early stopping criterion.",Accept (Poster),"The paper presents a topological complexity measure of neural networks based on persistence 0-homology of the weights in each layer. Some lower and upper bounds of the p-norm persistence diagram are derived that leads to normalized persistence metric. The main discovery of such a topological complexity measure is that it leads to a stability-based early stopping criterion without a statistical cross-validation, as well as distinct characterizations on random initialization, batch normalization and drop out. Experiments are conducted with simple networks and MNIST, Fashion-MNIST, CIFAR10, IMDB datasets. + +The main concerns from the reviewers are that experimental studies are still preliminary and the understanding on the observed interesting phenomenon is premature. The authors make comprehensive responses to the raised questions with new experiments and some reviewers raise the rating. + +The reviewers all agree that the paper presents a novel study on neural network from an algebraic topology perspective with interesting results that has not been seen before. The paper is thus suggested to be borderline lean accept. +",ICLR2019,5: The area chair is absolutely certain +7O7QiO5vFI,1576800000000.0,1576800000000.0,1,rkeu30EtvS,rkeu30EtvS,Paper Decision,Accept (Spotlight),"This paper presents a feature normalization method for CNNs by decorrelating channel-wise and spatial correlation simultaneously. Overall all reviewers are positive to the acceptance and I support their opinions. The idea and implementation is relatively straightforward but well-motivated and reasonable. Experiments are well-organized and intensive, providing enough evidence to convince its effectiveness in terms of final accuracy and convergence speed. Also, it’s analogy to biological center-surrounded structure is thought provoking. The novelty of the method seems somewhat incremental considering that there already exists a channel-wise decorrelation method, but I think the findings of the paper are interesting and valuable enough for ICLR community and would like to recommend acceptance. +Minor comments: I recommend authors to mention about zero-component analysis (ZCA) normalization, which has been a standard input normalization method for CIFAR datasets. I guess it is quite similar to the proposed method considering 1x1 convolution. Also, comparison with other recent normalization methods (e.g., Group Norm) would be useful. +",ICLR2020, +rRwTsnDn0D,1610040000000.0,1610470000000.0,1,1Q-CqRjUzf,1Q-CqRjUzf,Final Decision,Reject,"Fitting a neural net is a stochastic process, with many sources of stochasticity, including initialization, batch presentation, data augmentation, non-deterministic low-level operations and the non-associativity of rounding errors in multi-threads systems such as GPUs and TPUs. In this paper, the authors aim to alleviate this randomness by incorporating specific regularizers during learning or by using co-distillation. + +As the reviewers pointed out, the paper is quite clearly written, but the motivation for this work is not clear. The example of system updates does not correspond to the current study that targets the internal variability of the learning process. Reproducibility is an important issue, but in a statistical context, why would it be relevant to assess reproducibility by the individual decisions made by a single estimate? The usual way of assessing learning algorithms is to look at (a summary of) the distribution of performance for a given learning problem characterized by a data distribution, not to look at individual decisions made by a particular estimate. Furthermore, this study ignores the randomness due to the selection of hyper-parameters. Why would the partial reproducibility studied here, for a fixed choice of hyper-parameters, be of particular interest? As is, either the work is ill-defined and incomplete, or it lacks a clear rationale, and I thus recommend rejection. + +I would also like to point a reproducibility issue in the proposed experimental study. The exact meaning of the variability measures reported in the tables is not given, but I assume that it is the standard deviation of the different runs (for example, the 5 replicates in Table 1). These figures are not directly related to the variability of each setup, as they ignore the variability due to the random selections during the ablation study (for example, as I understand it, the last result of Table 1 was obtained for a single arbitrary initialization and a single arbitrary batch order). +",ICLR2021, +1cr9S0Bbtu,1576800000000.0,1576800000000.0,1,Syx7A3NFvH,Syx7A3NFvH,Paper Decision,Accept (Poster),"The paper focuses on multi-agent reinforcement learning applications in network systems control settings. A key consideration is the spatial layout of such systems, and the authors propose a problem formulation designed to leverage structural assumptions (e.g., locality). The authors derive a novel approach / communication protocol for these settings, and demonstrate strong performance and novel insights in realistic applications. Reviewers particularly commended the realistic applications explored here. Clarifying questions about the setting, experiments, and results were addressed in the rebuttal, and the resulting paper is judged to provide valuable novel insights.",ICLR2020, +rkglszG1gV,1544660000000.0,1545350000000.0,1,SJl7DsR5YQ,SJl7DsR5YQ,Meta-review,Reject,"The authors consider the interesting and important problem of how to train a robust driving policy without allowing unsafe exploration, an important challenge for real-world training scenarios. They suggest that both good and intentionally bad human demonstrations could be used, with the intuition being that humans can readily produce unsafe exploration such as swerving which can then be learnt using both positive and negative regressions. The reviewers all agree that the paper would not appeal to or have relevance for the wider community. The reviewers also agree that the main ideas are not well presented, that some of the claims are confusing, and that the writing is not technical enough. They also question the thoroughness of the empirical validation. ",ICLR2019,5: The area chair is absolutely certain +J6kFgigDo-8,1642700000000.0,1642700000000.0,1,eELR-4Dk4U8,eELR-4Dk4U8,Paper Decision,Reject,"The reviewers raised concerns and the authors have not provided a response. All reviewers concur that this paper should be rejected at this time, and I agree.",ICLR2022, +8uYQ9W1Nxz,1642700000000.0,1642700000000.0,1,YfFWrndRGQx,YfFWrndRGQx,Paper Decision,Reject,"This paper looks at a formulation of online multi-objective optimization problem. + +All reviewers agree on the score, 6, which is quite rare but is not really informative; none of them are very excited about the paper, but they all find it interesting. + +I have read it as well myself. The paper is rather clear and well written. I have three majors concerns. +1) I am not fully convinced by the objective R_{MOD} as it reduces to the dynamic regret in the single objective problem, as the later cannot be minimized unless we make strong stationarity assumption. This is obviously the case here (see Assumption 2). Then the choice of parameters would depend on some ""stationarity"" quantity (V_T). I am not really enthusiastic about this either. +2) The analysis is rather classical once the problem is reduced to a single-objective, so the analysis is not really breathtaking. Yet I admit that I quite enjoyed reading about this reduction, the idea is quite neat. +3) Multi-objective online optimization has already been considered in online learning, but the related works did not really mention it. For instance, Blackwell approachability is such an example [1,2,3] (yet I am not sure that it can cover the Pareto front idea). It would be interesting to see how those approach compares (notably, the online mirror descent has been widely studied in that case). + +All in all, I do understand the reviewers, and this paper is certainly borderline, but I do not think it reaches the acceptance bar yet. As a consequence, I would rather recommend rejection this year. + +[1] J. Abernethy, P. Bartlett, D. Hazan. Proceedings of the 24th Annual Conference on Learning Theory, PMLR 19:27-46, 2011. +[2] V. Perchet. Approachability, regret and calibration: Implications and equivalences, Journal of Dynamics & Games,181-254, 2014. +[3] A. Rakhlin, K. Sridharan, and A. Tewari. Online learning: Beyond regret. Proceedings of the 24th Annual Conference on Learning Theory, PMLR, 19:559–594, 2011.",ICLR2022, +BkSHNypHf,1517250000000.0,1517260000000.0,342,rk3pnae0b,rk3pnae0b,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"The pros and cons of the paper under consideration can be summarized below: + +Pros: +* Reviewers thought the underlying model is interesting and intuitive +* Main contributions are clear + +Cons: +* There is confusion between keywords and topics, which is leading to a somewhat confused explanation and lack of clear comparison with previous work. Because of this, it is hard to tell whether the proposed approach is clearly better than the state of the art. +* Typos and grammatical errors are numerous + +As the authors noted, the concerns about the small dataset are not necessarily warranted, but I would encourage the authors to measure the statistical significance of differences in results, which would help alleviate these concerns. + +An additional comment: it might be worth noting the connections to query-based or aspect-based summarization, which also have a similar goal of performing generation based on specific aspects of the content. + +Overall, the quality of the paper as-is seems to be somewhat below the standards of ICLR (although perhaps on the borderline), but the idea itself is novel and results are good. I am not recommending it for acceptance to the main conference, but it may be an appropriate contribution for the workshop track.",ICLR2018, +YGt9E1n9u,1576800000000.0,1576800000000.0,1,ByxCrerKvS,ByxCrerKvS,Paper Decision,Reject,"The paper investigates a new approach to classification of irregularly sampled and unaligned multi-modal time series via set function mapping. Experiment results on health care datasets are reported to demonstrate the effectiveness of the proposed approach. + +The idea of extending set functions to address missing value in time series is interesting and novel. The paper does a good job at motivating the methods and describing the proposed solution. The authors did a good job at addressing the concerns of the reviewers. + +During the discussion, some reviewers are still concerned about the empirical results, which do not match well with published results (even though the authors provided an explanation for it). In addition, the proposed method is only tested on the health care datasets, but the improvement is limited. Therefore it would be worthwhile investigating other time series datasets, and most important answering the important question in terms of what datasets/applications the proposed method works well. + +The paper is one step away for being a strong publication. We hope the reviews can help improve the paper for a strong publication in the future. ",ICLR2020, +wKAZxL2vvxp,1610040000000.0,1610470000000.0,1,mLeIhe67Li6,mLeIhe67Li6,Final Decision,Reject,"This paper gives a way to learn one-hidden-layer neural networks on when the input comes from Gaussian mixture model. The main algorithm uses [Janzamin et al. 2014] as an initialization and then performs gradient descent. The main contribution of this paper is 1. to give a characterization of sample complexity for estimating the moment tensors when the input distribution comes from a mixture of Gaussian; 2. to give a local convergence result when the samples come from a mixture of Gaussian. The paper claims certain behavior in the input data would make the problem harder and slow down the convergence, although the claim is based on an upperbound and would be stronger if there is some corresponding lowerbound.",ICLR2021, +vjdF-_3dpfk,1642700000000.0,1642700000000.0,1,cBu4ElJfneV,cBu4ElJfneV,Paper Decision,Accept (Poster),"Three experts reviewed the paper. Two reviewers recommended acceptance as they liked that the work identified a legacy design in object detection networks and resolved it by a new module. All reviewers found the empirical results strong. Reviewer MDN5 recommended rejection main for the concern that this newly designed module is a standard exploitation of network architectures. AC sided with the positive reviewers because of the paper's identification of a legacy design in object detection and the strong experimental results. Hence, the decision is to recommend the paper for acceptance. The reviewers did raise some valuable concerns that should be addressed in the final camera-ready version of the paper. The authors are encouraged to make the necessary changes to the best of their ability. We congratulate the authors on the acceptance of their paper!",ICLR2022, +Z1D2IIuo5ab,1610040000000.0,1610470000000.0,1,5NA1PinlGFu,5NA1PinlGFu,Final Decision,Accept (Poster),"The paper initially received a mixed rating, with two reviewers rate the paper below the bar and two above the bar. The raised concerns include the need for an autoregressive model for upsampling and the effect of batch sizes. These concerns were well-addressed in the rebuttal. Both of the reviewers that originally rated the paper below the bar raise the scores. After consulting the paper, the reviews, and the rebuttal, the AC agrees that the paper has its merits and is happy to accept the paper. + + +",ICLR2021, +ByxQTPVxeV,1544730000000.0,1545350000000.0,1,BJGjOi09t7,BJGjOi09t7,Somewhat incremental and Missing NMF baselines,Reject,"The paper introduces a variant of the variational autoencoder (VAE) for probabilistic non-negative matrix factorization. The main idea is to use a Weibull distribution in the latent space. There is agreement among the reviewers that the paper is technically sound and well written, but that it lacks in motivation and demonstration of utility of the proposed method. +All the reviewers think the approach is not particularly novel and somewhat incremental. The main issue is that the empirical evaluation of the algorithm is also quite limited. Specifically, it should have been compared with Bayesian NMF. Many papers have addressed Bayesian NMF with variational inference (Cemgil; Fevotte & Dikmen; Hoffman, Blei & Cook) like in VAE. Experimentally, Bayesian NMF and the proposed PAE-NMF could easily be quantitatively compared on matrix completion tasks. Overall, there was consensus among the reviewers that the paper is not ready for publication. +",ICLR2019,4: The area chair is confident but not absolutely certain +HJnisMIdg,1486400000000.0,1486400000000.0,1,HyAbMKwxe,HyAbMKwxe,ICLR committee final decision,Accept (Poster),"This is a well written paper that proposes the adaptation of the loss function for training during optimization, based on a simple and effective tighter bound on classification error. The paper could be improved in terms of a) review of related works; b) more convincing experiments.",ICLR2017, +ByCYByarM,1517250000000.0,1517260000000.0,619,rkmtTJZCb,rkmtTJZCb,ICLR 2018 Conference Acceptance Decision,Reject,"The paper presents a method for forward prediction in videos. The paper insufficiently motivates the proposed method and presents very limited empirical evaluations (no ablation studies, etc.) to backup its claims. This makes it difficult for the reader to put the work into the context of the broader research around learning from unsupervised video data; leading reviewers to complete about perceived lack of novelty and clarity.",ICLR2018, +Db43sfwc-kf,1642700000000.0,1642700000000.0,1,pFyXqxChZc,pFyXqxChZc,Paper Decision,Accept (Spotlight),This paper proposes a theoretically sound and practically effective method to compress quantized gradients and reduce communication in distributed optimization. The method is interesting and worth publication.,ICLR2022, +_Rum0GVya6c,1642700000000.0,1642700000000.0,1,Bl8CQrx2Up4,Bl8CQrx2Up4,Paper Decision,Accept (Poster),"This paper introduces a new linear attention mechanism for transformer based models. This is accomplished by replacing the softmax in the standard transformer self-attention with a cosine-based re-weighting mechanism. The empirical results are good, and cosFormer generally outperforms existing efficient transformers for autoregressive language modeling, fine-tuning, and on the long range arena. + +The reviewers were generally positive regarding the paper, with all reviewers voting to accept. The discussion period focused on particular choices regarding the ReLU activation function vs. other non-negative activation functions, further motivating the cosine operation, and comparing the speed of cosFormer vs. other efficient transformers. The authors responded by providing additional ablations to empirically validate the choice of ReLU, motivated the cosine operation by noting that it introduces a locality bias, and further described the computation requirements of their transformer vs. prior work. + +Overall, this is an interesting addition to the linear / efficient transformer literature, with solid empirical results supporting the various design decisions.",ICLR2022, +HklJRXtQeE,1544950000000.0,1545350000000.0,1,SkzK4iC5Ym,SkzK4iC5Ym,A sound idea but no sufficient evidence of its effectiveness,Reject,"The paper introduces a modification of batch normalization technique. In contrast to the original batch normalization that normalizes minibatch examples using their mean and standard deviation, this modification uses weighted average +of mean and standard deviation from the current and all previous minibatches. The authors then provide some theoretical justification for the superiority of their variant of BatchNorm. + +Unfortunately, the empirical demonstration of the improved performance seems not sufficient and thus fairly unconvincing. ",ICLR2019,4: The area chair is confident but not absolutely certain +HkcuEJpHM,1517250000000.0,1517260000000.0,388,S1pWFzbAW,S1pWFzbAW,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"Pros: +-- Use of Bloomier filters for lossy compression of nets is novel and well motivated, with interesting compression performance. +Cons: +-- Does lossy compression for transmission, doesn’t address FLOPS required for runtime execution. A lot of times, client devices do not have enough cpu to run large networks (title should be udpated to mean compression and transmission) +-- Missing results for full network, larger deeper network. + +Overall, the content is novel and interesting, so I would encourage the authors to submit to the workshop track. +",ICLR2018, +S1Ll7yaHM,1517250000000.0,1517260000000.0,61,B1EA-M-0Z,B1EA-M-0Z,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper presents several theoretical results linking deep, wide neural networks to GPs. It even includes illuminating experiments. + +Many of the results were already developed in earlier works. However, many at ICLR may be unaware of these links, and we hope this paper will contribute to the discussion. +",ICLR2018, +QQH5MTzJExo,1610040000000.0,1610470000000.0,1,kqDCPX7eWS,kqDCPX7eWS,Final Decision,Reject,"I think this paper has more positives than the reviews might indicate. And I do not share all the reviewers' concerns about the content of the paper. I think that there are a few concerns, though, that still suggest this paper should not be accepted as it is, when taken in conjunction with the concerns brought up by the reviewers. + +On the positive side: + + - Asynchronous methods often give significant improvements, and the throughput benefits can even be seen here in Table 6. + - The experiments are detailed with a lot of results comparing against many alternatives, including for ImageNet. + +On the negative side: + + - The biggest concern I have with this paper is the scale of the experiments. This is supposedly about ""distributed"" SGD, but the largest-scale experiment was run on only two workstations, and many experiments were run in the S1 and S2 settings which don't seem to be distributed at all (run on a single workstation). That is, there's a mismatch between the scale at which these experiments were run and the scale at which people want to run distributed deep learning. + - The theory seems to be only an incremental change to the standard local-SGD theory. The paper says ""At a naive first glance, studying the convergence properties of locally asynchronous SGD would be an incremental to existing analyses for local SGD"" but then it does not satisfactorily explain _why_ the approach is _not_ incremental. Not enough is done in the paper to explain why the analysis is not just a trivial combination of the local SGD with the standard approach to make an algorithmic analysis asynchronous. (Or, if the theoretical result _is_ incremental, the paper should make less of a big deal out of it.) + - The description of the algorithm in Section 1.2 is confusing. I think it would benefit from being more concrete. + +The paper should also compare against the paper ""Asynchronous Decentralized Parallel Stochastic Gradient Descent"" (Lian et al, 2017). It is actually not clear to me whether the method proposed here is a subset (or superset) of the method described in that paper, but they seem _very_ similar.",ICLR2021, +L0UmfQZMHZ2,1610040000000.0,1610470000000.0,1,ee6W5UgQLa,ee6W5UgQLa,Final Decision,Accept (Poster),"The paper presents a new dataset for multimodal QA that is deemed interesting, relevant and well executed by all reviewers. Multimodality in NLP (QA included) is an increasingly important topic and this paper provides a potentially impactful benchmark for research in it. All reviewers acknowledge that. + +We hence recommend to accept this paper as a poster. We recommend the authors to further improve the draft before camera ready by using the recommendations made by the reviewers with a particular focus on an extended discussion wrt prior work on VQA and other. The paper should also add more precisions on the license(s) related to the images used in the dataset. ",ICLR2021, +rJDGVJTrf,1517250000000.0,1517260000000.0,307,ByOnmlWC-,ByOnmlWC-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"At least two of the reviewers found the proposed approach novel and interesting and worthy of publication at ICLR. The reviewers raised concerns regarding the paper's terminology, which may lead to some misunderstanding. I agree that upon a quick skim, a reader may think that the paper performs the crossover operation outlined at the bottom right of Figure 1. Please consider improving the figure and the caption to prevent such a misunderstanding. You can even slightly change the title to reflect the policy distillation operation rather than naive crossover. Finally, including some more complex baselines benefits the paper. I am curious whether performing policy gradient on an ensemble of 8 policies + periodic removal of the bottom half of the policies will provide similar gains.",ICLR2018, +S1gh5y17gN,1544900000000.0,1545350000000.0,1,rkxt8oC9FQ,rkxt8oC9FQ,"A well composed, novel contribution to counterfactual inference with neural nets but lingering questions remain about empirical significance.",Reject,"The reviewers found the paper to be well written, the work novel and they appreciated the breadth of the empirical evaluation. However, they did not seem entirely convinced that the improvements over the baseline are statistically significant. Reviewer 1 has lingering concerns about the experimental conditions and whether propensity-score matching within a minibatch would provide a substantial improvement over propensity-score matching across the dataset. Overall the reviewers found this to be a good paper and noted that the discussion was illuminating and demonstrated the merits of this work and interest to the community. However, no reviewers were prepared to champion the paper and thus it falls just below borderline for acceptance.",ICLR2019,5: The area chair is absolutely certain +xFMIk0l03vm,1610040000000.0,1610470000000.0,1,INXUNEmgbnx,INXUNEmgbnx,Final Decision,Reject,"Summary: The authors propose a method for representing a posterior +over discrete latent variables in representation learning problems +using a neural network. Two applications are discussed: One are certain +clustering problems, in which clusters are sufficiently +separated. Another is the computation of mutual information of +discrete random variables. This is applied to learning image representations. + +Discussion: The authors have not provided a response to the reviews. + +Recommendation: Four detailed reviews unanimously recommend rejection. Main +points of criticism are lack of novelty, limited and unconvincing +experimental evaluation, and a poor presentation that also lacks +technical detail. This work is clearly not ready for publication. +",ICLR2021, +A7IESbVAGCi,1642700000000.0,1642700000000.0,1,W3-hiLnUYl,W3-hiLnUYl,Paper Decision,Reject,"The paper performs an empirical evaluation of deterministic methods for the quantification of epistemic uncertainty. There is no new algorithm. The main contribution is the empirical evaluation. This empirical evaluation will be useful for the community. It is an independent evaluation that casts some doubts on the calibration of several existing deterministic techniques, which will spur additional research. However, the paper is not well written. As pointed out by the reviewers, the paper does not provide much background. It refers to many concepts without defining them. The concepts are not new (references are provided for each concept), but since the paper does not describe any new technique it should do a good job at explaining those concepts. The authors added some explanations in the supplementary material, but some of those explanations should really be in the main paper. The most important issue with the paper is that it does not explain why the deterministic techniques do not seem to be well calibrated. The authors added a ""theoretical justification"" in section 6.1, but it amounts to saying that deterministic methods make a point estimate, which is too general to explain much. An important factor for proper generalization and calibration is the inductive bias of the model. At the end of the day, if we generate data from a model, then that model will be better calibrated than the other models. So a discussion of the inductive bias of each model and how this inductive bias relates to the properties of each dataset would have been much more insightful.",ICLR2022, +BJxmhd7kgE,1544660000000.0,1545350000000.0,1,B1l6e3RcF7,B1l6e3RcF7,Metareview,Reject,The reviewers agree that the paper needs significantly more work to improve presentation and is not fully empirically and conceptually convincing.,ICLR2019,5: The area chair is absolutely certain +Ki4tlY21qy,1576800000000.0,1576800000000.0,1,SylkzaEYPS,SylkzaEYPS,Paper Decision,Reject,"This paper presents an encoder-decoder based architecture to generate summaries. The real contribution of the paper is to use a recoder matrix which takes the output from an existing encoder-decoder network and tries to generate the reference summary again. The output here is basically the softmax layer produced by the first encoder-decoder network which then goes through a feed-forward layer before being fed as embeddings into the recoder. So, since there is no discretization, the whole model can be trained jointly. (the original loss of the first encoder-decoder model is used as well anyway). + +I agree with the reviewers here, that this whole model can in fact be viewed as a large encoder-decoder model, its not really clear where the improvements come from. Can you just increase the number of parameters of the original encoder-decoder model and see if it performs as good as the encoder-decoder + recoder? The paper also does not achieve SOTA on the task as there are other RL based papers which have been shown to perform better, so the choice of the recorder model is also not empirically justified. I recommend rejection of the paper in its current form.",ICLR2020, +vTKgtbkuLbb,1610040000000.0,1610470000000.0,1,Atpv9GUhRt6,Atpv9GUhRt6,Final Decision,Reject,"This paper received mixed reviews. One reviewer is positive, while the remaining three reviewers are either negative or feel that the paper is below the threshold for acceptance. + +The ideas presented in the paper are interesting and novel - this was acknowledged by three of the reviewers, even those who did not recommend acceptance. The AC also recognizes the technical novelty presented. However, as all the reviewers pointed out to varying degrees, the experimentation is problematic and the AC in agreement with this. In particular, the heavy focus on improvement on top of SLIC makes it the applicability of the proposed approach highly limited and also not so convincing. + +Recommendation for the paper is to reject and resubmit with improved experimentation.",ICLR2021, +LpfxO0ZhHU,1642700000000.0,1642700000000.0,1,EQmAP4F859,EQmAP4F859,Paper Decision,Accept (Poster),"*Summary:* Study gradient flow dynamics of empirical and population square risk in kernel learning. + +*Strengths:* +- Empirical results studying several cases in MSE curves. +- Explaining / solving certain phenomena in DL using kernels. + +*Weaknesses:* +- More motivations would be appreciated. +- Technical innovation not so high. + +*Discussion:* + +Ud7D found that the main strength of this paper is the take-home message rather than innovations. They concluded 7 might be appropriate for the evaluation. This opinion was seconded by WyHh who considered 7 the most appropriate rating. 5uQz also found that 7 would be the most appropriate rating. qXRH maintained concerns about the novelty of the work and rating 5. Nonetheless, they agreed the study is valuable and would not oppose acceptance. + +*Conclusion:* + +Three reviewers found this paper is definitely above the acceptance threshold (suggesting rating 7) and one more reviewer found it marginally below the acceptance threshold however not opposing acceptance. I found the general impressions from the discussion well described in a comment from Ud7D, who indicates that although this is not a breakthrough paper, it is a nice paper showing that a lot of DL phenomena are can be explained by Kernels. I conclude that the paper makes a sufficiently valuable contribution and hence I am recommending accept. I suggest the authors take the reviewers’ comments carefully into account when preparing the final version of the manuscript.",ICLR2022, +IscnxdJ-Ahn,1642700000000.0,1642700000000.0,1,rzvOQrnclO0,rzvOQrnclO0,Paper Decision,Accept (Poster),"The paper considers model-based RL, and focuses on approaches that benefit from the differentiability of the model in order to compute the policy gradient. It theoretically shows that the error in the gradient of the model w.r.t. its input appears in an upper bound of the error in the policy gradient computing using the learned model. Motivated by this, it suggests a MBRL approach that learns two models, one of them minimizes the next-state prediction error (as commonly done) and the other minimizes a combination of prediction error and the gradient error. +The paper empirically studies the method through extensive experiments. + +Reviewers are generally positive about this work. They believe that the paper is insightful and the method is original. At first, there were some important concerns raised by the reviewers, but the authors revised their paper in the discussion period, and it appears that the reviewers are all satisfied now. I also read the paper during the rebuttal phase, and I should say that I have some concerns myself, especially on the theory part of the paper. Given that the authors did not have an opportunity to answer my questions, I do not put much weight on my concerns (and I believe most of them can be addressed with some clarifications). Considering the positive response of reviewers and promising results, I am going to recommend **acceptance** of this paper. + +I strongly encourage the authors to consider the comments by reviewers, as well as the following ones, in the revision of their paper. + + +**Comments** + +1) The true dynamics $f$ is defined as a stochastic one, i.e., $s_{t+1} = f(s_t, a_t, \epsilon_t)$ (just before Eq. 1), and similarly for the learned model. Here $\epsilon_t$ is the noise causing the stochasticity of the model. But later, when the errors on the model and its gradient are introduced (i.e., $\epsilon_f$ and $\epsilon_f^g$), the role of stochasticity becomes unclear. +For example, we have +$\|| \tilde{f}(s,a) - f(s,a) \|| \leq \epsilon_f$. + +What happened to the noise term? + +The same is true for Eq. (5). The next-state s' (either according to the true dynamics or the learned model) is random. In that case, it is not obvious how to interpret Eq. (5). Is it the error of the expected gradient of the next state? Or is it something else? + +In case the dynamics is assumed to be deterministic, this should be clarified early in the paper. + +2) The upper bound in Theorem 1 might be vacuous if the Lipschitz constant $L_f$ of the model is larger than 1. +To see this, consider Lemma 1. The constant $C_0$ is $\min [D/\epsilon_f, (1-L_f^{t+1})/(1 - L_f)]$. +If $L_f$ is larger than 1, for large enough t, the term $(1-L_f^{t+1})/(1 - L_f)$ blows up and $C_0$ becomes $D/\epsilon_f$. Therefore, the upper bound of Lemma 1 becomes $D$. Here $D$ is the diameter of the state space, which is assumed to be bounded. + +This carries to in the next lemmas. In Lemma 4, $C_5$ would be of the same order as $C_0$ (multiplied by an extra $L_1 L_f / (1 - \gamma) )$, so the upper bound of this lemma becomes proportional to D too. + +The $C_0$'s appearance continues in the proof of Theorem 1, in which $C_8$ is proportional to $C_0$ and $C_5$. So, $C_8$ is also become proportional to $D/\epsilon_f$. When we have $C_8 \epsilon_f$ in Eq. (34), we get a constant term $D$. +A similar dependence appears in the proof of Theorem 2, where B_3 is proportional to $C_8 \epsilon_f$, which can be as large as $D$. And in Eq. (47), we have $B_3^2$. So the upper bound in Eq. (47), which seems to the be upper bound of Theorem 2, is proportional to $D^2$. This means that if $L_f$ is larger than one, the upper bound does not go to zero, no matter how small the model error $\epsilon_f$ is (unless it is actually zero). This makes the bound meaningless. + +This might be unavoidable. I am not sure about it at the moment. But it definitely requires a discussion. + +3) Assumption 2 has a term in the form of $E[\frac{s_{t_2}}{ s_{t_1}} ]$ (I have simplified the form). The states $s_{t_2}$ and $s_{t_1}$ are vectors in general. How is the division defined here? + +4) Please improve the clarify of the proofs. For example, in Lemma 2 it seems that a negative sign is missing in Eq. (49). Also how do we get Eq. (50) and Eq. (52)? (I couldn't easily verify them). + +5) I believe the ""periodicity property"" used in Assumption 1 should be ""ergodicity property"". + +6) The paper still has a lot of typos, e.g., ""To optimize the objective, One can ..."" (P3), ""argument data"" (instead of augmented) (p4), ""Superpose"" (p5), ""funcrion"" (p6).",ICLR2022, +8Q2T8yT1vr,1576800000000.0,1576800000000.0,1,SJl47yBYPS,SJl47yBYPS,Paper Decision,Reject,"The paper studies the role of entropy in maximum entropy RL, particularly in soft actor-critic, and proposes an action normalization scheme that leads to a new algorithm, called Streamlined Off-Policy (SOP), that does not maximize entropy, but retains or exceeds the performance of SAC. Independently from SOP, the paper also introduces Emphasizing Recent Experience (ERE) that samples minibatches from the replay buffer by prioritizing the most recent samples. After rounds of discussion and a revised version with added experiments, the reviewers viewed ERE as the main contribution, while had doubts regarding the claimed benefits of SOP. However, the paper is currently structured around SOP, and the effectiveness of ERE, which can be applied to any off-policy algorithm, is not properly studied. Therefore, I recommend rejection, but encourage the authors to revisit the work with an emphasis on ERE.",ICLR2020, +fUcThWQGyg,1576800000000.0,1576800000000.0,1,BJlPOlBKDB,BJlPOlBKDB,Paper Decision,Reject,"The author responses and notes to the AC are acknowledged. A fourth review was requested because this seemed like a tricky paper to review, given both the technical contribution and the application area. Overall, the reviewers were all in agreement in terms of score that the paper was just below borderline for acceptance. They found that the methodology seemed sensible and the application potentially impactful. However, a common thread was that the paper was hard to follow for non-experts on MRI and the reviewers weren't entirely convinced by the experiments (asking for additional experiments and comparison to Zhang et al.). The authors comment on the challenge of implementing Zhang is acknowledged and it's unfortunate that cluster issues prevented additional experimental results. While ICLR certainly accepts application papers and particularly ones with interesting technical contribution in machine learning, given that the reviewers struggled to follow the paper through the application specific language it does seem like this isn't the right venue for the paper as written. Thus the recommendation is to reject. Perhaps a more application specific venue would be a better fit for this work. Otherwise, making the paper more accessible to the ML audience and providing experiments to justify the methodology beyond the application would make the paper much stronger.",ICLR2020, +j_QtYqm_kk5,1610040000000.0,1610470000000.0,1,R4aWTjmrEKM,R4aWTjmrEKM,Final Decision,Accept (Spotlight),"This paper proposes a method to improve the convergence time of PSRO. The paper was well received by all reviewers and is likely to be of interest to a similar sub-community within ICLR, but may be of less relevance to the wider community not focused on multi-agent learning. + +A number of issues were raised by reviewers regarding the clarity of the originally submitted version of the paper. I encourage the authors to consider all constructive feedback given and revise the paper to maximise its impact. This will be of particular help in reaching a wider audience than those with pre-existing experience with the methods this work builds on.",ICLR2021, +LyX4pb3lC,1576800000000.0,1576800000000.0,1,rkeYvaNKPr,rkeYvaNKPr,Paper Decision,Reject,"The paper considers a special case of decision making processes with +non-Markovian reward functions, where conditioned on an unobserved task-label +the reward function becomes Markovian. +A semi-supervised loss for learning trajectory embeddings is proposed. +The approach is tested on a multi-task grid-world environment and ablation +studies are performed. + +The reviewers mainly criticize the experiments in the paper. The environments +studied are quite simple, leaving it uncertain if the approach still works in +more complex settings. +Apart from ablation results, no baselines were presented although the setting is +similar to continual learning / multi-task learning (with unobserved task label) +where prior work does exist. +Furthermore, the writing was found to be partially lacking in clarity, although +the authors addressed this in the rebuttal. + +The paper is somewhat below acceptance threshold, judging from reviews and my own +reading, mostly due to lack of convincing experiments. Furthermore, the general setting +considered in this paper seems quite specific, and therefore of limited impact.",ICLR2020, +jc5U-H8IJxb,1610040000000.0,1610470000000.0,1,XJk19XzGq2J,XJk19XzGq2J,Final Decision,Accept (Spotlight),There was a consensus among reviewers that this paper should be accepted as the authors addressed reviewers' concerns in the discussion phase. This paper is well-written and easy to read. It provides a coherent story and investigation on two important hypotheses: that natural images have a lower intrinsic dimension than the extrinsic dimension (e.g. the number of pixels) and that a lower intrinsic dimension lowers the sample complexity of learning. These results appear to be novel and significant for the ICLR community as it provides justifications for numerous work on understanding and designing convolutional neural networks based on low-dimensional assumptions.,ICLR2021, +JeYavP_hKk,1642700000000.0,1642700000000.0,1,qynB_fAt5TQ,qynB_fAt5TQ,Paper Decision,Reject,"The submission aims to improve the quality of the bootstrap when the number of samples is small. It does so by gradient descent on the to approximate the ideal bootstrap in Wasserstein distance. The submission combines a nice set of methodologies, and aims to address an interesting statistical problem in a principled way. The reviewers were unanimous in their opinion that the submission falls below the threshold for acceptance to ICLR. It was revealed in post rebuttal discussion with reviewer y4AP that they wish to retain a reject recommendation due to a lack of clarity in the methodology even after author comments. The review details specific issues that can eventually be clarified in a revision for submission to another venue.",ICLR2022, +urEAkCWvbu4,1610040000000.0,1610470000000.0,1,ce6CFXBh30h,ce6CFXBh30h,Final Decision,Accept (Poster),"This work proposes the Federated Matching algorithm as a novel method to tackle the problems in federated learning. The paper is well-written and original, and it contributes to the state-of-the-art. ",ICLR2021, +gBsDznubBTN,1610040000000.0,1610470000000.0,1,WweBNiwWkZh,WweBNiwWkZh,Final Decision,Reject,"Three of the four reviewers recommend rejection; one additional reviewer considers the paper to be marginally above threshold for acceptance but is very uncertain and this is taken into account. The AC is in consensus with the first three reviewers that this paper is not ready yet for publication. + +There is concern from the reviewers that ICLR is not the right venue for this submission. The author response in ""Submission Update"" does not clarify this concern. Training a neural network to solve the problem does not automatically mean that ICLR or other ML conferences are necessarily the right venue. Regardless, due to the many other raised concerns e.g. limited experimental results and comparisons as well as clarity, the AC recommends rejection for this paper and resubmission at a more appropriate venue. ",ICLR2021, +B98rTfvafga,1642700000000.0,1642700000000.0,1,DYypjaRdph2,DYypjaRdph2,Paper Decision,Accept (Poster),"All three reviewers suggest acceptance of the paper. The authors study an interesting problem (understanding non-stationary and reactionary policies) and propose a solution to the problem which compares favorably to baselines in experiments. However, some of the reviewers also criticize unclarities in the presentation of the paper and the made assumptions. The authors clarified those points quite well in their rebuttal. Further concerns regarded design decisions and the comparison to failure cases of baselines. The authors addressed those in their rebuttal and promised to include corresponding material in their updated paper. Hence I am suggesting acceptance of the paper. Nevertheless, I would like to urge the authors to carefuly revise their problem presentation in the paper in order to improve clarity and add the promised additional insights to the final version of the paper.",ICLR2022, +ZQkRCxJ3HQ,1576800000000.0,1576800000000.0,1,Bkel1krKPS,Bkel1krKPS,Paper Decision,Reject,"This work proposes a new architecture for abstract visual reasoning called ""Attention Relation Network"" (ARNe), based on Transformer-style soft attention and relation networks, which the authors show to improve on the ""Wild Relation Network"" (WReN). The authors test their network on the PGM dataset, and demonstrate a non-trivial improvement over previously reported baselines. + +The paper is well written and makes an interesting contribution, but the reviewers expressed some criticisms, including technical novelty, unfinished experiments (and lack of experimental details), and somewhat weak experimental results, which suggest that the proposed ARNe model does not work well when training with weaker supervision without meta-targets. Even though the authors addressed some concerns in their revised version (namely, they added new experiments in the extrapolation split of PGM and experiments on the new RAVEN dataset), I feel the paper is not yet ready for publication at ICLR. +",ICLR2020, +k1YhW4b8EaR,1610040000000.0,1610470000000.0,1,3jjmdp7Hha,3jjmdp7Hha,Final Decision,Accept (Poster),"This paper proposes a meta-learning-based technique to learn how to back-translate (generate a synthetic source-language translation of an observed target-language sentence) for the purpose of better optimising a source-to-target translation model. + +The approach is an interesting novel angle to jointly training the translation model and the back-translation component. Compared to techniques like UNMT and DualNMT, the approach offers reduced training time and a simpler formulation with fewer trainable components (and fewer hyperparameters). + +During the discussion phase the authors provided additional insight, clarifications, and results that improved our perception of the paper. I would personally appreciate if the authors would update their paper with the clarifications they made to points raised by R2, R3, and R4, especially on the details about meta-validation, the discussion about memory footprint, and the additional results on UNMT (and variants). ",ICLR2021, +Bkgz51SWxN,1544800000000.0,1545350000000.0,1,BkxSHsC5FQ,BkxSHsC5FQ,meta-review,Reject,"The authors propose using a SVM, trained as a last layer of a neural network, to identify exemplars (support vectors) to save and use to prevent forgetting as the model is trained on further tasks. The method is effective on several supervised benchmarks and is compared to several other methods, including VCL, iCARL, and GEM. The reviewers had various objections to the initial paper that centered around comparisons to other methods and reporting of detailed performance numbers, which the authors resolved convincingly in their revised paper. However, the AC and 2 of the reviewers were unconvinced of the contribution of the approach. Although no one has used this particular strategy, of using support vectors to prevent forgetting, the approach is a simplistic composition of the NN and the SVM which is heuristic, at least in how the authors present it. Most importantly, the approach is limited to supervised classification problems, yet catastrophic forgetting is not commonly considered to be a problem for the supervised classifier setting; rather it is a problem for inherently sequential learning environments such as RL (MNIST and CIFAR are just commonly used in the literature for ease of evaluation).",ICLR2019,5: The area chair is absolutely certain +IxMzboLE0um,1610040000000.0,1610470000000.0,1,uV7hcsjqM-,uV7hcsjqM-,Final Decision,Reject,"This is a nice paper using contrastive learning for code representation. The idea is to generate variations on unlabeled source code (using domain knowledge) by creating equivalent version of code. Improvements over baselines on two multiple tasks are shown. While some of the reviewers liked the (and R4 should have responded), none of the reviewers found the paper exciting enough to strongly recommend its acceptance. ",ICLR2021, +ryllD7fgxV,1544720000000.0,1545350000000.0,1,B1gJOoRcYQ,B1gJOoRcYQ,good analysis of the proposed method; lacks novelty,Reject,"1. Describe the strengths of the paper. As pointed out by the reviewers and based on your expert opinion. + +The paper +- tackles an interesting problem +- makes a concerted effort to provide qualititative results that give insight into the models behaviour. +- sufficiently cites related work. + +2. Describe the weaknesses of the paper. As pointed out by the reviewers and based on your expert opinion. Be sure to indicate which weaknesses are seen as salient for the decision (i.e., potential critical flaws), as opposed to weaknesses that the authors can likely fix in a revision. + +- The model architecture lacks novelty. +- There was also agreement that the contributions - (i) minor modifications of existing sequential attention-based models, and (ii) application to the RL domain - are minor. +- A lot of space in the paper (section 4.2) is devoted to exploring the use of this model for image classification and video action recognition. However the proposed model performed poorly compared to SOTA methods for this task and no motivation was given for why the proposed model would be useful for such tasks. + +All three points impacted the final decision. + +3. Discuss any major points of contention. As raised by the authors or reviewers in the discussion, and how these might have influenced the decision. If the authors provide a rebuttal to a potential reviewer concern, it’s a good idea to acknowledge this and note whether it influenced the final decision or not. This makes sure that author responses are addressed adequately. + +There was high agreement between the reviewers on the main drawbacks of the paper, before and after the rebuttal. +The AC considered the rebuttals by the authors (in which they argued that there was sufficient contribution) but, in the end, agreed with the reviewers' assessments. + +4. If consensus was reached, say so. Otherwise, explain what the source of reviewer disagreement was and why the decision on the paper aligns with one set of reviewers or another. + +The reviewers reached a consensus that the paper should be rejected. +",ICLR2019,4: The area chair is confident but not absolutely certain +SJiZnfI_e,1486400000000.0,1486400000000.0,1,B1YfAfcgl,B1YfAfcgl,ICLR committee final decision,Accept (Poster),"This paper presents both an analysis of neural net optimization landscapes, and an optimization algorithm that encourages movement in directions of high entropy. The motivation is based on intuitions from physics. + + Pros + - the main idea is well-motivated, from a non-standard perspective. + - There are lots of side-experiments supporting the claims for the motivation. + Cons + - The propose method is very complicated, and it sounds like good performance depended on adding and annealing yet another hyperparameter, referred to as 'scoping'. + - The motivating intuition has been around for a long time in different forms. In particular, the proposed method is very closely related to stochastic variational inference, or MCMC methods. Appendix C makes it clear that the two methods aren't identical, but I wish the authors had simply run SVI with their proposed modification, instead of appearing to re-invent the idea of maximizing local volume from scratch. The intuition that good generalization comes from regions of high volume is also exactly what Bayes rule says. + + In summary, while there is improvement for the paper, the idea is well-motivated and the experimental results are sound.",ICLR2017, +7wVH9rFuno,1610040000000.0,1610470000000.0,1,vujTf_I8Kmc,vujTf_I8Kmc,Final Decision,Accept (Poster),"This paper proposes a meta-learning method that learns structured features based on constellation modules. Exploiting object parts and their relationships is a promising direction for few-shot learning as AnonReviewer3 described. The effectiveness of the proposed method is demonstrated with experiments using standard benchmark, and ablation study. ",ICLR2021, +rkg1p7Ngx4,1544730000000.0,1545350000000.0,1,r1luCsCqFm,r1luCsCqFm,Meta-review,Reject,"This paper attempts to address a problem they dub ""inverse"" covariate shift where an improperly trained output layer can hamper learning. The idea is to use a form of curriculum learning. The reviewers found that the notion of inverse covariate shift was not formally or empirically well defined. Furthermore the baselines used were too weak: the authors should consider comparing against state-of-the-art curriculum learning methods.",ICLR2019,5: The area chair is absolutely certain +JU8_2fXEJ4,1642700000000.0,1642700000000.0,1,iMH1e5k7n3L,iMH1e5k7n3L,Paper Decision,Accept (Spotlight),"The authors propose a rank coding scheme for recurrent neural networks (RNNs) - inspired by spiking neural networks - in order to improve inference times at the classification of sequential data. The basic idea is to train the RNN to classify the sequence early - even before the full sequence has been observed. They also introduce a regularisation term that allows for a speed-accuracy trade-off. + +The method is tested on two toy-tasks as well as on temporal MNIST and Google Speech Commands. + +The results are very good, typically improving inference time with very little loss in accuracy. + +Furthermore, the idea seems novel and the paper is well written. + +An initial criticism was that experiments with spiking neural networks (SNNs) were missing. The authors added a proof of concept for SNNs, which satisfied the reviewer. + +The authors also added some control experiments in response to the initial reviews, which improved the manuscript. + +In summary, the manuscript presents a valuable novel idea with good experimental verification and interesting aspects both for ANNs and SNNs. The reviewers consistently vote for acceptance.",ICLR2022, +RfzUmNKyWKr,1642700000000.0,1642700000000.0,1,1zwleytEpYx,1zwleytEpYx,Paper Decision,Accept (Poster),"The paper provides theoretical bounds for imitation learning with rewards (algorithm from Wang et al. (2019)). The bounds/proofs are highly novel and a very interesting contribution to the community, even though they are a lot more conservative than what is observed in practice. All reviewers agree on this point. +It is laudable that the authors also additionally provide an experimental evaluation. After the revision and the discussion, quite a few of the reviewers are still not 100% convinced about them, on the one hand as they would have liked to see more tasks, and on the other hand due to concerns about the reward relaxation (i.e., doesn't match the assumptions in the theorems any longer) which is required for experiments on standard benchmarks. +In the final answer the authors provide evidence that there is no big discrepancy, which is good enough (given that there don't seem to be any alternatives to get around this issue, except removing the experimental section altogether, which would be undesirable). Please clearly point out those limitations of the experiments in the paper and also incorporate this evidence.",ICLR2022, +rkxMyIoex4,1544760000000.0,1545350000000.0,1,Ske7ToC5Km,Ske7ToC5Km,"Good paper, but there are some issues with the theory (either correctness or clarity) that need to be resolved.",Reject,"This was an extremely difficult case. There are many positive aspects of Graph2Seq, as detailed by all of the reviewers, however two of the reviewers have issue with the current theory, specifically the definition of k-local-gather and its relation to existing models. The authors and reviewers have had a detailed and discussion on the issue, however we do not seem to have come to a resolution. I will not wade into the specifics of the argument, however, ultimately, the onus is on the authors to convince the reviewers of the merits/correctness, and in this case two reviewers had the same issue, and their concerns have not been resolved. The best advice I can give is to consider the discussion so far and why this misunderstanding occurred, so that it might lead the best version of this paper possible.",ICLR2019,3: The area chair is somewhat confident +Gri2RAt5tN,1576800000000.0,1576800000000.0,1,BygpAp4Ywr,BygpAp4Ywr,Paper Decision,Reject,"The paper suggests a new way to defend against adversarial attacks on neural networks. Two of the reviewers were negative, one of them (the most experienced in the subarea) strongly negative. One reviewer is weakly positive. The main two concerns of the reviewers are insufficient comparisons with SOTA and lack of clarity. The authors' response, though detailed, has not convinced the reviewers and has not alleviated their concerns. +",ICLR2020, +PY1eG7FK7yf,1642700000000.0,1642700000000.0,1,_55bCXzj3D9,_55bCXzj3D9,Paper Decision,Reject,"The paper presents an empirical study of different strategies for fine tuning a large language model for the task of generating Java Unit tests *for a specific project*. + +As several reviewers pointed out, the setup itself is fairly impractical, requiring fine-tuning on an individual project, thus making it applicable only to the very tail-end of very large projects where the investment of doing this would make sense and where one could reasonably collect sufficient data for that project. + +On top of that, the paper contributes relatively little in terms of novel techniques. This in itself would be OK if the paper presented some extremely important empirical evidence. However, reviewers also raised some important concerns with the empirical evaluation itself. For example, as reviewer 1jM4 pointed out, there is prior research explicitly showing that the BLEU score is not a good measure for code evaluation. + +Overall, the meta-reviewer agrees with the reviewers that this paper is below the bar for publication.",ICLR2022, +OaIN6147qOa,1642700000000.0,1642700000000.0,1,WxBFVNbDUT6,WxBFVNbDUT6,Paper Decision,Reject,"The paper empirically benchmarks multiple sample selection strategies for offline RL based on the prioritized experience replay framework, including TD errors, N-step return, Generalized SIL, Pseudo-count, Uncertainty, and Likelihood. These are all benchmarked for the base algorithm TD3BC. The experiments study the performance and bootstrapping errors. Among other things, it is shown that non-uniform sampling strategies are also interesting in a batch RL setting. The authors show that non-uniform sampling can be helpful in offline RL compared to uniform sampling but they fail to avoid bootstrap error. They also found that there is no one outperforming metric for prioritized sampling in offline RL settings. + +The reviewers are in agreement that the question studied is a sensible and interesting one - Are PER strategies which are effective in online RL also useful for batch RL? The overall study conducted by the paper is clear and well presented. + +While the study/benchmark and the results presented is clear, the reviewers point out the following shortcomings +1. The study is not comprehensive for this work to become a definitive exploration of this space of ideas. Only algorithm has been tested with these ideas. +2. The results of the study are unfortunately inconclusive - while there are benefits these are achieved via different strategies and as mentioned by the paper no clear conclusions can be drawn. + +Since the paper is targeted purely as a benchmark, the originality aspect of the paper is naturally low. For benchmark papers in that case the impact factor squarely falls on comprehensiveness of the study and the emergence of some clear conclusions to further research in that area. The reviewers unanimously believe the paper falls short in both respects and therefore the decision. + +Hopefully the authors can consider the feedback provided and incorporate it to improve the paper.",ICLR2022, +M7T22qu6yLJ,1642700000000.0,1642700000000.0,1,rN9tjzY9UD,rN9tjzY9UD,Paper Decision,Reject,"The paper considers the important problem of tensor network optimization. Unfortunately the authors did not respond to the reviewers comments. Hence, several concerns remain about the proposed greedy algorithm, including its relationship with prior work and the issue of the ALS method being stuck in local minima for important classes of problems. We strongly encourage the authors to carefully examine the reviewers points and revise their work accordingly.",ICLR2022, +B1MI41aSf,1517250000000.0,1517260000000.0,353,Hkfmn5n6W,Hkfmn5n6W,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"The paper analyzes neural network with hidden layer of piecewise linear units, a single output, and a quadratic loss. The reviewers find the results incremental and not ""surprising"", and also complained about comparison with previous work. I think the topic is very pertinent, and definitely more relevant compared to studying multi-layer linear networks. Hence, I recommend the paper be presented in the workshop track.",ICLR2018, +DpykDwRyuYuD,1642700000000.0,1642700000000.0,1,scSheedMzl,scSheedMzl,Paper Decision,Reject,"The reviewers are largely in agreement that this proposal would benefit from more clarity and comparison to key papers/findings in this space. While one reviewer is leaning towards acceptance, and their points were considered by the other reviewers, there wasn't a consensus towards aligning towardsa an acceptance. Thus, I recommend that the authors take advantage of the reviewers' comments to further improve their manuscript.",ICLR2022, +wG6MSEQZa5,1610040000000.0,1610470000000.0,1,3uiR9bkbDjL,3uiR9bkbDjL,Final Decision,Reject,"The reviewers agree that the paper is addressing an interesting problem (cold-start for representation learning on dynamic graphs). However, the proposed methods can be improved by proposing more novel ideas. At the moment, the proposed methods is a combination of GCN model for node classification and GAE model for link prediction. In this case, some analysis or theoretical justification may make the paper more interesting. Furthermore, the reviewers think the experiments can be improved. For instance, results on more datasets, more comparison methods and a different setup will strengthen the paper. ",ICLR2021, +Cs8sKLjBx14,1642700000000.0,1642700000000.0,1,vKMVrqvXbXu,vKMVrqvXbXu,Paper Decision,Reject,"The paper studies the effect of manifold geometry on the complexity of the function implemented by a random ReLU network, as measured through its decomposition into linear / affine regions. In particular, it provides bounds on a surrogate for the number of such regions and the distance of a fixed point to the boundary of its region. These bounds follow from an extension of an argument of Hanin and Rolnick for Euclidean space. The bounds hold at random initialization, and are complemented with experiments in which they remain valid through training. + +Initial reviews of the paper were mixed. All reviewers recognized the extension to structured / non-euclidean data as an important direction, and the results as extending the argument of Hanin and Rolnick to this setting. At the same time, there were questions about the novelty, clarity, and implications of the paper. One issue concerns the implications of the results and the amount of insight they offer into the data complexity - network complexity relationship. In particular, the paper would be stronger with a more explicit accounting for the constant C_{M,\kappa} and intuitive explanations of how manifold properties such as curvature and reach affect the number of linear regions. There were also concerns regarding the statement and proof of Theorem 3, the initial version of which only held for small \epsilon. The review also raised other smaller issues regarding the paper's clarity and implications. After considering the authors feedback and revisions, reviewers retained their mixed evaluation of the paper. This appears to be a promising direction, but a paper that could benefit from further refinement.",ICLR2022, +wO5-8nhb-nI,1642700000000.0,1642700000000.0,1,Ih7LAeOYIb0,Ih7LAeOYIb0,Paper Decision,Reject,"The paper proposes a new architecture named Iterative Memory Network (IMN) to encode long user behavior sequence for recommendations. Reviewers appreciate the clarity of the writing as well as practicality and the O(L) complexity of the proposed architecture, however do raise questions on novelty. Different design choices employed in the paper are not well explained. The rebuttal was not able to convince the reviewers to accept the work at this venue, but reviewers do feel the paper could fly in an application oriented venue.",ICLR2022, +hUUZqlfTNkD,1610040000000.0,1610470000000.0,1,Au1gNqq4brw,Au1gNqq4brw,Final Decision,Reject,"the authors demonstrated that vanilla RNN, GRU and LSTM compute at each timestep a hidden state which is the sum of the current input and the weighted sum of the previous hidden states (weights can be either unit or complicated functions), when sigmoid and tanh functions are replaced by their second-order taylor series each. they refer to the first term as token-level and the second term as sequence-level, and claim that the latter can be thought of as summing n-gram features in the case of GRU & LSTM due to the complicated weight matrices used for the weighted sum, largely arising from the gating mechanisms. + +the reviewers are largely unsure about the significance of the findings in this paper due to a couple of reasons with which i agree. first, it is unclear whether the proposed approximation scheme is enough to capture much of what happens within either GRU or LSTM. if we consider a single step, it's likely fine to ignore the O(x^3) term arising from either sigmoid or tanh, but when unrolled over time, it's unclear whether these error terms will accumulate or cancel each other. without either empirically or theoretically verifying the sanity of this approximation, it's difficult to judge whether the authors' findings are specific to this approximation scheme or do indeed reflect what happens within GRU/LSTM. + +second, because the authors have used relative simple benchmarks to demonstrate their points, it is difficult, if not impossible, to tell whether the authors' findings are about the datasets themselves (which are all well known to be easily solvable or solvable very well with n-gram classification models and n-gram language models) or about GRU/LSTM, which is related to the first weakness shared by the reviewer. the observations that n-gram models and simplified GRU/LSTM models work as well as the original GRU/LSTM models on these datasets might simply imply that these datasets don't require any complicated interaction among the tokens beyond counting n-grams, which lead to the original GRU/LSTM trained to be simplified (n-gram detectors.) + +that said, i still believe this direction is important and is filled with many interesting observations to be made. i suggest the authors (1) verify the efficacy of their approximation scheme (probably empirical validation is enough, and (2) demonstrate their point with more sophisticated problems (carefully designed synthetic datasets are perfectly fine.) + + +",ICLR2021, +CZ_w9ys_63,1576800000000.0,1576800000000.0,1,S1x522NFvS,S1x522NFvS,Paper Decision,Reject,"This paper makes a connection between one-class neural networks and the unsupervised approximation of the binary classifier risk under the hinge loss. An important contribution of the paper is the algorithm to train a binary classifier without supervision by using class prior and the hypothesis that class conditional classifier scores have normal distribution. The technical contribution of the paper is novel and brings an increased understanding into one-class neural networks. The equations and the modeling present in the paper are sound and the paper is well-written. + +However, in its current form, as pointed out by the reviewers, the experimental section is rather weak and can be substantially improved by adding extra experiments as suggested by reviewers #1, #2. Since its submission the paper has not yet been updated to incorporate these comments. Thus, for now, I recommend rejection of this paper, however on improvements I'm sure it can be a good contribution in other conferences.",ICLR2020, +cWbpDUk4XI3,1610040000000.0,1610470000000.0,1,Yz-XtK5RBxB,Yz-XtK5RBxB,Final Decision,Accept (Poster),"This paper is overall well written and clearly presented. The problem of ordered data clustering is relevant, and the proposed method is effective. + +During the discussion, all reviewers agree with the strength of this paper and share the positive impression. The authors successfully addressed reviewers' concerns by the careful author response, which I also acknowledge. +One of the reviewers raised the concern about the broader impacts, while it is also well addressed in the author response. + +I therefore recommend acceptance of the paper.",ICLR2021, +HkgEj32geV,1544760000000.0,1545350000000.0,1,H1gupiC5KQ,H1gupiC5KQ,The paper can be improved,Reject,"The paper suggests using an ensemble of Q functions for Q-learning. This idea is related to bootstrapped DQN and more recent work on distributional RL and quantile regression in RL. Given the similarity, a comparison against these approaches (or a subset of those) is necessary. The experiments are limited to very simple environment (e.g. swing-up and cart-pole). The paper in its current form does not pass the bar for acceptance at ICLR.",ICLR2019,4: The area chair is confident but not absolutely certain +X8r_de6U_b5,1610040000000.0,1610470000000.0,1,7Z29QbHxIL,7Z29QbHxIL,Final Decision,Reject,"Three reviewers have reviewed this manuscript, and they had severe reservations regarding the presentation quality and the lack of sufficient theoretical support behind empirical observations. Even after rebuttal, the reviewers maintained that the above issues are not fully resolved. Unfortunately, this paper cannot be accepted in its current form.",ICLR2021, +SkgLq_Xll4,1544730000000.0,1545350000000.0,1,BklAEsR5t7,BklAEsR5t7,The paper can be improved,Reject,"The paper addresses the problem of large scale fine-grained classification by estimating pairwise potentials in a CRF model. The reviewers believe that the paper has some weaknesses including (1) the motivation for approximate learning is not clear (2) the approximate objective is not well studied and (3) the experiments are not convincing. The authors did not submit a rebuttal. I encourage the authors to take the feedback into account to improve the paper. +",ICLR2019,4: The area chair is confident but not absolutely certain +7z-rZJ1PShe,1610040000000.0,1610470000000.0,1,kOA6rtPxyL,kOA6rtPxyL,Final Decision,Reject,"This paper presents a variant of MAML or Reptile, where the meta-update along the long trajectory of the inner-loop optimization is bypassed to reduce the computational overhead appeared in MAML. The main idea is to use the look-ahed optimizer with careful tuning of relevant hyperparameters, which is done by a teacher-student scheme. Lazy MAML/Reptile are presented and experiments demonstrated their validity. While the paper contains interesting ideas, most of reviewers have a few concerns which are not even resolved even after the author responses. First of all, ResNet analogy with respect to teacher update was claimed but it was never clearly shown in the paper. The method needs careful tuning of hyperparameters in the inner loop, but the study about the computation requirements is not convincing yet. Long inner loops are computationally feasible for both fomaml and reptile so its not clear in which way the proposed method is improving lengthy exploration in the inner loop other than the performance being better in the experiments. Improving the paper, taking these comments into account, will lead to a good work in near future. ",ICLR2021, +fzDEFA7Dx,1576800000000.0,1576800000000.0,1,HJx7uJStPH,HJx7uJStPH,Paper Decision,Reject,"The paper proposed a waveform-to-waveform music source separation system. Experimental justification shows the proposed model achieved the best SDR among all the existing waveform-to-waveform models, and obtained similar performance to spectrogram based ones. The paper is clearly written and the experimental evaluation and ablation study are thorough. But the main concern is the limited novelty, it is an improvement over the existing Wave-U-Net, it added some changes to the existing model architecture for better modeling the waveform data and compared masking vs. synthesis for music source separation. ",ICLR2020, +OjVvy80YsL1i,1642700000000.0,1642700000000.0,1,xCVJMsPv3RT,xCVJMsPv3RT,Paper Decision,Accept (Poster),"The manuscript describes a method for improving the computational efficiency of randomized ensemble double Q-learning for continuous action RL, by using a small ensemble of Q-functions equipped with dropout and layer normalization, achieving matched sample efficiency at considerably less computational cost. + +Reviewers praised the method's simplicity and achievement of its stated objective of reducing the computational cost of deploying ensemble Q functions. In general, the paper was found to be easy to understand and well written. Several expressed concern about the lack of interrogation of why this combination of dropout and layer norm worked so well and an overall lack of novelty. Other miscellaneous criticisms were well-addressed in rebuttal and extensive new analyses in the Appendix were noted by several reviewers as adding much to the work. + +In the AC's opinion, this is an example of a simple but non-obvious combination of well-known ideas that works very well. The review process has improved the level of empirical rigor that has gone into understanding the properties and trade-offs of this method. I'm happy to recommend acceptance, though would echo reviewers concerns that dubbing the method ""Dr.Q"" will lead to confusion and would strongly urge adopting another name for the camera ready.",ICLR2022, +zOP1lwF384,1576800000000.0,1576800000000.0,1,BygfrANKvB,BygfrANKvB,Paper Decision,Reject,"The authors present a new approach to improve performance for retro-synthesis using a seq2seq model, achieving significant improvement over the baseline. There are a number of lingering questions regarding the significance and impact of this work. Hence, my recommendation is to reject. ",ICLR2020, +oURz9rU3kg,1576800000000.0,1576800000000.0,1,Skl4LTEtDS,Skl4LTEtDS,Paper Decision,Reject,"This paper presents a novel approach to learning in problems which have large action spaces with natural hierarchies. The proposed approach involves learning from a curriculum of increasingly larger action spaces to accelerate learning. The method is demonstrated on both small continuous action domains, as well as a Starcraft domain. + +While this is indeed an interesting paper, there were two major concerns expressed by the reviewers. The first concerns the choice of baselines for comparison, and the second involves improving the discussion and intuition for why the hierarchical approach to growing action spaces will not lead to the agent missing viable solutions. The reviewers felt that neither of these were adequately addressed in the rebuttal, and as such it is to be rejected in its current form.",ICLR2020, +uJ7ROpbYgo,1576800000000.0,1576800000000.0,1,BJlSPRVFwS,BJlSPRVFwS,Paper Decision,Reject,"This paper considers an interesting theoretical question. However, it would add to the strength of the paper if it was able to meaningfully connect the considered model as well as derived methodology to the challenges and performance that arise in practice. ",ICLR2020, +ILsTzg_YH1BC,1642700000000.0,1642700000000.0,1,n0OeTdNRG0Q,n0OeTdNRG0Q,Paper Decision,Accept (Poster),"This paper focuses on improving the efficiency of sharpness-aware minimization method for training neural networks. The proposals are stochastic weight perturbation, namely selecting subset of the parameters at any step, and sharpness-sensitive data selection. The philosophy behind sounds quite interesting to me, namely, sharpness-aware minimizer can be approximated properly with fewer computations after analyzing the min-max procedure. This philosophy leads to a novel algorithm design I have never seen. + +The clarity and novelty are clearly above the bar of ICLR. While the reviewers had some concerns on the significance, the authors did a particularly good job in their rebuttal. Thus, all of us have agreed to accept this paper for publication! Please include the additional experimental results in the next version.",ICLR2022, +L6uhbfuLi,1576800000000.0,1576800000000.0,1,HklvmlrKPB,HklvmlrKPB,Paper Decision,Reject,The paper scores low on novelty. The experiments and model analysis are not very strong.,ICLR2020, +rkzQ2GLdl,1486400000000.0,1486400000000.0,1,Hkz6aNqle,Hkz6aNqle,ICLR committee final decision,Reject,The reviewers unanimously recommend rejecting this paper.,ICLR2017, +OBol8Q_JAR,1576800000000.0,1576800000000.0,1,ByeaXeBFvH,ByeaXeBFvH,Paper Decision,Reject,"This work introduces a simple and effective method for ensemble distillation. The method is a simple extension of earlier “prior networks”: it differs in which, instead of fitting a single network to mimic a distribution produced by the ensemble, this work suggests to use multi-head (one head per individual ensemble member) in order to better capture the ensemble diversity. This paper experimentally shows that multi-head architecture performs well on MNIST and CIFAR-10 (they added CIFAR-100 in the revised version) in terms of accuracy and uncertainty. + +While the method is effective and the experiments on CIFAR-100 (a harder task) improved the paper, the reviewers (myself included) pointed out in the discussion phase that the limited novelty remains a major weakness. The proposed method seems like a trivial extension of the prior work, and does not provide much additional insight. To remedy this shortcoming, I suggest the authors provide extensive experimental supports including various datasets and ablation studies. + +Another concern mentioned in the discussion is the fact that these small improvements are in spite of the fact that the proposed method ends up using many more parameters than the baselines. Including and comparing different model sizes in a full fledged experimental evaluation would better convey the trade-offs of the proposed approach. +",ICLR2020, +H1lCIqbbgN,1544780000000.0,1545350000000.0,1,HJxXynC9t7,HJxXynC9t7,meta-review,Reject,"The authors propose to define 'Expressiveness' in deep RL by the rank of a matrix comprising a number of feature vectors from propagating observations through the learnt representation, and show a correlation between higher rank and higher performance. They try 3 regularizers to increase rank and show that they improve the final score on Atari games compared to A3C or DQN. The AC and reviewers agree that the paper is interesting and novel and could have general significance for the RL field. Also, the authors were very responsive to the reviewers and added more details, plus several experiments and analyses to support their claims. However, the reviewers were concerned about a number of aspects and have recommended that the authors clean up their presentation and analysis a bit more. In particular, the fact that the regularization coefficient is tuned for each Atari game makes it very hard to compare to DQN/A3C which are very careful to keep the same hyperparameters across every game.",ICLR2019,4: The area chair is confident but not absolutely certain +H1ldR_A7e4,1544970000000.0,1545350000000.0,1,ryGkSo0qYm,ryGkSo0qYm,Probable accept based on majority vote.,Accept (Poster),"The paper is proposed as probable accept based on current ratings with a majority accept (7,7,5).",ICLR2019,4: The area chair is confident but not absolutely certain +Cqtd5aUKBs,1576800000000.0,1576800000000.0,1,SyxhVkrYvr,SyxhVkrYvr,Paper Decision,Accept (Poster),This paper deals with the under-sensitivity problem in natural language inference tasks. An interval bound propagation (IBP) approach is applied to predict the confidence of the model when a subsets of words from the input text are deleted. The paper is well written and easy to follow. The authors give detailed rebuttal and 3 of the 4 reviewers lean to accept the paper.,ICLR2020, +MR3Ps3PBhjy,1642700000000.0,1642700000000.0,1,AsyICRrQ7Lp,AsyICRrQ7Lp,Paper Decision,Reject,"I thank the authors for their submission and active participation in the discussion. The reviewers unanimously agree that this submission has significant issues, including comparison to baselines/ablations [BnLV,yX9d,PtA1], clarity [BnLV], justification of the method [nX4W]. Thus, I am recommending rejection of this paper.",ICLR2022, +PDhTga8OVez,1642700000000.0,1642700000000.0,1,Xb2YyVApEj6,Xb2YyVApEj6,Paper Decision,Reject,"The paper presents a masking strategy to introduce the locality bias into the vision transformers. The experiments show the effectiveness of considering such inductive bias. The reviewers agreed on the importance of the research question and the simplicity of the algorithm. MaiT also has a straight-forward sparse attention extension that performs on the complexity of $O(n)$ rather than $O(n^2)$. + +The reviewers also listed some common concerns of the paper: + +(1) The novelty of such a masking approach is relatively low. I don't think the ALS or the soft masking adding too much contribution to that. Similar ideas have been explored in a number of papers. + +(2) Reviewers also raise concerns about the experiments. Inductive biases often help more in small settings (fewer parameters and FGLOPs) and gain less in the large settings. When comparing with the STOA models, I think this is basically the trend shown in the paper as well. While I appreciate the authors’ efforts in including more comparisons, I have to say I really don’t think the performance gain is significant enough especially in the large settings. Needless to say that there are many other ways of encoding the same locality bias into the model. + +Based on the reviewers' judgements and my own opinion, I therefore recommend rejection of this paper.",ICLR2022, +nkPMc-_jQH1,1642700000000.0,1642700000000.0,1,O9DAoNnYVlM,O9DAoNnYVlM,Paper Decision,Reject,Reviewers raised several valid concerns about novelty of quantization idea and lack of discussions related to prior art (AISTATS 2020 paper). The rebuttal did not convince the reviewers to raise their score. We hope the authors will benefit from the feedback and improve the paper for future submission.,ICLR2022, +rkgSzHBxl4,1544730000000.0,1545350000000.0,1,S1xzyhR9Y7,S1xzyhR9Y7,Small but reasonable novel contribution,Reject,"This paper offers a new method for sentence representation learning, fitting loosely into the multi-view learning framework, with fairly strong results. The paper is clearly borderline, with one reviewer arguing for acceptance and another arguing for rejection. While it is a tough decision, I have to argue for rejection in this case. + +There was a robust discussion and the authors revised the paper, so none of the remaining technical issues strike me as fatal. My primary concern is simply that the reviewers could not reach a consensus in favor of the paper. In particular, two reviewers expressed concerns that this paper makes too small an advance in NLP to be of interest to non-NLP researchers. I think it should be possible to broaden the scope of the paper and resubmit it to another general ML venue, and (as one reviewer suggested explicitly), this paper may have a better chance at an NLP-specific venue. + +While neither of these factors was crucial in the decision, I'd encourage the authors (i) to put more effort into comparing properly with the Subramanian and Radford baselines, and (ii) to clarify the points about the human brain. For the second point: While none of the claims about the brain are false *or misleading*, as far as I know, the authors do not make a convincing case that the claims about the brain are actually relevant to the work being done here.",ICLR2019,2: The area chair is not sure +gW9KbIXgmG,1576800000000.0,1576800000000.0,1,Skgxcn4YDS,Skgxcn4YDS,Paper Decision,Accept (Poster),This paper proposes a new method for lifelong learning of language using language modeling. Their training scheme is designed so as to prevent catastrophic forgetting. The reviewers found the motivation clear and that the proposed method outperforms prior related work. Reviewers raised concerns about the title and the lack of some baselines which the authors have addressed in the rebuttal and their revision.,ICLR2020, +D6996H75Y,1576800000000.0,1576800000000.0,1,ryxK0JBtPr,ryxK0JBtPr,Paper Decision,Accept (Poster),Reviewers uniformly suggest acceptance. Please take their comments into account in the camera-ready. Congratulations!,ICLR2020, +HJfa8ypBM,1517250000000.0,1517260000000.0,880,BJvWjcgAZ,BJvWjcgAZ,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers agree the proposed idea is relatively incremental, and the paper itself does not do an exemplary job in other areas to make up for this.",ICLR2018, +EncQeBK51jH,1610040000000.0,1610470000000.0,1,oFp8Mx_V5FL,oFp8Mx_V5FL,Final Decision,Accept (Poster),"The paper suggests a procedure to efficiently adapting a learned neural compression model to a new test distribution. If this test distribution has low entropy (e.g., a video as a sequence of interrelated frames), large compression gains can be expected. To achieve these gains, the method adapts the decoder model to the new instance, transmitting not only the data but also a compressed model update. Experiments are carried out on compressing I-frames from videos, while comparisons comprise baseline approaches that finetune the latent representations of videos as opposed to the decoder. + +The paper’s main contribution is very timely and relevant. While it was well-known in the classical compression literature that model updates could be sent along with the data (e.g., as already done in “optimized JPEG”), this is the first time the idea was implemented in neural compression. The experiments are arguably the paper’s weaker part and were originally a concern, but they have been significantly improved during the review period such that all reviewers voted for acceptance. We encourage the authors to further strengthen their experimental results by adding more challenging baselines on well-established tasks (e.g., image compression). +",ICLR2021, +k_s4m9Dr5cz,1642700000000.0,1642700000000.0,1,M5hiCgL7qt,M5hiCgL7qt,Paper Decision,Reject,"The paper relies on the analytical tools afforded by on the NTK theory to proposes an adversarial attack that uses the information of the model structure and training data, without the need to access the model under attack. While the reviewers found the problem interesting and well motivated, they feel that the theoretical analysis and the experimental results can be significantly improved. In particular, some of the points that the reviewers did not find convincing during the discussion include: (1) the technical novelty of the work, i.e., applying adversarial attack on NTK at inference time seems a trivial extension of PGD attack; (2) authors' argument that knowing the model is strictly stronger than knowing the original training data; (3) scalability and generalization of the proposed method to settings without training and test set; and (4) comparison to existing sota transfer attacks in the same setting, like no-box attack. Addressing the above points will significantly improve the manuscript.",ICLR2022, +rklwogBoyN,1544410000000.0,1545350000000.0,1,BygREjC9YQ,BygREjC9YQ,"a promising but heuristic framework for analyzing various optimizers, with little empirical justification",Reject,"The aim of this paper is to interpret various optimizers such as RMSprop, Adam, and NAG, as approximate Kalman filtering of the optimal parameters. These algorithms are derived as inference procedures in various dynamical systems. The main empirical result is the algorithms achieve slightly better test accuracy on MNIST compared to an unregularized network trained with Adam or RMSprop. + +This was a controversial paper, and each of the reviewers had a significant back-and-forth with the authors. The controversy reflects that this is a pretty interesting and relevant topic: a proper Bayesian framework could provide significant guidance for developing better optimizers and regularizers. Unfortunately, I don't think this paper delivers on its promise of a unifying Bayesian framework for these various methods, and I don't think it's quite ready for publication at ICLR. + +There was some controversy about relationships to various recently published papers giving Bayesian interpretations of optimizers. The authors believe the added value of this submission is that it recovers features such as momentum and root-mean-square normalization. This would be a very interesting contribution beyond those works. But R2 and R3 feel like these particular features were derived using fairly ad-hoc assumptions or approximations almost designed to obtain existing algorithms, and from reading the paper I have to say I agree with the reviewers. + +There was a lot of back-and-forth about the correctness of various theoretical claims. But overall, my impression is that the theoretical arguments in this paper exceed the bar for a primarily practical/empirical paper, but aren't rigorous enough for the paper to stand purely based on the theoretical contributions. + +Unfortunately, the empirical part of the paper is rather lacking. The only experiment reported is on MNIST, and the only result is improved test error. The baseline gets below 99% test accuracy, below the level achieved by the original LeNet, suggesting the baseline may be somehow broken. Simply measuring test error doesn't really get at the benefits of Bayesian approaches, as it doesn't distinguish it from the many other regularizers that have been proposed. Since the proposed method is nearly identical to things like Adam or NAG, I don't see any reason it can't be evaluated on more challenging problems (as reviewers have asked for). + +Overall, while I find the ideas promising, I think the paper needs considerable work before it is ready for publication at ICLR. +",ICLR2019,5: The area chair is absolutely certain +nFzLSbBuz,1576800000000.0,1576800000000.0,1,ryen_CEFwr,ryen_CEFwr,Paper Decision,Reject,"The paper proposes an approach for unsupervised learning of keypoint landmarks from images and videos by decomposing them into the foreground and static background. The technical approach builds upon related prior works such as Lorenz et al. 2019 and Jakab et al. 2018 by extending them with foreground/background separation. The proposed method works well for static background achieving strong pose prediction results. The weaknesses of the paper are that (1) the proposed method is a fairly reasonable but incremental extension of existing techniques; (2) it relies on a strong assumption on the property of static backgrounds; (3) video prediction results are of limited significance and scope. In particular, the proposed method may work for simple data like KTH but is very limited for modeling videos as it is not well-suited to handle moving backgrounds, interactions between objects (e.g., robot arm in the foreground and objects in the background), and stochasticity. ",ICLR2020, +4RIFJ4R3C7,1576800000000.0,1576800000000.0,1,HkejNgBtPB,HkejNgBtPB,Paper Decision,Accept (Poster),"The paper addresses the problem of generating descriptions from structured data. In particular a Variational Template Machine which explicitly disentangles templates from semantic content. They empirically demonstrate that their model performs better than existing methods on different methods. + +This paper has received a strong acceptance from two reviewers. In particular, the reviewers have appreciated the novelty and empirical evaluation of the proposed approach. R3 has raised quite a few concerns but I feel they were adequately addressed by the reviewers. Hence, I recommend that the paper be accepted. ",ICLR2020, +GBTYCJtnZIV,1642700000000.0,1642700000000.0,1,GQjaI9mLet,GQjaI9mLet,Paper Decision,Accept (Spotlight),"This paper introduces a novel SE(3) equivariant graph matching network, along with a keypoint discovery and alignment approach, for the problem of protein-protein docking, with a novel loss based on optimal transport. The overall consensus is that this is an impactful solution to an important problem, whereby competitive results are achieved without the need for templates, refinement, and are achieved with substantially faster run times.",ICLR2022, +5mVlfYDn4_V,1642700000000.0,1642700000000.0,1,nT0GS37Clr,nT0GS37Clr,Paper Decision,Reject,"The paper brings the ""supermask"" idea used in neural architecture search to the application of federated learning, here represented by a single mask of a larger network. The method can be seen as pruning-before-training, or more precisely pruning-instead-of-training. It is a simplified version of the related works LotteryFL, PruneFL or FedMask, with the difference that here no personalization and no training of the weights is performed, only learning of a global mask. Related work discussion should be improved. While the communication efficiency impact of the method seems minor but positive, the interesting point is that authors here argue that masking will improve robustness to adversarial participants during training. + +Unfortunately no theoretical evidence is provided for success of training, in the sense of Byzantine robustness. It is known that robust training can be attacked with small perturbations correlated over time (e.g. 'little is enough'), so also over layers, two important aspects which are ignored here - as voting here is only analysed static at a single time-point. As pointed out by reviewer JJjz, the considered attack (inverting ranking) is far from being formally proven to be the strongest one, and we would have wished for a more precise discussion of these issues as the target of the paper seems to be mainly robustness. + +Concerns on the paper also remained on the level of novelty, as it only uses existing building blocks which are more or less directly applicable from the centralized setting, and on the limited contributions towards formal robustness, and on the limited discussion of related work mentioned by several reviewers, only some of which we were able to address in the discussion phase. + +We hope the detailed feedback helps to strengthen the paper in the future.",ICLR2022, +0J74bDiGHXy,1642700000000.0,1642700000000.0,1,ibNr25jJrf,ibNr25jJrf,Paper Decision,Reject,"This paper presents a new approach for learning binary latent variable models using evolutionary optimization. + +Pros: +* A new perspective to learning binary latent variables is proposed using evolutionary algorithms. +* The proposed method works well on auxiliary tasks such as zero-shot denoising and inpainting. + +Cons: +* The proposed evolutionary optimization performs poorly on the binary VAE problem. +* It has a high computational cost that will limit its application in real-world problems. +* An in-depth comparison with prior work on learning discrete latent variables is missing. This may include MCMC-based approaches, REINFORCE-based techniques, REBAR or RELAX. + +This paper presents an interesting direction for learning binary latent variables using evolutionary algorithms. However, the proposed method performs poorly on the binary VAE problem which is the core problem, targetted in this paper (See likelihood values for binary VAEs in Fig. 15). The reviewers have raised concerns regarding the computational complexity of the evolutionary method in practice. They have also criticized the missing baselines for the binary VAE experiments. + +The authors have argued that the proposed method excels at auxiliary problems such as zero-shot image denoising and inpainting. However, these problems are not the central problem of this submission, and naturally, they have not been discussed, reviewed, and evaluated thoroughly. They can be also addressed with non-binary VAEs and other forms of generative models which are not discussed in the paper. + +Given these concerns, we don't believe that this submission in its current form is ready for publication at ICLR.",ICLR2022, +phqTZI4Cd-a,1642700000000.0,1642700000000.0,1,DIsWHvtU7lF,DIsWHvtU7lF,Paper Decision,Reject,"Thank you for your submission to ICLR. There is some disagreement about this paper, and several of the reviews are of relatively low confidence. While I appreciate the effort that the authors have put into addressing the concerns of the reviewers, after going through the paper and the responses myself, I'm ultimately coming down on the side of the less positive reviews. My reasoning, honestly, is that I think the authors are vastly overestimating the knowledge that the ICLR audience will have about numerical methods for PDE solutions. Reading through the paper, I honestly have very little idea about how the actual numerical techniques are carried out, and it's unclear to me precisely where this method falls in between a traditional numerical solver an actual neural network. Reading through the reviews, even the more positive ones, I don't think I'm alone in this perception (and the authors will hopefully believe me that these reviewers _are_ indeed emblematic of the subgroup of ICLR that is most experience with differential equations). I really feel like either a substantial rewrite of the paper is needed, to make clear the full extent of the numerical methods being applied; or alternatively, the work may really be better suited for a numerical methods venue.",ICLR2022, +B1l8AKZggN,1544720000000.0,1545350000000.0,1,H1xk8jAqKQ,H1xk8jAqKQ,Interesting paper but not quite there yet,Reject," +-pros: +- good, sensible idea +- good evaluations on the domains considered +- good analysis + +-cons: +- novelty, broader evaluation + +I think this is a good and interesting paper and I appreciate the authors' engagment with the reviewers. I agree with the authors that it is not fair to compare their work to a blog post which hasn't been published and I have taken this into account. However, there is still concern among the reviewers about the strength of the technical contribution and the decision was made not to accept for ICLR this year. ",ICLR2019,3: The area chair is somewhat confident +ANNM3kncdg,1576800000000.0,1576800000000.0,1,HJeTo2VFwH,HJeTo2VFwH,Paper Decision,Accept (Spotlight),"This is a strong submission, and I recommend acceptance. The idea is an elegant one: sparsify a network at initialization using a distribution that achieves approximate orthogonality of the Jacobian for each layer. This is well motivated by dynamical isometry theory, and should imply good performance of the pruned network to the extent that the training dynamics are explainable in terms of a linearization around the initial weights. The paper is very well written, and all design decisions are clearly motivated. The experiments are careful, and cleanly demonstrate the effectiveness of the technique. The one shortcoming is that the experiments don't use state-of-the-art modern architectures, even thought that ought to have been easy to try. The architectures differ in ways that could impact the results, so it's not clear to what extent the same principles describe SOTA neural nets. Still, this is overall a very strong submission, and will be of interest to a lot of researchers at the conference. + +",ICLR2020, +P6dAL-ngr30,1642700000000.0,1642700000000.0,1,_l_QjPGN5ye,_l_QjPGN5ye,Paper Decision,Accept (Poster),"The paper presents a new approach to learning human behavior by observing a small number of interactions. To this end, it proposes a Bayesian learning framework where a Boltzmann-type prior over human policies, based on an available reward function, governs default behavior. The prior is updated by observing actual trajectories taken by (human) agents, in principle. In practice, a full-fledged implementation using Gaussian priors and features from a neural architecture is proposed and shown to be effective in practical benchmarks. + +The reviewers are all positive about the paper's contributions. One remaining concern is that the effect of the quality of the prior (Boltzmann vs. other type vs. features designed in a different way) on the learning process is not explored to a significant depth. Yet, the results and approach proposed in the paper are valuable to merit acceptance.",ICLR2022, +VbQvSSxr_c3,1610040000000.0,1610470000000.0,1,2NHl-ETnHxk,2NHl-ETnHxk,Final Decision,Reject,"The paper received diverging review feedback. While reviewers found merits in the work, they also raise serious concerns over experimental validation, comparison with the existing methods, and practicality of the proposed method. It appears that the paper can benefit from better writing and more experimental validations clarifying all these points. ",ICLR2021, +qg34Yr5Lvx,1576800000000.0,1576800000000.0,1,Skl1HCNKDr,Skl1HCNKDr,Paper Decision,Reject,"The majority of reviewers suggest that this paper is not yet ready for publication. The idea presented in the paper is interesting, but there are concerns about what experiments are done, what papers are cited, and how polished the paper is. This all suggests that the paper could benefit from a bit more time to thoughtfully go through some of the criticisms, and make sure that everything reviewers suggest is covered.",ICLR2020, +4-XV2PdBjLV,1610040000000.0,1610470000000.0,1,ghKbryXRRAB,ghKbryXRRAB,Final Decision,Reject,"This work addresses the problem of understanding how pre-trained language models are encoding semantic information, such as WordNet structure. This is evaluated by recreating the structure of WordNet from embeddings. The study also shows evidence about the limitations of current pre-trained language models, demonstrating that all of them have difficulties to encode specific concepts. + +pros: +- good idea to reveal how well the pre-training models encode the underlying knowledge graph +- detailed understanding on how language models incorporate semantic knowledge and where this knowledge might be located within the models +- experiments show that models coming from the same family are strongly correlated +- the paper shows how individual layers of the language models contribute to the underlying knowledge +- analysis of the different semantic factors (9 different factors, including number of senses, graph depth etc.) +- paper is clearly written and understandable and includes enough details to understand the implementation of the semantic probing classifier. + +cons: +- weakly connected goals, response from reviewers is string around 3 main topics, which is seen as many for a single scientific paper. It would be easier to focus only on one topic and make a clear conclusion, +- single word concepts while CE models are powerful in context, +- lack of a profound analysis of the experimental results + - hard to understand which semantic category the pre-trained methods work well or not well, + - clarification about the improvement of the semantic learning abilities based on these results. + +Several of the identified issues have been answered in the author's rebuttal, however, the paper would still need more work to be accepted. Note also that the bar a this year ICLR conference is high and we encourage the authors to submit their updated work again at the next conference.",ICLR2021, +B0wtUeK6uG,1642700000000.0,1642700000000.0,1,JmU7lyDxTpc,JmU7lyDxTpc,Paper Decision,Reject,"This paper examines the time-dependent generalization behavior of high-dimensional student-teacher linear regression models. It introduces a simple two-scale covariance model and examines the exact solutions for the dynamics, finding a tradeoff between the fast- and slow-learning features, leading to epoch-wise double descent. Qualitative comparisons are made with the SGD dynamics of ResNe18 on CIFAR-10. + +The reviewers offer split opinions on this work, with most reviewers finding strength in exhibiting the complex behavior of epoch-wise double descent in a simple and analytically-tractable setting. Weaknesses highlighted in the discussion include clarity, discussion of prior work, and rigor of the analyses. + +I believe a clear demonstration and analytical explanation for epoch-wise double-descent would certainly be of interest to the ICLR community, and I concur with the reviewers who emphasize these strengths of the paper. However, as one reviewer mentioned, this paper is primarily a theoretical work, and as such, the main theoretical advancements over prior work should be clear, and the novel results should be sufficiently rigorous. In this regard, the paper is lacking, as detailed below. + +First of all, the discussion of SGD is imprecise, with no explicit definition of the optimization method that is actually being performed. What is the batch size? How is the sampling performed? What is the learning rate/schedule? The formulas in Secs. 2.1-2.2 suggest that full-batch gradient descent is being performed. In Sec. 2.3, stochasticity from SGD is induced via a Gibbs distribution. However, contrary to the discussion, I don't think that this is a ""well-known"" **result** (though of course it is a well-known **model**), and in high-dimensions I am not sure it is even correct (see e.g. [1]). + +Second of all, even assuming the Gibbs distribution, the substitution on line (23) is only justified in words, whereas the cited results from Ali et al., 2020, only provide a bound. What is meant by ""$\approx$""? Some discussion is given about this step of the derivation, but more precise statements would really help make the argument convincing. + +Finally, the derivations seem to rely on the replica method from statistical physics, which is not rigorous. While I am generally supportive of such methods for technically challenging problems that do not readily admit alternative analyses, given the simplicity of the linear model setup here, I believe a more rigorous approach would not be prohibitively difficult. At the very least, some acknowledgement should be given about the lack of rigor in the derivation. + +Overall, this paper presents a simple and analytically tractable model that sheds light on the importance phenomenon of epoch-wise double descent. Unfortunately, the presentation is not sufficiently clear and the derivations not sufficiently rigorous to merit publication at this time. + +[1] Paquette, Courtney et al. “SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize Criticality.” COLT (2021).",ICLR2022, +H1ikQJTBz,1517250000000.0,1517260000000.0,52,rkQkBnJAb,rkQkBnJAb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This is another paper, similar in spirit to the Wasserstein GAN and Cramer GAN, which uses ideas from optimal transport theory to define a more stable GAN architecture. It combines both a primal representation (with Sinkhorn loss) with a minibatch-based energy distance between distributions. +The experiments show that the OT-GAN produces sharper samples than a regular GAN on various datasets. While more could probably be done to distinguish the model from WGANs and Cramer GANs, this paper seems like a worthwhile contribution to the GAN literature and merits publication. +",ICLR2018, +3zlthw9dO9C,1642700000000.0,1642700000000.0,1,Gx6Tvlm-hWW,Gx6Tvlm-hWW,Paper Decision,Reject,"This paper suggests that in multi-label classification (where there are multiple y that could be correct), the usual conformal prediction setup could be too conservative (too large a set with too many false positives), because it asks for ""full"" coverage. They propose to change the error metric from coverage to precision, to instead output a smaller set, with higher precision, potentially with the loss of coverage. The paper thus produces two variants of conformal prediction: guaranteeing that the expected number of false positives is at most k, and that the probability that the #false positives being > k is < \delta. The experimental study is interesting. + +I followed the extensive discussion thread, and appreciate the authors' and reviewers' willingness to engage. However, in the end, the (excellent) reviewers were somewhat still not enthusiastic about the paper, and with nobody willing to champion the paper, it ended up on the borderline, in the bottom half of my set. Nevertheless, I read through the paper in detail myself to make sure, but I find enough reasons that suggest that the paper is not ready for publication currently, and would benefit from a significant overhaul. + +It seems like the final algorithm is a slight variant of nested conformal prediction (a well known and oft-cited paper by Gupta et al, 2020 that the authors seem to miss) in the following sense. The usual conformal-for-classification framework would order the labels in terms of a score (like posterior probability) and then return a set of labels whose score is less than some threshold. (The same style of procedure can be used for the single-label and multi-label case also.) The nesting appears to be the same high level framework used in this paper, except that the threshold is chosen by a different rule from standard conformal prediction (ie same nesting, different threshold). Writing the algorithm more transparently will make for a tighter connection to the conformal literature --- at the moment, the authors claim a fair bit of novelty, but this is partially due to the omission of this reference and the cleaner broad (nested conformal) framework under which this work (as well as standard conformal) sit. + +At the same time, I was not fully convinced of the central theoretical claim of the paper, in Proposition 4.6. The authors claim that since Theorem 4.3 is simultaneously valid for all j in B satisfying a condition (the filtered set), the theorem can also be invoked for a data-dependent index (chosen via (12)). This does not appear to the true, and at the very least requires careful justification. At a high level, dropping the conditioning for simplicity, Thm 4.3 reads as ""forall j in [B], E[A_j] <= c"", but this does not imply that for a data-dependent j-hat, we have E[A_{j-hat}] <= c. It would have been true, if the forall j appeared as a sup_j inside the expectation (or as a forall j inside the probability, rather than outside). The authors may want to clarify more carefully, if it is indeed true. + +Minor: I also continue to find typos in the main results and proofs. These are minor, but should be corrected. I believe that in the last line of Theorem 4.3, the X_i should be X_{n+1}. In the display after (25), that should be T_k, and not T_{k,\delta}. There appear to be some other potentially missing references as well. For example, while distribution-free conformal approaches are cited, distribution-free calibration approaches are not, within the related work section. At the same time, some very recent papers, such as by Bates et al, or Angelopoulos et al, appear to be overemphasized in the introduction. A more fair coverage of related work could be useful.",ICLR2022, +6dJF9t1rGj,1642700000000.0,1642700000000.0,1,BZbUtxOy3R,BZbUtxOy3R,Paper Decision,Reject,"The paper studies the problem of character generation using reinforcement learning for generation/parsing. All the reviewers recommended reject due to insufficient experimental investigation to support the ideas. The authors did not provide a rebuttal. Hence, the reviewers' opinion still remains the same. AC agrees with the reviewers and believes that the paper is not yet ready for publication.",ICLR2022, +r1gClIytkV,1544250000000.0,1545350000000.0,1,SyezvsC5tX,SyezvsC5tX,meta-review,Reject,"The paper proves that the locus of the global minima of an over-parameterized neural nets objective forms a low-dimensional manifold. The reviewers and AC note the following potential weaknesses: +--- it's not clear why the proved result is significant: it neither implies the SGD can find a global minimum, nor that the found solution can generalize. (Very likely, most of the global minima on the manifold cannot generalize.) +--- the results seem very intuitive and are a straightforward application of certain topological theorem. +",ICLR2019,5: The area chair is absolutely certain +ucM05O_9nW,1576800000000.0,1576800000000.0,1,HJg4qxSKPB,HJg4qxSKPB,Paper Decision,Reject,"This paper aims to study the effect of data augmentation of generalization performance. The authors put forth a measure of rugosity or ""roughness"" based on the tangent Hessian of the function reminiscent of a classic result by Donoho et. al. The authors show that this measure changes in tandem with how much data augmentation helps. The reviewers and I concur that the rugosity measure is interesting. However, as the reviewer mention the main draw back of this paper is that this measure of rugosity when made explicit does not improve generalization. This is the main draw back of the paper. I agree with the authors that this measure is interesting in itself. However, I think in its current form the paper is not ready for prime time and recommend rejection. That said, I believe this paper has a lot of potential and recommend the authors to rewrite and carry out more careful experiments for a future submission.",ICLR2020, +bjNDtvtLV0bA,1642700000000.0,1642700000000.0,1,nrGGfMbY_qK,nrGGfMbY_qK,Paper Decision,Accept (Poster),"The authors propose a new continual-learning setting with a few distinguishing features: 1) the task boundaries are blurry (in other words, past task samples can reappear); 2) training is online; and 3) evaluation using online accuracy (instead of average accuracy). The authors also propose a useful method for this scenario and benchmark it using four different datasets. + +The first round of review pointed to two main limitations of the manuscript. ++ The authors only provided small-scale experiments. The reviewers argued that for the setup and method to have an impact having good results using larger-scale data would go a long way. ++ Whether “task-free” and “class-incremental” were compatible. + +For the former, the authors were very reactive and provided results using a standard ""ImageNet for CL"" dataset. + +For the latter, I must thank the authors and also the reviewers for discussing this thoroughly. In the end, my understanding is that there was a reconciliation that both were in fact compatible, but the reviewer suggested that this be discussed very clearly by the authors. I second this suggestion. The CL field given its many slightly different settings might be partly to blame here (reviewer Vfw2 made a similar comment, and I also thank them for playing a role in resolving the issue). + + +A few additional thoughts: ++ I believe that more general setups in CL are worthwhile even in the absence of any immediate applications. This is especially true since some of the standard CL assumptions do not seem to be well motivated. However, I find that claiming that something is more realistic requires grounding (e.g. a set of examples from the ""real world"" or a specific domain/setting). I know the authors backed some of their claims with references, but different real-world problems will come with different limitations and I would be hesitant to use phrases such as ""most real-world"" settings without thorough justification. ++ While different from the core of your work, I believe the framework proposed in this other recent paper has similar goals (although the setup allows pre-training and is not online). Might be worth knowing about it in case you do not: +Online Fast Adaptation and Knowledge Accumulation (OSAKA): a New Approach to Continual Learning, NeurIPS 2020 +https://papers.nips.cc/paper/2020/file/c0a271bc0ecb776a094786474322cb82-Paper.pdf + +All in all, this is a good contribution that proposes an interesting and rich setting along with a good baseline method for it. I strongly encourage the authors to follow through on their promise to provide the community with code, dataset splits, Kaggle leaderboard, etc., as a way to maximize the impact of their work.",ICLR2022, +81GaFRZbP4q,1610350000000.0,1610470000000.0,1,ALSupSRaBH,ALSupSRaBH,Final Decision,Reject,"This paper proposes a new clustering method that takes into account side information. The paper was reviewed by four expert reviewers who expressed concerns for novelty, empirical and theoretical depth, and unclear parts of the paper. The authors are encouraged to continue research, taking into consideration the detailed comments provided by the reviewers.",ICLR2021, +r1xd0pDxe4,1544740000000.0,1545350000000.0,1,Hyls7h05FQ,Hyls7h05FQ,word sense induction with Gumbel-Softmax,Reject," +Pros: + +* High quality evaluation across different benchmarks, plus human eval + +* The paper is well written (though one could quibble about the motivation for the method, see Cons) + +Cons: + +* The approach is incremental, the main contribution is replacing marginalization or RL with G-S. G-S has already been studied in the context of VAEs with categorical latent variables, i.e. very similar models. + +* The main technical novelty is varying amount of added noise (i.e. downscaling Gumbel noise). In principle, the Gumbel relaxation is not needed here as exact marginalization can be done (as) effectively. Unlike the standard strategy used to make discrete r.v. tractable in complex models, samples from G-S are not used in this work to weight input to the 'decoder' (thus avoiding expensive marginalization) but to weight terms corresponding to reconstruction from individual latent states (in constract, e.g., to SkimRNN of Seo et al (ICLR 2018)). Presumably adding noise to softmax helps to force sharpness on the posteriors (~ argmax in previous work) and stochasticity may also help exploration. + +(Given the above, ""to preserve differentiability and circumvent the difficulties in training with reinforcement learning, we apply the reparameterization trick with Gumbel softmax"" seems slightly misleading) + + +* With contextualized embeddings, which are sense-disambiguated given the context, learning discrete senses (which are anyway only coarse approximations of reality) is less practically important + +Two reviewers are somewhat lukewarm (weak accept) about the paper (limited novelty), whereas one reviewer is considerably more positive. I do not believe that the reviews diverge in any factual information though. + + + +",ICLR2019,5: The area chair is absolutely certain +#NAME?,1576800000000.0,1576800000000.0,1,SkgTR3VFvH,SkgTR3VFvH,Paper Decision,Reject,"The paper proposed a new pipelined training approach to better utilize the memory and computation power to speed up deep convolutional neural network training. The authors experimentally justified that the proposed pipeline training, using stale weights without weights stacking or micro-batching, is simpler and does converge on a few networks. + +The main concern for this paper is the missing of convergence analysis of the proposed method as requested by the reviewers. The authors brought up the concern of the limited space in the paper, which can be addressed by putting convergence analysis into appendix. From a reader perspective, knowing the convergence property of the methods is much more important than knowing it works for a few networks on a particular dataset. ",ICLR2020, +Sk947JaBM,1517250000000.0,1517260000000.0,121,H1uR4GZRZ,H1uR4GZRZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This is a borderline paper. The reviewers are happy with the simplicity of the proposed method and the fact that it can be applied after training; but are concerned by the lack of theory explaining the results. I will recommend accepting, but I would ask the authors add the additional experiments they have promised, and would also suggest experiments on imagenet.",ICLR2018, +xnDRHE1izH,1576800000000.0,1576800000000.0,1,Bkg5LgrYwS,Bkg5LgrYwS,Paper Decision,Reject,"The present paper addresses the problem of imitation learning in multi-modal settings, combining vision, language and motion. The proposed approach learns an abstract task representation, and the goal is to use this as a basis for generalization. This paper was subject to considerable discussion, and the authors clarified several issues that reviewers raised during the rebuttal phase. Overall, the empirical study presented in the paper remains limited, for example in terms of ablations (which components of the proposed model have what effect on performance) and placement in the context of prior work. As a result, the depth of insights is not yet sufficient for publication.",ICLR2020, +mD_yhn82mk,1576800000000.0,1576800000000.0,1,BklXkCNYDB,BklXkCNYDB,Paper Decision,Reject,"While there was some interest in the ideas presented, this paper was on the borderline, and was ultimately not able to be accepted for publication at ICLR. + +Reviewers raised concerns as to the novelty, generality, and practicality of the approach, which could have been better demonstrated via experiments.",ICLR2020, +r7uQN3HAQg,1576800000000.0,1576800000000.0,1,Byx5R0NKPr,Byx5R0NKPr,Paper Decision,Reject,"The reviewers generally reached a consensus that the work is not quite ready for acceptance in its current form. The central concerns were about the potentially limited novelty of the method, and the fact that it was not quite clear how good the annotations needed to be (or how robust the method would be to imperfect annotations). This, combined with an evaluation scenario that is non-standard and requires some guesswork to understand its difficulty, leaves one with the impression that it is not quite clear from the experiments whether the method really works well. I would recommend for the authors to improve the evaluation in the next submission.",ICLR2020, +B1GBnzU_g,1486400000000.0,1486400000000.0,1,BkdpaH9ll,BkdpaH9ll,ICLR committee final decision,Reject,"The paper investigates several ways of conditioning an image captioning model on image attributes. While the results are good, the novelty is too limited for the paper to be accepted.",ICLR2017, +H1v3S1aHM,1517250000000.0,1517260000000.0,655,BkoCeqgR-,BkoCeqgR-,ICLR 2018 Conference Acceptance Decision,Reject,Three reviewers recommend rejection and there is no rebuttal.,ICLR2018, +SkQNX1THM,1517250000000.0,1517260000000.0,115,B1QgVti6Z,B1QgVti6Z,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Based on the positive reviews, I recommend acceptance. The paper analyzes when empirical risk is close to the population version, when empirical saddle points are close to the population version and empirical gradients are close to the population version.",ICLR2018, +mjW0hSFDmIm,1642700000000.0,1642700000000.0,1,gi4956J8g5,gi4956J8g5,Paper Decision,Reject,"This paper proposes a new two stage second-order unsupervised feature selection method via knowledge contrastive distillation. In the first stage, a sparse attention matrix that represents second order statistics is learned. In the second stage, a relational graph based on the learned attention matrix is constructed to perform graph segmentation for feature selection. + +This proposed method contains some new and interesting ideas and is novel in the unsupervised feature selection setting, though some components such as the second order affinity matrix are not totally new. The proposed method is technically sound. The authors compared their method with 10 methods including several recent deep methods on 12 datasets and demonstrated consistent improvements. + +However, there are some concerns from the reviewers, even after the discussion phase. 1) The computational efficiency of the proposed method seems to be low. Since one goal of feature selection is to speed up downstream tasks, the efficiency of feature selection itself should also be considered. I suggest the authors analyze the computational bottleneck of the proposed method and improve the efficiency. 2) More ablation studies can be added to illustrate how the proposed method removes the redundancy issues of the selected features. 3) Some metrics like supervised classification accuracy can be potentially used as a metric. Though supervised classification is impossible in the unsupervised learning setting, running the experiments on some datasets that have labels by pretending having on label is one way to evaluate the method. + +Overall, the paper provides some new and interesting ideas. However, given the above concerns, the novelty and significance of the paper will degenerate. Although we think the paper is not ready for ICLR in this round, we believe that the paper would be a strong one if the concerns can be well addressed.",ICLR2022, +SkgmeCK9JE,1544360000000.0,1545350000000.0,1,BJGfCjA5FX,BJGfCjA5FX,Intersting idea that needs a bit more investigation,Reject," The paper proposes an augmented adversarial reconstruction loss for training a stochastic encoder-decoder architecture. It corresponds to a discriminator loss distinguishing between a pair of a sample from the data distribution and its augmentation and pair containing the sample and its reconstruction. The introduction of the augmentation function is an interesting idea, intensively tested in a set of experiments, but, as two of the reviewers pointed out, the paper could be improved by deeper investigation of the augmentation function and the way of choosing it, which would increase significance of the contribution. ",ICLR2019,3: The area chair is somewhat confident +EnTsTevQuy,1610040000000.0,1610470000000.0,1,3zaVN0M0BIb,3zaVN0M0BIb,Final Decision,Reject,"Motivated by the fact that the benefit of overparameterization in unsupervised learning is not well understood than supervised learning, this paper analyzes normalized flow (NF) when the underlying neural network is one hidden layer overparameterized network and proves that for a certain class of NFs, one can efficiently learn any reasonable data distribution under minimal assumptions. The paper is very well motivated. However, the main concerns from the reviewers include (1) the writing quality and presentation are poor, even after revision during the author’s response; and (2) the analysis is limited in the neural tangent kernel (NTK) regime, which makes the results less significant. I agree with the reviewers’ evaluation and I think the first concern can be addressed by a careful revision, while the second concern needs additional nontrivial effort. Thus, I recommend rejection.",ICLR2021, +GGvIZE6Jb1,1576800000000.0,1576800000000.0,1,HygwvC4tPH,HygwvC4tPH,Paper Decision,Reject,"The paper describes an approach for learning context dependent entity representations that encodes fine-grained entity types. The paper includes some good empirical results and observations, but the proposed approach is very simple but lacks technical novelty needed to top ML conference; the clarify of the presentation can also be improved. ",ICLR2020, +OGpXE-mYr2v,1642700000000.0,1642700000000.0,1,VrjOFfcnSV8,VrjOFfcnSV8,Paper Decision,Accept (Poster),"The three reviewers all felt the paper was above threshold for acceptance to ICLR. To improve the final version, they suggest some additional discussion and experiments may help the paper.",ICLR2022, +YSuTyjBlUG,1610040000000.0,1610470000000.0,1,wWK7yXkULyh,wWK7yXkULyh,Final Decision,Accept (Oral),"Thanks for your submission to ICLR. + +When the initial reviews were written, three of the four reviewers were positive about the paper. Everyone felt it was overall a solid contribution, but there were some concerns about the clarity and presentation, as well as some suggestions for additional experiments. During the rebuttal/response period, the authors did a very nice job in responding to the concerns of the reviewers. Ultimately, all of the reviews were in agreement after discussion that the paper is strong and ready for publication. I also like this paper a lot, and find it to be a nice way to combine LSH with NN training. I am happy to recommend this paper for publication.",ICLR2021, +s9jogC-UMjv,1610040000000.0,1610470000000.0,1,NqWY3s0SILo,NqWY3s0SILo,Final Decision,Reject,"This work was deemed interesting by the reviewers, but they highlighted the following weaknesses in this version of the paper: + +- Lack of comparison to other methods. + +- Lack of novelty compared to previous work. + +- Fundamental problem with training only on one dataset (MNIST), issue with possible overfitting.",ICLR2021, +Syf53zIOg,1486400000000.0,1486400000000.0,1,r1BJLw9ex,r1BJLw9ex,ICLR committee final decision,Reject,"This was a borderline paper. However, no reviewers were willing to champion the acceptance of the paper during the deliberation period. Furthermore, in practice, initialization itself is a hyperparameter that gets tuned automatically. To be a compelling empirical result, it would be useful for the paper to include a comparison between the proposed initialization and a tuned arbitrary initialization scale with various tuning budgets. Additionally, other issues with the empirical evaluation brought up by the reviewers were only partially resolved in the revisions. For these reasons, the paper has been recommended for rejection.",ICLR2017, +E5lxdTfRo2,1576800000000.0,1576800000000.0,1,B1lxV6NFPH,B1lxV6NFPH,Paper Decision,Reject,"This paper uses Bayesian optimization with neural networks for neural architecture search. +One of the contributions is a path-based encoding that enumerates every possible path through a cell search space. This encoding is shown to be surprisingly powerful, but it will not scale to large cell-based search spaces or non-cell-based search spaces. The availability of code, as well as the careful attention to reproducibility is much appreciated and a factor in favor of the paper. + +In the discussion, it surfaced that a comparison to existing Bayesian optimization approaches using neural networks would have been possible, while the authors initially did not think that this would be the case. The authors promised to include these comparisons in the final version, but, as was also discussed in the private discussion between reviewers and AC, this is problematic since it is not clear what these results will show. Therefore, the one reviewer who was debating about increasing their score did in the end not do so (but would be inclined to accept a future version with a clean and thorough comparison to baselines). + +All reviewers stuck with their score of ""weak reject"", leaning to borderline. I read the paper myself and concur with this judgement. I recommend rejection of the current version, with an encouragement to submit to another venue after including a comparison to BO methods based on neural networks.",ICLR2020, +BjauOd8eyMR,1642700000000.0,1642700000000.0,1,ExJ4lMbZcqa,ExJ4lMbZcqa,Paper Decision,Reject,"This paper investigates the dereverberation problem from the audio-visual perspective. The geometry of the environment is represented by RGB and depth images. The authors propose a so-called visually-informed dereverberation of audio (VIDA) model and also create a dataset consisting of both synthetic and real data to verify the effectiveness of the model. Experiments are conducted on speech enhancement, speech recognition and speaker identification tasks. The authors compare VIDA with audio only dereverberation as well as various established baseline systems in the community. + +The audio-visual way of coping with dereverberation using visual representation of the acoustic environment seems to be interesting. The authors' rebuttal has cleared most of the concerns raised by the reviewers but there are still numerous lingering concerns which affect its acceptance. First of all, most of the reviewers consider the novelty not overwhelmingly significant. Second, the contribution of the visual input seems to be only marginal compared to the audio-only dereverberation. Results on real data are also mixed. Some of the reported p-values are extremely small, which raises questions whether it is due to the size of the test set. Third, there are noticeable artifacts in some of the samples in the demo. Fourth, there are numerous issues in the paper that are worth further in-depth investigation. For instance, it would be helpful to show in which way exactly the RGB and depth images helps.",ICLR2022, +YC20Kd0g59w,1642700000000.0,1642700000000.0,1,#NAME?,#NAME?,Paper Decision,Reject,"This paper studies how recurrent neural networks, and more specifically GRUs, store and access information. The authors analyze the solution obtained by gradient descent to the variable delay copy memory task for discrete sequences. They use concepts from dynamical systems, such as slow-manifold, to understand the behavior of the learned model. Finally, based on this analysis, the authors propose a synthetic solution to a simplified version of the delay copy memory task. + +Overall, while the scores for the paper are rather positive, I still have concerns about the paper, based on the reviews and discussion. I do not believe that these concerned were well addressed by the authors in their rebuttal. First, I tend to agree that the paper is somewhat lacking novelty and insightful findings (reviewers TN5R, afLb, MToe). For example, I think that tools from dynamical systems are mostly useful to analyze RNNs when the input is constant (Jordan et al., 2019). In the case of the copy task, this corresponds to the ""delay"" period, where in practice the hidden state is almost constant. This behavior is easily explained by the value of the update gate, close to 1. I thus agree that other hypotheses than slow manifold should be discussed to explain how GRUs store and access information, and that the benefits of using dynamical systems is not obvious. Moreover, I believe that previous solutions to the copy task (eg, from Henaff et al.) could be extended to the variable setting by adding a gating mechanism to these solutions. In particular, Henaff et al. claimed that LSTM could solve this task empirically, while the authors claim otherwise. + +Second, after reading the revised version a couple of times, I still find the paper hard to follow (MToe, afLb, TN5R). For example, I think that the concept of slow manifold is not introduced properly, and in particular, how it applies to the learned solution is not clear. More generally, I found the sections regarding how information is stored and accessed a bit confusing. Finally, I think that the studied task is simple, and probably does not provide strong insight about the working of recurrent networks. Specifically, LSTMs tend to perform similarly or slightly better than GRUs on many tasks, while the authors claim that this architecture cannot solve the studied task.",ICLR2022, +4dufPckHCTxS,1642700000000.0,1642700000000.0,1,wMpS-Z_AI_E,wMpS-Z_AI_E,Paper Decision,Accept (Poster),"The authors theoretically analyze learning of two-layer neural +networks by gradient descent with respect to a data distribution that +exposes how useful features are learned during training. + +Overall, the reviewers felt that the analysis yielded useful insight, +and was original. + +During the discussion period, a reviewer recommended that the authors +look at papers providing lower bounds on statistical query learning of two-layer networks, +and consider comparing the lower-bound technique of this paper with that earlier work.",ICLR2022, +_RbuvlBtnG,1576800000000.0,1576800000000.0,1,rkl2s34twS,rkl2s34twS,Paper Decision,Reject,"The authors proposed a new problem setting called Wildly UDA (WUDA) where the labels in the source domain are noisy. They then proposed the ""butterfly"" method, combining co-teaching with pseudo labeling and evaluated the method on a range of WUDA problem setup. In general, there is a concern that Butterfly as the combination between co-teaching and pseudo labeling is weak on the novelty side. In this case the value of the method can be assessed by strong empirical result. However as pointed out by Reviewer 3, a common setup (SVHN<-> MNIST) that appeared in many UDA paper was missing in the original draft. The author added the result for SVHN<-> MNIST as a response to review 3, however they only considered the UDA setting, not WUDA, hence the value of that experiment was limited. In addition, there are other UDA methods that achieve significantly better performance on SVHN<->MNIST that should be considered among the baselines. For example DIRT-T (Shu et al 2018) has a second phase where the decision boundary on the target domain is adjusted, and that could provide some robustness against a decision boundary affected by noise. + +Shu et al (2018) A DIRT-T Approach to Unsupervised Domain Adaptation. ICLR 2018. https://arxiv.org/abs/1802.08735 + +I suggest that the authors consider performing the full experiment with WUDA using SVHN<->MNIST, and also consider the use of stronger UDA methods among the baseline. ",ICLR2020, +rkUc7kpSG,1517250000000.0,1517260000000.0,200,BJehNfW0-,BJehNfW0-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"* presents a novel way analyzing GANs using the birthday paradox and provides a theoretical construction that shows bidirectional GANs cannot escape specific cases of mode collapse +* significant contribution to the discussion of whether GANs learn the target disctibution +* thorough justifications",ICLR2018, +9hObwT1yUB,1576800000000.0,1576800000000.0,1,rJgSk04tDH,rJgSk04tDH,Paper Decision,Reject,"This paper seeks to analyse the important question around why hierarchical reinforcement learning can be beneficial. The findings show that improved exploration is at the core of this improved performance. Based on these findings, the paper also proposes some simple exploration techniques which are shown to be competitive with hierarchical RL approaches. + +This is a really interesting paper that could serve to address an oft speculated about result of the relation between HRL and exploration. While the findings of the paper are intuitive, it was agreed by all reviewers that the claims are too general for the evidence presented. The paper should be extended with a wider range of experiments covering more domains and algorithms, and would also benefit from some theoretical results. + +As it stands this paper should not be accepted.",ICLR2020, +oHiqoQ8Jx-1F,1642700000000.0,1642700000000.0,1,5ALGcXpmFyC,5ALGcXpmFyC,Paper Decision,Reject,"This paper proposes a theory for double descent phenomena in denoting deep neural networks. There are two major concerns: (1) The assumption that the data lie in a low dimensional subspace is quite strong, and needs to be weaken or better justified. (2) The theory only works for r=1, where the rank is one. For general rank, how to apply the proposed analysis is hand wavy and not convincing. The paper can be significantly strengthen if these two issues could be addressed.",ICLR2022, +ByxNIg6xgV,1544770000000.0,1545350000000.0,1,HJg3rjA5tQ,HJg3rjA5tQ,Metareview,Reject,"This manuscript proposes spread divergences as a technique for extending f-divergences to distributions with different supports. This is achieved by convolving with a noise distribution. This is an important topic worth further study in the community, particularly as it related to training generative models. + +The reviewers and AC opinions were mixed, with reviewers either being unconvinced about the novelty of the proposed work, or expressing issues about the clarity of the presentation. Further improvement of the clarity, combined with additional convincing experiments would significantly strengthen this submission.",ICLR2019,4: The area chair is confident but not absolutely certain +WcEmjjSoMQv,1642700000000.0,1642700000000.0,1,#NAME?,#NAME?,Paper Decision,Reject,"This paper explores geometric properties of image perturbations (e.g. frequency content and local consistency) and their impact on the adversarial response of networks. The reviewers feel that the paper is at times unclear about the meaning of terminology (e.g. “local consistency”) that is not clearly defined. Also, while the reviewers acknowledge that the paper contains a number of interesting ideas, it is not always clear how the paper’s discussions and contributions differ from existing papers (e.g. Dong 2019, Yin et al., 2019, Wang et al., 2020a, Tsuzuku and Sato 2019) that also discuss the frequency content and smoothness properties of adversarial perturbations.",ICLR2022, +584vPQmgou,1610040000000.0,1610470000000.0,1,#NAME?,#NAME?,Final Decision,Accept (Poster),"This paper analyzes the universal approximation power of deep residual neural networks based on nonlinear control theory. Some concerns regarding the clarify, significance and connection to practice were raised, and partially addressed by the rebuttal. After discussions, all the reviewers feel positive about the contribution of the paper.",ICLR2021, +OdeyqPg81yU,1642700000000.0,1642700000000.0,1,rpxJc9j04U,rpxJc9j04U,Paper Decision,Accept (Poster),"This paper has potential impact in the theorem proving community, and demonstrated the possibility of using LMs for theorem proving in Lean, and is good enough to use ""in the real world"" through an interactive theorem proving tool. +The reviewers wish their data/models were public to address some concerns raised by the reviewers, but we think the community can benefit from this work.",ICLR2022, +w_JckE1MG,1576800000000.0,1576800000000.0,1,SkgpBJrtvS,SkgpBJrtvS,Paper Decision,Accept (Poster),"This paper presents a new distillation method with theoretical and empirical supports. + +Given reviewers' comments and AC's reading, the novelty/significance and application-scope shown in the paper can be arguably limited. However, the authors extensively verified and compared the proposed methods and existing ones by showing significant improvements under comprehensive experiments. As the distillation method can enjoy a broader usage, I think the propose method in this paper can be influential in the future works. + +Hence, I think this is a borderlines paper toward acceptance.",ICLR2020, +f2H7PUhkXqG,1642700000000.0,1642700000000.0,1,32OdIHsu1_,32OdIHsu1_,Paper Decision,Reject,"This paper trains a neural network to predict expert strategies (described in natural language) in the game of Angry Birds. While the reviewers agreed this was potentially interesting, there was also a consensus that the scope of the paper was too narrow, that the writing was imprecise, and that the evaluations too few and too qualitative. I agree the paper does not seem thorough enough for ICLR, and recommend rejection.",ICLR2022, +ZD27t7ja2Cn,1610040000000.0,1610470000000.0,1,aa0705s2Qc,aa0705s2Qc,Final Decision,Reject,"The paper proposes a modification to the DeepMind Control Suite to measure generalization with respect to visual variation. The authors run baseline experiments against their new benchmark and discover, unsurprisingly, that RL agents learning from visual observations overfit to spurious details of the observations. + +Reviewers generally found the work to be clearly written, and the experimental analysis to be thorough and well done, though concerns about the rather simple nature of the visual augmentations persisted even after updates and author rebuttals. There were also concerns that by focusing only on Soft Actor Critic in the experiments. + +3 of 4 reviewers felt the work met the acceptance bar, albeit only marginally. The dissenting reviewer's concerns centered on clarity (many specific issues appear to have been remedied), the relatively limited nature of the augmentations, and the fact that reviewers were not given access to the code. + +While the submission has potential, improvements needed are not minor, and given the short process, we can only accept papers as is, rather than expecting certain changes. Please take the reviewers' comments into consideration as you revise and resubmit to a future venue.",ICLR2021, +XKoucEf9y7Q,1642700000000.0,1642700000000.0,1,7TFcl1Xkr7,7TFcl1Xkr7,Paper Decision,Reject,"The paper studies improving model for abductive natural language inference task. Specifically, they introduce information interaction layers and the joint softmax focal loss. + +On positive notes, their method shows persuasive empirical gains. However, reviewers found (1) the technical novelty of the approach to be limited (reviewer croc, 3Vwo, W1Sp), (2) approaches (especially focal loss) not well motivated (reviewer hk5y), (3) there are limited take-away from the paper (reviewer imYG, hk5y) and (4) claims not well supported and experimental details missing (reviewer hk5y). The reviewers further provided detailed comments that would be helpful for authors to improve the paper. Because of such limitations, in its current form, the paper is not ready for publication.",ICLR2022, +v5p2sBey0G,1610040000000.0,1610470000000.0,1,068E_JSq9O,068E_JSq9O,Final Decision,Accept (Poster),"The paper presents new contrastive based self-supervised objective based on Chi squared divergence that helps with mini batch sensitivity, training stability and improved downstream performance. +An accept.",ICLR2021, +NUGd66WjqT,1576800000000.0,1576800000000.0,1,Hygy01StvH,Hygy01StvH,Paper Decision,Reject,"The reviewers have pointed out several major deficiencies of the paper, which the authors decided not to address.",ICLR2020, +qRMn3KeXMXd,1642700000000.0,1642700000000.0,1,W5PbuwQFzZx,W5PbuwQFzZx,Paper Decision,Reject,"The paper suggests a non-random strategy for selecting minibatches of nodes for training graph neural networks. The main argument is that consecutive memory accesses are faster than random accesses, and thus they claim a 20x speedup per epoch by precomputing batches at a small cost to accuracy. + +There are a number of discussion points. One reviewer finds the results hard to believe because previous work has shown that runtime sampling can be fully pipelined. The authors agree but say their speedups are still better, which isn’t a fully satisfying response, and it calls into question the quality of the baseline implementation. Another concern is about the effect of deterministic minibatches. The authors argue that the empirical results speak for themselves, while the reviewer worries about robustness. There also are some concerns about methodology around hyperparameters and special-casing of preprocessing for one dataset, though those appear mostly resolved. + +On the whole, this is a borderline paper that lands just on the side of rejection. I’d encourage the authors to more thoroughly address the questions about quality of the baseline implementation and the reviewer’s concern about robustness of deterministic minibatches, and then resubmit to the next conference.",ICLR2022, +Syly99tGlN,1544880000000.0,1545350000000.0,1,SygK6sA5tX,SygK6sA5tX,Lack of theoretical novelty voiced as one of main issues.,Reject,"AR1 is concerned about the overlap of this paper with Gama et al., 2018 as well as lack of theoretical analysis and poor results on REDDIT-5k and REDDIT-5B datasets. AR2 reflects the same concerns (lack of clear cut novelty over Zou & Lerman, 2018, Game, 2018. AR3 also points the same issue re. lack of theoretical results. The austhors admit that Zou and Lerman, 2018, and Gama, 2018, focus on stability results while this submission offers empirical evaluations. + +Unfortunately, reviewers did not find these arguments convincing. Thus, at this point, the paper cannot be accepted for publication in ICLR. AC strongly encourages authors to develop their theoretical 'edge' over this crowded market of GCNs and scattering approaches.",ICLR2019,5: The area chair is absolutely certain +_P0dJOrvaUo,1610040000000.0,1610470000000.0,1,v8b3e5jN66j,v8b3e5jN66j,Final Decision,Accept (Poster),"This paper addresses the problem of how best to sample hard negatives during contrastive learning, a topic of importance for the recently resurgent field of metric learning / contrastive loss-based unsupervised representation learning. Backed by theoretical results for a new low-variance version of the NCE, the paper proposes an easy-to-implement ""Ring"" method for selecting negatives that are at just the right level of difficulty, neither too hard nor too easy. + +Happily, this is a paper that has improved significantly through the interactive peer review of a dedicated set of reviewers combined with prompt responses from the authors. Perhaps the result that tipped this paper over the line in my assessment: the new experimental results now show significant gains from applying the ""Ring"" approach for hard negative sampling to near-state-of-the-art implementations of the MoCo-v2 approach, which is among the leading unsupervised visual feature learning approaches. ",ICLR2021, +BJeZvvBel4,1544730000000.0,1545350000000.0,1,BkE8NjCqYm,BkE8NjCqYm,Interesting insights but heuristics in approach worrisome,Reject,"This paper examines a concept (also coined by the paper) of ""search discrepancies"" where the search algorithm behaves differently with large beam sizes. It then proposes heuristics to help prevent the model from performing worse when the size of the beam is increased. + +I think there are some interesting insights in this paper with respect to how search works in modern neural models, but most reviewers (and me) were concerned by the heuristic approach taken to fix these errors. I still think that within a search paper, a clear separation between modeling errors and search errors is useful, and adding heuristics on top has a potential to making things more complicated down the road when, for example, we change our model or we change our training algorithm. + +It would be nice if the nice insights in the paper could be turned into a more theoretically clean framework that could be re-submitted to a future conference.",ICLR2019,4: The area chair is confident but not absolutely certain +vBSXJT2chX5,1610040000000.0,1610470000000.0,1,o7YTArVXdEW,o7YTArVXdEW,Final Decision,Reject,"This paper addresses automatically learning the neighborhood size (they call adaptive neighbor support) for unsupervised representation learning with a VAE. The neighborhood size is determined based on z-scores from by estimating a normal distribution in the latent space. + +The paper is poorly written. There are several grammatical errors and typos that distracts from understanding the paper. In addition, the use of terminology is not precise, which adds to the confusion, as pointed out by the reviewers. + +AC-VAE is better than VAE+KNN in Table 1 but worse in SCAN with KNN in Table 3. Further analysis to understand why this is so is needed. + +Additional measures of cluster quality is recommended. + +As pointed out by the reviewers, this paper is below the acceptance threshold for ICLR. The reviewers provided several constructive suggestions. Please refer to detailed reviewer comments to help you improve your paper. +",ICLR2021, +MWujQIPYpqt,1610040000000.0,1610470000000.0,1,5NsEIflpbSv,5NsEIflpbSv,Final Decision,Accept (Poster),The paper proposes LORAS (low-rank adaptive label smoothing) for training with soft targets with the goal of improving performance and calibration for neural networks. The authors derive PAC-Bayesian generalization bounds for label smoothing and show that the generalization error depends on choice of the noise (smoothing) distribution. Empirical results demonstrate the effectiveness of the approach. All reviewers recommend acceptance.,ICLR2021, +Sk1K3zLdx,1486400000000.0,1486400000000.0,1,B1s6xvqlx,B1s6xvqlx,ICLR committee final decision,Accept (Poster),"Quality, Clarity: + The paper is well written. Further revisions have been made upon the original. + + Originality, Significance: + The paper presents an action-conditional recurrent network that can predict frames in video games hundreds of steps in the future. This is done using a mix of (a) architectural modifications; (b) jumpy predictions; and (c) particular training schemes. The experimental validation is extensive, now including additional comparisons suggested by reviewers. + There is not complete consensus about the significance of the contributions, with one reviewer seeking additional technical novelty. Overall, the paper appears to provide interesting and very soundly-evaluated results, which likely promises to be the new standard for this type of prediction problem.",ICLR2017, +BJgIs-uJgE,1544680000000.0,1545350000000.0,1,r1xlvi0qYm,r1xlvi0qYm,High quality research on memory augmented neural networks,Accept (Oral),"Well-written paper that motivates through theoretical analysis new memory writing methods in memory augmented neural networks. Extensive experimental analysis support and demonstrate the advantages of the new solutions over other recurrent architectures. +Reviewers suggested extension and clarification of the analysis presented in the paper, for example, for different memory sizes. The paper was revised accordingly. Another important suggestion was considering ACT as a baseline. Authors explained clearly why it wasn't considered as a baseline, and updated the paper to include references and explanations in the paper as well.",ICLR2019,4: The area chair is confident but not absolutely certain +SygBu0DblN,1544810000000.0,1545350000000.0,1,Bylmkh05KX,Bylmkh05KX,Novel formulation with strong results,Accept (Poster),"This paper is about unsupervised learning for ASR, by matching the acoustic distribution, learned unsupervisedly, with a prior phone-lm distribution. Overall, the results look good on TIMIT. Reviewers agree that this is a well written paper and that it has interesting results. + +Strengths +- Novel formulation for unsupervised ASR, and a non-trivial extension to previously proposed unsupervised classification to segmental level. +- Well written, with strong results. Improved results and analysis based on review feedback. + +Weaknesses +- Results are on TIMIT -- a small phone recognition task. +- Unclear how it extends to large vocabulary ASR tasks, and tasks that have large scale training data, and RNNs that may learn implicit LMs. The authors propose to deal with this in future work. + +Overall, the reviewers agree that this is an excellent contribution with strong results. Therefore, it is recommended that the paper be accepted.",ICLR2019,5: The area chair is absolutely certain +B1gZR0nle4,1544770000000.0,1545350000000.0,1,rkgwuiA9F7,rkgwuiA9F7,"A nice closed-form kernel for WAE-MMD, but concerns about novelty.",Reject,"The reviewers in general like the idea of using the Cramer-Wold kernel, noting that its heavy tails and closed form solution are appealing properties that lead to increased stability and improved training. The main concern was novelty, as this paper can be seen as simply changing the kernel in WAE-MMD. One suggestion is to more heavily highlight the CW-distance, and in particular to find another useful application for it outside of WAE-MMD. + +The paper emphasizes frequently that the closed-form loss function is a critical feature of this approach, however I don’t see any experiments that optimize WAE-MMD under the CW-distance while sampling from the Gaussian. This is important to measure the degree to which any improvement is attributable to a closed-form solution, or to the distance measure itself. +",ICLR2019,3: The area chair is somewhat confident +HsbRzjQjP27,1610040000000.0,1610470000000.0,1,0fqoSxXBwI6,0fqoSxXBwI6,Final Decision,Reject,"The authors present a method for self-supervised learning of representations of 2D projections of 3D objects. By performing known 3D transformations of an object of interest, a encoder/decoder network is trained to estimate the applied transformation from a series of 2D projections. The proposed method is used as a regularizer and experiments are performed on supervised 3D object classification and retrieval. + +After seeing each others’ reviews, one of the main concerns from the reviewers was the relationship between the proposed method and Zhang et al., CVPR 2019 (i.e. AET). The two methods are conceptually very similar, and the consensus from the reviewers is that the authors did not acknowledge the overlap sufficiently and also did not provide a convincing argument as to why they think the approaches are different. + +In their rebuttal the authors provided some additional results on real data which is a valuable and welcome addition. However, there were still other concerns that the reviewers had e.g. R2 wanted to know why the model could not be applied directly to 3D shapes instead of 2D projections. + +Given the above concerns (specifically the relationship to AET), there is currently not enough support for accepting the paper in its current form. The authors have received detailed feedback and are encouraged to take it onboard when revising the paper in future. +",ICLR2021, +Bkxzmi6lx4,1544770000000.0,1545350000000.0,1,rJxNAjC5F7,rJxNAjC5F7,The paper needs improvement,Reject,"The paper proposes learning a hash function that maps high dimensional data to binary codes, and uses multi-index hashing for efficient retrieval. The paper discusses similar results to ""Similarity estimation techniques from rounding algorithms, M Charikar, 2002"" without citing this paper. The proposed learning idea is also similar to ""Binary Reconstructive Embedding, B. Kulis, T. Darrell, NIPS'09"" without citation. Please study the learning to hash literature and discuss the similarities and differences with your approach. + +Due to missing citations and lack of novelty, I believe the paper does not pass the bar for acceptance at ICLR. + + +PS: PQ and its better variants (optimized PQ and cartesian k-means) are from a different family of quantization techniques as pointed out by R3 and multi-index hashing is not directly applicable to such techniques. Regardless, I am also surprised that your technique just using hamming distance is able to outperform PQ using lookup table distance. + +",ICLR2019,4: The area chair is confident but not absolutely certain +CflqCeKKIW,1576800000000.0,1576800000000.0,1,BJe4JJBYwS,BJe4JJBYwS,Paper Decision,Reject,"The paper addresses image translation by extending prior models, e.g. CycleGAN, to domain pairs that have significantly different shape variations. The main technical idea is to apply the translation directly on the deep feature maps (instead of on the pixel level). +While acknowledging that the proposed model is potentially useful, the reviewers raised several important concerns: +(1) ill-posed formulation of the problem and what is desirable, (2) using fine-tuned/pre-trained VGG features, (3) computational cost of the proposed approach, i.e. training a cascade of pairs of translators (one pair per layer). +AC can confirm that all three reviewers have read the author responses. AC suggests, in its current state the manuscript is not ready for a publication. We hope the reviews are useful for improving and revising the paper. +",ICLR2020, +S1PZ2f8de,1486400000000.0,1486400000000.0,1,SJBr9Mcxl,SJBr9Mcxl,ICLR committee final decision,Reject,"While this is interesting work, one major concern comes from Reviewer 2 regarding the attempt of characterizing the tuning properties, which has proven useless both in neuroscience and in machine learning. Currently, no attempt so far has lived up to the promises this line of research is aiming for. In summary, this work is explorative and incremental but worthwhile. We encourage the authors to further refine their research effort and resubmit.",ICLR2017, +BkxBs7p-xN,1544830000000.0,1545350000000.0,1,ByeNFoRcK7,ByeNFoRcK7,metareview,Reject,"The submission hypothesizes that in typical GAN training the discriminator is too strong, too fast, and thus suggests a modification by which they gradually increases the task difficulty of the discriminator. This is done by introducing (effectively) a new random variable -- which has an effect on the label -- and which prevents the discriminator from solving its task too quickly. + +There was a healthy amount of back-and-forth between the authors and the reviewers which allowed for a number of important clarifications to be made (esp. with regards to proofs, comparison with baselines, etc). My judgment of this paper is that it provides a neat way to overcome a particular difficulty of training GANs, but that there is a lot of confusion about the similarities (of lack thereof) with various potentially simpler alternatives such as input dropout, adding noise to the input etc. I was sometimes confused by the author response as well (they at once suggest that the proposed method reduces overfitting of the discriminator but also state that ""We believe our method does not even try to “regularize” the discriminator""). Because of all this, the significance of this work is unclear and thus I do not recommend acceptance.",ICLR2019,3: The area chair is somewhat confident +pkpoKc_jdKA,1610040000000.0,1610470000000.0,1,dyaIRud1zXg,dyaIRud1zXg,Final Decision,Accept (Spotlight),"This clearly written paper has been constructively evaluated by three expert reviewers who provided at least two very detailed and informative summaries. The authors have addressed the inquiries raised by the reviewers in a comprehensive fashion, and at least one reviewer has updated their score as a result of those detailed rebuttals. In spite of some outstanding limitations, including a somewhat limited view of the relation of the proposed approach to existing alternatives, the reviewers are consistent in suggesting that this work is sufficiently mature to be considered for the inclusion in the program of ICLR 2021. I concur with that and recommend accepting this paper.",ICLR2021, +zBN_Gg597z,1610040000000.0,1610470000000.0,1,tqc8n6oHCtZ,tqc8n6oHCtZ,Final Decision,Reject,"The paper attempts to reduce computational cost of Transformer models. In this regard, authors generalizer PoWER-BERT by proposing a variant of dropout that reduces training cost by randomly sampling a fraction of the length of a sequence to use at each layer. Further, a sandwich training method is used which trains a spectrum of randomly sampled model between the largest and the smallest size model. At test time, the best length configuration that balances the accuracy and latency tradeoff via evolutionary search is used. The reviewers found the general idea interesting, but raised a number of concerns. First, proper baselines should be used and related works be discussed. In particular, the method is built on top of Power-BERT, yet it does not directly compare with it, and there was no good response when pointed out by a reviewer. Second, as the paper employs many tricks (some new some from prior work), but does not do any ablation studies to show how each of those contributes to the final accuracy gains. Finally, to showcase benefit compared to prior works in terms of computational cost a proper evaluation methodology and actual speedups for batch size 1 inference should be provided. Thus, an improved evaluation would benefit the paper a lot and paper in its current form is not ready for publication. +",ICLR2021, +7HUOXMKov,1576800000000.0,1576800000000.0,1,SkeyppEFvS,SkeyppEFvS,Paper Decision,Accept (Spotlight),The reviewers are unanimous in their opinion that this paper offers a novel approach to learning naïve physics. I concur.,ICLR2020, +HkgyrAd7eE,1544950000000.0,1545350000000.0,1,BkeK-nRcFX,BkeK-nRcFX,Interesting direction but needs more work,Reject,"This paper proposes the NonLinearity Coefficient (NLC), a metric which aims to predicts test-time performance of neural networks at initialization. The idea is interesting and novel, and has clear practical implications. Reviewers unanimously agreed that the direction is a worthwhile one to pursue. However, several reviewers also raised concerns about how well-justified the method is: in particular, Reviewer 3 believes that a quantitative comparison to the related work is necessary, and takes issue with the motivation for being ad-hoc. Reviewer 2 also is concerned about the soundness of the coefficient in truly measuring nonlinearity. + +These concerns make it clear that the paper needs more work before it can be published. And, in particular, addressing the reviewers' concerns and providing proper comparison to related works will go a long way in that direction.",ICLR2019,4: The area chair is confident but not absolutely certain +GVE5FVawo,1576800000000.0,1576800000000.0,1,HJg3Rp4FwH,HJg3Rp4FwH,Paper Decision,Reject,"The main contribution of this work is introducing the uncertainty-aware value function prediction into model-based RL, which can be used to balance the risk and return empirically. + +The reviewers generally agree that this paper addresses an interesting problem, but there are some concerns that remain (see reviewer comments). + +I also want to highlight that in terms of empirical results, it is insufficient to present results for 3 different random seeds. To highlight any kind of robustness, I suggest *at least* 10-20 different random seeds; otherwise the findings can/will be misleading. ",ICLR2020, +ryxuXSjNlN,1545020000000.0,1545350000000.0,1,rJlnB3C5Ym,rJlnB3C5Ym,Empirical paper casting shade on pruning,Accept (Poster),"The paper presents a lot of empirical evidence that fine tuning pruned networks is inferior to training them from scratch. These results seem unsurprising in retrospect, but hindsight is 20-20. The reviewers raised a wide range of issues, some of which were addressed and some which were not. I recommend to the authors that they make sure that any claims they draw from their experiments are sufficiently prescribed. E.g., the lottery ticket experiments done by Anonymous in response to this paper show that the random initialization does poorer than restarting with the initial weights (other than in resnet, though this seems possibly due to the learning rate). There is something different in their setting, and so your claims should be properly circumscribed. I don't think the ""standard"" versus ""nonstandard"" terminology is appropriate until the actual boundary between these two behaviors is identified. I would recommend the authors make guarded claims here.",ICLR2019,3: The area chair is somewhat confident +_0EwDxHThxNL,1642700000000.0,1642700000000.0,1,CgV7NVOgDJZ,CgV7NVOgDJZ,Paper Decision,Reject,"The paper proposes using unlabelled speech data for TTS by decoupling parts of the model. +However, all reveiwers agree that the technique is already known and the experimental results are not strong enough to make advantage of training on more data. +A reject.",ICLR2022, +CR68E8Im953,1642700000000.0,1642700000000.0,1,oU3aTsmeRQV,oU3aTsmeRQV,Paper Decision,Accept (Poster),"This paper proposes Self-Ensemble Adversarial Training (SEAT) for yielding a robust classifier by averaging weights of history models. The solution is different from an ensemble of predictions of different adversarially trained models. The authors also provided theoretical and empirical evidence that the proposed self-ensemble method yields a smoother loss landscape and better robustness than both individual models and an ensemble of predictions from different classifiers. + +The paper receives a mixed rating of 8-6-6-5 (after private discussion; initially it was 8-6-5-3), and all reviewers actively engaged in discussion. From the three positive reviewers, it is in general consensus that this paper has a clear motivation, is easy to follow, and owns reasonable (not exceptional) novelty. The negative reviewer poses a number of concerns, citing the absence of adaptive attack evaluation, the unclear difference between vanilla EMA and SEAT, and the proof of Proposition 1. The authors provided detailed responses and the negative reviewer was partially convinced (not fully) after viewing other comments. + +AC carefully reads all discussions and feels this fall into a borderline case. The authors did solid work and there is no fatal concern as AC can see. The majority sentiment is that this is a good paper, just not an exciting one. Hence, the current recommendation is a borderline acceptance.",ICLR2022, +n2WNhczPSW,1576800000000.0,1576800000000.0,1,SJeD3CEFPH,SJeD3CEFPH,Paper Decision,Accept (Talk),"This paper’s contribution is twofold: 1) it proposes a new meta-RL method that leverages off-policy meta-learning by importance weighting, and 2) it demonstrates that current popular meta-RL benchmarks don’t necessarily require meta-learning, as a simple non-meta-learning algorithm (TD3) conditioned on a context variable of the trajectory is competitive with SoTA meta-learning approaches. + +The reviewers all agreed that the approach is interesting and the contributions are significant. I’d like to thank the reviewers for engaging in a spirited discussion about this paper, both with each other and with the authors. There was also a disagreement about the semantics of whether the approach can be classified as “meta-learning”, but in my opinion this argument is orthogonal to the practical contributions. After the revisions and rebuttal, reviewers agreed that the paper was improved and increased their ratings as a result, with all recommending accept. + +There’s a good chance this work will make an impactful contribution to the field of meta-reinforcement learning and therefore I recommend it for an oral presentation. +",ICLR2020, +8cT-gD6ic1,1576800000000.0,1576800000000.0,1,HkxARkrFwB,HkxARkrFwB,Paper Decision,Accept (Spotlight),"This paper proposes quantum-inspired methods for increasing the parametric efficiency of word embeddings. While a little heavy in terms of quantum jargon, and perhaps a little ignorant of loosely related work in this sub-field (e.g. see the work of Coecke and colleagues from 2008 onwards), the majority of reviewers were broadly convinced the work and results were of sufficient merit to be published.",ICLR2020, +M8Eje-Bmkv,1576800000000.0,1576800000000.0,1,Hkg-xgrYvH,Hkg-xgrYvH,Paper Decision,Accept (Poster),"Three reviewers have assessed this paper and they have scored it 6/6/6 after rebuttal. Nonetheless, the reviewers have raised a number of criticisms and the authors are encouraged to resolve them for the camera-ready submission.",ICLR2020, +ryeb3X8-lV,1544800000000.0,1545350000000.0,1,S1gBz2C9tX,S1gBz2C9tX,"Nice work with potential, but contributions need to be strengthened",Reject,"The paper proposes to use importance resampling (IR) as an alternative to the more popular importance sampling (IS) approach to off-policy RL. The hope is to reduce variance, as shown in experiments. However, there is no analysis why/when IR will be better than IS for variance reduction, and a few baselines were suggested by reviewers. While the authors rebuttal was helpful in clarifying several issues, the overall contribution does not seem strong enough for ICLR, on both theoretical and empirical sides. + +The high variance of IS is known, and the following work may be referenced for better 1st order updates when IS weights are used: Karampatziakis & Langford (UAI'11). + +In section 3, the paper says that most off-policy work uses d_mu, instead of d_pi, to weigh states. This is true, but in the current context (infinite-horizon RL), there are more recent works that should probably be referenced: + http://proceedings.mlr.press/v70/hallak17a.html + https://papers.nips.cc/paper/7781-breaking-the-curse-of-horizon-infinite-horizon-off-policy-estimation",ICLR2019,5: The area chair is absolutely certain +sJr0m389Yf,1642700000000.0,1642700000000.0,1,hjlXybdILM3,hjlXybdILM3,Paper Decision,Reject,"This paper proposes SimpleBits, for simplifying input images to remove irrelevant details but keep relevant details for classification. This idea can be applied during/after training. Authors have significantly revised the draft to address reviewer concerns, to improve the readability and clarify concerns on complexity analysis, for which reviewers have raised scores post-rebuttal. However, even with score changes, there are commonly expressed concerns, that manuscript still needs some more improvements to be ready for publication in their post-rebuttal comments: findings are not very nontrivial or significant (reviewer eYVm), still incomplete (RfmX) or optimization algorithm is yet to be found (reviewer agcx) .",ICLR2022, +8oAcB2VClFx,1610040000000.0,1610470000000.0,1,P5RQfyAmrU,P5RQfyAmrU,Final Decision,Reject,"The paper defines a ""local data matrix"" (inspired from local Fisher matrix) and uses it to obtain a foliation in the data space. This provides a lens to view the data space from model's perspective. While the idea is interesting, reviewers have two main concerns from the reviewers which are not fully addressed in the author response: +(i) The method works with partially trained model (1 epoch for MNIST) and it's not clear how the observations made in the paper extend to fully trained models, +(ii) The motivation and application of the proposed model-centric view of data space needs more work - it will be good to think of some applications where this view can help. + +I encourage the authors to consider the suggestions from the reviewers (e.g, R3 suggested label smoothing for (i)), and submit a revised version to a future venue. ",ICLR2021, +#NAME?,1576800000000.0,1576800000000.0,1,Syl-_aVtvH,Syl-_aVtvH,Paper Decision,Reject,"This manuscript personalization techniques to improve the scalability and privacy preservation of federated learning. Empirical results are provided which suggests improved performance. + +The reviewers and AC agree that the problem studied is timely and interesting, as the tradeoffs between personalization and performance are a known concern in federated learning. However, this manuscript also received quite divergent reviews, resulting from differences in opinion about the novelty and clarity of the conceptual and empirical results. Reviewers were also unconvinced by the provided empirical evaluation results. ",ICLR2020, +OBfJ7dQnH6i,1610040000000.0,1610470000000.0,1,OjUsDdCpR5,OjUsDdCpR5,Final Decision,Reject,"Authors extend the probabilistic PCA framework to multinomial-distributed data. Scalable estimation of principal components in the model is achieved using a multinomial variational autoencoder in combination with an isometric log-ratio (ILR) transform. +The reviewers did not agree on the degree of novelty of the paper to PC estimation. +The presentation of the paper can be improved. +The reviewers criticise that large changes have been made to the paper during the rebuttal phase. +Overall, the paper is borderline and due to the mentioned large changes I recommend a rejection (and re-review at a different venue). +",ICLR2021, +BuDufNvwG,1576800000000.0,1576800000000.0,1,SJeXJANFPr,SJeXJANFPr,Paper Decision,Reject,"This paper proposes a training approach that orthogonalizes gradients to enable better learning across multiple tasks. The idea is simple and intuitive. + +Given that there is past work following the same kind of ideas, it would be need to further: +(a) expand the experimental evaluation section with comparisons to prior work and, ideally, demonstrate stronger results. + (b) study in more depth the assumptions behind gradient orthogonality for transfer. This would increase impact on top of past literature by explaining, besides intuitions, why gradient orthogonality helps for transfer in the first place. +",ICLR2020, +WHnxe6UsF43,1610040000000.0,1610470000000.0,1,oh71uL93yay,oh71uL93yay,Final Decision,Reject,"Three of the reviewers are significantly concerned about this submission while R3 was positive during review. During discussion, R3 also agreed that there are concerns not only on experimental designs and results but also the proposed model. Thus a reject is recommended.",ICLR2021, +IUsYqUv9nW1,1610040000000.0,1610470000000.0,1,#NAME?,#NAME?,Final Decision,Reject,"After the rebuttal phase, all scores are borderline (6) or negative (4). Among the most confident reviewers (confidence 5), one gives 6 and one gives 4. The reviewer with confidence 4 gives overall score 6 but states they cannot support the paper. There were several concerns about the novelty of the task and method, the challenge of the experimental settings, missing comparisons to recent prior work in the original paper, etc. While the reviewers see merit, the paper can benefit from another revision before being accepted, including to better position the novelty of its method and perhaps reduce claims of novelty of the task. ",ICLR2021, +rkCcN2RgOkk,1642700000000.0,1642700000000.0,1,t8O-4LKFVx,t8O-4LKFVx,Paper Decision,Accept (Spotlight),"In this paper, a new learning scheme for minimizing the confidence set by conformal prediction is proposed. Most of the reviewers agree that the idea is interesting and novel. This is an important contribution to trustworthy ML, with theoretically sound considerations and thorough experimental validation.",ICLR2022, +rJQwQy6SM,1517250000000.0,1517260000000.0,154,rkfOvGbCW,rkfOvGbCW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"the proposed approach nicely incorporates various ideas from recent work into a single meta-learning (or domain adaptation or incremental learning or ...) framework. although better empirical comparison to existing (however recent they are) approaches would have made it stronger, the reviewers all found this submission to be worth publication, with which i agree.",ICLR2018, +BJQeTz8_x,1486400000000.0,1486400000000.0,1,HJGwcKclx,HJGwcKclx,ICLR committee final decision,Accept (Poster),"This paper provides a principled and practical formulation for weight-sharing and quantization, using a simple mixture of Guassians on the weights, and stochastic variational inference. The main idea and results are presented clearly, along with illustrative side-experiments showing the properties of this method in practice. Also, the method is illustrated on non-toy problems.",ICLR2017, +BJbD8ypBG,1517250000000.0,1517260000000.0,798,ryZERzWCZ,ryZERzWCZ,ICLR 2018 Conference Acceptance Decision,Reject,The paper provides a constrained mutual information objective function whose Lagrangian dual covers several existing generative models. However reviewers are not convinced of the significance or usefulness of the proposed unifying framework (at least from the way results are presented currently in the paper). Authors have not taken any steps towards revising the paper to address these concerns. Improving the presentation to bring out the significance/utility of the proposed unifying framework is needed.,ICLR2018, +yvA9qMrKEQd,1642700000000.0,1642700000000.0,1,f2lrIbGx3x7,f2lrIbGx3x7,Paper Decision,Accept (Poster),The paper formalizes the problem of gradient leakage through a Bayesian framework. They show that existing attacks can be viewed as approximations of a Bayesian optimal adversary. The empirical results show that heuristic defences are not good against stronger attacks and that the early part of the training is particularly vulnerable. There was a lively discussion in the reviews and rebuttal and the outstanding questions of the reviewers have been answered.,ICLR2022, +_-sfg1B4sO,1642700000000.0,1642700000000.0,1,KNfuensPHDU,KNfuensPHDU,Paper Decision,Reject,"The paper proposes a method to improve PROVEN, which gives a certification for probabilistic robustness. However, reviewers think the paper is below the acceptance bar due to unclear motivation and insufficient experiments. In particular, a clear use case of probabilistic robustness certification is crucial for the paper.",ICLR2022, +I1sEb64uZk2,1610040000000.0,1610470000000.0,1,CF-ZIuSMXRz,CF-ZIuSMXRz,Final Decision,Accept (Poster),"This paper studies extensions of the Scattering Graph Transform to spatio-temporal domains. By exploring several design choices for spatio-temporal wavelet filters, the authors provide a solid and broad study of such predefined represenatations, including stability analysis as well as extensive empirical evaluations. +Reviewers were generally favorable, and highlighted the importance of this method as providing a simple yet powerful baseline for spatio-temporal graph prediction tasks that requires no training. Despite some concerns about lack of analysis of the empirical results, the AC believes this work will provide a valuable baseline for future research and therefore recommends acceptance as a poster. ",ICLR2021, +8V8JZzJyQ4V,1610040000000.0,1610470000000.0,1,IhUeMfEmexK,IhUeMfEmexK,Final Decision,Reject,"This paper presents a knowledge distillation method for face recognition, by inheriting the teacher’s classifier as the student’s classifier and optimizing the student model with advanced loss functions. It received comments from three reviewers: 1 rated “Ok but not good enough - rejection”, 1 rated “Marginally below” and 1 rated “Marginally above”. The reviewers appreciate the simple yet clear methodology illustration and the well written paper. However, a number of major concerns are raised by the reviewers, including limited novelty, lack of comparison with more advanced knowledge distillation methods and their special case in face recognition. During the rebuttal, the authors made efforts to response to all reviewers’ comments. However, the rating were not changed. The ACs concur these major concerns and more comprehensive comparisons with the state of the art KD methods are necessary to better illustrate the contribution of this work. Therefore, this paper can not be accepted at its current state. +",ICLR2021, +rkbwH16Bf,1517250000000.0,1517260000000.0,580,H1srNebAZ,H1srNebAZ,ICLR 2018 Conference Acceptance Decision,Reject,"While one reviewer did upgrade their Rating from 6 to 7, the most negative reviewer maintains: ""Overall, I find this work interesting and current results surprising. However, I find it to be a preliminary work and not yet ready for publication. The paper still lacks a conclusion / a leading hypothesis / an explanation for the shown results. I find this conclusion indispensable even for a small scientific study to be published."" after the rebuttal. With scores of 7-5-4 it is just not possible for the AC to recommend acceptance.",ICLR2018, +46n0aRu7n6B,1642700000000.0,1642700000000.0,1,6vkzF28Hur8,6vkzF28Hur8,Paper Decision,Accept (Poster),"Description of paper content: + +The paper proposes a strategy to train a “transition policy” that can connect two pre-trained policies. The transition policy tries to reach state-action pairs that are within the occupancy distribution of the second policy using Inverse RL. The technique was evaluated on robot manipulation and locomotion problems. + +Summary of paper discussion: + +The discussion was not lengthy. The reviewers felt the paper was quite well-written, instructive, and novel, yet also implied the experimental results were less systematic than might be desired. All reviewers were weakly supportive of publication and made few critical comments. The salient ones concerned the experimental domains, the number of baselines, and the question of the generality of the approach.",ICLR2022, +ktl70XhYqRt,1610040000000.0,1610470000000.0,1,dx4b7lm8jMM,dx4b7lm8jMM,Final Decision,Accept (Poster),"The paper uses free algebras for sequential data representation, and two of the reviewers and the AC find this highly innovative. There were numerous small issues brought up by reviewers (and reviewers disagreed some on the presentation), in particular R3 asking about the experiments, some of which were addressed in the rebuttal. Overall, because the idea was unusual, it's a bit hard to place and judge this paper. In the end, in the opinion of this AC, the ideas are very creative and there is enough of a chance that this paper could become a very highly cited work, hence we recommend its acceptance.",ICLR2021, +Eqfr0CBhkSN,1610040000000.0,1610470000000.0,1,IJxaSrLIbkx,IJxaSrLIbkx,Final Decision,Reject,"The authors consider local 'why' or 'abductive' explanations for a model and a given class, which identify a minimal subset of features such that they're sufficient to imply that the model predicts the class; and 'why not' or 'contrastive' explanations, which identify a minimal subset s.t. they're sufficient to imply that the model predicts a different class. The two types of explanation are related using earlier work on minimal hitting sets going back to Reiter (1987). + +Reviewers were divided in their opinions. R4 was very positive but with little detail and only medium confidence, then did not participate in discussion. R2 was the only reviewer with high confidence, leaning against acceptance. The paper relies on FOL which was hard for reviewers to grapple with, and may make it challenging for readers. The presentation could be improved by clearly linking to existing work and demonstrating why the new approach is important.",ICLR2021, +id_QDXuaKXt,1610040000000.0,1610470000000.0,1,OQ08SN70M1V,OQ08SN70M1V,Final Decision,Accept (Poster),"This paper introduces a pair of related regularization-oriented techniques for fine-tuning pretrained transformer models for NLP tasks, and shows that both are more efficient and more effective than prior work in thorough experiments on a wide range of tasks. The techniques are motivated by the idea of 'representational collapse', which is defined as drops in the ability of a linear model trained on an input representation to solve tasks _other than_ the one being trained on. + +Pros: +- The new method is demonstrated to be broadly efficient and effective on a wide range of tasks. + +Cons: +- It's not clear why 'representational collapse' warrants a new term, or whether it's desirable in general. +- The motivations for some of the precise technical decisions behind the new methods are unclear.",ICLR2021, +4E0bPZySHS,1576800000000.0,1576800000000.0,1,S1xRxgSFvH,S1xRxgSFvH,Paper Decision,Reject,"This submission proposes an interesting experiment/modification of CNNs. However, it looks like this contribution overlaps significantly with prior work (that the authors initially missed) and the comparison in the (revised) manuscript seem to not clearly delineate and acknowledge the similarities and differences. + +I suggest the authors improve this aspect and try submitting this work to next venue. ",ICLR2020, +mrNKqipRuKI,1642700000000.0,1642700000000.0,1,J4iSIR9fhY0,J4iSIR9fhY0,Paper Decision,Accept (Spotlight),"In this paper, the authors extend the FLAMBE to the infinite-horizon MDP and largely improved the sample complexity of the representation learning in FLAMBE. Meanwhile, the authors also consider the offline representation learning with the same framework. Although there is still some computational issue in MLE for the linear MDP, the paper completes a solid step towards making linear MDP for practice. The paper could be impactful for the RL community. + +As the reviewers suggested, there are still several minors to be addressed: + +- The extension of the proposed algorithm for finite-horizon MDP should be added. +- The directly comparison between the sample complexity of FLAMBE and the proposed algorithm in infinite-horizon MDP is not appropriate. The authors should clarify the difference here. +- The organization of the proof is not clear. As reviewer suggested, the one-step back trick should be emphasized for better significance of the submission.",ICLR2022, +m1uhNgSu4qN,1642700000000.0,1642700000000.0,1,mHu2vIds_-b,mHu2vIds_-b,Paper Decision,Accept (Spotlight),"This paper integrates model ensembles with randomized smoothing to improve the certified accuracy. The methodology is motivated theoretically by showing the effect of model ensemble on reducing the variance of smooth classifiers. Moreover, it proposes an adaptive sampling algorithm to reduce the computation required for certifying with randomized smoothing. Extensive experiments were conducted on CIFAR10 and ImageNet datasets. + +The strengths of the paper are as follows: ++ In terms of significance of the topic, the problem tackled in the paper is significant and highly relevant. ++ The motivation of using model ensemble is clearly illustrated via a figure and well justified with theoretical analysis. ++ Algorithmically, the paper proposes Adaptive Sampling and K-consensus algorithms to reduce the computational cost, making the method more practical. ++ Experimentally, the paper exhibits competitive results against several frameworks for training smooth classifiers and on several datasets.",ICLR2022, +BJRi81TSG,1517250000000.0,1517260000000.0,863,ryk77mbRZ,ryk77mbRZ,ICLR 2018 Conference Acceptance Decision,Reject,"This paper proposes a regularizer for recurrent neural networks, based on injecting random noise into the hidden unit activations. In general the reviewers thought that the paper was well written and easy to understand. However, the major concern among the reviewers was a lack of empirical evidence that the method works consistently. Essentially, the reviewers were not compelled by the presented experiments and demanded more rigorous empirical validation of the approach. + +Pros: +- Well written and easy to follow +- An interesting idea +- Regularizing RNNs is an interesting and active area of research in the community + +Cons: +- The experiments are not compelling and are questioned by all the reviewers +- The writing does not cite relevant related work +- The work seems underexplored (empirically and methodologically)",ICLR2018, +nuKKZ22RZN6,1610040000000.0,1610470000000.0,1,KcTBbZ1kM6K,KcTBbZ1kM6K,Final Decision,Reject,"The paper considers the question of identifying whether a model is bad from an OOD perspective or certifying that it is good. The reviews agree that there are interesting ideas in the paper, however, lack of sufficient experiments and presentation issues were pointed out which make the paper not ready for acceptance at this stage.",ICLR2021, +Tq-yoCdnRru,1610040000000.0,1610470000000.0,1,u15gHPQViL,u15gHPQViL,Final Decision,Reject,"This paper presents work on zero-shot learning. The reviewers appreciated the simplicity of the method and its clear exposition. However, concerns were raised over novelty, motivation, and empirical validation. After reading the authors' response, the reviewers remained of the opinion that these concerns have not yet been addressed sufficiently. Based on these points, the paper is not yet ready for publication.",ICLR2021, +TnFEHQmNZe,1610040000000.0,1610470000000.0,1,6jlNy83JUQ_,6jlNy83JUQ_,Final Decision,Reject,"This paper proposes a new approximate algorithm for Bayesian logistic regression in the online setting. The primary approximation involved in the algorithm is the use of a diagonal Gaussian approximation. (A probably more minor approximation is approximating the sigmoid with a Gaussian.) The main discussion focused on two issues: Firstly, there was some sentiment that the paper lacked theoretical guarantees. Second, there were concerns about the experimental results. I feel that it is not a serious flaw that the paper lacks a theoretical regret bound. Given the current state of algorithms for this problem practical algorithms remain very much of interest. However, the general sentiment of reviewers was that the experimental results were not as strong (or as numerous) as would be hoped. For an algorithm without a theoretical regret bound, I do agree that stronger empirical evidence would be expected. This was partially addressed in a revision but still I agree with the consensus that more extensive numerical evidence should be expected, and for that reason I am recommending rejection. + +Finally, I'll mention some other issues that I view as not counting substantially against the paper. Firstly is the dependence on the prior. Here I am in agreement with the authors that this is an aspect shared by all Bayesian methods. This issue is a (valid) argument about the value of all Bayesian methods, but not one I think we will resolve here. Second, there were suggestions from the reviewers about improvements that could be made to the baseline methods. Here I don't feel that it's fair that we ask the authors to make novel improvements to other algorithms, unless those improvements are very ""obvious"".",ICLR2021, +XnhpuNRCNS5,1610040000000.0,1610470000000.0,1,8xLkv08d70T,8xLkv08d70T,Final Decision,Accept (Poster),"I thank the authors for their submission and very active participation in the author response period. The paper is well written [R3,R4], tackles a hard problem [R4] in a novel way [R4] with interesting and convincing results [R2]. R3 noted that an empirical comparison to POET would be appropriate. However, in my view the authors addressed these concerns in a satisfactory manner. It seems that R3 has not updated their assessment nor confirmed their current score based on the author response. I am therefore discounting the only review voting for rejection and am siding with R1, R2 and R4. Thus, I recommend acceptance.",ICLR2021, +1YUguJ4jxH,1576800000000.0,1576800000000.0,1,BJglA3NKwS,BJglA3NKwS,Paper Decision,Reject,"The submission presents a Siamese attention operator that lowers the computational costs of attention operators for applications such as image recognition. The reviews are split. R1 posted significant concerns with the content of the submission. The concerns remain after the authors' responses and revision. One of the concerns is the apparent dual submission with ""Kronecker Attention Networks"". The AC agrees with these concerns and recommends rejecting the submission.",ICLR2020, +SkgX6N8pJE,1544540000000.0,1545350000000.0,1,SyfXKoRqFQ,SyfXKoRqFQ,Could be improved with better explanation of the insights and experiment design.,Reject,"This paper introduced an adaptive importance sampling strategy to select mini-batches to speed up the convergence of network training. The method is well motivated and easy to follow. + +The main concerns raised by the reviewers are limited novelty of the proposed simple idea compared to related recent work, and moderate empirical performance. + +The authors argue that the particular choice of the adaptive sampling method comes after trying various methods. I believe providing more detailed discussion and comparison with different methods together with the ""active bias"" paper would help the readers appreciate the insights conveyed in this paper. + +The authors provide some additional experiments in the revision. It would make the whole experiment section a lot stronger and convincing if the authors could run more thorough experiments on extra challenging datasets and include all the results int the main text. + +Additional experiment to clarify the merit of the proposed method on either faster convergence or lower asymptotic error would also improve the contribution of this paper.",ICLR2019,4: The area chair is confident but not absolutely certain +L2G_KbmtPFjI,1642700000000.0,1642700000000.0,1,ybsh6zEzIKA,ybsh6zEzIKA,Paper Decision,Reject,"This paper proposes a “Mixup” type of data augmentation for graphs that accounts for the difficulty of mixing graphs of different number of nodes. The authors show that the mixed graphs are invertible functions of the original graphs. + +Reviewer d3Ri liked the simplicity and effectiveness of the technique. They called it a “healthy and useful contribution for the field”. Reviewer n1Dk thought that the paper explored an important problem and thought the paper was clear, though some of the math could have been simplified. This reviewer was concerned that a central claim of the paper, that the method avoids “Manifold Intrusion” was unsubstantiated. Specifically that it could not be deduced from the fact that edge connectivity could be recovered from the mixed graphs. The reviewer claimed that node features of the individual graphs were unrecoverable. The authors responded in detail to the reviewer’s criticism, adding two new lemmas which purportedly guaranteed node feature vectors could be uniquely recovered. The authors admitted to some conversion between “Manifold intrusion” and invertibility and added a Theorem and its proof that invertibility guarantees no manifold intrusion. The authors also responded to reviewer n1Dk’s concern about the significance of the reported improvements. Reviewer n1Dk responded to the author rebuttal with concerns about the strong and unrealistic assumption of linear independence of the feature matrix. They had further concerns that for the case of weighted edges the “Intrusion-free” property could not be enforced. Discussion ensued, with the authors arguing that the independence assumption was not as strong as the reviewer claimed and that the “Intrusion-free” property was only every for graphs with binary edge weights. + +Reviewer 7hBS and q8bs were both on the fence. 7hBS also raised some concern with the case of non-binary weighted edges. They also raised the same issue with respect to the connection b/w invertibility and the “Intrusion-free” property, which the authors addressed. Reviewer q8bs also thought the problem was interesting, the paper was clear, yet like n1Dk thought the performance improvement was marginal and had concerns with technical novelty of the work. + +This was a tough call, so I engaged the reviewers in further discussion. 7hBS agreed with n1Dk’s opinion that the central claim of the paper (the method being intrusion-free) was not presented with strong evidence. They also raised another concern, which was that the paper didn’t evaluate on node classification like most other graph mixup-style models. Q8bs agreed with n1Dk’s concerns and felt that post discussion the technical novelty of the work was limited. Without strong support from the reviewers, I think that this paper could use further development, either lightening the “intrusion-free” claim or presenting evidence for it in other settings.",ICLR2022, +evg4uPG2JQk,1610040000000.0,1610470000000.0,1,#NAME?,#NAME?,Final Decision,Accept (Poster),"The authors introduce an approach for designing pseudo-labels in semi-supervised segmentation. +The approach combines the idea a refining pseudo-labels with self-attention grad-CAM (SGC) and a calibrated prediction fusion, and consistency training by enforcing pseudo labels to be robust to strongly-augmented data. + +The reviewers overall like idea and point out the good level of performance obtained by the method in the challenging semi-supervised context. However, they also point out the limited novelty of the approach, and the need for a better positioning with respect to related works. After rebuttal, reviewers were satisfied with authors' answers and paper modifications, and all recommend acceptance. \ +The AC considers that the submission is a nice combination of existing techniques and likes the simplicity of the one-stage approach, which reaches good performances. Therefore, the AC recommends acceptance.",ICLR2021, +lQl60QXqWMz,1610040000000.0,1610470000000.0,1,MY3WGKsXct_,MY3WGKsXct_,Final Decision,Reject,"The work focuses on detecting whether a certain data sample was used to train a deep network-based conditional image synthesis model. The key idea is not to rely on just reconstruction error but normalizing it via a proposed difficulty score. The reviewers found the problem statement important and the paper easy to follow. However, a common concern in the discussion was the approach is specific to image-to-image translation problems. Many other minor questions were answered by the authors in the rebuttal. Upon discussion post rebuttal, the reviewers decided to maintain their score. AC and reviewers believe that the paper will benefit from better analysis and description of difficulty score and its correlation with reconstruction error. It would be ideal to see how these ideas are useful for a broader problem than image translation. Please refer to the reviews for final feedback and suggestions to strengthen the submission.",ICLR2021, +CE7TShW_d4g,1610040000000.0,1610470000000.0,1,RGeQOjc58d,RGeQOjc58d,Final Decision,Reject,"The paper studies the robustness of binary neural networks (BNNS), showing how quantized models suffer from gradient vanishing. To solve this issue, the authors propose temperature scaling approaches that can overcome this masking, achieving near-perfect perfect success in crafting adversarial inputs for these models. The problem is interesting and important. However, the major concerns are that the technical novelty is limited raised by two Reviewers, small improvements for linear loss functions. The most related work is not compared in the experiment. +",ICLR2021, +uzhmydBe0i,1576800000000.0,1576800000000.0,1,rygfC0VKPS,rygfC0VKPS,Paper Decision,Reject,"All reviewers agree that the paper is to be rejected, provided strong claims that were not answered. In this form (especially with such a title) it could not be published (it is more of a technical/engineering interest).",ICLR2020, +MdDlIalKA,1576800000000.0,1576800000000.0,1,BJlPLlrFvH,BJlPLlrFvH,Paper Decision,Reject,"The author response and revisions to the manuscript motivated two reviewers to increase their scores to weak accept. While these revisions increased the quality of the work, the overall assessment is just shy of the threshold for inclusion.",ICLR2020, +rpuGy8Snh9,1576800000000.0,1576800000000.0,1,r1e7NgrYvH,r1e7NgrYvH,Paper Decision,Reject,"The idea of integrating causality into an auto-encoder is interesting and very timely. While the reviewers find this paper to contain some interesting ideas, the technical contributions and mathematical rigor, scope of the method, and the presentation of results would need to be significantly improved in order for this work to reach the quality bar of ICLR.",ICLR2020, +cCbzrDrqNXo,1642700000000.0,1642700000000.0,1,JV4tkMi4xg,JV4tkMi4xg,Paper Decision,Reject,"The paper considers the problem of black-box optimization and proposes a discrete MBO framework using piecewise-linear neural networks as surrogate models and mixed-integer linear programming. The reviewers generally agree that the paper suggests an interesting approach but they also raised several concerns in their initial reviews. The response from the authors addressed a number of these concerns, for instance regarding scalability and expressivity of the model. However, some of these concerns remained after the discussion period, including doubts about the usefulness for typical applications in discrete black-box optimization and some concerns about the balance between exploration and exploration. + +Overall the paper falls below the acceptance bar for now but the direction taken by the authors has some potential. I encourage the authors to address the problems discussed in the reviews before resubmitting.",ICLR2022, +6gAZ4JQEMw_,1642700000000.0,1642700000000.0,1,49h_IkpJtaE,49h_IkpJtaE,Paper Decision,Accept (Poster),"Three of four reviewers rated this paper as an 8. +These positive reviewers felt that this paper provided a lot of value through extensive experimentation with MAML in the few-shot setting. It was felt that the detailed analysis of the inner and outer loop of MAML provided a lot of understanding to the reader regarding the behaviour of MAML. The fourth reviewer giving a score of 3 remains concerned about high variance on some experiments. However the strength of ratings from the other reviewers make the AC more than comfortable giving an accept recommendation for this work.",ICLR2022, +WLfnFWzoqnp,1610040000000.0,1610470000000.0,1,tL89RnzIiCd,tL89RnzIiCd,Final Decision,Accept (Poster),"The novelty of the paper are: ++ introduces a new Hopfield network with continuous states, hence can be learned end-to-end differentiation and back propagation. ++ derives efficient update rules ++ reveals a connection between the update rules and transformers ++ illustrate how the network can be used as a layer in deep neural network that can perform different functions + +The presentation was clear enough for the reviewers to understand and appreciate the novelty, although there were a few points of confusion. I would recommend the authors to address several suggestions that came up in the discussions including: +- additional analysis to highlight when and how the networks is able to outperform other competing models +- intuitions about the proofs for the theorems (okay to leave the detailed derivation in the appendix) + + +",ICLR2021, +HyR3OmGeoDp,1610040000000.0,1610470000000.0,1,qbH974jKUVy,qbH974jKUVy,Final Decision,Accept (Poster),"The paper seeks to empirically study and highlight how disentanglement of latent representations relates to combinatorial generalization. In particular, the main argument is to show that models fail to perform combinatorial generalization or extrapolation while succeeding in other ways. This is a borderline paper. For empirical studies it is also less agreed upon in general where one should draw the line about sufficient coverage of experiments, i.e., the burden of proof for primarily empirically derived insights. The initial submission clearly did not meet the necessary standard as the analysis was based on a single dataset and studied only two methods (VAE and beta-VAE). The revised version of the manuscript now includes additional experiments (an additional dataset and two new methods), still offering largely consistent pattern of observations, raising the paper to its current borderline status. Some questions remain about the new results (esp the decoder). ",ICLR2021, +si5thvHfU3R,1642700000000.0,1642700000000.0,1,o9DnX55PEAo,o9DnX55PEAo,Paper Decision,Reject,"This paper presents a method for distilling pretrained models (such as BERT) into a different student architecture (CMOW), and extend the CMOW architecture with a bidirectional component. On a couple of datasets, results are comparable to DistilBERT a previous baseline. This paper is nice, but can be stronger with more empirical experiments on non-GLUE tasks (TriviaQA, Natural Questions, SQUAD for example). Furthermore, I agree with Reviewer M3tk that there are many empirical comparisons with baselines such as TinyBERT missing and the argument of not needing the teacher model to be super convincing.",ICLR2022, +tP2eKYz_cd,1642700000000.0,1642700000000.0,1,f3qFAV_MH-C,f3qFAV_MH-C,Paper Decision,Reject,"The manuscript proposes (TRansfer and Marginalize) TRAM method that integrates the privileged information into the learned network weight through weight sharing at training time and approximately marginalizes over the privileged information at test time. TRAM can also be combined with methods for dealing with noisy labels, distillation (Distilled-TRAM) and heteroscedastic output layers (Het-TRAM). Experiments are performed on both realistic and synthetic datasets including CIFAR-10H, re-labeled ImageNet, and Civil Comments Identities. + +Reviewers agreed on several positive aspects of the manuscript, including: +1. The proposed methods have simple architectures (not requiring specific modules, e.g., Gaussian dropout [Lambert et al., 2018], for the marginalization); +2. The proposed method can in principle be applied to any neural network model and has zero overhead at prediction time. + +Reviewers also highlighted several major concerns, including: +1. The analysis is performed on edge cases such as linear and non-linear sine models. There is no analysis for the classification case that this manuscript is targeted for. The simple cases are only true when the feature extraction network is kept unchanged during training; +2. Empirically, the experiments are conducted in a limited and counter-intuitive; +3. Lack empirical evidence suggesting that the representations learned with access to privileged information are more robust against label noise; +4. Lack quantitative (or even qualitative) evidence about how, how much, and what kind of privileged information is transferred through weight sharing in realistic deep neural network models. + +Several new experiments have been added to show, among others: representations learned with privileged information outperform representations learned without access to privileged information (using a linear classification model on ImageNet), better quantitatively and qualitatively understanding how and how much privileged information is transferred in realistic deep networks. + +Post-rebuttal, reviewers stayed with borderline ratings, and they have suggested further improvements: simulating with more annotators by using different checkpoints and/or different hyperparameters, collecting a real-world large-scale dataset such that the privileged information is insignificantly expensive to obtain along with the main annotations, and disentangling the effect of the pretraining model on the denoising method.",ICLR2022, +pYDZ-Lm0gmG,1642700000000.0,1642700000000.0,1,zq1iJkNk3uN,zq1iJkNk3uN,Paper Decision,Accept (Poster),"This paper aims at improving the data efficiency of pretraining in CLIP. This is a practically meaningful research direction. The proposed method is simple, even kind of straightforward and has limited innovations. It combines self-supervision within each modality, multi-view supervision across modalities, and nearest-neighbor supervision from other similar pairs. Such a combination showed strong empirical results: achieved better performance using seven times fewer data. The rebuttals resolved most critical concerns on experiments, such as fair comparisons with the original CLIP work.",ICLR2022, +pWo2QQ3uy-,1576800000000.0,1576800000000.0,1,BJe55gBtvH,BJe55gBtvH,Paper Decision,Accept (Spotlight),"The article is concerned with depth width tradeoffs in the representation of functions with neural networks. The article presents connections between expressivity of neural networks and dynamical systems, and obtains lower bounds on the width to represent periodic functions as a function of the depth. These are relevant advances and new perspectives for the theoretical study of neural networks. The reviewers were very positive about this article. The authors' responses also addressed comments from the initial reviews. ",ICLR2020, +B1gaeJHblE,1544800000000.0,1545350000000.0,1,rylxxhRctX,rylxxhRctX,Not enough for acceptance ,Reject,The overall view of the reviewers is that the paper is not quite good enough as it stands. The reviewers also appreciate the contributions so taking the comments into account and resubmit elsewhere is encouraged. ,ICLR2019,5: The area chair is absolutely certain +oL8pA7Wofzo,1642700000000.0,1642700000000.0,1,xtZXWpXVbiK,xtZXWpXVbiK,Paper Decision,Reject,"Summary: this is a difficult paper to meta-review, since it contains some insightful ideas and interesting experiments, while it also unfortunately contains omissions, confusions, and places where clarity is lacking (see below). One consistent theme is that the paper is too dismissive of prior work; the exposition is not as clear as it should be about what aspects of FORBES are present in previous papers, it uses too broad a brush to describe prior methods (resulting in too-general statements about what these methods can't do), and it skips important chunks of the extensive literature on POMDP belief representation and tracking. As a result, the paper doesn’t do a good job concisely and accurately stating its contribution; there is still reasonable concern about how significant this contribution is. On the other hand, the experimental results for FORBES are interesting; the new method seems to represent a better combination of techniques than at least many existing works, at least to the resolution of the experiments’ statistical power. So the end question is whether interesting experimental results and a new combination of techniques are enough to outweigh the problems outlined above. In the end we believe that the correct outcome is rejection; but we have every expectation that a future version of the paper will resolve the difficulties outlined here and will appear in a future conference. + +A brief note about the discussion: the original scores for this paper were lower. While some reviewers raised their score later in the discussion, a thorough reading of the discussion and the revised paper indicates that a substantial fraction of the issues leading to the lower scores still remain. + +More details: + +There is a lot of prior work on tracking belief states, which should be cited more thoroughly. The paper's intro makes it sound like diagonal Gaussians were the only previous alternative. At least, the intro should cite older work on MCMC methods like particle filters (e.g., Thrun’s book Probabilistic Robotics, or Arnaud Doucet’s work), and prior deep-net papers that attempt non-Gaussian representations, even if these don’t perform as well as hoped (see below for examples). It is also important to compare to RKHS representations of beliefs, such as Nishiyama, Boularias, Gretton, Fukumizu 2012; these handle multimodality, and can behave similarly to deep nets if they use the neural tangent kernel. Accurately comparing to prior work is one of the most important functions of a paper, so it doesn’t make sense to be unfairly critical of prior work or to skip it. + +The paper is also unclear about the effects of Gaussian distributions at different places in a variational approximation. Because of this lack of clarity, the criticisms it levels at previous variational methods seem to be true only of some of them. + +In particular, the introduction should distinguish between two uses of Gaussian approximations: first for the belief itself, and second for the distribution of observations given a belief. Some prior works make only one of these approximations. For example, a non-Gaussian distribution used as a belief state can predict multi-modal future behaviors, even if we approximate observations under a given belief as Gaussian. + +The introduction should also distinguish between two common places that a Gaussian could enter into a variational approximation: at the input or at the output of a network. A Gaussian latent at the input of a variational network (even if it has diagonal covariance) can result in a highly non-Gaussian output distribution, while Gaussian noise added at the end will (if it is the only noise) lead to a Gaussian output. Again, some of the statements in the intro apply only to the latter use of a Gaussian, while some prior work focuses on the former use. + +There is an important conceptual confusion in the paper about what it means to have a multimodal belief state: the paper presents the true belief as an inherent property of an environment, while in fact it is a property of an environment *model*. So, there can be two different equally accurate models of the same environment which differ in the belief representation; a simple example would be to use either a continuous state whose components are joint angles, or a discrete state obtained by finely discretizing this continuous one. In the first case the belief would be a distribution over the continuous space, while in the second it would be a categorical distribution (a point in a simplex). A consequence of such a difference is that beliefs can be multimodal in one representation and not another. + +The importance of this confusion is that, since we are asking our network to learn a belief state, the learning process could potentially favor representations that lead to unimodal beliefs — so it’s not clear theoretically that forcing a unimodal belief representation is necessarily a disadvantage. The paper presents the situation as if the disadvantage is forced by theory, while instead the argument should be based on experiments: e.g., one could try to show that unimodal representations, even if given a higher latent dimension to work with, aren’t empirically able to capture the same information. + +Some interesting prior deep-net POMDP papers that might need better discussion: +* Han, Doya, and Tani ICLR 2020 (which isn’t cited here) puts the Gaussian latent variable as an input to the network for predicting beliefs (eq 2), resulting in a possibly highly non-Gaussian output representing the belief. +* Tschiatschek et al, 2018 (also not cited) uses a Gaussian *mixture* as the variational distribution to approximate beliefs, again allowing multimodality. +* Igl et al. 2018 (which is cited only late in the current paper, and basically dismissed) uses a deep version of particle filters to allow non-Gaussian distributions for both beliefs and observations. +* More work that is potentially relevant but not adequately compared (even if briefly cited): Gregor et al. (2019), DreamerV2, Ha & Schmidhuber’s World Models. Each of these makes at least some choices to try to handle at least some kinds of multimodality, so a clear explanation of differences that avoids the confusions mentioned above would be very helpful. +* In general, the results of the search “variational encoder POMDP” seem to include a number of papers not cited in the current paper; another useful search is “normalizing flow POMDP"" + +Finally, in the experiments section, the paper needs to correctly report the reliability of its conclusions. In some places (e.g., Fig. 5) there’s no mention of reliability or repeatability of conclusions; the paper just says that its evidence “support[s] the claim that FORBES can better capture the complex belief states”. In other places (e.g., Fig. 6, 7), the paper displays uncertainty representations based on only a few replications of an experiment (e.g., 3 seeds for Fig. 6, or 5 seeds when a reviewer requested extra experiments). The corresponding uncertainty estimates almost certainly are strongly biased too low (too certain); e.g., three runs would have less than a 50% chance of even seeing failure modes that happen with probability as high as 20% (0.8^3 = 0.512 > 0.5), meaning that the estimated standard deviation could be almost arbitrarily badly biased downward. To be clear, experiments with few replicates can still be highly useful and informative, and it’s true that some experiments are too expensive to run many times; but in such cases the paper should add appropriate caveats to its conclusions. For example, instead of reporting the sample standard deviation based on a normal model, the paper could report a confidence interval based on a more robust model or test, such as a Wilcoxon test. (To illustrate the difference, confidence intervals at typical significant levels like p=0.05 would be vacuous (infinitely wide) under Wilcoxon with 3 seeds, but much-weaker p-values would still yield non-vacuous intervals.) + +A few smaller questions: + +The authors added a nice ablation study to compare to Dreamer; this is great to see. It would be good to discuss the connection to earlier methods such as PlaNet and Dreamer at places where the current method is similar or different (e.g., different from Dreamer in the belief state representation in sec 2.2, but similar in the RL framework in section 3.2). These comparisons would aid in the reader’s understanding of what is new in FORBES. + +An unusual feature of FORBES is that the variational approximation to the belief at time t+1 is not a function of the belief at time t. Instead the belief inference network q_{\psi,\theta} takes as input the entire past trajectory, uses convolution and recurrence to reduce the variable-length input to fixed dimension, and passes this fixed-dimension representation through a normalizing flow mapping. It would be interesting to discuss the reason for this design decision. In particular, it seems like it would inhibit tracking — i.e., it could be hard to propagate information from one belief distribution to the immediate next one, particularly if there are a few unlikely observations scattered through a trajectory. + +A minor point for clarity: in Fig 1 it's unclear what distributions the white and gray triangles refer to. They don't seem to correspond to a natural belief state: instead maybe they incorporate three simultaneous observations from the same starting belief? Correctly intersecting beliefs is an important issue though, so at a high level the point that the figure is trying to make fits well. + +Another point for clarity: “there always exists a diffeomorphism that can turn one well-behaved distribution into another”: this is true for some definition of ”well-behaved”, but it’s misleading to say it this way. E.g., it is not true if the distributions in question can have atoms, or differ in dimension or topology; these exceptions are unfortunately important cases that do come up in practice.",ICLR2022, +rkNA4yaHM,1517250000000.0,1517260000000.0,464,r1AoGNlC-,r1AoGNlC-,ICLR 2018 Conference Acceptance Decision,Reject,"This paper introduces a possibly useful new RL idea (though it's a incremental on Liang et al), but the evaluations don't say much about why it works (when it does), and we didn't find the target application convincing. +",ICLR2018, +R3ybHj6QLFs,1610040000000.0,1610470000000.0,1,Te1aZ2myPIu,Te1aZ2myPIu,Final Decision,Reject,"The paper considers an extension of randomized smoothing where the smoothing noise may differ for different points. The resulting method shows good performance experimentally. However, the reviewers raised a number of problems which, at the moment, precludes the acceptance of the paper, such as the following: + +- The paper analyzes the transductive setting, where all the test points are available to fine-tune the smoothing parameters of the predictor. It is not clear how this setting corresponds to a real adversarial threat model, and whether the final tuning needs to use the perturbed or unperturbed points. In the first case, the resulting certified radius is different from what is normally used in the literature, while in the latter it is not clear how the method would be useful to mitigate any real adversarial attack. +- A related comment is that the paper should explain (and state) properly how the results of Cohen et al. (2019) are applicable to compute the certified radius, which would also provide a proper explanation why partitioning is used. +- The training cost of the procedure seems very high, and this is not discussed. +- The clarity of the presentation should be improved. + + + +",ICLR2021, +tlATYHK8aq,1576800000000.0,1576800000000.0,1,rylVTTVtvH,rylVTTVtvH,Paper Decision,Reject,"The paper proposes a tensor-based extension to graph convolutional networks for prediction over dynamic graphs. + +The proposed model is reasonable and achieves promising empirical results. After discussion, it is agreed that while the problem of handling dynamic graphs is interesting and challenging, the proposed tensor method lacks novelty, the theoretical analysis is artificial, and the empirical study does not cover enough benchmarks. + +The current version of the paper is not ready for publication. Addressing the issues above could lead to a strong publication in the future. ",ICLR2020, +VuXSDsQMx4Z,1642700000000.0,1642700000000.0,1,ItkxLQU01lD,ItkxLQU01lD,Paper Decision,Accept (Poster),"The paper got four accepts (after the reviewers changed their scores), all with high confidences. The theories are complete and the experiments are solid. The AC found no reason to overturn reviewers' recommendations. However, the AC deemed that all the pieces are just routine, thus only recommended poster.",ICLR2022, +u9smupYS6dq,1610040000000.0,1610470000000.0,1,piek7LGx7j,piek7LGx7j,Final Decision,Reject,"This paper presents a two-step approach to achieve disentangled representation and good reconstruction at the same time in deep generative models: the first step focuses on good disentanglement (e.g., with beta-TCVAE) while possibly sacrificing reconstruction reconstruction, while the second step focuses on high-quality reconstruction, conditioned on the low-quality reconstruction from disentangled representation. In this paper, each step uses an existing method: beta-TCVAE is used for the first step and AdaIN is used for the second, so the paper presents an intuitive combination of two existing methods to achieve both goals. Some useful ablation studies are provided to empirically justify the specific method choices. The concern is whether the two step approach is necessary to achieve both; the authors' argument is that models learning only one set of latent variables may not have the capacity to achieve both goals, and methods jointly learning two set of variables (""disentangled"" and ""correlated"" variables) can not guarantee they represent disjoint structures of data. Some of these statements seem somewhat handwavy (including the d-separation argument, I am not very sure if it applies when the variables are learned separately), and shall be made rigorous and justified (theoretically and/or empirically). + +The reviewers rate this paper to be borderline.",ICLR2021, +3ooTLWxORs,1610040000000.0,1610470000000.0,1,uUlGTEbBRL,uUlGTEbBRL,Final Decision,Reject,"This paper presents a theoretical analysis of CNN compression using tensor methods. None of the three reviewers have strong opinion; there scores are 5, 6, and 5. + +The attempt to understand the mechanism of how tensor decomposition compresses CNNs is meaningful and interesting. However, the main contribution of this work is not sufficiently distinct compared to the existing approaches and the analysis and proof is conduected only for simplified models as mentioned by reviewers. The practical benefit of this paper is not clear and the experimental validation is weak because only a small number of model architectures were tested on a few small datasets. + +This is a borderline paper. However, this paper needs to extend its contribution by performing more comprehensive analysis for general CNNs. ",ICLR2021, +BJeOZeCll4,1544770000000.0,1545350000000.0,1,ByetGn0cYX,ByetGn0cYX,Promising approach and text should be clarified on some points of active discussion,Accept (Poster),"This paper presents a new approach for posing control as inference that leverages Sequential Monte Carlo and Bayesian smoothing. There is significant interest from the reviewers into this method, and also an active discussion about this paper, particularly with respect to the optimism bias issue. The paper is borderline and the authors are encouraged to address the desired clarifications and changes from the reviewers. +",ICLR2019,3: The area chair is somewhat confident +EmbcEIvTsg,1576800000000.0,1576800000000.0,1,ByxXZpVtPB,ByxXZpVtPB,Paper Decision,Reject,"The authors propose a framework for incorporating homogeneous linear inequality constraints on neural network activations into neural network architectures. The authors show that this enables training neural networks that are guaranteed to satisfy non-trivial constraints on the neurons in a manner that is significantly more scalable than prior work, and demonstrate this experimentally on a generative modelling task. + +The problem considered in the paper is certainly significant (training neural networks that are guaranteed to satisfy constraints arises in many applications) and the authors make some interesting contributions. However, the reviewers found the following issues that make it difficult to accept the paper in its present form: +1) The setting of homogeneous linear equality constraints is not well-motivated and the significance of being able to impose such constraints is not clearly articulated in the paper. The authors would do well to prepare a future revision documenting use-cases motivated by practical applications and add these to the paper. +2) The experimental evaluation is not sufficiently thorough: the authors evaluate their method on an artificial constraint involving a ""checkerboard pattern"" on MNIST. Even in this case, the training method proposed by the authors seems to suffer from some issues, and more thorough experiments need to be conducted to confirm that the training method can perform well across a variety of datasets and constraints. + +Given these issues, I recommend rejection. However, I encourage the authors to revise their work on this important topic and prepare a future version including practical examples of the constraints and experiments on a variety of prediction tasks. + +",ICLR2020, +kdmSXxHAIK,1576800000000.0,1576800000000.0,1,HkgXteBYPB,HkgXteBYPB,Paper Decision,Reject,"The paper presents a timely method for intuitive physics simulations that expand on the HTRN model, and tested in several physicals systems with rigid and deformable objects as well as other results later in the review. + +Reviewer 3 was positive about the paper, and suggested improving the exposition to make it more self-contained. Reviewer 1 raised questions about the complexity of tasks and a concerns of limited advancement provided by the paper. Reviewer 2, had a similar concerns about limited clarity as to how the changes contribute to the results, and missing baselines. The authors provided detailed responses in all cases, providing some additional results with various other videos. After discussion and reviewing the additional results, the role of the stochastic elements of the model and its contributions to performance remained and the reviewers chose not to adjust their ratings. + +The paper is interesting, timely and addresses important questions, but questions remain. We hope the review has provided useful information for their ongoing research. ",ICLR2020, +BmljCBMJ8,1576800000000.0,1576800000000.0,1,SJldu6EtDS,SJldu6EtDS,Paper Decision,Reject,"This article proposes a regularisation scheme to learn classifiers that take into account similarity of labels, and presents a series of experiments. The reviewers found the approach plausible, the paper well written, and the experiments sufficient. At the same time, they expressed concerns, mentioning that the technical contribution is limited (in particular, the Wasserstein distance has been used before in estimation of conditional distributions and in multi-label learning), and that it would be important to put more efforts into learning the metric. The author responses clarified a few points and agreed that learning the metric is an interesting problem. There were also concerns about the competitiveness of the approach, which were addressed in part in the authors' responses, albeit not fully convincing all of the reviewers. This article proposes an interesting technique for a relevant type of problems, and demonstrates that it can be competitive with extensive experiments. ``Although this is a reasonably good article, it is not good enough, given the very high acceptance bar for this year's ICLR. +",ICLR2020, +43Sd8Bm0VK,1610040000000.0,1610470000000.0,1,KIS8jqLp4fQ,KIS8jqLp4fQ,Final Decision,Reject,"This work proposes algorithms for solving ERM with continuous losses satisfying the PL condition. The first algorithm achieves that by using a chainging noise variance and thus the paper frames the contribution in terms of the advantages of non-constant noise rate. + +The problem is a well-studied one and the result is a nice if relatively modest improvement over Wang et al. However, as pointed out in reviews, in the context of convex optimization the same rate has already been established (Feldman,Koren,Talwar STOC 2020). This work is cited and briefly discussed but the discussion only includes one of the algorithms in the paper (that does have an additional log N factor). The overall assumptions in this paper are not comparable (weaker in some ways and stronger since they only require PL instead of strong convexity) but still the overall the contribution appears to be incremental.",ICLR2021, +T4dwXQec9t5,1610040000000.0,1610470000000.0,1,Ov_sMNau-PF,Ov_sMNau-PF,Final Decision,Accept (Poster)," +This paper described a model that improves the performance of LM-based pre-trained sentence representation on semantic text similarity tasks (STS). The proposed approach is motivated by the observation that top-layers in transformer-based LMs are quite poor at this task per se. This paper proposes Contrastive Tension, a self-supervised objective that drags representations of same sentence together, and pulls away representations of different sentences. The proposed method only relies on unlabelled data, and is relatively simple to implement. The paper demonstrates consistently strong results empirical results on the unsupervised semantic textual similarity (STS) task. Moreover, the paper provides reasonably good analysis. + +On a negative side, the reviewers noted that the paper lacks a bit of analysis about the objective. The connection between the observation on layer-wise performance of BERT on STS and the proposed contrastive training method is not clear. Second, while the result is interesting, its applicability is limited to STS. + +Taking into account all the above, the reviews constitue a case for a solid weak-accept. Therefore I recommend acceptance as a Poster contribution.",ICLR2021, +Nhsilu0__FtS,1642700000000.0,1642700000000.0,1,aBAgwom5pTn,aBAgwom5pTn,Paper Decision,Reject,"This paper presents a new method for performing Bayesian optimization for hyperparameter tuning that uses learning curve trajectories to reason about how long to train a model for (thus ""grey box"" optimization) and whether to continue training a model. The reviewers seem to find the paper clear, well-motivated and the presented methodology sensible. However, the reviews were quite mixed and leaning towards reject with 3, 6, 5, 3, 6. A challenge for the authors is that there is already significant related literature on the subject of multi-fidelity optimization and even specific formulations for hyperparameter optimization that reason about learning curves. A common criticism raised by the reviewers is that while there are extensive experiments, they don't seem to be the right choice of experiments to help understand the advantages of this method (e.g. epochs instead of wall-clock on the x-axis, choice of baselines, demonstration that early results are used to forecast later success, etc.). Unfortunately, because there is significant related literature, the bar is raised somewhat in terms of empirical evidence (although theoretical evidence of the performance of this method would also help). It seems clear that some of the reviewers are not convinced by the experiments that were presented. Thus the recommendation is to reject the paper but encourage the authors to submit to a future venue. It looks like the authors have gone a long way to address these concerns in their author responses. Incorporating these new results and the reviewer feedback would go a long way to improving the paper for a future submission.",ICLR2022, +ByeE9y0zeV,1544900000000.0,1545350000000.0,1,BygIV2CcKm,BygIV2CcKm,"Interesting contribution, but not fully developed yet. ",Reject,"This paper proposes and end-to-end trainable architecture for data augmentation, by defining a parametric model for data augmentation (using spatial transformers and GANs) and optimizing validation classification error through the notion of influence functions. Experiments are reported on MNIST and CIfar-10. + +This is a borderline submission. Reviewers found the theoretical framework and problem setup to be solid and promising, but were also concerned about the experimental setup and the lack of clarity in the manuscript. In particular, one would like to evaluate this model against similar baselines (e.g. Ratner et al) on a large-scale classification problem. The AC, after taking these comments into account and making his/her own assessment, recommends rejection at this time, encouraging the authors to address the above comments and resubmit this promising work in the next conference cycle. ",ICLR2019,4: The area chair is confident but not absolutely certain +SyefuLZlgE,1544720000000.0,1545350000000.0,1,HJl1ujCct7,HJl1ujCct7,"Many reviewer concerns, no author rebuttals.",Reject,"The authors propose a GAN-based anomaly detection method based on simulating anomalies (low density regions of the data space) in order to train an anomaly classifier. + +While the paper addresses an interesting take on an important problem, there are many concerns raised by reviewers including novelty, clarity, attribution, reproducibility, the use of exclusively proprietary data, and a multitude of textual mistakes. Overall, the paper shows promise but does not seem to be a mature and polished piece of work. As there has been no rebuttal or update to the paper I have no choice but to concur with the reviewers' initial assessments and reject.",ICLR2019,5: The area chair is absolutely certain +0Z9wue9a68I,1610040000000.0,1610470000000.0,1,Wj4ODo0uyCF,Wj4ODo0uyCF,Final Decision,Accept (Oral),"This paper proposes a conditional language-specific routing (CLSR) mechanism for multilingual NMT, which also considers the trade-off between language specificity and generality. + +All of the reviewers think this paper is interesting for both idea and empirical findings. Therefore, it is a clear acceptance.",ICLR2021, +lDXI0VGDVg,1576800000000.0,1576800000000.0,1,SyeMblBtwr,SyeMblBtwr,Paper Decision,Reject,"This is certainly a boarderline paper. The reviewers agreed this paper provides a good explanation and empirical justification of why popular normalization schemes don't help in DRL. The paper then proposes a simple scheme and demonstrates how it improves learning in several domains. The main concerns are the nature of these gains and how broadly useful the new approach is. In many cases there appear to be somewhat clear wins in the middle of the learning curves, but by the end of each experiment the errorbars overlap. The most clear results are those with TD3. There are some oddities here: using half SD error bars and smoothing---both underline the concern about significance. + +The reviewers requested more experiments and the authors provided three more domains: two in which their method appears better. These are not widely used benchmarks and it was hard to compare the performance of the baselines with fan et al (different setup) to evaluate the claims. The paper nicely provides lots of insight and empirical wisdom in the appendix, explaining how they got the algorithms to perform well. +",ICLR2020, +G29xmWVkKwo,1610040000000.0,1610470000000.0,1,1OCwJdJSnSA,1OCwJdJSnSA,Final Decision,Reject,"This paper investigates the problem of unsupervised domain adaptation and proposes a framework based on a specific type of disentangled representations learning. The paper is well written and the proposed method seems plausible. However, according to Reviewers #3 and #4, the proposed framework does not seems to be sufficiently different from existing ones, and the empirical results do not seem convincing enough. + +Please also double check in C3, whether T and S should be marginally independent or conditionally independent conditioning on X.",ICLR2021, +BJV42fUul,1486400000000.0,1486400000000.0,1,r1Ue8Hcxg,r1Ue8Hcxg,ICLR committee final decision,Accept (Oral),"This was one of the most highly rated papers submitted to the conference. The reviewers all enjoyed the idea and found the experiments rigorous, interesting and compelling. Especially interesting was the empirical result that the model found architectures that outperform those that are used extensively in the literature (i.e. LSTMs).",ICLR2017, +mkLQJxJSMNV,1642700000000.0,1642700000000.0,1,_Vn-mKDipa1,_Vn-mKDipa1,Paper Decision,Reject,"The paper proposes a method for time series forecasting based on a hierarchical deep learning approach. Three reviewers submitted reviews, with two marginally accept and one marginally reject. The paper was therefore borderline, but the issues raised by the marginal reject reviewer on the justification for the design choice of a deep latent model and the experimental setup appear worth addressing in a revision resubmitted to another conference.",ICLR2022, +s9JXMWu7X0,1610040000000.0,1610470000000.0,1,b905-XVjbDO,b905-XVjbDO,Final Decision,Reject,"The average review ratings for this paper is somewhat borderline. The paper provides mathematical characterizations on when ReLU neural networks are injective. The paper has very nice ideas, but the reviewers also pointed out several key concerns: + +1. “Given that the DSS condition takes exponential time to check, how do you check for injectivity of a given network?” +2. “But after you train a network using some training dataset, these matrices are no longer random or generic. How do you ensure that the network is still injective?” +3. “With leaky ReLU or flow model, global injectivity is automatically satisfied for non-degenerate weight matrices, and in most applications, we don't see much difference” + +I think points 2&3 are particularly important here. It seems that for practitioners, if injectivity is a key concern, then one can just use leaky-relu with well-conditioned weight matrices that guarantee injectivity. Note that well-conditioning is easy to check and relatively easy to enforce. It’s unclear to the AC why one has to stick to particularly the relu activation and the recipe provided by corollary 2 and the paragraphs below Corollary 2. (It also seems to the AC that Corollary 2’s construction is fundamentally similar to the using leaky-relu, but the AC is not quite sure.) Given that a much easier workaround (using leaky-relu and full-rank matrices) is available and is widely used in prior works (when it’s necessary), the AC, unfortunately, does not see that the paper could have a strong impact on the ML community and does not think the experimental results are sufficient to justify that this is a better idea than using leaky-relu. In the AC’s opinion, the paper might be more compelling in a math venue. + +",ICLR2021, +ry9DL1pHM,1517250000000.0,1517260000000.0,806,S1EzRgb0W,S1EzRgb0W,ICLR 2018 Conference Acceptance Decision,Reject,"The paper proposes a way to find why a classifier misclassified a certain instance. It tries to find pertubations in the input space to identify the appropriate reasons for the misclassification. The reviewers feel that the idea is interesting, however, it is insufficiently evaluated. Even for the datasets they do evaluate not enough examples of success are provided. In fact, for CelebA the results are far from flattering.",ICLR2018, +nTpi81Uv68i,1610040000000.0,1610470000000.0,1,mEdwVCRJuX4,mEdwVCRJuX4,Final Decision,Accept (Poster),"This paper got 2 clear acceptance and 2 borderline recommendation. The main concerns lie in the clarity of the experiment results and settings (AR3). The authors address these questions in their response. AR2 has two important questions. One is whether the simplified assumption holds in the considered very complicated settings (i.e., the labels are noisy and long-tailed). The other one is the lack of comparison with SOTA method for long-tailed classification. The authors did good job in their response. They provide additional experiment results to address these questions. Overall, the quality of this submission meet the bar of ICLR acceptance, though AC has concerns on the complicated settings and the marginal performance improvement over the existing long-tailed works.",ICLR2021, +wJj40MIsqjV,1642700000000.0,1642700000000.0,1,T73sfhfzk07,T73sfhfzk07,Paper Decision,Reject,"The paper proposes a gradient-based method for OOD detection. While the paper has some interesting contributions, all the reviewers felt that the current version falls below the ICLR acceptance threshold. I encourage the authors to revise and resubmit to a different venue.",ICLR2022, +SkllOWzge4,1544720000000.0,1545350000000.0,1,BylE1205Fm,BylE1205Fm,good performance on proposed task; task not well-motivated,Accept (Poster),"1. Describe the strengths of the paper. As pointed out by the reviewers and based on your expert opinion. + +The proposed method performed well on 3 visual content transfer problems. + +2. Describe the weaknesses of the paper. As pointed out by the reviewers and based on your expert opinion. Be sure to indicate which weaknesses are seen as salient for the decision (i.e., potential critical flaws), as opposed to weaknesses that the authors can likely fix in a revision. + +- The paper is hard to follow at times +- The problem being addressed is technically interesting but not well-motivated. That is, the question ""why is this of interest to the ICLR community"" was not well-answered. + +3. Discuss any major points of contention. As raised by the authors or reviewers in the discussion, and how these might have influenced the decision. If the authors provide a rebuttal to a potential reviewer concern, it’s a good idea to acknowledge this and note whether it influenced the final decision or not. This makes sure that author responses are addressed adequately. + +There were no major points of contention. + +4. If consensus was reached, say so. Otherwise, explain what the source of reviewer disagreement was and why the decision on the paper aligns with one set of reviewers or another. + +The reviewers reached a consensus that the paper should be accepted. +",ICLR2019,3: The area chair is somewhat confident +vfONePFDJh,1576800000000.0,1576800000000.0,1,B1eiJyrtDB,B1eiJyrtDB,Paper Decision,Reject,"This work proves a generalization bound for permutation invariant neural networks (with ReLU activations). While it appears the proof is technically sound and the exact result is novel, reviewers did not feel that the proof significantly improves our understanding of model generalization relative to prior work. Because of this, the work is too incremental in its current form.",ICLR2020, +ByoWEJ6BM,1517250000000.0,1517260000000.0,297,r1Ddp1-Rb,r1Ddp1-Rb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The paper presents a simple but surprisingly effective data augmentation technique which is thoroughly evaluated on a variety of classification tasks, leading to improvement over state-of-the-art baselines. The paper is somewhat lacking a theoretical justification beyond intuitions, but extensive evaluation makes up for that.",ICLR2018, +MvBRpk_0Jzz,1642700000000.0,1642700000000.0,1,-0LuSWi6j4,-0LuSWi6j4,Paper Decision,Reject,"The authors discuss the disconnect between log-likelihood and sample quality of VAEs and relate it to an undesirable focus of the model on high-frequency signals. They propose to alleviate it through a two-stage training scheme for VAEs. +As it is, the paper does not explain well its contributions, especially compared to the rate-distortion balance discussion in ""Fixing a Broken ELBo"" by Alemi et al. (2018) (see [reviews sh3z](https://openreview.net/forum?id=-0LuSWi6j4¬eId=D52ninjThn1), [7Pio](https://openreview.net/forum?id=-0LuSWi6j4¬eId=9qMQNUGk6bx), and [LBJj](https://openreview.net/forum?id=-0LuSWi6j4¬eId=gyG86hghxsU)), and lacks the experiments to back up its claim (see [LBJj](https://openreview.net/forum?id=-0LuSWi6j4¬eId=gyG86hghxsU), and [KKon](https://openreview.net/forum?id=-0LuSWi6j4¬eId=zeFApaHliSv)). While the authors have made a more precise statement about their contributions in their rebuttal, the writing remains unclear. +I recommend this submission for rejection.",ICLR2022, +rznihtVIyjk,1610040000000.0,1610470000000.0,1,Ew0zR07CYRd,Ew0zR07CYRd,Final Decision,Reject,"Most reviewers are positive about this work, though they believe it is somewhat incremental, and its theoretical contributions are minor. None of the reviewers are very excited about this work. Overall, the PC believes this is a borderline paper. + +Minor note: During the discussions, the paper by Xiao et al., ""Characterizing Attacks on Deep Reinforcement Learning"" (2019) was brought up. The authors claimed that they did not compare with that paper because the best attack there (obs-fgsm-wb) had already been studied. In a later stage of discussions, one of the reviewers stated that the method obs-nn-wb in that paper performed better in some domains. Even though this is not a major issue, it is advisable to the authors to make sure that this is indeed the case, and if it is, provide proper comparison with that paper. + +We encourage the authors to consider the reviewers' comments to improve the paper and resubmit to a future venue.",ICLR2021, +kMOSb1tw49p,1610040000000.0,1610470000000.0,1,aI8VuzSvCPn,aI8VuzSvCPn,Final Decision,Reject,The paper uses adversarial data to improve generalization in Programming By Example (PBE). The reviews were somewhat mixed with some people finding this useful and interesting while others finding it straightforward and unsurprsing. The reviewers were not convinced of the ultimate usefulness of the approach since it is evaluated on toy or synthetic datasets. The clarity of the presentation could also be improved.,ICLR2021, +ByltV_RJeV,1544710000000.0,1545350000000.0,1,SkNSehA9FQ,SkNSehA9FQ,Results do not justify method complexity,Reject,"This paper introduces fairly complex methods for dealing with OOV words in graphs representing source code, and aims to show that these improve over existing methods. The chief and valid concern raised by the reviewers was that the experiments had been changed so as to not allow proper comparison to prior work, or where comparison can be made. It is essential that a new method such as this be properly evaluated against existing benchmarks, under the same experimental conditions as presented in related literature. It seems that while the method is interesting, the empirical section of this paper needs reworking in order to be suitable for publication.",ICLR2019,4: The area chair is confident but not absolutely certain +w2-qUXOQ5bC,1610040000000.0,1610470000000.0,1,daLIpc7vQ2q,daLIpc7vQ2q,Final Decision,Reject,"This paper introduces a bag of techniques to improve contrastive divergence training of energy-based models (EBMs), particularly a KL divergence term, data augmentation, multi-scale energy functions, and reservoir sampling. The overall paper is well written and clearly presented. + +In response to the major concerns from reviewers, the AC recognizes the authors' effort in expanding related work and adding ablation on the effects of the KL loss. However, reviewers remain unconvinced by the significance of the current results. In particular, the quality improvement by adding the KL terms is subtle compared to using reservoir sampling (as evidenced in the contrast of the last two rows in Table 2). Moreover, the authors are also encouraged to compare additionally with recent development in EBM, as pointed out by R2 & R4. + +The AC does find the results on downstream tasks such as out-of-distribution quite promising and interesting. Perhaps it's worth expanding the discussion with formal reasoning on why KL loss helps in this case. + +All four knowledgeable reviewers are leaning towards rejection, the AC respects and agrees with the decision. +",ICLR2021, +HJxgSyHbeV,1544800000000.0,1545350000000.0,1,SygONjRqKm,SygONjRqKm,Not quite enough for acceptance,Reject,The overall view of the reviewers is that the paper is not quite good enough as it stands. The reviewers also appreciate the contributions so taking the comments into account and resubmit elsewhere is encouraged. ,ICLR2019,5: The area chair is absolutely certain +SkW5UypHG,1517250000000.0,1517260000000.0,838,SyhRVm-Rb,SyhRVm-Rb,ICLR 2018 Conference Acceptance Decision,Reject,"In principle, the idea behind the submission is sound: use a generative model (GANs in this case) to learn to generate desirable ""goals"" (subsets of the state space) and use that instead of uniform sampling for goals. Overall I tend to agree with Reviewer 3 in that the current set of results is not convincing in terms of it being able to generate goals in a high-dimensional state space, which seems to be be whole raison d'etre of GANs in this proposed method. The coverage experiment in Figure 5 seems like a good *illustration* of the method, but for this work to be convincing, I think we would need a more diverse set of experiments (a la Figure 2) showing how this method performs on complicated tasks. + +I encourage the authors to sharpen the definitions, as suggested by reviewers, and, if possible, provide experiments where the Assumptions being made in Section 3.3 are *violated* somehow (to actually test how the method fails in those cases).",ICLR2018, +c-nF1OkNfes,1642700000000.0,1642700000000.0,1,okmZ6-zU6Lz,okmZ6-zU6Lz,Paper Decision,Reject,"The paper considers the problem of controlling the dynamics of a networked dynamical system, under partial observations, considering a reduced order system from coarse data, and providing approximation bounds and an empirical evaluation. Reviewers agree it is a borderline paper. Technical results are nontrivial, and it introduces new questions, but the main contribution is rather narrow and it could be better written.",ICLR2022, +XmIoP22iVNmF,1642700000000.0,1642700000000.0,1,Fza94Y8VS4a,Fza94Y8VS4a,Paper Decision,Accept (Poster),"This paper examines the evolution of densities of initial conditions under the multiplicative weights update rule for learning in two-player zero-sum games. Specifically, the authors estimate the differential entropy (DE) of a density of initial conditions as it evolves over time (what they call ""uncertainty""), and they show that (a) as long as the density of states assigns sufficient mass to all strategies, its DE will increase; and (b) the density of states will get arbitrarily close to the boundary of the state space infinitely often (i.e., at least one pure strategy will be employed with arbitrarily small probability infinitely often). The authors also apply these results to a population-like model of learning as well as an optimistic variant of the MWU protocol (the latter in the supplement). + +The paper was extensively discussed during the review/rebuttal phase. While the reviewers appreciated the conceptual contributions of the paper, they also identified certain technical shortcomings that were only partially addressed by the authors. One of these issues concerned the possibility that the density of initial conditions may exhibit singularities, in which case the DE may fail to be well-defined. As a result, one of the reviewers indicated an intent to downgrade their score from ""8"" to ""3"" due to concerns on the correctness of the results presented in the paper. + +After discussing with both the authors and the reviewers, my view is that the merits of the paper outweigh its flaws, so I am making an ""accept"" recommendation. At the same time, there is a number of revisions that the authors will have to undertake in the camera-ready version of their paper: + +1. The authors need to be more careful with their assumptions and notation. The reviewers already indicated a number of glitches, most of them easily fixable (so they are not of particular concern). On the other hand, the issue of whether the initial density of states becomes singular or not is more subtle and led one of the reviewers to drastically change their evaluation of the paper. + + The problem here is that the authors are not being precise in their assumptions for $g^1$ and its support, and this confusion remained throughout the discussion: the authors are looking at distributions that are ""smooth with bounded support"", but this does not exclude singularities. The counterexample given to the authors was a random variable $X$ supported on $\mathcal{X} = (0,1)$ with density $g(x) = 1/(2\sqrt{x})$; this density has bounded support and it is smooth on its support, but it is not itself bounded. [There is an ambiguity here in whether the authors are considering the support to be closed or not.] + + The issue for the initial density can be trivially fixed by asking that $g^1$ be itself bounded (or smooth over the closed support, or any other similar statement). However, even if this is assumed for $g^1$, the density at some later time $t$ could, a priori, become singular (incidentally, this is a problem that arises frequently in the study of densities that evolve over time, e.g., as in optimal transport). Thus, even an explicit assumption for $g^1$ does not suffice to ensure that $g^t$ does not develop singularities in future stages. [Incidentally, the authors' reply that the singularity has measure zero and therefore does not contribute to the integral misses the heart of the matter (and raises concerns about the authors' overall treatment of this question): the function $g(x) = (\log 2) \big/ (x \log^2x)$ has infinite differential entropy over $(0,1/2)$ even though it is a smooth density over $(0,1/2)$.] + + To be clear, I do not believe that blow-ups actuall occur in the authors' model, but there is still something that needs to be shown here. However, since it is impossible to check an argument or proof at this stage (and I do not think it would be fair to let this stand in the way of accepting the paper), the authors should instead revise their paper to add as an **explicit** assumption that $g^t$ has bounded support and is bounded over its support (or clarify whether they take the support to be closed or not). + +1. Another concern revolves around the use of the word ""uncertainty"" to describe the basic premise of the paper. In the authors' model, this does not refer to uncertainty among the learners (all their observations are perfectly certain and deterministic), so it is not used in the sense that is standard in game theory and learning (cf. the classic works of Bertsekas, Dekel, Fudenberg, Tsitsiklis, and many others). Instead, the authors' use of the word seems to refer to some ""outside spectator"" who can only partially guess the players' initial conditions, and tries to guess the evolution of the players' mixed strategies (but still has full information about the learning model that players use, its parameters, etc.). However, this model is not fleshed out in sufficient detail by the authors, so the term ""uncertainty"" does not seem appropriate here. + + During the rebuttal phase, the authors argued that the goal of their paper is ""bringing the notion of DE to machine learning audience's attention as a measure of uncertainty, explaining how the change of DE is related to the Jacobian of the underlying dynamical systems"" and they asked ""that [the paper's] title remains as is"". While I am sympathetic to the authors' request, the fact remains that the current title (and part of the discussion in the abstract) is not representative of the paper. + + Given the authors' stated objective, the simplest solution would be to frame the paper as the ""evolution of differential entropy under..."" or the ""evolution of spectator/observer uncertainty"" or something of the sort. Both titles carry more information and, based on the authors' input, are more appropriate for the range of ideas the authors wish to convey – but simply saying ""uncertainty"" goes against the established terminology of the field. + +Overall, I would urge the authors to avoid vague/ambiguous terminology and statements, and focus instead on exact mathematical definitions that are not open to interpretation. The ideas presented in the paper are interesting and fresh, so they deserve a likewise sharp and precise treatment.",ICLR2022, +aJtXl1ZqJZ,1576800000000.0,1576800000000.0,1,SkgKO0EtvS,SkgKO0EtvS,Paper Decision,Accept (Poster),It seems to be an interesting contribution to the area. I suggest acceptance.,ICLR2020, +xbpqULHUzW,1576800000000.0,1576800000000.0,1,Skg9aAEKwH,Skg9aAEKwH,Paper Decision,Reject,"This paper proposes a technique for training embodied agents to play Visual Hide and Seek where a prey must navigate in a simulated environment in order to avoid capture from a predator. The model is trained to play this game from scratch without any prior knowledge of its visual world, and experiments and visualizations show that a representation of other agents automatically emerges in the learned representation. Results suggest that, although agent weaknesses make the learning problem more challenging, they also cause useful features to emerge in the representation. + +While reviewers found the paper explores an interesting direction, concerns were raised that many claims are unjustified. For example, in the discussion phase a reviewer asked how can one infer ""hider learns to first turn away from the seeker then run away"" from a single transition frequency? Or, the rebuttal mentions ""The agent with visibility reward does not get the chance to learn features of self-visibility because of the limited speed hence the model received samples with significantly less variation of its self-visibility, which makes learning to discriminate self-visibility difficult"". What is the justification for this? There could be more details in the paper and I'd also like to know if these findings were reached purely by looking at the histograms or by combining visual analysis with the histograms. + +I suggest authors address these concerns and provide quantitative results for all of the claims in an improved iteration of this paper. ",ICLR2020, +AGxF8VwgaRd,1642700000000.0,1642700000000.0,1,xNOVfCCvDpM,xNOVfCCvDpM,Paper Decision,Accept (Poster),"This paper demonstrates that current post-hoc methods to explain black-box models are not robust to spurious signals based on three metrics especially when the spurious signals are implicit or unknown. Technical novelty is limited because the paper presents primarily empirical results instead of novel machine learning techniques. However, the problem is very important and timely, and significance to the field and potential impact of the presented results to advance the field are high as reviewers emphasized. There are ways to further improve the paper, including the clarity of presentation, although the authors improved in the revised manuscript. Overall, this paper deserves borderline acceptance.",ICLR2022, +8q0p0rM_Kt,1642700000000.0,1642700000000.0,1,0no8Motr-zO,0no8Motr-zO,Paper Decision,Accept (Poster),"In this paper, the authors introduce an exploration method for RL according to experimental design perspective via designing an acquisition function, which quantifies how much information a state-action pair would provide about the optimal solution to a Markov decision process, and the state-action that maximizes such acquisition function will be used for sampling for policy update. The empirical evidences show the proposed method is promising. + +Since most of the reviewers support the paper, I recommend acceptance of this submission. + +However, besides the questions raised by the reviewers, e.g., computation cost and planning quality from CEM, there is a major issue need to be clarified in the paper: + +>The algorithm designed for RL with generative model, which makes the state-action reset can be conducted (this is sometimes impossible in practice where the agent must start from initial state). This is different from the common RL setting, and thus reduce the complexity of RL. This should be emphasized in the paper. Meanwhile, for a fair comparison, this should be explicitly specified in experiment setting.",ICLR2022, +8Gsy5D145Rp,1642700000000.0,1642700000000.0,1,RRGVCN8kjim,RRGVCN8kjim,Paper Decision,Accept (Poster),"This paper proposes to modify DETR, a recent Transformer-based architecture for object detection. +More precisely, they propose to sparsify input feature maps by learning an extra classifier to select which input features (few of them) will be used in the attention module. The supervision of this classifier is guided by second extra module coming from the Transformer decoder attention weights. +The resulting framework, called Sparse DETR, is an efficient end-to-end object detection architecture that allows to overcome the main computational bottleneck of DETR. Sparse DETR can use only 10%-50% of the original encoder query while achieving DETR comparable results. + +Authors tried to answer to all the questions raised during the rebuttal. +Even if the final scores are still contrasted, most of reviewers are very positive. +Overall, this paper provides valuable insight into reducing computational complexity (by reducing the number of queries) for DETR-like detectors. The proposed method is novel and technically sound. Even if there are some tricks to make the whole working well, this work is more effective and efficient than previous propositions to handle this complexity problem, bringing us some new insight of how to sparsify queries without performance dropping. +All these elements lead me to propose this paper for publication at ICLR.",ICLR2022, +SLyRt7Anr,1576800000000.0,1576800000000.0,1,SJeLIgBKPS,SJeLIgBKPS,Paper Decision,Accept (Talk),"This paper studies the implicit regularization of the gradient descent in homogeneous and shows that when the training loss falls below a threshold, then the smoothed. This study generalizes some of the earlier related works by relying on weaker assumptions. Experiments on MNIST and CIFAR-10 are provided to backup the theoretical findings of the paper. +R2 had some concern about one of the assumptions in this work (A4). While authors admitted that (A4) may not hold for all neural networks and all datasets, they stressed that this assumptions is reasonable when the network is overparameterized and can perfectly fit the training data. Overall, all reviewers are very positive about this submission and find a valuable step toward understanding implicit regularization. +",ICLR2020, +QmaaJ-k17w,1576800000000.0,1576800000000.0,1,BJe8pkHFwS,BJe8pkHFwS,Paper Decision,Accept (Poster),"All three reviewers advocated acceptance. The AC agrees, feeling the paper is interesting. ",ICLR2020, +HJg_VYOleV,1544750000000.0,1545350000000.0,1,SJerEhR5Km,SJerEhR5Km,"Borderline and not ideally reviewed, but not quite ready",Reject,"This paper extends the transformer model of Vashwani et al. by replacing the sine/cosine positional encodings with information reflecting the tree stucture of appropriately parsed data. According to the reviews, the paper, while interesting, does not make the cut. My concern here is that the quality of the reviews, in particular those of reviewers 2 and 3, is very sub par. They lack detail (or, in the case of R2, did so until 05 Dec(!!)), and the reviewers did not engage much (or at all) in the subsequent discussion period despite repeated reminders. Infuriatingly, this puts a lot of work squarely in the lap of the AC: if the review process fails the authors, I cannot make a decision on the basis of shoddy reviews and inexistent discussion! Clearly, as this is not the fault of the authors, the best I can offer is to properly read through the paper and reviews, and attempt to make a fair assessment. + +Having done so, I conclude that while interesting, I agree with the sentiment expressed in the reviews that the paper is very incremental. In particular, the points of comparison are quite limited and it would have been good to see a more thorough comparison across a wider range of tasks with some more contemporary baselines. Papers like Melis et al. 2017 have shown us that an endemic issue throughout language modelling (and certainly also other evaluation areas) is that complex model improvements are offered without comparison against properly tuned baselines and benchmarks, failing to offer assurances that the baselines would not match performance of the proposed model with proper regularisation. As some of the reviewers, the scope of comparison to prior art in this paper is extremely limited, as is the bibliography, which opens up this concern I've just outlined that it's difficult to take the results with the confidence they require. In short, my assessment, on the basis of reading the paper and reviews, is that the main failing of this paper is the lack of breadth and depth of evaluation, not that it is incremental (as many good ideas are). I'm afraid this paper is not ready for publication at this time, and am sorry the authors will have had a sub-par review process, but I believe it's in the best interest of this work to encourage the authors to further evaluate their approach before publishing it in conference proceedings.",ICLR2019,2: The area chair is not sure +cOtUbdqMZO,1576800000000.0,1576800000000.0,1,rke3TJrtPS,rke3TJrtPS,Paper Decision,Accept (Poster),"The paper proposes a new algorithm for solving constrained MDPs called Projection Based Constrained Policy Optimization. Compared to CPO, it projects the solution back to the feasible region after each step, which results in improvements on some of the tasks considered. + +The problem addressed is relevant, as many tasks could have important constraints e.g. concerning fairness or safety. + +The method is supported through theory and empirical results. It is great to have theoretical bounds on the policy improvement and constraint violation of the algorithm, although they only apply to the intractable version of the algorithm (another approximate algorithm is proposed that is used in practice). The experimental evidence is a bit mixed, with the best of the proposed projections (based on the KL approach) sometimes beating CPO but also sometimes being beaten by it, both on the obtained reward and on constraint satisfaction. + +The method only considers a single constraint. I'm not sure how trivial it would be to add more than one constraint. The reviewers also mention that the paper does not implement TRPO as in the original paper, as in the original paper the step size in the direction of the natural gradient is refined with a line search if the original step size (calculated using the quadratic expansion of the expected KL) does violate the original constraints. (Line search on the constraint as mentioned by the authors would be a different issue). Futhermore, the quadratic expansion of the KL is symmetric around the current policy in parameter space. This means that starting from a feasible solution the trust region should always overlap with the constraint set when feasibility is maintained, going somewhat agains the argument for PCPO as opposed to CPO brought up by the authors in the discussion with R2. I would also show this symmetry in illustrations such as Fig 1 to aid understanding. + + +",ICLR2020, +sfFS3bZE19,1610040000000.0,1610470000000.0,1,wiSgdeJ29ee,wiSgdeJ29ee,Final Decision,Reject,"This paper proposes a method for offline reinforcement learning methods with model-based policy optimization where they first learn a model of the environment to learn the transition dynamics, a critic and the policy in an offline manner. They basically learn the model by training an ensemble of probabilistic dynamics models represented by neural networks that output a diagonal Gaussian distribution over the next state and reward. Then they use the covariance of the probabilistic dynamics model to get an uncertainty measure that they incorporate into the reward when training it with the AWAC. + +There were two main concerns raised by the reviewers: + +1) Experiments: As pointed out by the reviewers, the experimental gains don't look very convincing. In particular, the performance of AWAC looks bad, and MB2PO doesn't give much gains on top of it. It is not clear how much better the proposed method is doing on the tasks that it does well, without any confidence intervals or variance measures provided. + +2) Novelty: This is almost a trivial combination of two existing ideas: model based policy optimization and AWAC. It is not clear how useful this particular combination is in practice, and it seems like there is not much insights gained from it. + +I think better motivations, further ablations and more empirical analysis to understand the proposed model better. For example, analyzing the types of behaviors learned or how calibrated the uncertainty estimates that is incorporated into the reward is or some hyperparameter sensitivity analysis would make the paper more interesting. + +As it stands right now, I am suggesting to reject this paper. I hope the authors will improve the paper for the future... + +",ICLR2021, +jqgpju4XN,1576800000000.0,1576800000000.0,1,Bklr0kBKvB,Bklr0kBKvB,Paper Decision,Reject,This paper offers an improved attack on 3-D point clouds. Unfortunately the clarity of the contribution is unclear and on balance insufficient for acceptance.,ICLR2020, +yyi-Js8zp7o,1610040000000.0,1610470000000.0,1,_ojjh-QFiFr,_ojjh-QFiFr,Final Decision,Reject,"While the paper's topic is on a topic of interest and presents an evaluation on three synthetic datasets, PartNet-Chairs, Shop-VRB-Simple, CLEVR dataset, several concerns and weaknesses remain after the author response. + +Main Concern and Weaknesses: +* The main improvement comes from the additional supervision provided by language, which provides a strong supervision signal as the language is scripted and the parser has nearly ""perfect accuracy (>99.9%) on test questions/captions"", as the authors state. +* Limited contribution: combination of MONet/Slot-Attention with NS-CL; +* Experiments limited to synthetic images with no background (relatively easy to segment) and synthetic (templated) language (easy to parse). This is especially concerning when the task is segmentation and the supervision comes from templated language, making it a strong supervision signal. +* The positive impact of the objectness score on performance was not sufficiently demonstrated +* Additionally, in the final discussion phase, reviewers were concerned that the with limited visual reasoning training on a subset of 25% of the data, reduces the performance [I note that I did not take this as the decisive reason for rejection as the authors did not have a chance to respond to this concern but the authors should discuss this in any future version of the paper] + +The paper initially received borderline and reject scores and the authors took significant effort to address several of the comments of reviewers. While the paper was improved several of the main concerns remained and reviewers recommended reject after reading the author response and each others comments. + +I agree to the concerns and recommend reject.",ICLR2021, +By7X6fUue,1486400000000.0,1486400000000.0,1,Byx5BTilg,Byx5BTilg,ICLR committee final decision,Reject,The reviewers unanimously recommend rejection.,ICLR2017, +S1lkDRr8xN,1545130000000.0,1545350000000.0,1,HJgyAoRqFQ,HJgyAoRqFQ,"Promising novel idea for RNN training, with too limited experiments",Reject,"The paper is well written and develops a novel and original architecture and technique for RNNs to learn attractors for their hidden states (based on an auxiliary denoising training of an attractor network). All reviewers and AC found the idea very interesting and a promising direction of research for RNNs. However all also agreed that the experimental validation was currently too limited, in type and size of task and data, as in scope. Reviewers demand experimental comparisons with other (simpler) denoising / regularization techniques; more in depth experimental validation and analysis of the state-denoising behaviour; as well as experiments on larger datasets and more ambitious tasks. ",ICLR2019,4: The area chair is confident but not absolutely certain +_StETeCIc8,1576800000000.0,1576800000000.0,1,H1gza2NtwH,H1gza2NtwH,Paper Decision,Reject,"The reviewers all appreciated the importance of the topic: understanding the local geometry of loss surfaces of large models is viewed as critical to understand generalization and design better optimization methods. + +However, reviewers also pointed out the strength of the assumptions and the limitations of the empirical study. Despite the claim that these assumptions are weaker than those made in prior work, this did not convince the reviewers that the conclusion could be applied to common loss landscapes. + +I encourage the authors to address the points made by the reviewers and submit an updated version to a later conference.",ICLR2020, +By4hjz8_x,1486400000000.0,1486400000000.0,1,Byj72udxe,Byj72udxe,ICLR committee final decision,Accept (Poster),"The reviewers liked this paper quite a bit, and so for this reason it is a perfectly fine paper to accept. However, it should be noted that the area chair was less enthusiastic. The area chairs mentions that the model appears to be an extension of Gulcehre et al. and the Penn Treebank perplexity experiments are too small scale to be taken seriously in 2017. Instead of experimenting on other known large-scale language modeling setups, the authors introduce their own new dataset (which is 1 order of magnitude smaller than the 1-Billion LM dataset by Chelba et al). The new dataset might be a good idea, but the area chair doesn't understand why the authors do not run public available systems as baselines. This should have been fairly easy to do and would have significantly strengthen the result of this work. The PCs thus encourage to authors to take into account this feedback and consider updating their paper accordingly.",ICLR2017, +urt33-dYGB,1576800000000.0,1576800000000.0,1,rylxpA4YwH,rylxpA4YwH,Paper Decision,Reject,"The paper presents an extension of FID for conditional generation settings. While it's an important problem to address, the reviewers were concerned about the novelty and advantage of the proposed method over the existing methods. The evaluation is reported on toy datasets, and the significance is limited.",ICLR2020, +Hkx7pRnAyV,1544630000000.0,1545350000000.0,1,BkgiM20cYX,BkgiM20cYX,"nice, but unripe",Reject,"The paper proposes a novel approach to interfacing robots with humans, or rather vv: by mapping instructions to goals, and goals to robot actions. A possibly nice idea, and possibly good for more efficient learning. + +But the technical realisation is less strong than the initial idea. The original idea merits a good evaluation, and the authors are strongly encouraged to follow up on this idea and realise it, towards a stronger publication. + +It be noted that the authors refrained from using the rebuttal phase.",ICLR2019,5: The area chair is absolutely certain +Wss718Ki1yx,1610040000000.0,1610470000000.0,1,VVdmjgu7pKM,VVdmjgu7pKM,Final Decision,Accept (Poster),"This paper proposes a modular RNN architecture called SCOFF. The work was inspired by cognitive science(object file and schema) and was built upon previous work RIMs. The method is validated on tasks having multiple objects of the same type. + +Pros: +- It addresses an important problem in DNN -- systematic generalization. +- The proposal makes sense and is more flexible than RIM. +- Experimental results outperform baselines. + +Cons before rebuttal: +- The presentation of the algorithm is not very clear due to some confusing notations and missing details of algorithm steps. +- The comparison with baselines might not be fair due to extra parameters. +- The novelty is limited, because the only difference from RIM is weight sharing. + +The reviewers raised concerns listed in Cons. The authors successfully addressed concerns: they indicated that the comparison was fair with the same input to both; SCOFF is more flexible than RIM, and there is spatial attention to input. +The authors added the missing details in the revised version. + +All reviewers agree that the problem is important and the idea is interesting. Since the authors' rebuttal was very helpful in clarifying the questions raised, I recommend accept. +",ICLR2021, +E4PCaaW86Q,1642700000000.0,1642700000000.0,1,Q42O1Qaho5N,Q42O1Qaho5N,Paper Decision,Reject,"This paper presents a generative model for geometric graphs. The main contribution is to separate the representation and generation of geometry from that of graph structure and features. Based on this idea the authors assembled a set of existing ideas and built an auto-encoder style generative model for geometric graphs. + +This paper sits on the borderline, with reviewers split on both sides. I appreciate the clarifications from the authors during the rebuttal and the interactions with the reviewers. The main concern is the novelty of this approach, as the main contribution is the idea of separating geometry from graph structure, and most other components of the pipeline already exist in the literature. Because of this I think the paper can probably devote a bit more to this ablation study. In particular the paper currently lacks detail about whether the size of the models were controlled when doing the ablation, which could be a confounding factor that explains why the joint model with both geometry and graph structure works better. Also the different architecture choices may also factor into the difference, it would be more convincing if for example the same combination of multi-head attention blocks and GINE networks are used for the ablated graph encoder (you can simply concatenate the features from both on all layers, or even at the end). + +Based on this I would recommend rejection at this time but encourage the authors to improve the paper and send it to the next venue.",ICLR2022, +UHlarmyV1KD,1642700000000.0,1642700000000.0,1,rl8jF3GENq,rl8jF3GENq,Paper Decision,Reject,"This paper received 5 quality reviewers, where 3 of them rated 5 and 2 rated 3. While this work has merits, many concerns are raised by various reviewers. The AC agrees with the reviewers that this paper is not ready for publication at its current form.",ICLR2022, +Y1Me_gVCgE,1576800000000.0,1576800000000.0,1,HJxnM1rFvr,HJxnM1rFvr,Paper Decision,Reject,"The paper introduces additional layers on top BERT type models for disentangling of semantic and positional information. The paper demonstrates (small) performance gains in transfer learning compared to pure BERT baseline. + +Both reviewers and authors have engaged in a constructive discussion of the merits of the proposed method. Although the reviewers appreciate the ideas and parts of the paper the consensus among the reviewers is that the evaluation of the method is not clearcut enough to warrant publication. + +Rejection is therefore recommended. Given the good ideas presented in the paper and the promising results the authors are encouraged to take the feedback into account and submit to the next ML conference. ",ICLR2020, +BJvxVyTBz,1517250000000.0,1517260000000.0,280,SJi9WOeRb,SJi9WOeRb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The paper presents the Stein gradient estimator, a kernelized direct estimate of the score function for implicitly defined models. The authors demonstrate the estimator for GANs, meta-learning for approx. inference in Bayesian NNs, and approximating gradient-free MCMC. The reviewers found the method interesting and principled. The GAN experiments are somewhat toy-ish as far as I am concerned, so I'd encourage the authors to try out larger-scale models if possible, but otherwise this should be an interesting addition to ICLR.",ICLR2018, +ckoVaKEFI15,1642700000000.0,1642700000000.0,1,66kgCIYQW3,66kgCIYQW3,Paper Decision,Reject,"The authors consider the task of interpretable video classification. First, a set of binary “concepts'' is predicted, and these concept features are then used for classifying a video. The set itself is automatically generated from natural language descriptions, instead of relying on expert annotations. The authors collect two datasets to validate the proposed approach and show that the model can match the performance of a standard video classification model, while being interpretable. + +The reviewers felt that the paper was well written and that the method and empirical results were clearly outlined. They also appreciated the empirical results whereby interpretability doesn’t necessarily come at the expense of accuracy and consider interpretability as a desirable property. The main reason for the borderline results is the heuristic nature of the proposed automatic concept labeling and the empirical evaluation against alternative baselines. In particular, one needs to **show that the proposed method generalises to other datasets**. Secondly, one of the main contributions, namely the automatic **concept extraction, still ends up requiring human annotation in the form of narrations**, and this cost should be quantified and contextualised. + +I suggest the authors address these points and resubmit.",ICLR2022, +4m7aVq5CABZ,1642700000000.0,1642700000000.0,1,3eIrli0TwQ,3eIrli0TwQ,Paper Decision,Accept (Poster),"This paper proposes a technique to improve membership inference attacks by +carefully applying ""difficulty calibration"" to improve the attack success +rate. The reviewers are split on this paper. They all generally agree on +the facts: the paper introduces a (somewhat) new technique and performs a +solid evaluation, but the novelty on top of prior work isn't all that high. + +On the whole I believe this paper should be accepted. This paper has identified +a very clear problem with existing attacks (poor performance at low false +positive rate) and has carefully developed a way to improve on this metric. A +thorough evaluation has convinced the reviewers that this paper does what it +set out to do. + +It comes down to a question of novelty then. And here the question is this: +does someone who reads this paper learn something new that wasn't obvious +before? Part of this can be novelty in the method---and I agree with the +reviewers that this paper lacks novely in the method. However the paper does +not lack novelty in the ideas on the whole. While Long et al., Sablayrolles et +al., and Carlini et al. do all use some kind of low false positive rate +evaluation and calibrate for low loss, none of these papers actually go out +of their way to evaluate this fact explicitly. And so even if this paper +had no technical contribution at all, the simple measurement study in and +of itself would be a useful insight. + +Machine learning research at present focuses fairly heavily on novelty +of the techniques. While this is good, it's also important to go back and +actually evaluate what we have. That's what this paper does, and it does +it well enough to be worth accepting. + +The paper would definitely be improved by following some of the advice of +the reviewers and including comparisons to prior work (e.g., especially +clarifying the relationship to Sablayrolles et al. and if it is true that +this attack is a simplification of this prior one and is thus less effective) +and I hope the authors will take the opportunity to do this.",ICLR2022, +Bkgq8tC-kN,1543790000000.0,1545350000000.0,1,SJzYdsAqY7,SJzYdsAqY7,Reject,Reject,"Reviewer scores straddle the decision boundary but overall this does work does not meet the bar yet. Even after discussion with the authors, the reviewers reconfirmed there 'reject' recommendation and the area chair agrees with that assessment.",ICLR2019,4: The area chair is confident but not absolutely certain +SJeS2P7teN,1545320000000.0,1545350000000.0,1,H1ltQ3R9KQ,H1ltQ3R9KQ,metareview ,Reject,"The reviewers raised a number of concerns including insufficiently demonstrated benefits of the proposed methodology, lack of explanations, and the lack of thorough and convincing experimental evaluation. The authors’ rebuttal failed to alleviate these concerns fully. I agree with the main concerns raised and, although I also believe that the work can result eventually in a very interesting paper, I cannot suggest it at this stage for presentation at ICLR.",ICLR2019,5: The area chair is absolutely certain +nbd1DQgQ5aM,1610040000000.0,1610470000000.0,1,uUX49ez8P06,uUX49ez8P06,Final Decision,Reject,"The authors propose a network-expandable approach to tackle NAS in the continual learning setting. More specifically, they use a RNN controller to decide which neurons to use (for a new task) and the additional capacity required (i.e., number of new neurons to add). This work can be viewed as an extension of RCL and as such suffers from the large runtime. This was a concern for most reviewers. While reviewers highlighted the gains in the experiments conducted, several questions remained regarding the efficiency of the proposed approach and how it compares to other strategies. The practical relevance of the proposed approach was also a concern as its application requires to restrict it to models of modest size. +",ICLR2021, +Hz286cLG6d,1576800000000.0,1576800000000.0,1,HJe88xBKPr,HJe88xBKPr,Paper Decision,Reject,"This paper propose a method to train DNNs using 8-bit floating point numbers, by using an enhanced loss scaling method and stochastic rounding method. However, the proposed method lacks novel and both the paper presentation and experiments need to be improved throughout. ",ICLR2020, +B160SJTrz,1517250000000.0,1517260000000.0,688,Hyig0zb0Z,Hyig0zb0Z,ICLR 2018 Conference Acceptance Decision,Reject,"Pros +-- Competitive results on LibriSpeech. +Cons +-- Limited novelty, and lacks enough comparisons. +-- Comparison with other end-to-end approaches, and on other commonly used datasets, like WSJ, missing. +-- Gated convnets have already been proposed. +-- Letter based systems have been shown to be competitive to phone based systems. +-- Optimization criterion is quite similar to lattice-free MMI proposed by Povey et al., but with a letter based LM and a slightly different HMM topology. + +Given the cons pointed out by reviews, the AC is recommending that the paper be rejected. + +",ICLR2018, +HJlsEJ5lxE,1544750000000.0,1545350000000.0,1,HyMS8iRcK7,HyMS8iRcK7,a bar is higher for a new memory augmented neural network,Reject,"there have been many variants of memory augmented neural nets since around 2014 when NTM, attention-based NMT and MemNet were proposed. it is indeed still an interesting and important direction of research, but the bar for introducing yet another variant of memory-augmented neural nets has been significantly raised, which is a sentiment shared by the reviewers. the author's response had not swayed the reviewers' opinion, and i am sticking to the reviewers' decisions. + +i believe more streamlined and systematic comparison among different memory augmented networks across many different benchmarks (e.g., use the same set of latest variants of memory nets across all the benchmarks) in this submission would make it a better paper and increase the chance of acceptance. ",ICLR2019,4: The area chair is confident but not absolutely certain +IBgCFlx9Koo,1610040000000.0,1610470000000.0,1,nkIDwI6oO4_,nkIDwI6oO4_,Final Decision,Accept (Poster),"The paper proposed Twin L2O (learning to optimize) for extending L2O from minimization to minimax problems. The authors honestly discussed the limitation of Twin L2O and proposed two improvements upon it with better generalization/transferability. While some reviewer had some concerns on the motivation of applying L2O to solve minimax problems and the motivation of the loss-function design (why objective-based one is chosen but not gradient-based one), the authors have done a particularly good job in the rebuttal. Even though this is more a proof-of-concept paper, it indeed has novel and solid contributions, and should be accept for publication.",ICLR2021, +R1dwCCoDpDy,1642700000000.0,1642700000000.0,1,sTkY-RVYBz,sTkY-RVYBz,Paper Decision,Reject,"The authors have proposed a new consistency loss for improving model robustness to common corruptions. With a student-teacher training setup, only the student network uses batch normalization at training time. Improvements are shown on small scale corruption datasets (CIFAR-C), a single domain generalization dataset (VLCS), and RobustPointSet. + +Though, positive feedback were given on the quality of the story telling, and on an interesting motivation by a few toy examples, some concerns remained among the reviewers. +In particular applicability of the method as model and data sizes increases, e.g., on ImageNet-C, was questioned. +After Additional results were provided by the authors, the method seems to break as scales increases. +The way relevant baselines from previous work was also judged light and should be improved. +Hence, the paper could be improved to include more comparisons and more convincingly showing advantages of the method.",ICLR2022, +d9Ruu4a58p,1610040000000.0,1610470000000.0,1,bhngY7lHu_,bhngY7lHu_,Final Decision,Reject,"This paper studies n-step returns in off-policy RL and introduces a novel algorithm which adapts the return’s horizon n in function of a notion of policy’s age. +Overall, the reviewers found that the paper presents interesting observations and promising experimental results. However, they also raised concerns in their initial reviews, regarding the clarity of the paper, its theoretical foundations and its positioning (notably regarding the bias/variance tradeoff of uncorrected n-step returns) and parts of the experimental results. +In the absence of rebuttal or revised manuscript from the authors, not much discussion was triggered. Based on the initial reviews, the AC cannot recommend accepting this paper, but the authors are encouraged to pursue this interesting research direction. +",ICLR2021, +S1g0jpfQxV,1544920000000.0,1545350000000.0,1,Bke96sC5tm,Bke96sC5tm,"Interesting ideas, but the paper can be improved.",Reject,"This paper proposes a method to learn representations to infer simple local models that can be used for policy improvement. All the reviewers agree that the paper has interesting ideas, but they found the main contribution to be a bit weak and the experiments to be insufficient. + +Post rebuttal, the reviewers discussed extensively with each other and agreed that, given more work is done on a clear presentation and improving the experiments, this paper can be accepted. In its current form however, the paper is not ready to be accepted. I have recommended to reject this paper, but I will encourage the authors to resubmit after improving the work. +",ICLR2019,5: The area chair is absolutely certain +4opPZcYr3O9,1642700000000.0,1642700000000.0,1,f2OYVDyfIB,f2OYVDyfIB,Paper Decision,Accept (Poster),"The results reported in this paper and the model checkpoints released are of interest and broad utility to the community in the opinion of the NLP. While one reviewer was somewhat negative, most reviewers were in favor of acceptance of this paper, which expands the results from [1] to downstream tasks. The AC therefore recommends acceptance.",ICLR2022, +0kRf-11Av,1576800000000.0,1576800000000.0,1,Skltqh4KvB,Skltqh4KvB,Paper Decision,Reject,"This paper conducted a number of empirical studies to find whether units in object-classification CNN can be used as object detectors. The claimed conclusion is that there are no units that are sufficient powerful to be considered as object detectors. Three reviewers have split reviews. While reviewer #1 is positive about this work, the review is quite brief. In contrast, Reviewer #2 and #3 both rate weak reject, with similar major concerns. That is, the conclusion seems non-conclusive and not surprising as well. What would be the contribution of this type of conclusion to the ICLR community? In particular, Reviewer #2 provided detailed and well elaborated comments. The authors made efforts to response to all reviewers’ comments. However, the major concerns remain, and the rating were not changed. The ACs concur the major concerns and agree that the paper can not be accepted at its current state.",ICLR2020, +HkVzQ1pBG,1517250000000.0,1517260000000.0,87,S1XolQbRW,S1XolQbRW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The submission proposes a method for quantization. The approach is reasonably straightforward, and is summarized in Algorithm 1. It is the analysis which is more interesting, showing the relationship between quantization and adding Gaussian noise (Appendix B) - motivating quantization as regularization. + +The submission has a reasonable mix of empirical and theoretical results, motivating a simple-to-implement algorithm. All three reviewers recommended acceptance.",ICLR2018, +S1l3VC9rl4,1545080000000.0,1545350000000.0,1,HJxeWnCcF7,HJxeWnCcF7,Novel framework for learning non-euclidean embeddings,Accept (Poster),"This paper proposes a novel framework for tractably learning non-eucliean embeddings that are product spaces formed by hyperbolic, spherical, and Euclidean components, providing a heterogenous mix of curvature properties. On several datasets, these product space embeddings outperform single Euclidean or hyperbolic spaces. The reviewers unanimously recommend acceptance.",ICLR2019,5: The area chair is absolutely certain +rJxndhWZlV,1544790000000.0,1545350000000.0,1,HkeILsRqFQ,HkeILsRqFQ,An interesting question but there are issues with the presentation and analysis,Reject,"Dear authors, + +The reviewers all appreciated the question you are asking and the study of the impact of each layer is definitely an interesting one. + +They were however uncertain about the actual metrics you used to emphasize your points. Further, as you noted, there were quite a few presentation issues that led to skepticism of the reviewers, despite them spending quite a bit of time reading the paper and engaging in discussion. + +Hence, I regret to inform you that your work is not yet ready for publication. A more focused analysis would be a great addition to the questions you raise.",ICLR2019,3: The area chair is somewhat confident +T1IB0wwzex,1576800000000.0,1576800000000.0,1,rJxwDTVFDB,rJxwDTVFDB,Paper Decision,Reject,"The reviewers have uniformly had significant reservations for the paper. Given that the authors did not even try to address them, this suggests the paper should be rejected.",ICLR2020, +U62l4vJvvv,1576800000000.0,1576800000000.0,1,ByeAK1BKPB,ByeAK1BKPB,Paper Decision,Reject,"The paper proposes a tensor decomposition method that interpolates between Tucker and CP decompositions. The authors also propose an optimization algorithms (AdaImp) and argue that it has superior performance against AdaGrad in this tesnor decomposition task. The approach is evaluated on some NLP tasks. +The reviewers raised some concerns related to clarity, novelty, and strength of experiments. As part of addressing reviewers concerns, the authors reported their own results on MurP and Tucker (instead of quoting results from reference papers). While the reviewers greatly appreciated these experiments as well as authors' response to their questions and feedback, the concerns largely remained unresolved. In particular, R2 found the gain achieved by AdaImp not significantly large compared to Adagrad. In addition, R2 found very limited evaluation on how AdaImp outperforms Adagrad (thus little evidence to support that claim). Finally, AdaImp lacks any theoretical analysis (unlike Adagrad).",ICLR2020, +ArfAd6UI48,1576800000000.0,1576800000000.0,1,rylnK6VtDH,rylnK6VtDH,Paper Decision,Accept (Poster),"This paper provides a unifying perspective regarding a variety of popular DNN architectures in terms of the inclusion of multiplicative interaction layers. Such layers increase the representational power of conventional linear layers, which the paper argues can induce a useful inductive bias in practical scenarios such as when multiple streams of information are fused. Empirical support is provided to validate these claims and showcase the potential of multiplicative interactions in occupying broader practical roles. + +All reviewers agreed to accept this paper, although some concerns were raised in terms of novelty, clarity, and the relationship with state-of-the-art models. However, the author rebuttal and updated revision are adequate, and I believe that this paper should be accepted.",ICLR2020, +fCtEhLyEV7_,1610040000000.0,1610470000000.0,1,U4XLJhqwNF1,U4XLJhqwNF1,Final Decision,Accept (Poster),"This paper presents a simple yet effective approach to improve self-supervised contrastive approaches like MoCo. There are concerns with respect to novelty/simplicity and low improvements over MoCov2. AC believes that simplicity is good and while gains might not be as huge, they still show usefulness of new loss. It might also provide insights for future papers on self-supervised learning. Overall, the sentiment is that paper is above the bar.",ICLR2021, +H1eyie3leV,1544760000000.0,1545350000000.0,1,HyxnZh0ct7,HyxnZh0ct7,"A closed form solver for the base learner is new in the meta-learning literature, and the experiments are sufficiently carried out to show its effectiveness.",Accept (Poster),"The reviewers disagree strongly on this paper. Reviewer 2 was the most positive, believing it to be an interesting contribution with strong results. Reviewer 3 however, was underwhelmed by the results. Reviewer 1 does not believe that the contribution is sufficiently novel, seeing it as too close to existing multi-task learning approaches. + +After considering all of the discussion so far, I have to agree with reviewer 2 on their assessment. Much of the meta learning literature involves changing the base learner *for a fixed architecture* and seeing how it affects performance. There is a temptation to chase performance by changing the architecture, adding new regularizers, etc., and while this is important for practical reasons, it does not help to shed light on the underlying fundamentals. This is best done by considering carefully controlled and well understood experimental settings. Even still, the performance is quite good relative to popular base learners. + +Regarding novelty, I agree it is a simple change to the base learner, using a technique that has been tried before in other settings (linear regression as opposed to classification), however its use in a meta learning setup is novel in my opinion, and the new experimental comparison regression on top of pre-trained CNN features helps to demonstrate the utility of its use in the meta-learning settings. + +While the novelty can certainly be debated, I want to highlight two reasons why I am opting to accept this paper: 1) simple and effective ideas are often some of the most impactful. 2) sometimes taking ideas from one area (e.g., multi-task learning) and demonstrating that they can be effective in other settings (e.g., meta-learning) can itself be a valuable contribution. I believe that the meta-learning community would benefit from reading this paper. +",ICLR2019,4: The area chair is confident but not absolutely certain +Bke1ae84lV,1545000000000.0,1545350000000.0,1,ByGuynAct7,ByGuynAct7,"Good paper, but need to some discussion",Accept (Poster),"This paper proposes factorized prior distributions for CNN weights by using explicit and implicit parameterization for the prior. The paper suggest a few tractable methods to learn the prior and the model jointly. The paper, overall, is interesting. + +The reviewers have had some disagreement regarding the effectiveness of the method. The factorized prior may not be the most informative prior and using extra machinery to estimate it might deteriorates the performance. On the other hand, estimating a more informative prior might be difficult. It is extremely important to discuss this trade-off in the paper. I strongly recommend for the authors to discuss the pros and cons of using priors that are weakly informative vs strongly informative. + +The idea of using a hierarchical model has been around, e.g., see the paper on ""Hierarchical variational models"" and more recently ""semi-implicit Variational Inference"". Please include a related work on such existing work. Please discuss why your proposed method is better than these existing methods. + +Conditioned on the two discussions added to the paper, we can accept it. +",ICLR2019,5: The area chair is absolutely certain +sZhsHapd3Az,1642700000000.0,1642700000000.0,1,844kbKgwDL,844kbKgwDL,Paper Decision,Reject,"ICLR is selective and reviewers are not sufficiently enthusiastic about this paper. In particular, they point out closely related methods that should be cited and compared to as baselines. The reviews are of good quality, and the authors did not respond.",ICLR2022, +LjNmZTALroP,1610040000000.0,1610470000000.0,1,luGQiBeRMxd,luGQiBeRMxd,Final Decision,Reject,"The paper presents a new Bayesian optimization method based on the Gaussian process bandits framework for black-box adversarial attacks. The method achieves good performance in the experiments, which was appreciated by all the reviewers. + +At the same time, the presentation of the method is quite confusing, which currently precludes acceptance of the paper. In particular, during the discussion phase the reviewers were not able to decipher the algorithm based on the description presented in the paper. It is not clear how the problem is modeled as a bandit problem, what the loss function $\ell$ is minimized and why minimizing it makes sense (assuming, e.g., that $\ell$ it the hinge loss as suggested and the initial prediction is good with a large margin, that is, the loss is zero, equation 6 never changes $x_t$ when the procedure is started from $x$). This connection, since it is the fundamental contribution of the paper, should be much better explained. Once the problem is set up to estimate (maximize?) the reward, it is changed to calculating the difference in the minimization (cf. equation 11), which is again unmotivated. (Other standard aspects of the algorithm should also be explained properly, e.g., the stopping condition of Algorithm 1) + +Unfortunately, the paper is written in a mathematically very imprecise manner. As an example, consider equation (6), where $B_p$ and the projection operator are not defined, and while these can be guessed, a projection of the argmin seems to be missing as well in the end (otherwise nothing guarantees that $x_T$, which is the final outcome of the algorithm, remains in the $L_p$ ball). Another example is the $Discrete\ Approximate\ CorrAttack_{Flip}$ paragraph which requires that every coordinate of $x$ should be changed by $\pm\epsilon$. It is also not clear what ""dividing the image into several blocks"" means in Section 4.1 (e.g., are these overlapping, do they cover the whole image, etc., not to mention that previously $x$ was a general input, not necessarily an image). It is also unlikely that the stopping condition in Algorithm 1 would use the exact same $\epsilon$ for the acquisition function as the perturbation radius for adversarial examples, etc. While some of these inaccuracies and unclear definitions are also mentioned in the reviews, unfortunately there are more in the paper. + +The authors are encouraged to resubmit the paper to the next venue after significantly improving and cleaning up the presentation. + +",ICLR2021, +qHheFbwrabS,1610040000000.0,1610470000000.0,1,6SXNhWc5HFe,6SXNhWc5HFe,Final Decision,Reject,"This is interesting work, but not yet sufficiently mature for publication. Although the authors propose an novel algorithm and provide an analysis, the reviewers raised several criticisms about the comparison to previous work, the lack of any empirical evaluation, the strength and unnaturalness of the assumptions used to establish convergence. After discussion, the reviewers remained largely unsatisfied with the author responses to these questions, and none recommended accepting this paper.",ICLR2021, +Yz9_uC48Kb9,1642700000000.0,1642700000000.0,1,J7V_4aauV6B,J7V_4aauV6B,Paper Decision,Reject,"This paper analyzes the effects of the weight decay hyperparameter, and based on this analysis, proposes methods to schedule the weight decay. Overall, while I'm glad that more work is being done on understanding the effects of weight decay, I don't think this submission is of sufficient quality for ICLR. + +Theorem 1 is simply re-expressing the well-known fact that if the regularization version of weight decay is used, then (simply because it's based on a single objective function) the stationary points are invariant to the choice of learning rate. This may not be apparent due to the misused terminology: ""invariant"" is referred to as ""stable"", but ""stable stationary point"" has a technical meaning very different from the one used here. + +Corollary 2 essentially shows that the optimum of the regularized loss is different from the optimum of the unregularized loss. The authors conclude from this that the optimal value of lambda is 0 from the perspective of test error, which is unwarranted. + +Overall, the paper centers around the interaction between learning rates and the weight decay parameter. However, as various reviewers point out, this interaction has been analyzed in detail for networks with normalization layers, and normalization completely changes the nature of the interaction. So any analysis would either need to take this into account or limit the scope to networks without normalization. + +I encourage the authors to take the reviewers' feedback into account and improve the paper for the next submission cycle.",ICLR2022, +NZQ_JFEeRAf3,1642700000000.0,1642700000000.0,1,9n9c8sf0xm,9n9c8sf0xm,Paper Decision,Accept (Poster),"This paper takes on (in my view) one of the most important questions in the lottery ticket literature today: how small are the smallest lottery tickets that exist in our neural networks? Many methods have been proposed for finding weak lottery tickets (those that require training to reach full accuracy) and strong lottery tickets (those that do not), but we have no idea how close they come to finding the smallest lottery tickets. Moreover, in many cases, we only know how to find lottery ticket subnetworks early in training rather than at initialization. Is this a fundamental limitation on the existence of lottery tickets, or is this simply a limitation of our methods for finding them? I am personally very involved in lottery ticket conversations in the literature, and I believe I can speak with some authority when I say that these are vital questions where any progress is important. + +Moreover, these are exceedingly difficult research questions, and (again, in my view) the authors should be commended for taking them on. A naive approach to these questions would involve brute force search over all possible subnetworks, which is infeasible even on the smallest of toy examples, let alone the meaningful computer vision tasks where lottery ticket work typically focuses. + +I am sharing all of this information to provide background for my confident recommendation to accept this paper over the many legitimate concerns expressed by reviewers and those that I saw when reading the paper in detail. Those include that: +* This paper does not solve any of these research problems in their entirety. +* It focuses on toy networks smaller than those traditionally studied in the lottery ticket literature, and it is well known that lottery ticket behavior changes in character at larger scales. +* Planting good subnetworks may be an unrealistic proxy for the kinds of subnetworks that actually emerge naturally. +* There may be multiple good subnetworks in a network, not just the one that was planted. +* The graphs are a bit hard to read. +* I find the mix of pruning methods studied, which were designed with very different goals (pruning after training, pruning before training, finding strong lottery tickets), a bit confusing. + +**The bottom line:** With all of that said, in my view, the paper asks good questions and provides an initial foothold that other researchers will be able to build on as we seek more general answers. This is similar to the contributions made by Zhou et al., which started the conversation on strong lottery tickets, and potentially even Frankle & Carbin, which kicked off the lottery ticket discussion but got many things wrong. Both papers were good first attempts at solving big problems, and both were highly influential despite their flaws. Similarly, even if this submission isn't perfect in every way, this is among the most important kinds of contributions that a paper can make. For that reason, I strongly recommend acceptance under the belief that this paper will help to foster a valuable conversation in the literature. + +P.S. I really, truly, strongly beg the authors to redo their graphs following the style of some of the more user-friendly lottery ticket or pruning papers they have cited (e.g., Frankle et al., 2021). The graphs in this paper were really hard to parse. Really really really hard to parse. They're too small, the y-axis is often squished, gridlines would be helpful, the lines are overlapping in ways that are difficult to distinguish because the colors blend, etc. etc. This is quite possibly the biggest impediment I see to this paper's ability to have broader influence.",ICLR2022, +6GvOkLoZcLd,1642700000000.0,1642700000000.0,1,2sDQwC_hmnM,2sDQwC_hmnM,Paper Decision,Accept (Poster),"This paper considers the problem of on-device training for federated learning. This is an important problem since, in real-world settings, the clients have limited compute and memory, and local training needs to be efficient. The paper shows that the standard sparsity based speed-up techniques that consider top-K weights/activations during forward and/or backward pass do not work well in the federated setting and proposes several solutions to mitigate this issue. The proposed solutions are demonstrated to work well on several datasets. + +In their initial assessment, given that this is largely an empirical insights driven paper, the reviewers mainly expressed concerns about the experimental evaluation (e.g., only one dataset CIFAR10 and one architecture ResNet18) and lack of more baselines (e.g., Federated Dropout). The authors responded in detail to the reviews and also conducted additional experiments and the reviewers and authors engaged in discussion. As the discussion converged, the reviewers agreed that the revised manuscript addresses their key concerns and their assessment, on an average, are now learning largely towards a borderline accept. + +I also read the reviews, the discussion, and read the paper. I think the paper is a good initial attempt at providing a general approach to enable on-device federated learning when the clients are lightweight devices (e.g. edge devices). Even though the study is somewhat preliminary, the current manuscript, after the revision during the discussion phase, is significantly improved version of the original submission and does address the key concerns from the reviewers. Overall, I would rate the paper for a borderline acceptance.",ICLR2022, +TrPULW2s1OS,1610040000000.0,1610470000000.0,1,Oi-Kh379U0,Oi-Kh379U0,Final Decision,Reject,"This paper proposes a new and general formulation for supernet, which encodes supernet with tensor network(TN). The idea is interesting and motivated. However, the paper is well presented and the clarify needs to be further improved. The effectiveness of algorithm is not well justified and experimental results are less convincing even after additional results provided in the revision. Most importantly, it is not clear that the 'TENSORIZING' method can solve the current NAS's ineffectiveness problem. It is confirmed that the reference to ICLR-2021 paper is not used for the decision of paper. ",ICLR2021, +WYgCpx_-oRn,1642700000000.0,1642700000000.0,1,l9tb1bKyfMn,l9tb1bKyfMn,Paper Decision,Reject,"This paper proposes a simple change to Transformer architecture to improve efficiency. While the reviewers appreciate the writing, all the reviewers agree that the novelty and contributions of the paper are limited both in the problem being solved by the paper and the level of experiments in it. Authors did not respond to reviewer's comments. Hence I recommend rejection.",ICLR2022, +SylUS03elV,1544760000000.0,1545950000000.0,1,S1lKSjRcY7,S1lKSjRcY7,"Sensible ideas scattered throughout, but does not engage with similar earlier work.",Reject,"Strengths: This paper provides a useful review of some of the recent work on gradient estimators for discrete variables, and proposes both a computationally more efficient variant of one, and a new estimator based on piecewise linear functions. + +Weaknesses: Many new ideas are scattered throughout the paper. The notation is a bit dense. Comparisons to RELAX, which had better results than REBAR, are missing. Finally, it seems that REBAR was trained with a fixed temperature, instead of optimizing it during training, which is one of the main benefits of the method. + +Points of contention: Only R1 mentioned the omission of REBAR and RELAX. A discussion and a few comparisons to REBAR were added to the paper, but only in a few experiments. + +Consensus: This paper is borderline. I agree with R1: quality 6, clarity 8, originality 6, significance 4. All reviewers agreed that this was a decent paper but I think that R2 and R3 were relatively unfamiliar with the existing literature. + +Update for clarification: +===================== + +This section has been added to clarify the reasons for rejection. The abstract of the paper states: + +""We show that the commonly used Gumbel-Softmax estimator is biased and propose a simple method to reduce it. We also derive a simpler piece-wise linear continuous relaxation that also possesses reduced bias. We demonstrate empirically that reduced bias leads to a better +performance in variational inference and on binary optimization tasks."" + +The fact that Gumbel-Softmax is biased is well-known. Reducing its bias was the motivation for developing the _exactly_ unbiased REBAR method, which already has similar asymptotic complexity. A major side-benefit of using an exactly unbiased estimator is that the estimator's hyperparameters can be automatically tuned to reduce variance, as in REBAR and RELAX. + +This paper focuses on methods for reducing bias and variance, but hardly discusses related methods that already achieved its stated aims. This a major weakness of the paper. The experiments only compared with REBAR, and did not even tune the temperature to reduce variance (removing one of its major advantages). + +This reject decision is not made mainly on lack of experiments or state-of-the-art results. It's because the idea of reducing the bias of continuous-relaxation-based gradient estimators has already been fruitfully explored, and zero-bias CR estimators have been developed, but this work mostly ignores them. However, thorough experiments are always going to be necessary for a paper proposing biased estimators, because there are already many such estimators, and little theory to say which ones will work well in which situations. + +Suggestions to improve the paper: Run experiments on all methods that directly measure bias and variance. Incorporate discussion of REBAR throughout, not just in an appendix. Run comparisons against REBAR and RELAX without crippling their ability to reduce variance. Do more to characterize when different estimators will be expected to be effective.",ICLR2019,3: The area chair is somewhat confident +VsUGSijHry,1642700000000.0,1642700000000.0,1,aPOpXlnV1T,aPOpXlnV1T,Paper Decision,Accept (Poster),"The paper examines the approach of modeling aleatoric uncertainty by fitting a neural network, that estimates mean and variances of a heteorscedasitic Gaussian distribution, based on log likelihood maximization. The authors identify the problem that gradient based training on the netgative log likelihood (NLL) may result in suboptimal solutions where a high predicted variance compensates for the predicted mean being far from the true mean. To solve this problem, the authors suggest to adjust the log likelihood objective by weighting the log likelihood of each single data point by the corresponding beta-exponentiated variance estimate. This adjusted objective is referred to as beta-NLL. + +All reviewers agreed that the identified problem and the proposed solution are interesting, that the paper is well written and organized, and that the contributions are significant and somewhat new. The main criticism was on the side of the empirical evaluation. It was criticized that the empirical analysis did not compare the proposed method to other approaches to solving the same problem, that the identified problem and the proposed method should be also analyzed on high-dimensional data, that the results on the synthetic experiments could be improved by investigating more than a single run and by incorporating the the MSE in corresponding Figure 1, and that standard errors were not reported. + +Based on the reviews the authors added several new experiments and investigations in the revised version of their manuscript to improve their empirical analysis: 1) new experiments on high-dimensional data sets were conducted applying variational autoencoders on MNIST and Fashion MNIST and performing Depth-map prediction from images on the NYUv2 dataset. 2) For comparison several baseline methods were added to the experiments on the UCI and the dynamics datasets. 3) Three more UCI datasets (carbon, superconductivity, wine-white) were included in the empirical analysis. 4) An evaluation of calibration of predictive variance for the heteroscedastic sine dataset was added. 5) Several more repetitions of the experiment represented in Figure 1 were conducted. (6) An analysis of undersampling behavior on FetchPickAndPlace was added. Moreover, the authors reported two errors in their previous experiments that they discovered and corrected. + +All reviewers were satisfied with the changes in the revised version and the answers to their specific questions and increased their scores accordingly, now commonly voting for acceptance. The paper should therefore be accepted.",ICLR2022, +HJXImypBM,1517250000000.0,1517260000000.0,140,SygwwGbRW,SygwwGbRW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Important problem (navigation in unseen 3D environments, Doom in this case), interesting hybrid approach (mixing neural networks and path-planning). Initially, there were concerns about evaluation (proper baselines, ambiguous environments, etc). The authors have responded with updated experiments that are convincing to the reviewers. R1 did not participate in the discussion and their review has been ignored. I am supportive of this paper. ",ICLR2018, +88fXgi26t_w,1610040000000.0,1610470000000.0,1,h0de3QWtGG,h0de3QWtGG,Final Decision,Accept (Poster),"This paper presents a counterfactual approach to interpret aspects within a sequential decision-making setup. The reviewers have reacted to each others' comments as well as the authors' response to their views. I am recommending acceptance of this paper, as it targets an interesting problem and presents an intriguing approach. I think the community would appreciate further discussing this paper at the conference.",ICLR2021, +7W_1EdrSJU,1610040000000.0,1610470000000.0,1,9t0CV2iD5gE,9t0CV2iD5gE,Final Decision,Reject,"This paper proposes to automatically determine when the SGD step-size should be decreased, by running two ""threads"" of SGD for a bunch of iterations, divide those into windows, and then look at the average inner-product of the gradients in the two threads in each window. If the inner-product tends to be high, that indicates that there is still ""signal"" in the gradient and it should not be decreased. If it is low, that indicates that the gradient is mostly ""noise"". In the latter case, the learning rate is decreased by a factor of gamma and the length of the next phase is increased by gamma. + +Theorem 3.1 essentially assumes smoothness, a bounded fourth moment for the stochastic gradient, and that the stochastic gradient error is not too far from isotropic. Then it shows that if the step-size is set small enough, the standard deviation of the diagnostic (Q_i) can be upper-bounded in terms of the expected value of Q_i. It follows that the probability of Q_i being negative cannot be too large (bounded in terms of the step size eta and the length of the windows l). + +Theorem 3.2 adds the assumption of strong convexity and weakens the assumption on the gradient to a bounded second moment. Then it upper-bounds the expected value of the diagnostic in terms of its standard deviation. + +Proposition 3.4 gives a proof of convergence. As far as I can tell the proof is essentially that the learning rate decay can't be much worse than what would happen if the diagnostic *always* set to decrease. In particular: (1) It's impossible for the learning rate to decay too quickly, since the length of each phase is increased by gamma whenever the learning rate is decreased by gamma. (This is a ""non-adaptive result.) (2) The learning rate will eventually decay with probability 1. + +Various concerns were brought up by the reviewers. Perhaps the most strongly voiced concern was that the proposed method is a heuristic rather than a method with a rigorous guarantee. For my part I am in agreement with the authors and other reviewers that heuristic methods for decreasing the learning rate are worthy of study given the large practical importance of this problem. + +I concur with the concern raised by some reviewers that the theoretical component of the paper may not have little explanatory value for the results that are given. The assumption of strong convexity is not a major concern to me. (Though not true it can still give intuition.) More concerning is that theory essentially takes a fixed step-size scheme (repeatedly decrease the step size by gamma and increasing the length of a phase by gamma) and then shows that the diagnostic can’t be too much worse. This isn’t in keeping with the motivation of being adaptive. + +The reviewers were also concerned about the explanation of better results due to less overfitting. This may be true, but the theory makes no mention of overfitting. + +There was a consensus that the experimental results were promising, though some minor issues were raised. + +While the direction explored in the paper has value, there are enough open questions about the relationship of the theory to the experimental results to warrant another round of review. + +Small thoughts, not significant to acceptance: + +The current heuristic runs two separate threads and looks at the inner-product of those gradients. An alternative to this would be to run a single thread along with a ""ghost"" thread that computes a different gradient at each iteration. It would be great to comment on the difference and why one might be superior to the other. A more radical alternative would be to run a single thread, but then compute the diagnostic on each half of the minibatch. A more radical alternative still would be to analytically do that splitting many times and average the results. This seems like it might simultaneously reduce the variance of the diagnostic and also reduce the computational cost. + +2. The current heuristic runs two threads. Is there a tradeoff if you run more? + +3. The statement of theorems could be more user-friendly. To understand Thm 3.1, I needed to search o find the definitions of: eta, l, i, w, Q_i. With a small amount of effort this could be re-written to remind the reader that w is the number of windows, l is the length, eta is the stepsize, etc. It is particularly unfortunate that sd() is never formally defined (only by reading the appendix did I discover that this was the standard deviation.) + +4. The fact that the length of threads is always increased by a factor of gamma whenever the step size is reduced by gamma seems contrary to the spirit of the proposed diagnostic. After all, this ""bakes in"" a kind of ""fastest possible"" decay schedule. If the diagnostic were fully reliable, shouldn't this not be necessary? The decision to add this doe not get nearly enough discussion in the paper in my view. + +5. I think it might be clearer to re-state theorem 3.1 including the Chebyshev result after it.",ICLR2021, +xTVs_Gu3jo,1576800000000.0,1576800000000.0,1,r1l-5pEtDr,r1l-5pEtDr,Paper Decision,Reject,"This paper analyzes the non-convergence issue in Adam in a simple non-convex case. The authors propose a new adaptive gradient descent algorithm based on exponential long term memory, and analyze its convergence in both convex and non-convex settings. The major weakness of this paper pointed out by many reviewers is its experimental evaluation, ranging from experimental design to missing comparison with strong baseline algorithms. I agree with the reviewers’ evaluation and thus recommend reject.",ICLR2020, +iQfshnKM9j3,1610040000000.0,1610470000000.0,1,lf7st0bJIA5,lf7st0bJIA5,Final Decision,Accept (Poster),"The paper proposes to do unsupervised discovery of 3D physical objects. The core idea is to decompose the scene into primitives that contain: (a) a segment; (b) 3D position and dynamics; and (c) appearance. These are combined with a physics model and renderer to discover objects/primitives by watching videos; the core supervisory signal used is that one should be able to reconstruct future scenes and that objects/primitives ought to be physically consistent. The system is tested on synthetic data as well as real videos of blocks. + +The reviewers were positive about many aspects but, at the time of submission had a number of concerns. These were, in view of of many of the four reviewers, largely addressed. These are as follows: +- One overarching concern (R3, R4) was the experiments that the paper’s title and motivation focused heavily on 3D but the experiments lacked a 3D experiment of any variety. The authors addressed this by adding 3D IOU and recall. While numbers are low for IOU, this is a challenging area and the AC appreciates this as did R3 and R4. +- Another concern is the data itself (R4,R1). R4 in particular cites the synthetic nature of it as a stumbling block; R1 is similarly concerned about the difficulty of the backgrounds (and the rigidity of the objects). The AC thinks that the data is sufficient for this paper given the overall paper focus, methodological contributions, and particular set of claims. However, the AC is highly sympathetic to R4’s arguments and thinks more realistic real data (beyond the additional data of towers of blocks in front of a white sheet) would substantially improve the impact of the paper and the direction of research. +- The last content-focused concern was disagreement that the system is unsupervised (R2,R4). The authors have addressed this with experiments using a hard-coded system that uses a heuristic based on the bottom coordinate, which obtains good results as well. All reviewers with this concern seem satisfied although the AC would note this assumes a single ground plane, which ties into concerns about the data (although this is a small nitpick). +- R2 had substantial concerns about the legibility and reproducibility of the paper. These have been largely addressed in the revision, as far as the AC can tell. + +The paper is an good contribution on a challenging and important problem. While the AC shares some of R4’s concerns about the data (and indeed how data difficulty and method interact), the AC finds the revised paper compelling and recommends acceptance.",ICLR2021, +S1p_Vy6rf,1517250000000.0,1517260000000.0,390,rJrTwxbCb,rJrTwxbCb,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"Pros: ++ Builds in important ways on the work of Sagun et al., 2016. + +Cons: +- The reviewers were very concerned that the assumption in the paper that the second term of Equation (6) is negligible was insufficiently supported, and this concern remained after the discussion and the revision. +- The paper needs to be more precise in its language about the Hessian, particularly in distinguishing between ill conditioning and degeneracy. +- The reviewers did not find the experiment very convincing because it relied on initializing the small-batch optimization from the end point of the large-batch optimization. Again, this concern remained following the discussion and revision. + +The area chair agrees with the authors' comments in their OpenReview post of 08 Jan. 2018 ""A remark on relative evaluation,"" and has discounted the reviewers' comments about the relative novelty of the work. It is important not to penalize authors for submitting their papers to conferences with an open review process, especially when that process is still being refined. + +However, even discounting the remarks about novelty, there are key issues in the paper that need to be addressed to strengthen it (the 3 ""cons"" above), so this paper does not quite meet the threshold for ICLR Conference acceptance. + +However, because it raises really interesting questions and is likely to provoke useful discussions in the community, it might be a good workshop track paper. +",ICLR2018, +ryl0czu-lE,1544810000000.0,1545350000000.0,1,S1gARiAcFm,S1gARiAcFm,"Rejection, concerns not addressed by authors",Reject,"The paper tackles an interesting problem, which is effectively modeling biological time-series data. The advantages of deep neural networks over structured models like HMMs are their ability to learn features from the data, whereas probabilistic graphical models suffer from ""model mismatch"", where the available data must be carefully processed in order to fit the assumptions of the PGM. Any work advancing this topic would be extremely welcome in the world of machine learning in biology. + +However, the reviewers each raised individual concerns about the paper regarding its clarity and quality, and the authors did not respond. Thus, the reviewers scores remain unchanged, and the rough consensus is a rejection.",ICLR2019,5: The area chair is absolutely certain +G3beuUu-swX,1610040000000.0,1610470000000.0,1,Twf5rUVeU-I,Twf5rUVeU-I,Final Decision,Reject,"The authors provide a homotopy framework for SGD in order to exploit structures that arise by construction, such as PL. I very much liked the delineated homotopy analysis which is general (i.e., as opposed to simply adding a quadratic, the authors consider a homotopy mapping). While the algorithm should not be considered new, it is still a good proposal to consider in the SGD applications setting. Unfortunately, I cannot recommend acceptance because of several issues that the reviewers raised in detail: Strength of the assumptions, unclear performance improvement in practice, applicability of the locally PL condition, among others. ",ICLR2021, +Cxxy7oUT1n,1610040000000.0,1610470000000.0,1,awnQ2qTLSwn,awnQ2qTLSwn,Final Decision,Reject,"Although there was some initial disagreement on this paper, the majority of reviewers agree that this work is not ready for publication and can be improved in various manners. After the discussion phase there is also serious concern that the experiments need more work (statistically), to verify if they hold up. More comparisons with baselines are required as well. The paper could also be better put in context with the SOTA and related work. The paper does contain interesting ideas and the authors are encouraged to deepen the work and resubmit to another major ML venue.",ICLR2021, +UyZubnFHZ7,1576800000000.0,1576800000000.0,1,HJx0U64FwS,HJx0U64FwS,Paper Decision,Reject,"This paper analyzes a mechanism of the implicit regularization caused by nonlinearity of ReLU activation, and suggests that the learned DNNs interpolate almost linearly between data points, which leads to the low complexity solutions in the over-parameterized regime. The main objections include (1) some claims in this paper are not appropriate; (2) lack of proper comparison with prior work; and many other issues in the presentation. I agree with the reviewers’ evaluation and encourage the authors to improve this paper and resubmit to future conference. +",ICLR2020, +AKVb60TfqP,1576800000000.0,1576800000000.0,1,HklRwaEKwB,HklRwaEKwB,Paper Decision,Accept (Spotlight),"The paper studies theoretical properties of ridge regression, and in particular how to correct for the bias of the estimator. + +The reviewers appreciated the contribution and the fact that you updated the manuscript to make it clearer. + +I however advise the authors to think about the best way to maximize impact for the ICLR audience, perhaps by providing relevant examples from the ML literature.",ICLR2020, +WqbKpPMPUf,1576800000000.0,1576800000000.0,1,HygXkJHtvB,HygXkJHtvB,Paper Decision,Reject,"There has been significant discussion in the literature on the effect of the properties of the curvature of minima on generalization in deep learning. This paper aims to shed some light on that discussion through the lens of theoretical analysis and the use of a Bayesian Jeffrey's prior. It seems clear that the reviewers appreciated the work and found the analysis insightful. However, a major issue cited by the reviewers is a lack of compelling empirical evidence that the claims of the paper are true. The authors run experiments on very small networks and reviewers felt that the results of these experiments were unlikely to extrapolate to large scale modern models and problems. One reviewer was concerned about the quality of the exposition in terms of the writing and language and care in terminology. Unfortunately, this paper falls below the bar for acceptance, but it seems likely that stronger empirical results and a careful treatment of the writing would make this a much stronger paper for future submission.",ICLR2020, +ryWpnG8ul,1486400000000.0,1486400000000.0,1,B1M8JF9xx,B1M8JF9xx,ICLR committee final decision,Accept (Poster),"This paper describes a method to estimate likelihood scores for a range of models defined by a decoder. + + This work has some issues. The paper mainly applies existing ideas. As discussed on openreview, the isotropic Gaussian noise model used to create a model with a likelihood is questionable, and it's unclear how useful likelihoods are when models are obviously wrong. However, the results, lead to some interesting conclusions, and on balance this is a good paper.",ICLR2017, +BkrqUy6BG,1517250000000.0,1517260000000.0,842,H1BO9M-0Z,H1BO9M-0Z,ICLR 2018 Conference Acceptance Decision,Reject,"While the problem of learning word embeddings for a new domain is important, the proposed method was found to be unclearly presented and missing a number of important baselines. The reviewers found the technical contribution to be of only limited value.",ICLR2018, +B4flWHJ0C-G,1642700000000.0,1642700000000.0,1,neqU3HWDgE,neqU3HWDgE,Paper Decision,Accept (Poster),"Three reviewers had a positive impression of this paper, two of them were willing to champion it. The main positive aspects mentioned by these reviewers were clarity, methodological strength, novelty and convincing experimental evaluation. On the other hand, the was one clearly negative vote, raising issues about the proposed concept of 'entropy of entanglement' and about the use of tensor products. It seems that after the rebuttal, this reviewer was still not fully convinced. In my opinion, however, the rebuttal addressed most of these points of criticism in a clear and transparent way, so I recommend acceptance.",ICLR2022, +KocvS4Xe9E,1642700000000.0,1642700000000.0,1,cw-EmNq5zfD,cw-EmNq5zfD,Paper Decision,Accept (Poster),"The paper proposes a new pipeline-parallel training method called WPipe. WPipe works (on a very high level) by replacing the two-buffer structure of PipeDream-2BW with a two-partition-group structure, allowing resources to be shared in a similar way to PipeDream-2BW but with less memory use and less delays in weight update propagation across stages. The 1.4x speedup it achieves over PipeDream-2BW is impressive. + +In discussion, the reviewers agreed that the problem WPipe tries to tackle is important and that the approach is novel and interesting. But there was significant disagreement among the reviewers as to score. A reviewer expressed concern about the work being incremental and difficult to follow. And while these were valid concerns, and the authors should take note of them when revising their paper, I do not think they should present a bar to publication, both based on my own read of the work and also in light of the fact that other reviewers with higher confidence scores did not find novelty to be a disqualifying concern. As a result, I plan to follow the majority reviewer opinion and recommend acceptance here.",ICLR2022, +_Sv6_XaOeOX,1642700000000.0,1642700000000.0,1,0m4c9ZfDrDt,0m4c9ZfDrDt,Paper Decision,Reject,"The paper presents an actor critic type of method consisting of two types of features -- dynamics and tasks, in the multi-task continuous control setting. While the topic of the research is interesting and relevant to ICLR, the reviewers have concerns with the novelty and technical significance of the work. Specifically, the proposed method is very similar to several other works leading to an incremental novelty. In addition, the method is evaluated only on simple environments. The concerns remain after the discussion period. + +In the next version of the manuscript, the authors are encouraged to pursue more difficult settings and modify the method to work on those problems. That would make the paper stronger, and lead to a more novel method evaluated on harder problems.",ICLR2022, +R13Yq1QW_MJM,1642700000000.0,1642700000000.0,1,fvLLcIYmXb,fvLLcIYmXb,Paper Decision,Accept (Poster),"The paper proposes a MLP-based architecture that makes extensive use of the shift operation on the feature maps. The model performs well on several vision tasks and datasets. + +The reviews are mixed even after the authors' response. Main pros are that the proposed architecture is elegant and reasonable, and the experimental evaluation is thorough and strong. The main con is that the novelty is somewhat limited to some prior papers. + +Overall, I recommend acceptance. The reviewers point out that the architecture is good and the results are strong. Similarities to prior works do not seem serious enough to warrant rejection - even an author of arguably the most related (concurrent) works - S2-MLP and S2-MLPv2 - confirms that there is sufficient difference. Moreover, this is one of the first papers to show very strong results on detection and segmentation.",ICLR2022, +M4kOAv88NR,1610040000000.0,1610470000000.0,1,QoWatN-b8T,QoWatN-b8T,Final Decision,Accept (Poster),"The paper proposes a variant of Kanerva Machine Wu et al. (2018) by introducing a spatial transformer to index the memory storage and Temporal Shift Module Lin et al., (2019). The KM++ model learns to encode an exchangeable sequence locally via the spatial transformer. The proposed method is evaluated on conditional image generation tasks. The empirical results demonstrated the nearby keys in the memory encoded related and similar images. Several issues of clarity and the correctness of the main theoretical result were addressed during the rebuttal period in a way that satisfied the reviewers. The basic ideas in the paper are interesting to both the machine learning and the wider cognitive science communities. However, additional experiments should be included in Table 1 to complete the ""DKM w/TSM (our impl)"" row on Fashion MNIST, CIFAR-10, and DMLab in the final revision for completeness. ",ICLR2021, +rJY6LypSf,1517250000000.0,1517260000000.0,886,rJIgf7bAZ,rJIgf7bAZ,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers are unanimous that this is an interesting paper, but that ultimately the empirical results are not sufficiently promising to warrant the added complexity.",ICLR2018, +zQ-WZvGFpS,1576800000000.0,1576800000000.0,1,SJxDDpEKvH,SJxDDpEKvH,Paper Decision,Accept (Poster),"This paper provides a fresh application of tools from causality theory to investigate modularity and disentanglement in learned deep generative models. It also goes one step further towards making these models more transparent by studying their internal components. While there is still margin for improving the experiments, I believe this paper is a timely contribution to the ICLR/ML community. +This paper has high-variance in the reviewer scores. But I believe the authors did a good job with the revision and rebuttal. I recommend acceptance.",ICLR2020, +HkZUU16Bz,1517250000000.0,1517260000000.0,784,SJdCUMZAW,SJdCUMZAW,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers were quite unanimous in their assessment of this paper. + +PROS: +1. The paper is relatively clear and the approach makes sense +2. The paper presents and evaluates a collection of approaches to speed learning of policies for manipulation tasks. +3. Improving the data efficiency of learning algorithms and enabling learning across multiple robots is important for practical use in robot manipulation. +4. The multi-stage structure of manipulation is nicely exploited in reward shaping and distribution of starting states for training. + +CONS +1. Lack of novelty e.g. wrt to Finn et al. in ""Deep Spatial Autoencoders for Visuomotor Learning"" +2. The techniques of asynchronous update and multiple replay steps may have limited novelty, building closely on previous work and applying it to this new problem. +3. The contribution on reward shaping would benefit from a more detailed description and investigation. +4. There is concern that results may be specific to the chosen task. +5. Experiments using real robots are needed for practical evaluation.",ICLR2018, +bQJTEcHvxOG,1642700000000.0,1642700000000.0,1,rbPg0zkHGi,rbPg0zkHGi,Paper Decision,Reject,"The submission considers a new acquisition function for active learning. The method considers the sensitivity of the prediction for a given datapoint with respect to parameter perturbations. Points with the largest variance under these perturbations are selected for labelling. The method is simple and the empirical results are reasonable. Some weaknesses are the clarity of writing, and somewhat limited experimental comparisons. + +The discussion was useful and helped improve the clarity. Additional experiments also helped improve the paper, although some reviewers still felt the experimental comparisons were lacking, including using entropy as a baseline acquisition function. Despite these improved scores, the overall average score remains below threshold I'm afraid. + +I feel this is a useful paper, but perhaps needs a little more polishing in the writing and some additional experiments. As such, it just falls short of the acceptance threshold.",ICLR2022, +djWj-xaoqq3,1642700000000.0,1642700000000.0,1,POxF-LEqnF,POxF-LEqnF,Paper Decision,Accept (Poster),The manuscript brings up an important issue: that current methods and datasets don't generally highlight interactions when it comes to trajectory prediction. This is despite the fact that it would seem that current methods incorporate agent interactions and that datasets appear to require reasoning about agent interactions. This qualitative and quantitative observation should lead to better datasets in the future as well as more refined metrics pushing the field forward. Reviewers were in agreement that this is a strong submission. The authors responded with substantive new experiments that cleared up any lingering issues.,ICLR2022, +hA7jyFYSW,1576800000000.0,1576800000000.0,1,SJlEs1HKDr,SJlEs1HKDr,Paper Decision,Reject,"This manuscript outlines a method to improve the described under-fitting issues of sequential neural processes. The primary contribution is an attention mechanism depending on a context generated through an RNN network. Empirical evaluation indicates empirical results on some benchmark tasks. + +In reviews and discussion, the reviewers and AC agreed that the results look promising, albeit on somewhat simplified tasks. It was also brought up in reviews and discussions that the technical contributions seem to be incremental. This combined with limited empirical evaluation suggests that this work might be preliminary for conference publication. Overall, the manuscript in its current state is borderline and would be significantly improved wither by additional conceptual contributions, or by a more thorough empirical evaluation.",ICLR2020, +rJR7mJpBG,1517250000000.0,1517260000000.0,111,BkwHObbRZ,BkwHObbRZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"I recommend acceptance based on the reviews. The paper makes novel contributions to learning one-hidden layer neural networks and designing new objective function with no bad local optima. + + There is one point that the paper is missing. It only mentions Janzamin et al in the passing. Janzamin et al propose using score function framework for designing alternative objective function. For the case of Gaussian input that this paper considers, the score function reduces to Hermite polynomials. Lack of discussion about this connection is weird. There should be proper acknowledgement of prior work. Also missing are some of the key papers on tensor decomposition and its analysis + +I think there are enough contributions in the paper for acceptance irrespective of the above aspect. ",ICLR2018, +N8WJtfb0F0a,1610040000000.0,1610470000000.0,1,Wis-_MNpr4,Wis-_MNpr4,Final Decision,Reject,"While reviewers appreciated the simple approach of this work, the biggest concern reviewers had was with the security guarantee of the method. R4 argued that in a certain case recovering an original image x_1 amounted to guessing 2 coefficients. In the discussion phase the authors argued that security amounts to the adversary guessing 4 floating point numbers, not 2, which requires 100s of millions of years to decode an image correctly. However, R4 is correct that only 2 floating point numbers are necessary. This is because, as described by R4 when one sees outputs x_1 * a_{2,2} and x_2 * a_{2,1}, they can reconstruct x_1 as: + +x_1 = (x_1 * a_{2,2} - x_2 * a_{2,1}) / (a_{1,1} * a_{2,2} - a_{1,2} * a_{2,1}) + +Now define: + +b_1 := a_{2,2} / (a_{1,1} * a_{2,2} - a_{1,2} * a_{2,1}) +b_2 := a_{2,1} / ((a_{1,1} * a_{2,2} - a_{1,2} * a_{2,1}) + +Thus the above equation can be written as: + +x_1 = x_1 * b_1 - x_2*b_2 + +So an adversary needs to guess 2 floating point numbers. Further, R4 points out that an adversary can obtain x_1 up to a scale factor by simply guessing the relative ratio of the the 2 unknown floating point numbers, i.e., if our guess is c: + +x_1/c = x_1 * (b_1/c) - x_2 * (b_2/c) + +This is a single floating point number, and not all floating point numbers need to be checked. For many images, information can be leaked even if the true scale of the image is not known. + +For this reason I would urge the authors to strengthen the security guarantee of their approach. One way to do this would be to adapt the method so to make the resulting guarantee be a more standard one (e.g., differential privacy, standard cryptographic hardness guarantees). This would eliminate the main reviewer concerns and greatly strengthen the paper.",ICLR2021, +_B8vEgJA0h,1576800000000.0,1576800000000.0,1,BkggGREKvS,BkggGREKvS,Paper Decision,Reject,"After reading the reviews and discussing this paper with the reviewers, I believe that this paper is not quite ready for publication at this time. While there was some enthusiasm from the reviewers about the paper, there were also major concerns raised about the comparisons and experimental evaluation, as well as some concerns about novelty. The major concerns about experimental evaluation center around the experiments being restricted to continuous action settings where there is a limited set of baselines (see R3). While I see the authors' point that the method is not restricted to this setting, showing more experiments with more baselines would be important: the demonstrated experiments do strike me as somewhat simplistic, and the standardized comparisons are limited. + +This might not by itself be that large of an issue, if it wasn't for the other problem: the contribution strikes me as somewhat ad-hoc. While I can see the intuition behind why these two auxiliary objectives might work well, since there is only intuition, then the burden in terms of showing that this is a good idea falls entirely on the experiments. And this is where in my opinion the work comes up short: if we are going to judge the efficacy of the method entirely on the experimental evaluation without any theoretical motivation, then the experimental evaluation does not seem to me to be sufficient. + +This issue could be addressed either with more extensive and complete experiments and comparisons, or a more convincing conceptual or theoretical argument explaining why we should expect these two particular auxiliary objectives to make a big difference.",ICLR2020, +vjvsq9gA7,1576800000000.0,1576800000000.0,1,HJedXaEtvS,HJedXaEtvS,Paper Decision,Accept (Poster),"This paper proposes a method which patches/edits a pre-trained neural network's predictions on problematic data points. They do this without the need for retraining the network on the entire data, by only using a few steps of stochastic gradient descent, and thereby avoiding influencing model behaviour on other samples. The post patching training can encourage reliability, locality and efficiency by using a loss function which incorporates these three criteria weighted by hyperparameters. Experiments are done on CIFAR-10 toy experiments, large-scale image classification with adversarial examples, and machine translation. The reviews are generally positive, with significant author response, a new improved version of the paper, and further discussion. This is a well written paper with convincing results, and it addresses a serious problem for production models, I therefore recommend that it is accepted. ",ICLR2020, +giKeDQy9-n,1576800000000.0,1576800000000.0,1,rylwJxrYDS,rylwJxrYDS,Paper Decision,Accept (Poster),"This paper proposes a new self-supervised pre-trained speech model that improves speech recognition performance. + The idea combines an earlier pre-training approach (wav2vec) with discretization followed by BERT-style masked reconstruction. The result is a fairly complex approach, with not too much novelty but with a good amount of engineering and analysis, and ultimately very good performance. The reviewers agree that the work deserves publication at ICLR, and the authors have addressed some of the reviewer concerns in their revision. The complexity of the approach may mean that it is not immediately widely adopted by others, but it is a good proof of concept and may well inspire other related work. I believe the ICLR community will find this work interesting.",ICLR2020, +H1g9yakkx4,1544650000000.0,1545350000000.0,1,B1xeyhCctQ,B1xeyhCctQ,"Method to visualize spatial bias distribution in network. Reviewers unanimous reject, no rebuttal from authors. ",Reject,"The work presents a method to back propagate and visualize bias distribution in network as a form of explainability of network decisions. Reviewers unanimous reject, no rebuttal from authors. ",ICLR2019,4: The area chair is confident but not absolutely certain +ZDVjQehB4JE,1610040000000.0,1610470000000.0,1,4AWko4A35ss,4AWko4A35ss,Final Decision,Reject,"This paper presents a new idea and approach for self-supervised video representation learning. + +The reviewers' opinions diverge. R1 suggests that the paper is (marginally) above the threshold. R2 supports the paper, saying that he/she likes the idea behind the paper. R3 explicitly mentioned that he/she would like to provide a borderline rating (but cannot due to the system). R4 is not in favor of the paper, even after the rebuttal. + +The AC’s opinion is more aligned with R3 and R4, who are the more senior reviewers among the four. There are two main concerns with the paper: technical contribution and experimental comparison. In terms of the contribution, both the reviewers find that the paper is lacking: ""What I'm not entirely sure about is how much this method manages to push the boundary of SSL."" (by R3). ""Overall, the paper presents yet another method to design the pretext task for SSVRL. But my major concern is it lacks enough insights for inspiring future research for this topic."" (by R4). The authors argue the difficulties of being in academia in doing this research with limited computation resources and argue that the reviewers should focus more on novelty and contribution, but even after the rebuttal, R4 is not convinced and R3 is still mildly concerned whether the proposed approach really brings something new to the field as the paper fails to show ""clear superiority over existing methods"". + +In addition, as pointed out by the reviewers, there are several state-of-the-art self-supervised video representation learning works that the paper misses to cite, or compare against. In addition to Pace and SpeedNet R3 mentioned, below are approaches reporting results on UCF101 and HMDB with the standard self-supervised classification task setting (Table 1): + +AVTS 89.0, 61.6 +CVRL 92.1, 65.4 +ELo 93.8, 67.4 +XDC 94.2, 67.4 +GDT 95.2, 72.8 + +We note that all these results are much superior to the best results reported by the proposed approach, 79.5 and 50.9 on UCF101 and HMDB. The authors mention in the rebuttal that these superior approaches not included in the paper use stronger backbones (and are thus omitted), but we believe a more academically proper attitude is to include all these numbers and explicitly describe why the proposed approach is not performing better, instead of completely omitting their results. + +The AC also questions whether the R2D3D-34 backbone used in this paper really is computationally lighter compared to the backbones used in previous approaches like R(2+1)D-18, which alternates 2D residual modules and 2+1D residual modules (using much fewer parameters and compute than 3D modules) and also has fewer layers. XDC using R(2+1)D-18 backbone reports 86.8/52.6 (UCF/HMDB) accuracies with Kinetics-400 unlabeled data. AVTS also reports 84.1/52.5 (UCF/HMDB) accuracies using MC3-18 backbone. Similarly, GDT uses R(2+1)D-18 backbone and reports 89.3/60.0 (UCF/HMDB) accuracies using unlabeled Kinetics-400. Even MemDPC reports 86.1/54.5 using R-2D3D backbone when optical flow feature is added. All these are far superior to the results being reported in the paper. + +Overall, we view the experimental section of this paper as incomplete, and we cannot convince ourselves that the paper reaches the quality of ICLR. + + +[AVTS] Korbar, B., Tran, D., Torresani, L.: Cooperative learning of audio and video models +from self-supervised synchronization. In: NeurIPS (2018) + +[ELo] A. Piergiovanni, A. Angelova, and M. S. Ryoo. Evolving losses for unsupervised video representation learning. In Proc. CVPR, 2020 + +[XDC] H. Alwassel, D. Mahajan, L. Torresani, B. Ghanem, and D. Tran. Self-supervised learning by cross-modal audio-video clustering. arXiv preprint arXiv:1911.12667, 2019 + +[GDT] M. Patrick, Y. M. Asano, R. Fong, J. F. Henriques, G. Zweig, and A. Vedaldi. Multi-modal self-supervision from generalized data transformations. arXiv preprint arXiv:2003.04298, 2020 + +[CVRL] R. Qian, T. Meng, B. Gong, M.-H. Yang, H. Wang, S. Belongie, and Y. Cui. Spatiotemporal contrastive video representation learning. arXiv preprint arXiv:2008.03800, 2020",ICLR2021, +KPW86bmYDl-,1642700000000.0,1642700000000.0,1,8MN_GH4Ckp4,8MN_GH4Ckp4,Paper Decision,Reject,"Authors developed a reparameterization scheme using QR decomposition to reveal symmetries in networks with radial activation. While I welcome new ideas and formalisms from other fields, the ideas presented in this manuscript are fairly straightforward under the radial activation assumption. Although the authors claim that the results may generalize, no evidence was provided. The practical contributions are marginal and lacks comparisons with related DNN compression schemes. Through the review process, the paper has been greatly improved, but unfortunately, this interesting paper does not meet ICLR's standard as is.",ICLR2022, +#NAME?,1610040000000.0,1610470000000.0,1,3Aoft6NWFej,3Aoft6NWFej,Final Decision,Accept (Spotlight),"This paper describes a new and experimentally useful way to propose masked spans for MLM pretraining, by masking spans of text that co-occur more often than would be expected given their components - ie that are statistically likely to be non-compositional phrases. + +The authors should make some attempt to connect their PMI heuristic with prior methods for statistical phrase-finding and term recognition, eg https://www.aaai.org/Papers/IJCAI/2007/IJCAI07-439.pdf or https://link.springer.com/chapter/10.1007/978-3-540-85287-2_24 in the final paper.",ICLR2021, +n2AhQYbKKO,1610040000000.0,1610470000000.0,1,SQfqNwVoWu,SQfqNwVoWu,Final Decision,Reject,"This paper proposes a method for conditional inference with arbitrary conditioning by creating composed flows. The paper provides a hardness result for arbitrary conditional queries. Motivated by the fact that conditional inference is hard the paper therefore suggests a novel relaxation where the *conditioning* is relaxed. + +There were various concerns from the reviewers regarding notation, comparison algorithms, and how the hardness result motivates the smoothing operation introduced. After careful study of the paper and all the comments I find that I am most concerned about the hardness result and how it motivates the smoothing operation that is done. Novel computational complexity results *as such* are not really in the scope of ICLR. There's nothing wrong with having such a result in a paper, of course, but a paper like this should be evaluated on the basis of the algorithm proposed. + +Like R4, I do not follow how this hardness result is meant to motivate the smoothing that's applied. The paper is unambiguous that the goal is to do conditional inference. A hardness result is presented for conditional inference, and so a relaxed surrogate is presented. This has a minor problem that it's not clear the relaxed problem avoids the complexity boundary of the original one. There's a larger problem, though. The hardness result has not been sidestepped! The goal is still to solve conditional inference. The algorithm that's presented is still an approximate algorithm for conditional inference. R4 suggests that other approximation algorithms should be compared to. The authors responded to this point, but I am not able to understand the response. For the same reason, I think it is valid to ask for comparison to other approximate inference algorithms (e.g. without smoothing) + +None of the above is to say that the smoothing approach is bad. It may very well be. However, I think that either the existing argument should be clarified or a different argument should be given. + +Finally here are two minor points (These weren't raised by reviewers and aren't significant for acceptance of the paper. I'm just bringing them up in case they are useful.) + +Is Eq. 3 (proof in Appendix B.1) not just an example of the invariance of the KL-divergence under diffeomorphisms? + +Proof in appendix B.2 appears to just a special case of the standard chain rule of KL-divergence (e.g. as covered in Cover and Thomas)",ICLR2021, +tGM0QjoTrY,1610040000000.0,1610470000000.0,1,kKwFlM32HV5,kKwFlM32HV5,Final Decision,Reject,"The authors address the problem of fine-grained image classification. They propose a batch based regularizer, called the batch confusion norm (BCN), to encourage less over confident predictions. They also tackle the problem of class imbalance during training by adaptively weighting the BCN loss at the class level to take the imbalances in the underlying label distributions into account. Results are presented on four different fine-grained datasets. + +Overall, while the reviewers had some positive comments, there was not broad support for the paper. There are questions that need to be resolved related to the evaluation e.g. the best performing model uses GASPP, however there is no reported GASPP variant for the PC baseline. Similarly, it would be valuable to know how much PC would benefit from an additional class imbalance term in the iNaturalist2018 results. Given that the proposed regularizer builds on PC (Dubey et al.), it is very important that the authors provide a like-for-like comparison so that readers can better understand the merits of the proposed method. + +There were also concerns with the presentation of the paper e.g. several typos (which can be easily fixed), issues with the clarity of the text (which require more work), and uninformative figures (e.g. Fig 2 should be revised to more clearly illustrate the differences between the three methods shown). The authors are encouraged to revise the text to resolve these problems. + +While the paper has some strengths (e.g. the empirical performance on some of the tasks is promising and the method is conceptually simple), there are still a number of concerns from the reviewers e.g. a lack of a clear motivation as to why the proposed method works, and why it is conceptually better than existing alternatives (e.g. PC). Given this lack of support, it is not possible to recommend the paper in its current form. +",ICLR2021, +HJxNUC4iJN,1544400000000.0,1545350000000.0,1,BJeem3C9F7,BJeem3C9F7,metareview,Reject,"This paper proposes an approach for learning to generate 3D views, using a surfel-based representation, trained entirely from 2D images. After the discussion phase, reviewers rate the paper close to the acceptance threshold. + +AnonReviewer3, who initially stated ""My second concern is the results are all on synthetic data, and most shapes are very simple"", remains concerned after the rebuttal, stating ""all results are on synthetic, simple scenes. In particular, these synthetic scenes don't have lighting, material, and texture variations, making them considerably easier than any types of real images."" + +The AC agrees with the concerns raised by AnonReviewer3, and believes that more extensive experimentation, either on more complex synthetic scenes or on real images, is needed to back the claims of the paper. Particularly relevant is the criticism that ""While the paper is called ‘pix2scene’, it’s really about ‘pix2object’ or ‘pix2shape’."" +",ICLR2019,4: The area chair is confident but not absolutely certain +S1m82zUue,1486400000000.0,1486400000000.0,1,HJGODLqgx,HJGODLqgx,ICLR committee final decision,Accept (Poster),"This paper is a by-the-numbers extension of the hidden semi-Markov model to include nonlinear observations, and neural network-based inference. The paper is fairly clear, although the English isn't great. The experiments are thorough. + + Where this paper really falls down is on originality. In particular, in the last two years there have been related works that aren't cited (and unfortunately weren't mentioned by the reviewers) that produce similar models. In particular, Johnson et al's 2016 NIPS paper develops almost the same inference strategy in almost the same model class. + + http://stat.columbia.edu/~cunningham/pdf/GaoNIPS2016.pdf + https://arxiv.org/abs/1511.05121 + https://arxiv.org/abs/1603.06277 + + This paper is borderline, but I think makes the cut by virtue of having experiments on real datasets, and by addressing a timely problem (how to have interpretable structure in neural network latent variable models).",ICLR2017, +wwtJbs4awtZ,1610040000000.0,1610470000000.0,1,RuUdMAU-XbI,RuUdMAU-XbI,Final Decision,Reject,"The idea presented in the paper is interesting and has caught the attention of the reviewers. However there seem to be only a tepid support for acceptance with a reviewer championing rejection. +There is little novelty in the approach but empirical validation shows results that consistently improve over selected baselines. I am afraid that more evaluations would be needed at this stage to consider this work for acceptance.",ICLR2021, +HygN3VQJx4,1544660000000.0,1545350000000.0,1,rJxMM2C5K7,rJxMM2C5K7,Metareview,Reject,The reviewers found that the paper needs more compelling empirical study.,ICLR2019,5: The area chair is absolutely certain +SkRvIkarf,1517250000000.0,1517260000000.0,809,rJhR_pxCZ,rJhR_pxCZ,ICLR 2018 Conference Acceptance Decision,Reject,The paper proposes a new model called differential decision tree which captures the benefits of decision trees and VAEs. They evaluate the method only on the MNIST dataset. The reviewers thus rightly complain that the evaluation is thus insufficient and one also questions its technical novelty.,ICLR2018, +seyZGYaV98P,1642700000000.0,1642700000000.0,1,QguFu30t0d,QguFu30t0d,Paper Decision,Reject,"This paper proposes a knowledge distillation strategy to enable the use of a large server-side model in federated learning while satisfying the computation constraints of resource-limited clients. The problem is relevant and well-motivated, and the paper presents compelling experimental results to support the proposed strategy. However, reviewers had the following major comments suggestions/: +1) The theoretical analysis section needs improvement in terms of the technical depth and rigor +2) Better explanation of how the proposed strategy compares with previous works/baselines +3) Considering the privacy and scalability properties of the proposed strategy. + +The paper generated lots of constructive post-rebuttal discussions between the authors and the reviewers, and I believe the authors received several ideas to improve the work and appreciated the reviews. One of the reviewers increased their score. However, based on the current scores, I still recommend rejection. I do think the paper has promise, and with improvements, the revised version will make an excellent contribution.",ICLR2022, +ohzLoj9sP9F,1610040000000.0,1610470000000.0,1,#NAME?,#NAME?,Final Decision,Reject,"All four reviewers were against accepting the paper. A major point shared by everyone was lack of clarity: this included its overall writing, its discussion toward prior work, and imprecise math to explain the ideas. The paper did improve quite a bit over its revisions. Whether this clarified all of the reviewers' understanding of the paper remains unclear. The work may ultimately need another cycle of reviews to assess its quality. + +Another shared point are a number of recommended ablations in the experiments, as well as going through more comprehensively in the set of studied datasets (R3), effect of AE choices (R2), and alternatives to the geodesic (R1, R2).",ICLR2021, +qzdQuQ5BKvU,1642700000000.0,1642700000000.0,1,kR1hC6j48Tp,kR1hC6j48Tp,Paper Decision,Accept (Poster),"The paper explores the application of generative adversarial networks as posterior models in simulation-based inference. A new method is proposed, and its connections with related work are studied. The proposed method is empirically evaluated on joint inference of up to 784 parameters. + +The reviews are borderline, with one weak reject, two weak accepts, and one strong accept. Overall, the paper is well-written and well-executed. Its main strength is the promising performance of the proposed method in high-dimensional parameter spaces, which are out-of-reach for many existing approaches. The main weakness of the paper is its lack of novelty: the proposed method is only marginally different from already existing ones, while the paper could have explored the differences to a greater extent. + +On balance, I'm leaning towards recommending the paper for acceptance. Despite the lack of novelty, the paper is well executed with potential impact in high-dimensional simulation-based inference.",ICLR2022, +0RDHYAzwB4,1576800000000.0,1576800000000.0,1,HylfPgHYvr,HylfPgHYvr,Paper Decision,Reject,"The paper studies the problem of modeling inter-object dynamics with occlusions. It provides proof-of-concept demonstrations on toy 3d scenes that occlusions can be handled by structured representations using object-level segmentation masks and depth information. However, the technical novelty is not high and the requirement of such structured information seems impractical real-world applications which thus limits the significance of the proposed method.",ICLR2020, +sVI4d9Twb-l,1642700000000.0,1642700000000.0,1,fkjO_FKVzw,fkjO_FKVzw,Paper Decision,Reject,"The paper aims to scale transformers to large graphs. In this regard, authors propose to first obtain a ""coarse"" version of the large graph using existing algorithms. With reduced number of nodes in the coarse graph, we can employ the transformer efficiently to capture the global information. To capture the local information, GNNs are employed. Finally, authors carry out extensive experiments on a range of graph datasets. Also, reviewers do appreciate reporting the confidence intervals. We thank the reviewers and authors for engaging in an active discussion. Unfortunately, the reviewers are in a consensus that novelty of the proposed method is limited: it is combination of existing techniques and similar ideas have been widely used in the literature. Also, the empirical results are not very significant. Thus, unfortunately I cannot recommend an acceptance of the paper in its current form.",ICLR2022, +USgzb-h2MLA,1642700000000.0,1642700000000.0,1,ChKNCDB0oYj,ChKNCDB0oYj,Paper Decision,Reject,"The paper received 3,3,1 as reviews. All reviewers have the consensus on the weaknesses, i.e. limited technical novelty and weak boost in performance in datasets that may not be the state of the art anymore. The authors have submitted a rebuttal however the rebuttal did not improve the score of the reviewers. Following the reviewers recommendation, the AC recommends rejection.",ICLR2022, +c7km52vbUdQ,1642700000000.0,1642700000000.0,1,nwKXyFvaUm,nwKXyFvaUm,Paper Decision,Accept (Poster),"The paper proposes a novel method for (diverse) client selection at each round of a federated learning procedure with the aim of improving performance in terms of convergence, learning efficiency and fairness. The main idea is to introduce a facility location objective to quantify how representative/informative is the gradient information of a given set of clients is, and then choose a subset that maximizes this objective. Given the monotonicity and submodularity of the proposed facility location objective, the authors have been able to provide theoretical guarantees. Experimental results on two data sets (FEMNIST and CelebA) show the effectiveness to the proposed approach and algorithm. + +The reviewers had a number of concerns most of which were addressed in the authors response. The reviewers believe that the theoretical results of the paper are incremental given the prior work (see the reviews for more details); however, the reviewers (as well as myself) agree that the proposed method is novel and can provide significant practical advantage. Utilizing sub modular objectives for diverse selection is a well-known (and effective approach), but I am seeing it in the context of federated learning for the first time. + +My suggestion to the authors: (i) Improve the experimental section by adding a few more common data sets (such as CIFAR when data is distributed in a heterogeneous manner). CelebA and FEMNIST are not really the best data sets to try in FL (although they are commonly used). (ii) One of the reviewers had several critical comments about the theoretical results, please address those in the updated version. (iii) Please clarify in more detail how the theoretical and algorithmic contributions of there paper go beyond the recent work of (mirzasoleiman et el. 2020); (iv) iIt seems to me that the paper is missing some references on client selection in federated learning. Please revise the related work accordingly.",ICLR2022, +r-CadCiZMD,1576800000000.0,1576800000000.0,1,BJl4g0NYvB,BJl4g0NYvB,Paper Decision,Reject,"The submission presents an approach to uncovering causal relations in an environment via interaction. The topic is interesting and the work is timely. However, the experimental setting is quite simplistic and the approach makes strong assumptions that limit its applicability. The reviewers are split. R2 raised their rating from 3 to 6 following the authors' responses and revision, but R1 maintained their rating of 3 and posted a response that justifies this position. In light of the limitations of the work, the AC recommends against accepting the submission.",ICLR2020, +vhXajHhCvt4,1610040000000.0,1610470000000.0,1,KTEde38blNB,KTEde38blNB,Final Decision,Reject,"Nominally, the scores on this paper were pretty split. +In reality, I concur with the 2 and the 3. +The 6 acknowledges being unfamiliar w/ the GAN literature, and I think the 7 is being too permissive about the baselines. + +The empirical evaluation here is simply not up to par for a major machine learning conference. +As reviewers have mentioned, the baselines are out of date, and even then the improvements are marginal. +It's totally fine to have a marginal improvement if the proposed technique is very new and interesting and the baselines +are taken seriously, but unfortunately I don't believe that's the case here. +Thus, I recommend rejection.",ICLR2021, +YZFOPces7iS,1642700000000.0,1642700000000.0,1,hJk11f5yfy,hJk11f5yfy,Paper Decision,Reject,"The paper studies subpopulation shift in object recognition when classes obey a hierarchy. It proposes an architecture, a relevant metric and a dataset (subset of imagenet). The problem of classification in hierarchical label spaces is important and of great interest, and the effect on domain shift is interesting. Naturally, this problem was studied quite intensively over the years. + +Reviewers were concerned that the current proposal was not placed well enough in context of previous literature, both in terms of the method and in terms of experimental results. Also, the paper would be strengthen if it provides more theoretical analysis about how the hierarchy helps with the domain shift. The authors addressed some of these issues in the rebuttal, adding references and highlighting the differences from previous methods, but the paper would need more time to make the proper experimental comparisons with previous work and subsequent analysis. As a result, the paper is still not ready for acceptance to ICLR in its current form.",ICLR2022, +SkxJTYISlE,1545070000000.0,1545350000000.0,1,BJgolhR9Km,BJgolhR9Km,reject,Reject,The paper presents a novel unit making the networks intrinsically more robust to gradient-based adversarial attacks. The authors have addressed some concerns of the reviewers (e.g. regarding pseudo-gradient attacks) but experimental section could benefit from a larger scale evaluation (e.g. Imagenet).,ICLR2019,5: The area chair is absolutely certain +vEfea0kwF2p,1642700000000.0,1642700000000.0,1,gLqnSGXVJ6l,gLqnSGXVJ6l,Paper Decision,Reject,"The reviewers unanimously think the paper has lack of novelty, its contributions are quite limited, and is not ready for publication.",ICLR2022, +8brN0BwSuZw,1610040000000.0,1610470000000.0,1,XwATtbX3oCz,XwATtbX3oCz,Final Decision,Reject,"This paper received three recommendations of accept and one recommendation of reject. The paper is mixed. The results presented are both compelling and will have impact on the community. The AC does not agree with R2's views that the paper requires proposal of a novel method for acceptance. At the same time, the AC also does not agree with the views of the other reviewers that the current experiments alone are enough to carry the paper without more conclusive statements. As hinted by R3, simply pointing out the problems is not enough without proposing how to adjust our models and experimentation protocols in the future is insufficient. + +In its current state, the paper would make for a good workshop submission. Alternatively, the AC suggests to the authors to expand on the SimpleView baseline and or propose alternative solutions or protocols.",ICLR2021, +c7eO7ZB4Zd,1610040000000.0,1610470000000.0,1,4c0J6lwQ4_,4c0J6lwQ4_,Final Decision,Accept (Poster),The work proposed a new approach to encode time series that are irregularly sampled and multivariate using time attention module and an encoder-decoder framework based on VAE. All the reviewers find the approach novel and the experiments extensive with encouraging results. Please continue to improve the presentation of the paper. I would suggest to move the diagram showing the overall architecture to the main text to assist the explanation. Reviewers also would like to see more explanation on the experimental results and some ablation studies to show the importance of each component of the proposed architecture. ,ICLR2021, +Hy6j2GIOl,1486400000000.0,1486400000000.0,1,rJ6DhP5xe,rJ6DhP5xe,ICLR committee final decision,Invite to Workshop Track,"A summary of strengths and weaknesses brought up in the reviews: + + Strengths + -Paper presents a novel way to evaluate representations on generalizability to out-of-domain data (R2) + -Experimental results are encouraging (R2) + -Writing is clear (R1, R2) + + Weaknesses + -More careful controls are needed to ascertain generalization (R2) + -Experimental analysis is preliminary and lack of detailed analysis (R1, R2, R3) + -Novelty and discussion of past related work (R3) + + The reviewers are in consensus that the idea is exciting and at least of moderate novelty, however the paper is just too preliminary for acceptance as-is. The authors did not provide a response. This is surprising because specific feedback was given to improve the paper and it seems that the paper was just under the bar. Therefore I have decided to align with the 3 reviewers in consensus and encourage the authors to revise the paper to respond to the fairly consistent suggestions for improvement and re-submit. Mentime, I'd like to invite the authors to present this work at the workshop track.",ICLR2017, +rklf-FNgeN,1544730000000.0,1545350000000.0,1,rJl8viCqKQ,rJl8viCqKQ,Incremental contribution,Reject,"The paper proposes improvements on the area of neural network inference with homomorphically encrypted data. Existing applications typically have high computational cost, and this paper provides some solutions to these problems. Some of the improvements are due to better ""engineering"" (the use of the faster SAEL 3.2.1 over CryptoNet). The idea of using pre-trained AlexNet features is new, but pretty standard practice. The presentation has been greatly improved in the updated version, however the paper could benefit from additional discussions and experiments. For example, when a practitioner wants to solve a new problem with some design need (e.g. accuracy, latency vs. bandwidth trade-off), what network modules should be used and how should they be represented? To summarize, the problem considered is important, however, as pointed out by the reviewers, both the empirical and the theoretical results appear to be incremental with respect to the existing literature. +",ICLR2019,3: The area chair is somewhat confident +UuAtqjBE4,1576800000000.0,1576800000000.0,1,BJxkOlSYDH,BJxkOlSYDH,Paper Decision,Accept (Poster),"This paper presents a sampling-based approach for generating compact CNNs by pruning redundant filters. One advantage of the proposed method is a bound for the final pruning error. + +One of the major concerns during review is the experiment design. The original paper lacks the results on real work dataset like ImageNet. Furthermore, the presentation is a little misleading. The authors addressed most of these problems in the revision. + +Model compression and purring is a very important field for real world application, hence I choose to accept the paper. +",ICLR2020, +sJgoQVGNHD6,1642700000000.0,1642700000000.0,1,b-ny3x071E5,b-ny3x071E5,Paper Decision,Accept (Oral),"This paper addresses a meta-learning method which involves bilevel optimization. It is claimed that two limitations (myopia of MG and restricted consideration of geometry of search space) that most of existing methods have can be resolved by the MBG with a properly chosen pseudo-metric. The algorithm first bootstraps a target from the meta- learner, then optimizes the meta-learner by minimizing the distance to that target under a chosen pseudo-metric. The authors also establish conditions that guarantee performance improvements and show that metric can be sued to control meta-optimization. All the reviewers agree that the idea is interesting and experiments well support it. Authors did a good job in the rebuttal phase, resolving most of concerns raised by reviewers, leading that two of reviewers raised their score. While the current theoretical results are limited to a simple case where L=1$, the method is attractive for meta-learning community. All reviewers agree to champion this paper. Congratulations on a nice work.",ICLR2022, +SJgWUZDxeV,1544740000000.0,1545350000000.0,1,rJMcdsA5FX,rJMcdsA5FX,"Valuable goal, but execution is somewhat suspect.",Reject,"This paper conducts experiments evaluating several different metrics for evaluating GAN-based language generation models. This is a worthy pursuit, and some of the evaluation is interesting. + +However as noted by Reviewer 2, there are a number of concerns with the execution of the paper: evaluation of metrics with respect to human judgement is insufficient, the diversity of the text samples is not evaluated, and there are clarity issues. + +I feel that with a major re-write and tighter experiments this paper could potentially become something nice, but in its current form it seems below the ICLR quality threshold. ",ICLR2019,3: The area chair is somewhat confident +OnGRdBQk4G,1576800000000.0,1576800000000.0,1,ryex8CEKPr,ryex8CEKPr,Paper Decision,Reject,"This manuscript proposes feature selection inspired by knockoffs, where the generative models are implemented using modern deep generative techniques. The resulting procedure is evaluated in a variety of empirical settings and shown to improve performance. + +The reviewers and AC agree that the problem studied is timely and interesting, as knockoffs combined with generative models have recently shown promise for inferential problems. However, the reviewers were unconvinced about the motivation of the work, and the strength of the empirical evaluation results. In the option of the AC, this work might be improved by focusing (both conceptually and empirically) on applications where inferential variable selection is most relevant e.g. causal settings, healthcare applications, and so on.",ICLR2020, +rkekvieWeN,1544780000000.0,1545350000000.0,1,ByGVui0ctm,ByGVui0ctm,meta-review,Reject,"The authors have proposed 3 continual learning variants which are all based on MNIST and which vary in terms of whether task ids are given and what the classification task is, and they have proposed a method which incorporates a symmetric VAE for generative replay with a class discriminator. The proposed method does work well on the continual learning scenarios and the incorporation of the generative model with the classifier is more efficient than keeping them separate. The discussion of the different CL scenarios and of related work is nice to read. However, the authors imply that these scenarios cover the space of important CL variants, yet they do not consider many other settings, such as when tasks continually change rather than having sharp boundaries. The authors have also only focused on the catastrophic forgetting aspect of continual learning, without considering scenarios where, e.g., strong forward transfer (or backwards transfer) is very important. Regarding the proposed architecture that combines a VAE with a softmax classifier for efficiency, the reviewers all felt that this was not novel enough to recommend publication.",ICLR2019,5: The area chair is absolutely certain +YV6Zo3uGmR,1576800000000.0,1576800000000.0,1,rklEj2EFvB,rklEj2EFvB,Paper Decision,Accept (Spotlight),"The authors derive a novel, unbiased gradient estimator for discrete random variables based on sampling without replacement. They relate their estimator to existing multi-sample estimators and motivate why we would expect reduced variance. Finally, they evaluate their estimator across several tasks and show that is performs well in all of them. + +The reviewers agree that the revised paper is well-written and well-executed. There was some concern about that effectiveness of the estimator, however, the authors clarified that ""it is the only estimator that performs well across different settings (high and low entropy). Therefore it is more robust and a strict improvement to any of these estimators which only have good performance in either high or low entropy settings."" Reviewer 2 was still not convinced about the strength of the analysis of the estimator, and this is indeed quantifying the variance reduction theoretically would be an improvement. + +Overall, the paper is a nice addition to the set of tools for computing gradients of expectations of discrete random variables. I recommend acceptance. + +",ICLR2020, +n2ZID6uFSe,1610040000000.0,1610470000000.0,1,txC1ObHJ0wB,txC1ObHJ0wB,Final Decision,Reject,"This paper presents an analysis of different tricks for training the super-network in NAS. While all reviewers see value in some of the many experiments, all reviewers also have substantial criticisms of the paper, and all reviewers gave weak rejection scores. + +Looking at the paper myself, I agree with this assessment. Several of the experiments are valuable, but there are also several substantial issues. + +One question that confused two reviewers and myself is about using sparse Kendall's tau as a metric that the authors in the rebuttal again state can be computed during super-net training to evaluate the quality, just like super-net accuracy. I don't see how that is possible. Kendall's tau measures the correlation between the ranks of the performances of the stand-alone architectures and the ordering implied by the super-net. Computing this requires access to the performance ranks of the stand-alone architectures. For tabular benchmarks this is of course available, but not in practical NAS applications. + +I would also like to echo the concern of AnonReviewer2 that too little information is given to fully understand what is shown in Figure 10. + +Some reviewers also questioned inhowfar the results generalize to the setting of the Once-for-all-network or BigNAS. This was not a deciding factor for me, since insights based on NAS-Bench101, 201 and a DARTS-like search spaces are already very useful. + +I agree with the reviewers that the authors' use of ""proxy"" is highly misleading. It is standard to refer to the low-fidelity model used for training as the proxy model. In contrast, the authors use it for the final evaluation model. + +Concerning the authors' five final take-aways: +1) I don't see how sparse Kendall's tau is actionable. +2) The batch normalization part is interesting, and I agree with the authors that it is useful to spell this out and analyze it, rather than just having one sentence in the paper as NB-201 and TuNAS, but the attribution that this has been done before is broken. ""In contrast to X"", rather than ""Like X"" +3) This is interesting, although I agree with AnonReviewer3 that I'm lacking intuition why a smaller learning rate should be useful for a less smooth space +4) The experiment on low fidelity estimates is very misleading. The proxy settings used during training are already low fidelity evaluations -- for the final evaluation, you would increase the number of channels, number of layers and number of epochs. Stating that the use of low fidelities is not useful is highly misleading. The authors' experiments only shows that the proxy model is already well chosen, and that if you reduce #layers or #channels and proportionally increase #epochs, performance gets worse. I encourage the authors to try searching without this proxy model, and I'm sure they will find that (which correlations might increase) the search process will be far too slow. +5) The insight on dynamic channeling appears very useful to me. + +In summary, I recommend rejection and encourage the authors to address the points raised by the reviewers and in this meta-review.",ICLR2021, +maXNWVPu8_,1610040000000.0,1610470000000.0,1,trj4iYJpIvy,trj4iYJpIvy,Final Decision,Reject,"The paper proposes three algorithms for the sparse PCA problem, where one imposes the additional constraint that the vectors have a small number of non-zero entries. The proposed algorithms run in polynomial time and achieve provable approximation guarantees on the accuracy and sparsity. The reviewers identified the following strengths of the contributions: the algorithms are simple and have different strengths; the theoretical results are sound and perhaps even surprising; the presentation is clear. The reviewers identified the following weaknesses of the contributions: the running times of the proposed algorithms are high and they may not scale to large datasets, which significantly limits their application to machine learning datasets; the experimental evaluation is insufficient and it does not compare with some of the state of the art algorithms; the algorithmic novelty is limited. After weighing these strengths and weaknesses as well as evaluating the paper relative to other ICLR submissions, I recommend reject.",ICLR2021, +H16uIyTHG,1517250000000.0,1517260000000.0,822,H1BHbmWCZ,H1BHbmWCZ,ICLR 2018 Conference Acceptance Decision,Reject,"Reviewers unanimous on rejection. +Authors don't maintain anonymity. +No rebuttal from authors. +Poorly written",ICLR2018, +BkChnGLug,1486400000000.0,1486400000000.0,1,BkIqod5ll,BkIqod5ll,ICLR committee final decision,Reject,"This work studies the problem of generalizing a convolutional neural network to data lacking grid-structure. + + The authors consider the Random Walk Normalized Laplacian and its finite powers to define a convolutional layer in a general graph. Experiments in Merck molecular discovery and mnist are reported. + + The reviewers all agreed that this paper, while presenting an interesting and important problem, lacks novelty relative to existing approaches. In particular, the AC would like to point out that important references seem to be missing from the current version. + + The proposed approach is closely related to 'Convolutional neural networks on graphs with fast localized spectral filtering', Defferrand et al. NIPS'16 , which considers Chevyshev polynomials of the Laplacian and learns the polynomial coefficients in an efficient manner. Since the Laplacian and the Random Walk Normalized Laplacian are similar operators (have same eigenvectors), the resulting model is essentially equivalent. Another related model that precedes all the cited works and is deeply related to the current submission is the Graph Neural Network from Scarselli et al.; see 'Geometric Deep Learning: going beyond Euclidean Data', Bronstein et al, https://arxiv.org/pdf/1611.08097v1.pdf' and references therein for more detailed comparisons between the models.",ICLR2017, +HyPg7k6Bz,1517250000000.0,1517260000000.0,62,SyqShMZRb,SyqShMZRb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper presents a more complex version of the grammar-VAE, which can be used to generate structured discrete objects for which a grammar is known, by adding a second 'attribute grammar', inspired by Knuth. + +Overall, the idea is a bit incremental, but the space is wide open and I think that structured encoder/decoders is an important direction. The experiments seem to have been done carefully (with some help from the reviewers) and the results are convincing.",ICLR2018, +TLsRXk8Tym,1576800000000.0,1576800000000.0,1,H1efEp4Yvr,H1efEp4Yvr,Paper Decision,Reject,"The authors develop theoretical results showing that policy gradient methods converge to the globally optimal policy for a class of MDPs arising in econometrics. The authors show empirically that their methods perform on a standard benchmark. + +The paper contains interesting theoretical results. However, the reviewers were concerned about some aspects: +1) The paper does not explain to a general ML audience the significance of the models considered in the paper - where do these arise in practical applications? Further, the experiments are also limited to a small MDP - while this may be a standard benchmark in econometrics, it would be good to study the algorithm's scaling properties to larger models as is standard practice in RL. + +2) The implications of the assumptions made in the paper are not explained clearly, nor are the relative improvements of the authors' work relative to prior work. In particular, one reviewer was concerned that the assumptions could be trivially satisfied and the authors' rebuttal did not clarify this sufficiently. + +Thus, I recommend rejection but am unsure since none of the reviewers nor I am an expert in this area.",ICLR2020, +EGdrL5lcgr,1576800000000.0,1576800000000.0,1,BJe-_CNKPH,BJe-_CNKPH,Paper Decision,Reject,"This paper investigates the degree to which we might view attention weights as explanatory across NLP tasks and architectures. Notably, the authors distinguish between single and ""pair"" sequence tasks, the latter including NLI, and generation tasks (e.g., translation). The argument here is that attention weights do not provide explanatory power for single sequence tasks like classification, but do for NLI and generation. Another notable distinction from most (although not all; see the references below) prior work on the explainability of attention mechanisms in NLP is the inclusion of transformer/self-attentive architectures. + +Unfortunately, the paper needs work in presentation (in particular, in Section 3) before it is ready to be published.",ICLR2020, +4h7b0CpDeLQ,1642700000000.0,1642700000000.0,1,KFUWHgRYEDF,KFUWHgRYEDF,Paper Decision,Reject,The submission considers a method involving adversarial training to speed up the fine-tuning of large pre-trained transformer language models. Reviewers consider it to be a borderline paper. Many suggestions are made by the reviewers which will help improve the presentation and substance and make it more useful for the community.,ICLR2022, +LyXwrtrXM2,1576800000000.0,1576800000000.0,1,SyeKGgStDB,SyeKGgStDB,Paper Decision,Reject,Paper is withdrawn by authors.,ICLR2020, +QBdxGa4A-3,1610040000000.0,1610470000000.0,1,sxZvLS2ZPfH,sxZvLS2ZPfH,Final Decision,Reject,Three reviewers agreed to reject and the other reviewer also suggested it is below the threshold.,ICLR2021, +eA2dOePHyve,1642700000000.0,1642700000000.0,1,mqIeP6qPvta,mqIeP6qPvta,Paper Decision,Reject,"This paper introduces an architecture that uses pooling regions and +eye movements to sequentially build up an object representation. A +confidence threshold is used to allow recognition in less time for +easier images. + +There was a lot of disagreement on this paper. Those in favor argued +that it is a worthy endeavor to explore new biologically motivated +architectures and foveated eye movements are an important aspect of +human vision that is worth exploring for computer vision. Another pro +was the improved robustness to some adversarial attacks. Those +arguing for not accepting the paper, argued that classification +performance is not improved over SOTA and that more ablation studies +should be done to better understand the role and importance of the +various aspects of the model and how they differ from other +architectural designs with dilated convolutions instead of the +foveation module. + +I agree that more ablation studies would be useful to better +understand the role of the different model components. While I +feel that this novel sequential processing algorithm is worth publishing to +increase activity in this area, I feel it would be best received after further +studies help clarify the importance of different aspects of the model. +I recommend resubmission after further analysis.",ICLR2022, +-1XP4RgX0Y4,1610040000000.0,1610470000000.0,1,6puUoArESGp,6puUoArESGp,Final Decision,Accept (Poster),"The paper has merits on providing a particular way of understanding a prediction model based on auxiliary data (concepts). I have a generally more positive view of it, aligned with the higher-scoring reviews. However, I feel a bit uncomfortable of framing it as ""causal"" in the sense it does not aim to provide any causal predictions, but it is more of a smoothing method for capturing signal contaminated with ""uninteresting"" latent sources - this is more akin to regression with measurement error (see e.g. Carroll, Ruppert and Stefanski's ""Nonlinear regression with measurement error"") where, like in this paper, different definitions of ""instrumental variables"" also exist and are different from the causal inference definition. I can see though why we may want to provide a causal interpretation in order to justify particular assumptions, not unlike interesting lines of work from Scholkopf's take on causality. The paper can be strengthened by some further discussion on the assumptions made about additivity on equations (2) and (3), which feel strong and not particularly welcome in many applications. + +The proposed title is still a bit clunky, I feel that the two-stage approach is less important than the structural assumptions made, perhaps a title emphasizing the latter rather than the former would be more promising.",ICLR2021, +bpgYUcvJa_u,1610040000000.0,1610470000000.0,1,1dm_j4ciZp,1dm_j4ciZp,Final Decision,Reject,"This paper was referred to the ICLR 2021 Ethics Review Committee based on concerns about a potential violation of the ICLR 2021 Code of Ethics (https://iclr.cc/public/CodeOfEthics) raised by reviewers. The paper was carefully reviewed by two committe members, who provided a binding decision. The decision is ""Significant concerns (Do not publish)"". Details are provided in the Ethics Meta Review. As a result, the paper is rejected based Ethics Review Committee's decision . + +The technical review and meta reviewing process moved proceeded independently of the ethics review. The result is as follows: + +This paper studies the problem of evaluating optimiser's performance, which is important to show whether real progress in research has been made. It proposes several evaluation protocols, and used Hyperband (Li et al. 2017) to automate the tuning of each optimiser in the bench-marking study. Evaluations have been conducted on a wide range of deep learning tasks, and the paper reaches to a conclusion that none of the recently proposed optimisers in evaluation can uniformly out-perform Adam in all the tasks in consideration. + +Reviewers agreed that the evaluations are extensive, however there are some shared concerns among reviewers. The paper argues that manual hyper-parameter tuning by humans is the right behavior to target for, which is the motivation to use Hyperband as an automating tool, and there is a human study to demonstrate that Hyperband tuning resembles human tuning behaviour. Some reviewers questioned about this desiderata choice that favours human tuning behaviour, also concerns on how the human study is conducted (and to what extend the human study itself is reflective enough for the human tuning behaviour in general). + +Personally I welcome any empirical study that aims at understanding the real progress of a research topic, and I agree it is important to make rigorous automation tools in order to enable such a large scale study. Therefore, while the presented results are extensive, I would encourage the authors to incorporate the feedback from the reviewers to better examine their assumptions. ",ICLR2021, +zIrgeChvN4,1576800000000.0,1576800000000.0,1,Bke7MANKvS,Bke7MANKvS,Paper Decision,Reject,"This is an interesting paper that aims to redefine generalization based on the difference between the training error and the inference error (measured on the empirical sample set), rather than the test error. The authors propose to improve generalization in image classification by augmenting the input with encodings of the image using a source code, and learn this encoding using the compression distance, an approximation of the Kolmogorov complexity. They show that training in this fashion leads to performance that is more robust to corruption and adversarial perturbations that exist in the empirical sample set. + +Reviewers agree on the importance of this topic and the novelty of the approach, but there continue to exist sharp disagreement in the ratings. Most have concerns about the formalism and clarity in the presentation. Especially given that the paper is 10 pages, it should be evaluated against a more rigorous standard, which doesn't appear to be met. I encourage the authors to consider a rewrite with a goal towards clarity for a more general ML audience and resubmit for a future conference. +",ICLR2020, +aqGf8e_0ax,1576800000000.0,1576800000000.0,1,rJgJDAVKvB,rJgJDAVKvB,Paper Decision,Accept (Spotlight),All reviewers unanimously accept the paper.,ICLR2020, +wvxD7G-G4N,1576800000000.0,1576800000000.0,1,HJgEe1SKPr,HJgEe1SKPr,Paper Decision,Reject,"This paper proposes to use GMM as the latent prior distribution of GAN. The reviewers unanimously agree that the paper is not well motivated, explanations are lacking and writing needs to be substantially improved. ",ICLR2020, +P-rfT5kPalR,1610040000000.0,1610470000000.0,1,30SS5VjvhrZ,30SS5VjvhrZ,Final Decision,Reject,"This paper proposes an approach to estimating uncertainty in deep neural network models that avoids the need to make multiple forward passes through a network or through multiple individual models in a posterior ensemble. In terms of strengths, this is an important and timely topic that is of significant interest. The paper is clearly written for the most part. In terms of weaknesses, the significance of the work is low. As the reviewers note, there are multiple questions around the experimental evaluation that remain unresolved following the author feedback and discussion. In particular, the authors do not compare to baseline MCMC methods like HMC/SGHMC that can yield gold standard estimates of posterior predictive uncertainty. While not feasible for large-scale models, MCMC methods provides crucial sanity checks for uncertainty estimation on small-scale (e.g., MNIST-scale) models. Posterior distillation methods like Bayesian Dark Knowledge are also not considered in the evaluation and should be compared to where the distillation computation is feasible. There are also foundational technical correctness issues with respect to uncertainty quantification due to the fact that the paper is approximating the measure of uncertainty produced by MC Dropout, which itself only approximates the true Bayesian posterior predictive distribution under additional assumptions. This makes empirical comparissons to MCMC methods all the more important. Following the discussion, the reviewers agree that the paper is not yet ready for publication.",ICLR2021, +H1vRIJpSf,1517250000000.0,1517260000000.0,897,SJD8YjCpW,SJD8YjCpW,ICLR 2018 Conference Acceptance Decision,Reject,"An empirical study of weight sharing for neural networks is interesting, but all of the reviewers found the experiments insufficient without enough baseline comparisons.",ICLR2018, +VIOBObL07F,1576800000000.0,1576800000000.0,1,r1eOnh4YPB,r1eOnh4YPB,Paper Decision,Reject,"This paper seeks to understand the effect of learning rate decay in neural net training. This is an important question in the field and this paper also proposes to show why previous explanations were not correct. However, the reviewers found that the paper did not explain the experimental setup enough to be reproducible. Furthermore, there are significant problems with the novelty of the work due to its overlap with works such as (Nakiran et al., 2019), (Li et al. 2019) or (Jastrzębski et al. 2017).",ICLR2020, +rkeXvvDrxE,1545070000000.0,1545350000000.0,1,BylRVjC9K7,BylRVjC9K7,reject,Reject,The reviewers have agreed this paper is not ready for publication at ICLR. ,ICLR2019,5: The area chair is absolutely certain +ryo0NkpHf,1517250000000.0,1517260000000.0,469,BJ4prNx0W,BJ4prNx0W,ICLR 2018 Conference Acceptance Decision,Reject,"This paper is novel, but relatively incremental and relatively niche; the reviewers (despite discussion) are still unsure why this approach is needed.",ICLR2018, +yW-Hfu6mlup,1610040000000.0,1610470000000.0,1,gwnoVHIES05,gwnoVHIES05,Final Decision,Accept (Poster),"While much of generative modeling is tasked with the goal of generating content within the data distribution, the motivation of this work is to examine whether ML techniques can generate creative content. This work has 2 core contributions: + +1) Two new datasets of creative sketches: birds and creatures, that have part annotations (size ~ 10K samples for each set). The way the datasets are structured with the body part annotations will facilitate the creativity aspect of the approach later described in the paper. + +2) This paper propose a GAN model that is part-based, which they call DoodlerGAN. It is inspired partly by the human's creative process of sequential drawing. Here, the trained model determines the appropriate order of parts to generate, which makes the model well suited for human-in-the-loop interactive interfaces in creative applications where it can make suggestions based on user drawn partial sketches. + +They show that the proposed model, trained on the part-annotated datasets, are able to generate unseen compositions of birds and creatures with novel body part configurations for creative sketches. They conduct human evaluation and also quantitative metrics to show the superiority of their approach (for human preference, and also FID score). + +Many reviewers, including R1 and myself observe that the datasets provided, along with the parts-based labeling and modeling approach are a clear advantage over existing datasets and methodology. With ever growing importance of generative models used in real world applications, including the creative industry, I believe this paper provides a much needed fresh take on creative ways of using our generative models besides making them larger, or achieving better log-likelihood scores. Many reviewers, including R3, would think that this work is indeed a ""Delightful, well written paper! I have concerns about its fit here."" I strongly believe such works in fact definitely *do* belong at ICLR, and I think this work has the potential to get researchers in the generative modeling community to rethink what they are really optimizing for. + +I believe this paper will be a great addition to ICLR2021, and I look forward to see their presentation to the community to spark more creativity in our research endeavors. For this reason, I'm strongly recommending an acceptance (Poster).",ICLR2021, +sf9UOn3L3k,1610040000000.0,1610470000000.0,1,kE3vd639uRW,kE3vd639uRW,Final Decision,Accept (Poster),"The paper presents a bidirectional pooling layer inspired by the classical Lifting scheme from signal processing. LiftDownPool is able to preserve structure and details in different sub-bands, whereas LiftUpPool is able to generate a refined up sampled feature map using the detail sub-bands. This is very useful for image to image translation tasks and all tasks that involve up scaling. +This is a solid contribution with extensive and thorough experiments and direct practical usage, clear accept. +",ICLR2021, +hxOF_5_TyHY,1610040000000.0,1610470000000.0,1,ahAUv8TI2Mz,ahAUv8TI2Mz,Final Decision,Accept (Poster),"The submission combines meta-learning and attention mechanism for generalised zero-shot learning. The image-guided attention on the semantic space helps to adapt the better class specific semantic information while separate experts operate on the seen and unseen classes. The unseen class expert is trained with the pseudo negative samples with pseudo negative labels. Meta-learning based training adapts the model to few-shot learning scenario. The submission has received two accept, two weak accept and one weak reject reviews. All reviewers found the methodology interesting but they found it moderately novel. The experimental evaluation has been found strong. The rebuttal addressed all the reviewers' concerns and during the discussion phase all reviewers recommended acceptance. The meta reviewer follows the consensus of all the reviewers and recommends acceptance.",ICLR2021, +srobydp_2oh,1610040000000.0,1610470000000.0,1,6puCSjH3hwA,6puCSjH3hwA,Final Decision,Accept (Spotlight),"All the reviewers are positive about the paper; R2 and R3 voted for clear accept. Overall, all the reviewers feel that evolution is comprehensive and the results are decent. There is a novel objective formulation that controls for motion diversity, disentanglement and content matching, outperforming existing methods across multiple datasets. High-res videos at 1024x1024 are generated and there is cross-domain video generation. Many good questions were raised by the reviewers, and they were addressed in details in the rebuttal. In particular, the question about subtle motion and short video sequences was raised (which was the concern that the AC had). The AC agrees with the reviewers that the paper warrants a publication. Please address the questions raised by the reviewers in the final version. +",ICLR2021, +u0GJlWsq_,1576800000000.0,1576800000000.0,1,Hye_V0NKwr,Hye_V0NKwr,Paper Decision,Accept (Poster),"This paper investigates the role of locality (ability to encode only information specific to locations of interest) and compositionality (ability to be expressed as a combination of simpler parts) in Zero-Shot Learning (ZSL). Main contributions of the paper are (i) compared to previous ZSL frameworks, the proposed approach is that the model is not allowed to be pretrained on another dataset (ii) a thorough evaluation of existing methods. + +Following discussions, weaknesses are (i) the proposed method (CMDIM) isn't sufficiently different or interesting compared to existing methods (ii) the paper does not do an in-depth discussion of locality and compositionality. The empirical evaluation being extensive, the accept decision is chosen. +",ICLR2020, +19ZjyItEumci,1642700000000.0,1642700000000.0,1,kQMXLDF_z20,kQMXLDF_z20,Paper Decision,Reject,"This paper introduces a new layer for graph neural networks that aims to reduce the oversmoothing issue common to this model type. The reviewers find the paper well organized and easy to follow, and they recognize the importance of the problem that is addressed. However, they also identify critical errors in the mathematical derivations: the authors did not provide a response to the reviews, and hence these errors remain unaddressed. In addition, multiple reviewers indicate they find the experimental evaluation insufficient. For these reasons I'm recommending rejecting this paper.",ICLR2022, +IbZNoCnNdd,1610040000000.0,1610470000000.0,1,N3zUDGN5lO,N3zUDGN5lO,Final Decision,Accept (Poster),"The paper shows that using graph neural networks to address multi-task control problems with incompatible environments does not provide benefits to the learning process. The authors instead propose to use Transformers as a simpler mechanism to be able to train and discover the helpful morphological distinctions between agents in order to better solve multitask reinforcement learning problems. + +The paper is well written and the analysis of the literature has been appreciated. The contribution is original and relevant to the community. + +All the reviewers agree that this paper deserves acceptance. We invite the authors to modify the paper by following the suggestions provided by the reviewers. In particular: +- improve the analysis of the empirical results +- update the plots +- add the suggested references",ICLR2021, +FFSNqtyup1,1576800000000.0,1576800000000.0,1,rkeIIkHKvS,rkeIIkHKvS,Paper Decision,Accept (Poster),Two reviewers are positive about this paper while the other reviewer is negative. The low-scoring reviewer did not respond to discussions. I also read the paper and found it interesting. Thus an accept is recommended.,ICLR2020, +mKPxBqJzVJt,1642700000000.0,1642700000000.0,1,c8AvdRAyVkz,c8AvdRAyVkz,Paper Decision,Reject,"The paper focuses on the Catastrophic Overfitting problem of adversarial training of FGSM. One reviewer gave a score of 6 and the other three reviewers gave negative scores. The authors failed to address or clarify (no rebuttal provided) how perturbation distribution and robustness are linked (four reviewers all agree on this). Other issues include unclear motivation, limited experiments validation, and lack of theoretical analysis. Thus, the current version of the paper cannot be accepted to ICLR.",ICLR2022, +yG8mtYohKDUd,1642700000000.0,1642700000000.0,1,gdWQMQVJST,gdWQMQVJST,Paper Decision,Reject,"This paper proposes a novel Federated Learning (FL) framework that leverages the Neural Tangent Kernel (NTK), to replace the gradient-descent algorithm for optimization. Specifically, the workers upload the labels and the Jacobian matrices to the server, and the server uses the tools from the NTK to obtain a trained neural network. However since this could lead to increased communication cost and compromise of data privacy, the authors propose data sampling and random projection techniques to alleviate the problem. The authors provide a theoretical analysis that the proposed scheme has a faster convergence than FedAvg under specific assumptions, and experimentally validate that it significantly outperforms previous FL algorithms, achieving similar test accuracy to ideal centralized cases. + +Pros +- The idea of using NTK for model optimization without gradient descent and use of it in the FL setting is both interesting and novel. +- The paper properly discusses and tackles the new challenges posed by the introduction of the new method. +- The paper is well-organized and clearly written, with sufficient discussion of related works and backgrounds. + +Cons + +- The proposed method puts heavy computational burdens on the server-side. +- The method violates the privacy preserving feature of FL by its nature, and while the proposed compression shuffling alleviates the concern, more discussion is necessary. +- Missing comparison against popular baselines such as FedProx and SCAFFOLD. +- The faster convergence of the proposed method in comparison to FedAvg depends on the learning rate and is not always true. +- There is a gap between the theory and practice, which makes the practicality of the algorithm still questionable. + +Although the reviewers found the idea as novel, the proposed techniques for alleviating communication cost and privacy concerns convincing, and considered both the theoretical analysis and experimental validation thorough, all reviewers leaned toward rejection due to critical concerns unanswered. During the discussion period, the authors alleviate many of the minor concerns from the reviewers, but there were still remaining concerns on the gap between the theory and practice on its convergence behavior, and insufficient discussion of the privacy-preserving feature of the proposed method, as well as shifting of computation burdens to the server. Thus, the reviewers reached a consensus that the paper is not yet ready for publication. + +Despite the low average score, the novelty of the idea and the quality of the paper is much higher than those of the accepted papers in my batch, and I strongly believe that this will become a high impact paper, if remaining concerns from the reviewers are properly resolved.",ICLR2022, +HJxOCYLexV,1544740000000.0,1545350000000.0,1,r1xwKoR9Y7,r1xwKoR9Y7,A great starting point for ML-assisted ITP,Accept (Poster),"This paper provides an RL environment defined over Coq, allowing for RL agents and other such systems to to be trained to propose tactics during the running of an ITP. I really like this general line of work, and the reviewers broadly speaking did as well. The one holdout is reviewer 3, who raises important concerns about the need for further evaluation. I understand and appreciate their points, and I think the authors should be careful to incorporate their feedback not only in final revisions to the paper, but in deciding what follow-on work to focus on. Nonetheless, and with all due respect to reviewer 3, who provided a review of acceptable quality, I am unsure the substance of their review merits a score as low as they have given. Considering the support the other reviews offer for the paper, I recommend acceptance for what the majority of reviewers believes is a good first step towards one day proving substantial new theorems using ITP-ML hybrids.",ICLR2019,4: The area chair is confident but not absolutely certain +rk1bQypBM,1517250000000.0,1517260000000.0,69,SyZipzbCb,SyZipzbCb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"As identified by most reviewers, this paper does a very thorough empirical evaluation of a relatively straightforward combination of known techniques for distributed RL. The work also builds on ""Distributed prioritized experience replay"", which could be noted more prominently in the introduction.",ICLR2018, +EnRpKaETA,1576800000000.0,1576800000000.0,1,rkgAGAVKPr,rkgAGAVKPr,Paper Decision,Accept (Poster),"While the reviewers have some outstanding issues regarding the organization and clarity of the paper, the overall consensus is that the proposed evaluation methods is a useful improvement over current standards for meta-learning.",ICLR2020, +HkeDLTW-lV,1544790000000.0,1545350000000.0,1,SkGH2oRcYX,SkGH2oRcYX,novelty not well justified,Reject,"The paper presents an action conditioned video prediction method that combines previous losses in the literature, such as, perceptual, adversarial and infogan type of losses. The reviewers point out the lack of novelty in the formulation, as well as the lack of experiments that would verify its usefulness in model based RL. There is no rebuttal thus no ground for discussion or acceptance.",ICLR2019,5: The area chair is absolutely certain +BJxQk8Y4gN,1545010000000.0,1545350000000.0,1,H1xAH2RqK7,H1xAH2RqK7,Not ready for publication at ICLR,Reject,"While there was some support for the ideas presented, the majority of the reviewers did not think the submission was ready for presentation at ICLR. Concerns raised included that the experiments needed more work, and the paper needs to do a better job of distinguishing the contributions beyond those of past work.",ICLR2019,5: The area chair is absolutely certain +bES_ulzIbCIv,1642700000000.0,1642700000000.0,1,RbVp8ieInU7,RbVp8ieInU7,Paper Decision,Reject,"Overall this paper was discussed at length given the high variance in scores, and it was ultimately felt that the paper was a borderline paper and there was not enough enthusiasm to warrant acceptance. Several concerns in the discussion could not be resolved, in particular the bounds might not be tight, or even useful, and more explanation on the dependence of various parameters involved and assumptions involved is needed. Specifically, as pointed out by a reviewer, there is a concern about the parameter epsilon_3. It seems for natural input distributions epsilon_3 would be so small that the upper bound would scale as n^3 (given the 1/epsilon_3^2 dependence), which is then trivial since it is larger than n. The reviewers were not satisfied with the authors response regarding this.",ICLR2022, +SkIuEyaHf,1517250000000.0,1517260000000.0,385,ryDNZZZAW,ryDNZZZAW,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"Pros +-- Lays out bounds for multi-domain adaptation based on earlier work on a single source-target domain pair. +-- Shows gains over choosing the best source domain for a target domain, or naively combining domains. + +Cons +-- The reviewers agree that the extensions are relatively straightforward extensions to single source-target pair. +-- Hard-max doesn’t consider the partial contribution of multiple source domains, and considers the worst-case scenario. +-- Soft-max addresses some of these issues; the authors provide reasonable justification for the algorithm but it’s not clear that the specific choice of \alphas leads to the tightest bound. + +The reviewers noted that the authors significantly improved the paper during the revision process. The AC feels that the presented techniques would be of interest to the community and would help lead discussions towards theoretically optimal ways to do domain adaptation given multiple domains. The authors are therefore encouraged to submit to the workshop track. +",ICLR2018, +0CxuDoTk1vE,1610040000000.0,1610470000000.0,1,9r30XCjf5Dt,9r30XCjf5Dt,Final Decision,Accept (Poster),"The paper focuses on adversarial attacks for RL, which is an exciting understudied research direction, and can be of interest to the community. All the reviewers are (mildly) positive about the paper and the author competently replied to the concerns expressed by the reviewers. ",ICLR2021, +S1lKcAwHl4,1545070000000.0,1545350000000.0,1,BkxkH30cFm,BkxkH30cFm,an interesting formulation for 2D dynamics learning not clearly described,Reject,"This paper tackles a very valuable problem of learning object detection and object dynamics from video sequences, and builds upon the method of Zhu et al. 2018. The reviewers point out that there is a lot of engineering steps in the object proposal stage, which takes into account background subtraction to propose objects. In its current form, the writing of the paper is not clear enough on the object instantiation part, which is also the novel part over Zhu et al., potentially due to the complexity of using motion to guide object proposals. A limitation of the proposed formulation is that it works for moving cameras but only in 2d environments. Experiments on 3D environments would make this paper a much stronger submission. ",ICLR2019,5: The area chair is absolutely certain +IEvzkBmrU,1576800000000.0,1576800000000.0,1,Hye1kTVFDS,Hye1kTVFDS,Paper Decision,Accept (Poster),"Existing implementation of information bottleneck need access to privileged information which goes against the idea of compression. The authors propose variational bandwidth bottleneck which estimates the value of the privileged information and then stochastically decided whether to access this information or not. They provide a suitable approximation and show that their method improves generalisation in RL while reducing access to expensive information. + +These paper received only two reviews. However, both the reviews were favourable. During discussions with the AC the reviewers acknowledged that most of their concerns were addressed. R2 is still concerned that VBB does not result in improvement in terms of sample efficiency. I request the authors to adequately address this in the final version. Having said that, the paper does make other interesting contributions, hence I recommend that this paper should be accepted.",ICLR2020, +9nnmo0TBcZt,1642700000000.0,1642700000000.0,1,iUuzzTMUw9K,iUuzzTMUw9K,Paper Decision,Accept (Poster),"An interesting paper on combining NerFs with StyleGAN to get high-quality and high-resolution 3d aware generative models. The results are very good visually and also allow interactive speeds. The technique is natural and concurrent papers are proposing variations + +The reviewers identified a few limitations including that the nerf does not have a viewing direction and also seems limited to aligned objects with a common structure, like faces. Still the results are very interesting and suitable for publication.",ICLR2022, +r1x060syx4,1544700000000.0,1545350000000.0,1,rye4g3AqFm,rye4g3AqFm,An interesting addition to the deep learning theory literature,Accept (Poster),"Dear authors, + +There was some disagreement among reviewers on the significance of your results, in particular because of the limited experimental section. + +Despite this issues, which is not minor, your work adds yet another piece of the generalization puzzle. However, I would encourage the authors to make sure they do not oversell their results, either in the title or in their text, for the final version.",ICLR2019,4: The area chair is confident but not absolutely certain +rXRu5wiPXVF,1610040000000.0,1610470000000.0,1,bMzj6hXL2VJ,bMzj6hXL2VJ,Final Decision,Reject,"In this paper, the authors propose an RL-based method for learning DAGs based on searching over causal orders instead of graphs. Order search for learning DAGs is a well-studied problem, and it is well-known that this can relieve some of the burden of searching through the space of DAGs. Several reviewers raised legitimate concerns regarding the experiments, and without identifiability or theoretical results to advance the state of the art, the contribution of this work is limited.",ICLR2021, +H1zuHyTHM,1517250000000.0,1517260000000.0,594,rkWN3g-AZ,rkWN3g-AZ,ICLR 2018 Conference Acceptance Decision,Reject,This paper was reviewed by 3 expert reviewers. All three recommend rejection citing significant concerns (e.g. missing baselines).,ICLR2018, +SttNyKHM11,1576800000000.0,1576800000000.0,1,H1lma24tPB,H1lma24tPB,Paper Decision,Accept (Talk),"All the reviewers agreed that this was a sensible application of mostly existing ideas from standard neural net initialization to the setting of hypernetworks. The main criticism was that this method was used to improve existing applications of hypernets, instead of extending their limits of applicability.",ICLR2020, +rkY6EkarG,1517250000000.0,1517260000000.0,454,SkfNU2e0Z,SkfNU2e0Z,ICLR 2018 Conference Acceptance Decision,Reject,"This paper presents a toolbox for the exploration of layerwise-parallel deep neural networks. The reviewers were consistent in their analysis of this paper: it provided an interesting class of models which warranted further investigation, and that the toolbox would be useful to those who are interested in exploring further. However, there was a lack of convincing examples, and also some concern that Theano (no longer maintained) was the only supported backend. The authors responded to say that they had subsequently incorporated TensorFlow support, they were not able to provide any more examples due to several reasons: “time, pending IP concerns, open technical details, sufficient presentation quality, page restriction.” I agree with the consensus reached by the reviewers.",ICLR2018, +SJeP3vKEl4,1545010000000.0,1545350000000.0,1,rkzUYjCcFm,rkzUYjCcFm,meta-review,Reject,"This paper was reviewed by three experts. After the author response, R2 and R3 recommend rejecting this paper citing concerns of novelty and experimental evaluation. R1 assigns it a score of ""6"" but in comments agrees that the manuscript is not ready for ICLR. The AC finds no basis for accepting this paper in this state. +",ICLR2019,4: The area chair is confident but not absolutely certain +BkcgazIul,1486400000000.0,1486400000000.0,1,SkwSJ99ex,SkwSJ99ex,ICLR committee final decision,Reject,The proposed method doesn't have enough novelty to be accepted to ICLR.,ICLR2017, +#NAME?,1610040000000.0,1610470000000.0,1,xiwHM0l55c3,xiwHM0l55c3,Final Decision,Reject,"Thank you for your submission to ICLR. The reviewers unanimously felt that there were substantial issues with this work, owing to the fact that both the techniques and applications have been considered in a great deal of previous work. Furthermore, the manuscript itself needs substantial amounts of revision before being suitable for publication. As there was no response to these points during the rebuttal period, it seems clear that the paper can't be accepted in its current form.",ICLR2021, +SkgvVInfgV,1544890000000.0,1545350000000.0,1,BJxmXhRcK7,BJxmXhRcK7,Numerous concerns.,Reject,"AR1 is concerned about the poor organisation of this paper. AR2 is concerned about the similarity between TRL and TR. The authors show some empirical results to support their intuition, however, no theoretical guarantees are provided regarding TRL superiority. Moreover, experiments for the Taskonomy dataset as well as on RNN have not been demonstrated, thus AR2 did not increase his/her score. AR3 is the most critical and finds the clarity and explanations not ready for publication. + +AC agrees with the reviewers in that the proposed idea has some merits, e.g. the reduction in the number of parameters seem a good point of this idea. However, reviewer urges the authors to seek non-trivial theoretical analysis for this method. Otherwise, it indeed is just an intelligent application paper and, as such, it cannot be accepted to ICLR.",ICLR2019,5: The area chair is absolutely certain +Bf1Hf3JXT7,1610040000000.0,1610470000000.0,1,giit4HdDNa,giit4HdDNa,Final Decision,Accept (Poster),"This paper introduces a few variants of neural ODE architectures to improve their expressivity. The motivation and method make sense, but are fairly incremental. The tasks are also fairly low dimensional and as one reviewer pointed out, reconstruction isn't a good benchmark task. + +However, the paper seems well-executed, and the rebuttals answered the expert rewiewers' concerns.",ICLR2021, +Syel9-Y7gV,1544950000000.0,1545350000000.0,1,BkfbpsAcF7,BkfbpsAcF7,An interesting angle with some issues in terms of execution,Accept (Poster),"This paper studies the roots of the existence of adversarial perspective from a new perspective. This perspective is quite interesting and thought-provoking. However, some of the contributions rely on fairly restrictive assumptions and/or are not properly evaluated. + +Still, overall, this paper should be a valuable addition to the program. ",ICLR2019,4: The area chair is confident but not absolutely certain +EmXk-ig5Ogt,1642700000000.0,1642700000000.0,1,NuzF7PHTKRw,NuzF7PHTKRw,Paper Decision,Reject,"I thank the authors for their submission and active participation in the discussions. This papers is borderline with three reviewers leaning towards acceptance [3c96,7T33,Zhvq] and one leaning towards rejection [o38w]. Reviewer o38w's main concerns are around the lack of details about how the baselines were tuned and missing training details (specifically the connectivity test used to reject candidate environments). During discussion both, reviewers Zhvq and 7T33, agree that the paper requires substantial restructuring/rewriting to properly address the reviewer's feedback which is currently mostly addressed in the appendix. Based on the discussion with reviewers, my assessment is that this paper is not ready for publication at this point and that it will benefit greatly from another iteration. I want to very strongly encourage the authors to further improve their paper based on the reviewer feedback.",ICLR2022, +Jdhi3-kYzN,1642700000000.0,1642700000000.0,1,YZHES8wIdE,YZHES8wIdE,Paper Decision,Accept (Spotlight),"This paper proposes an alternative approach to epsilon-greedy exploration by instead generating multi-step plans from an RNN, and then stochastically determining whether to continue with the plan or re-plan. The reviewers agreed that this idea is novel and interesting, that the paper is well-written, and that the evaluations are convincing, showing large improvements over epsilon-greedy exploration and more consistently strong performance than other baselines. While the original reviews contained some questions around discussion of related work and the simplicity of the evaluation environments, the reviewers felt these concerns were adequately addressed by the rebuttal. I agree the paper explores a very interesting idea and convincingly demonstrates its potential, and should be of wide interest to the deep RL community especially as it touches on many different subfields of RL: MBRL/planning, exploration, HRL, navigation, etc. I recommend acceptance as a spotlight presentation.",ICLR2022, +OnMe7ECmD,1576800000000.0,1576800000000.0,1,BJxYUaVtPB,BJxYUaVtPB,Paper Decision,Reject,"This paper investigates neural networks for group comparison -- i.e., deciding if one group of objects would be preferred over another. The paper received 4 reviews (we requested an emergency review because of a late review that eventually did arrive). R1 recommends Weak Reject, based primarily on unclear presentation, missing details, and concerns about experiments. R2 recommends Reject, also based on concerns about writing, unclear notation, weak baselines, and unclear technical details. In a short review, R3 recommends Weak Accept and suggests some additional experiments, but also indicates that their familiarity with this area is not strong. R4 also recommends Weak Accept and suggests some clarifications in the writing (e.g. additional motivation future work). The authors submitted a response and revision that addresses many of these concerns. Given the split decision, the AC also read the paper; while we see that it has significant merit, we agree with R1 and R2's concerns, and feel the paper needs another round of peer review to address the remaining concerns.",ICLR2020, +Lz1zmGO95NI,1610040000000.0,1610470000000.0,1,HfnQjEN_ZC,HfnQjEN_ZC,Final Decision,Reject,"This paper initially received three negative reviews: 4,4,4. The main concerns of the reviewers included limited methodological novelty and an oversimplistic experimental setup. The authors did not submit their responses. +As a result, the final recommendation is reject.",ICLR2021, +1-dp4_055r,1642700000000.0,1642700000000.0,1,dmq_-R2LhQk,dmq_-R2LhQk,Paper Decision,Reject,This paper studies the following hypothesis that gradient-based explanations are more meaningful the more they are aligned with the tangent space of the data manifold. The reviews are negative overall. The general feeling is that the paper reads like a set of subjective observations about the meaningfulness of explanation and relationship with data manifold + tangential theory. There isn’t a coherent story.,ICLR2022, +XL4BWh5z89R,1642700000000.0,1642700000000.0,1,wwDg3bbYBIq,wwDg3bbYBIq,Paper Decision,Accept (Poster),The paper presents a neural architecture based on neural memory modules to model the spatiotemporal traffic data. The reviewers think this is an important application of deep learning and thus fits the topic of ICLR. The writing and the novelty of the proposed method need improvement.,ICLR2022, +hX-TI2zrBjt,1642700000000.0,1642700000000.0,1,lY0-7bj0Vfz,lY0-7bj0Vfz,Paper Decision,Accept (Poster),"This paper uses prototype memories for learning generative models. Inspired by the finding that there is sparse activity and complex selectivity in the supragranular layers of every cortical region, even primary visual cortex, the authors propose to use prototype memories at each level of the hierarchy, which marks their work as novel. They show superior performance in few shot image generation tasks. + +The reviewers' scores were borderline (5,5,8), making this a case that required some AC consideration. The reviewers generally agreed that the paper was relevant and interesting, though the two more negative reviewers had some concerns about (1) the tests used, (2) the interpretation relative to neuroscience data, and (3) the novelty. After reading through the paper, the reviews, and the rebuttal's, the AC felt that the authors had made a decent attempt at addressing items (1) and (2), and item (3) was ultimately a subjective question. The authors were reasonably clear about what marks their work as novel, and it is certainly not *exactly* the same as previous work. Altogether, given these considerations, the AC felt that this paper deserved to be accepted, given the reasonable attempts from the authors to respond to the reviewers' concerns and an average score above acceptance threshold (though the scores did not change post-rebuttal, it should be noted).",ICLR2022, +vzvObgEQ7V,1576800000000.0,1576800000000.0,1,rklr9kHFDB,rklr9kHFDB,Paper Decision,Accept (Talk),This paper is enthusiastically supported by all three reviewers. Thus an accept is recommended.,ICLR2020, +wMXTh4unAH,1576800000000.0,1576800000000.0,1,S1g6xeSKDS,S1g6xeSKDS,Paper Decision,Accept (Poster),"This paper studies generalizations of Variational Autoencoders to Non-Euclidean domains, modeled as products of constant curvature Riemannian manifolds. The framework allows to simultaneously learn the latent representations as well as the curvature of the latent domain. + +Reviewers were unanimous at highlighting the significance of this work at developing non-Euclidean tools for generative modeling. Despite the somewhat preliminary nature of the empirical evaluation, there was consensus that the paper puts forward interesting tools that might spark future research in this direction. Given those positive assessments, the AC recommends acceptance. +",ICLR2020, +vnwXnzlYok4t,1642700000000.0,1642700000000.0,1,_XNtisL32jv,_XNtisL32jv,Paper Decision,Accept (Poster),"The paper proposes a new loss function for the training of spiking neural networks leading to significant improvements in generalization performance across a variety of datasets and network architectures. While conceptually simple, the approach leads to substantial performance gains, and some intuition is provided to explain its success. + +The reviewers are split on the issue of significance of the paper, in part due to the simplicity of the proposed loss function. Still, good results speak for themselves, and the effectiveness of the technique has been demonstrated thoroughly.",ICLR2022, +By7GmkTSM,1517250000000.0,1517260000000.0,86,rypT3fb0b,rypT3fb0b,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The paper proposes to regularize via a family of structured sparsity norms on the weights of a deep network. A proximal algorithm is employed for optimization, and results are shown on synthetic data, MNIST, and CIFAR10. + +Pros: the regularization scheme is reasonably general, the optimization is principled, the presentation is reasonable, and all three reviewers recommend acceptance. + +Cons: the regularization is conceptually not terribly different from other kinds of regularization proposed in the literature. The experiments are limited to quite simple data sets.",ICLR2018, +9KlaxlvdVG,1642700000000.0,1642700000000.0,1,RDlLMjLJXdq,RDlLMjLJXdq,Paper Decision,Accept (Poster),"This paper proposes two new sets of conditions under which we can identify temporally causal latent processes. In this sense, this work makes valuable contributions to the theories of identifiability in this topic. The authors also propose LEAP, extending the VAE, to estimate temporally causal latent processes. + +The reviewers had many constructive comments, and the authors strived to address them. In the end, the reviewers were satisfied with the final version of the paper. + +Given that the theoretical identifiability theorems are major parts of the paper, I encourage the authors to elaborate more on the two sets of assumptions. They should discuss when these assumptions will hold and provide examples in which they will be violated.",ICLR2022, +_UYc9KyXZ,1576800000000.0,1576800000000.0,1,S1eZOeBKDS,S1eZOeBKDS,Paper Decision,Reject,"The paper presents a model for learning spiking representations. The basic model is a a deep autoencoder trained end-to-end with a biophysical generative model and results are presented on EMG and sEMG data, with the aim to motivate further research in self-supervised learning. + +The reviewers raised several points about the paper. Reviewer 1 raised concerns about lack of context on surrounding work, clarity of the model itself and motivating the loss. Reviewer 2 pointed out strengths of the paper in its simplicity and the importance of this problem, but also raised concerns about the papers clarity, again motivations on the loss function and sensibility of design choices. The authors responded to the feedback from reviewer 1, but overall the reviewer did not think their scores should be changed. + +The paper in its current form is not yet ready for acceptance, and we hope there has been useful feedback from the reviewing process for their future research.",ICLR2020, +Osllv1e61ex,1642700000000.0,1642700000000.0,1,2NqIV8dzR7N,2NqIV8dzR7N,Paper Decision,Reject,"In this paper, the stopping condition of Bayesian Optimization (BO) is discussed. This problem is very important when BO is applied to the Hyper-parameter optimization (HPO) task. All the reviewers agree that the proposed approach based on high-probability confidence bound on the regret is interesting and reasonable. An important issue raised by a reviewer is that many existing BO works discussed how to achieve efficiency and saving budget in BO although they did not explicitly mention the stopping condition. Due to the lack of discussion regarding the relationship with these highly related studies, we have to conclude that the paper cannot be accepted in its current form.",ICLR2022, +r1lckxsNgE,1545020000000.0,1545350000000.0,1,HyeFAsRctQ,HyeFAsRctQ,Interesting contribution to understanding NNs,Accept (Poster),"This paper proposes verification algorithms for a class of convex-relaxable specifications to evaluate the robustness of neural networks under adversarial examples. + +The reviewers were unanimous in their vote to accept the paper. Note: the remaining score of 5 belongs to a reviewer who agreed to acceptance in the discussion.",ICLR2019,5: The area chair is absolutely certain +x9olEt-65Pj,1642700000000.0,1642700000000.0,1,a34GrNaYEcS,a34GrNaYEcS,Paper Decision,Accept (Poster),"The paper builds upon parametric distributionally robust optimization (PDRO) and proposes ratio PDRO (R-PDRO) where the ratio of the worst case distribution and training distribution is parameterized by a discriminative network. This has a benefit over PDRO which needs to do generative modeling of worst case distribution. The paper empirically demonstrates R-PDRO improves over existing methods on group robustness problems. Reviewer are overall positive about the paper, and have appreciated the significance of the problem, writing clarity, and thorough empirical evaluation. There were some minor questions which have been adequately addressed by the authors.",ICLR2022, +ryljBAtllV,1544750000000.0,1545350000000.0,1,rJeQYjRqYX,rJeQYjRqYX,meta-review,Reject,"The paper presents an approach to estimate the ""effective path"" of examples +in a network to reach a decision, and consider this to analyze if examples +might be adversarial. Reviewers think the paper lacks some clarity and +experiments. They point to a confusion between interpretability and adversarial +attacks, they ask questions about computational complexity, and point to some +unsubstanciated claims. Authors have not responded to reviewers. Overall, I +concur with the reviewers to reject the paper.",ICLR2019,4: The area chair is confident but not absolutely certain +IyIJZ08Qol,1576800000000.0,1576800000000.0,1,SkxBUpEKwH,SkxBUpEKwH,Paper Decision,Accept (Poster),"This paper proposes to extract a character from a video, manually control the character, and render into the background in real time. The rendered video can have arbitrary background and capture both the dynamics and appearance of the person. All three reviewers praises the visual quality of the synthesized video and the paper is well written with extensive details. Some concerns are raised. For example, despite an excellent engineering effort, there is few things the reader would scientifically learn from this paper. Additional ablation study on each component would also help the better understanding of the approach. Given the level of efforts, the quality of the results and the reviewers’ comments, the ACs recommend acceptance as a poster.",ICLR2020, +4T_7jb1BIA,1610040000000.0,1610470000000.0,1,OcTUl1kc_00,OcTUl1kc_00,Final Decision,Reject,"R4 of this submission was slightly positive on this submission while all other reviewers expressed quite significant concerns in their reviews. R4 also agreed that the originality and experimental results as presented in this submission are not sufficient during discussion, although he/she pointed out the incorporation of long-range structural information is novel. Given the above recommendations and discussions, a reject is recommended. + +",ICLR2021, +BJE2QJ6Hf,1517250000000.0,1517260000000.0,219,HJC2SzZCW,HJC2SzZCW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Reviewers always find problems in papers like this. + +AnonReviewer1 would have preferred to have seen a study of traditional architectures, rather than fully connected ones, which are now less frequently used. They thought the paper was too long, the figures too cluttered, and were not convinced by the discussion around linear v. elliptical trajectories. + +I appreciate the need for a parametrizable architecture, although it may not be justified to translate these insights to other architectures, and then the fact that fully connected architectures are less common undermines the impact of the work. I don't find the length a problem, and I don't find the figures a problem. + +After the back and forth, AnonReviewer3 believes that there are data compatibility issues associated with the studied transformations and that non-linear transformations would have been more informative. I find the reviewers response to be convincing. + +AnonReviewer2 is strongly in favor of acceptance, finding the work exhaustive, interesting, and of high quality. I'm inclined to agree. + +",ICLR2018, +Eh3etdI-Y5E,1610040000000.0,1610470000000.0,1,hcCao_UYd6O,hcCao_UYd6O,Final Decision,Reject,"This paper proposes Adversarial Feature Desensitization (AFD) as a defense against adversarial examples. Specifically, following the spirit of GAN and Adversarial Domain Adaptation, an adversarial discriminator is introduced to distinguish clean and perturbed inputs at the representational level. + +This paper receives 3 reject and 1 accept recommendations. On one hand, though the proposed method shares some similarity with the Feature Scattering method at a high level, most of the reviewers still find the proposed method is interesting. The AC also agrees that the paper's organization and typos does not warrant a rejection. + +On the other hand, the reviewers have also raised a few concerns. (i) A more careful discussion on the scalability of the proposed method is needed. (ii) Experiments are mostly focused on small datasets, while results on ImageNet is lacking, which makes the paper less convincing. The authors claim that they are trying to at least run Tiny-ImageNet experiments; however, this set of results are not provided by the end. (iii) A more detailed analysis and visualization on the learned difference between the distributions of benign and adversary representation is needed, since a discriminator is learned here. + +The rebuttal unfortunately did not fully address the reviewers' main concerns. On balance, the AC regrets that the paper cannot be recommended for acceptance at this time. The authors are encouraged to consider the reviewers' comments when revising the paper for submission elsewhere. ",ICLR2021, +rkSynGL_e,1486400000000.0,1486400000000.0,1,Hy8X3aKee,Hy8X3aKee,ICLR committee final decision,Reject,"The main strengths and weaknesses pointed out by the reviewers were: + + Strengths + -Domain is interesting, problem is important (R2, R1) + -Discretization of continuous domain may enable leveraging of advanced tools developed for discrete domains, e.g. NLP (R1) + + Weaknesses + -Issues with experiments: models under-capacity, omission of obvious baselines (R1, R3) + -Unclear conclusion: is quantization and embedding superior to working with the raw data? (R1) + -Fair amount of relevant work omitted (R1, R3) + + While the authors engaged in the pre-review discussion, they did not respond to the official reviews. Therefore I have decided to align with the reviewers who are in consensus that the paper does not meet the acceptance bar.",ICLR2017, +YRjw_z79vfh,1610040000000.0,1610470000000.0,1,7Yhok3vJpU,7Yhok3vJpU,Final Decision,Reject,"This submission got 1 reject and 3 marginally below the threshold. The concerns in the original reviews include (1) lack of theoretical justification. The motivation and claim are from empirical observation; (2) the performance improvement is minor compared with the existing methods; (3) some experiment settings and details are not explained clearly. Though the authors provide some additional experiments to the questions about the experiments, reviewers still keep their ratings. The rebuttal did not address their questions. AC has read the paper and all the reviews/discussions. AC has the same recommendation as the reviewers. The major concerns are (1) the theoretical justification is not clear. The additional explanation given by the authors in their rebuttal, i.e., the prediction becomes sharper and thus the model generalization ability can be improved, is not justified. (2) the experiments are not very convincing and can be further improved in the following two aspects: (1) the motivation experiments should be conducted in a consistent manner, instead of using simplified EL in some cases; (2) the effectiveness of EL should be more significant otherwise it is not clear whether the claim is true or not. At the current status of this submission, AC cannot recommend acceptance for the submission.",ICLR2021, +BJLq8yaSz,1517250000000.0,1517260000000.0,843,BJB7fkWR-,BJB7fkWR-,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers have found that while the task of visual domain adaptation is meaningful to explore and improve, the proposed method is not sufficiently well-motivated, explained or empirically tested. ",ICLR2018, +By7YSyTrG,1517250000000.0,1517260000000.0,609,Hk2MHt-3-,Hk2MHt-3-,ICLR 2018 Conference Acceptance Decision,Reject,"The paper studies end-to-end training of a multi-branch convolutional network. This appears to lead to strong accuracies on the CIFAR and SVHN datasets, but it remains unclear whether or not this results transfers to ImageNet. The proposed approach is hardly novel, and lacks a systematic comparison with ""regular"" ensembling methods and with related mixture-of-experts approaches (for instance: S. Gross et al. Hard Mixtures of Experts for Large Scale Weakly Supervised Vision, 2017; Shazeer et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, 2017).",ICLR2018, +NLETEQwD6OQ,1610040000000.0,1610470000000.0,1,T1XmO8ScKim,T1XmO8ScKim,Final Decision,Accept (Poster),"This is a fairly technical paper bridging deep learning with uncertainty propagation in computations (i.e. probabilistic numerics). It is well structured, but it could benefit from further improvements in readability given that there are only very few researchers that are experts in all sub-domains associated with this work. Given the above, as well as low overall confidence by the reviewers, I attempted a more thorough reading of the paper (even if not an expert myself), and I was also happy to see that the discussion clarified important points. Overall, the idea is novel, convincing and seems well executed, with good results. The technical advancements needed to make the idea work are fairly complicated and are appreciated as contributions, because they are expected to be useful in other applications too (beyond irregular sampled data) where uncertainty propagation matters.",ICLR2021, +b6pw1JqQCU4,1642700000000.0,1642700000000.0,1,nZOUYEN6Wvy,nZOUYEN6Wvy,Paper Decision,Accept (Poster),"The AC and reviewers all agree that the paper proposes a very interesting framework to extend Granger Causality to DAG structured dynamical systems with important applications. + +The submission was the object of extensive discussion, and the AC and reviewers all agree that the author feedback satisfactorily addresses the vast majority of their concerns. We strongly urge the authors to incorporate all the points and revisions mentioned in their feedback. + +We certainly hope that the author will pursue this line of work and consider scaling their approach to tackle larger applications such as those related to social networks.",ICLR2022, +MKn75xxkem,1576800000000.0,1576800000000.0,1,BJg15lrKvS,BJg15lrKvS,Paper Decision,Reject,"The authors propose to understand spectral bias during training of neural networks from the perspective of the NTK. While reviewers appreciated aspects of the work, the general consensus was that the current version is not ready for publication; some concerns stem from whether the the NTK model and finite neural networks are sufficiently similar that we should be able to gain real practical insights into the behaviour of finite models. This is partly an empirical question, and stronger experiments are required to have a better sense of the answer. Nonetheless, the authors are encouraged to persist with this work, taking into account reviewer comments in future revisions. + +",ICLR2020, +nD75ftYq1bg,1610040000000.0,1610470000000.0,1,A5VV3UyIQz,A5VV3UyIQz,Final Decision,Accept (Poster),"The paper touches upon explainable anomaly detection. To that extend, it modified hypersphere classifier towards fully convolutional data description (FCDD). This is, as also pointed out by two of the reviewers a direct application of a fully convolutional network within the hyperspherical classifier. However, the paper also shows how to then upsample the receptive field using a strided transposed convolution with a fixed Gaussian kernel. Both together with tackling explainable anomaly detection is important. Moreover, the empirical evaluation is quite exhaustive and shows several benefits compared to state-of-the-art. So, yes, incremental, but incremental for a very interesting an important case. ",ICLR2021, +KO_Iv7zfaH,1576800000000.0,1576800000000.0,1,B1lTqgSFDH,B1lTqgSFDH,Paper Decision,Reject,"The reviewers initially gave scores of 1,1,3 citing primarily weak empirical results and a lack of theoretical justification. The experiments are presented on synthetic examples, which is a great start but the reviewers found that this doesn't give strong enough evidence that the methods developed in the paper would work well in practice. The authors did not submit an author response to the reviewers and as such the scores did not change during discussion. This paper would be significantly strengthened with the addition of experiments on actual problems e.g. related to drug discovery which is the motivation in the paper.",ICLR2020, +rJMiU16HM,1517250000000.0,1517260000000.0,853,Bk6qQGWRb,Bk6qQGWRb,ICLR 2018 Conference Acceptance Decision,Reject,"This work develops a methodology for exploration in deep Q-learning through Thompson sampling to learn to play Atari games. The major innovation is to perform a Bayesian linear regression on the last layer of the deep neural network mapping from frames to Q-values. This Bayesian linear regression allows for efficiently drawing (approximate) samples from the network. A careful methodology is presented that achieves impressive results on a subset of Atari games. + +The initial reviews all indicated that the results were impressive but questioned the rigor of the empirical analysis and the implementation of the baselines. The authors have since improved the baselines and demonstrated impressive results across more games but questions over the empirical analysis remain (by AnonReviewer3 for instance) and the results still span only a small subset of the Atari suite. The reviewers took issue with the treatment of related work, placing the contributions of this paper in relation to previous literature. + +In general, this paper shows tremendous promise, but is just below borderline. It is very close to a strong and impressive paper, but requires more careful empirical work and a better treatment of related work. Hopefully the reviews and the discussion process will help make the paper much stronger for a future submission. + +Pros: +- Very impressive results on a subset of Atari games +- A simple and elegant solution to achieving approximate samples from the Q-network +- The paper is well written and the methodology is clearly explained + +Cons: +- Questions remain about the rigor of the empirical analysis (comparison to baselines) +- Requires more thoughtful comparison in the manuscript to related literature +- The theoretical justification for the proposed methods is not strong",ICLR2018, +HkeuUJTBf,1517250000000.0,1517260000000.0,811,HJcjQTJ0W,HJcjQTJ0W,ICLR 2018 Conference Acceptance Decision,Reject,"Reviews are marginal. +I concur with the two less-favorable reviews that the metrics for privacy protection are not sufficiently strong for preserving privacy. ",ICLR2018, +HkjWQk6BM,1517250000000.0,1517260000000.0,79,S1sqHMZCb,S1sqHMZCb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"An interesting application of graph neural networks to robotics. The body of a robot is represented as a graph, and the agent’s policy is defined using a graph neural network (GNNs/GCNs) over the graph structure. + +The GNN-based policy network perform on par with best methods on traditional benchmarks, but shown to be very effective for transfer scenarios: changing robot size or disabling its components. I believe that the reviewers' concern that the original experiments focused solely on centepedes and snakes were (at least partially) addressed in the author response: they showed that their GNN-based model outperforms MLPs on a dataset of 2D walkers. + +Overall: +-- an interesting application +-- modeling robot morphology is an under-explored direction +-- the paper is well written +-- experiments are sufficiently convincing (esp. after addressing the concerns re diversity and robustness). + +",ICLR2018, +6ygjR6QOg,1576800000000.0,1576800000000.0,1,B1lnbRNtwr,B1lnbRNtwr,Paper Decision,Accept (Poster),"The paper investigates hybrid NN architectures to represent programs, involving both local (RNN, Transformer) and global (Gated Graph NN) structures, with the goal of exploiting the program structure while permitting the fast flow of information through the whole program. + +The proof of concept for the quality of the representation is the performance on the VarMisuse task (identifying where a variable was replaced by another one, and which variable was the correct one). Other criteria regard the computational cost of training and number of parameters. + +Varied architectures, involving fast and local transmission with and without attention mechanisms, are investigated, comparing full graphs and compressed (leaves-only) graphs. The lessons learned concern the trade-off between the architecture of the model, the computational time and the learning curve. It is suggested that the Transformer learns from scratch to connect the tokens as appropriate; and that interleaving RNN and GNN allows for more effective processing, with less message passes and less parameters with improved accuracy. + +A first issue raised by the reviewers concerns the computational time (ca 100 hours on P100 GPUs); the authors focus on the performance gain w.r.t. GGNN in terms of computational time (significant) and in terms of epochs. Another concern raised by the reviewers is the moderate originality of the proposed architecture. I strongly recommend that the authors make their architecture public; this is imo the best way to evidence the originality of the proposed solution. + +The authors did a good job in answering the other concerns, in particular concerning the computational time and the choice of the samples. I thus recommend acceptance. ",ICLR2020, +TAkwGdspNhQ,1642700000000.0,1642700000000.0,1,e2Lle5cij9D,e2Lle5cij9D,Paper Decision,Accept (Poster),"The authors provide a convexification for the GAN training via integral probability metrics induced by two-layer neural networks. The exposition relies on the convexification tools recently proposed by the Pilanci et al., and provides interesting insights to follow up in the future.",ICLR2022, +0MJxGt3Yv2,1576800000000.0,1576800000000.0,1,rkxoh24FPH,rkxoh24FPH,Paper Decision,Accept (Poster),"This paper exams the role of mutual information (MI) estimation in representation learning. Through experiments, they show that the large MI is not predictive of downstream performance, and the empirical success of methods like InfoMax may be more attributed to the inductive bias in the choice of architectures of discriminators, rather than accurate MI estimation. The work is well appreciated by the reviewers. It forms a strong contribution and may motivate subsequent works in the field. +",ICLR2020, +9zYwbCAcfz,1576800000000.0,1576800000000.0,1,HkxjqxBYDB,HkxjqxBYDB,Paper Decision,Accept (Poster),"The submission tackles the problem of data efficiency in RL by building a graph on top of the replay memory and propagate values based on this representation of states and transitions. The method is evaluated on Atari games and is shown to outperform other episodic RL methods. + +The reviews were mixed initially but have been brought up by the revisions to the paper and the authors' rebuttal. In particular, there was a concern about theoretical support and the authors added a proof of convergence. They have also added additional experiments and explanations. Given the positive reviews and discussion, the recommendation is to accept this paper.",ICLR2020, +gJz0dFnS2A,1610040000000.0,1610470000000.0,1,IFqrg1p5Bc,IFqrg1p5Bc,Final Decision,Accept (Poster),"This paper proposes constraints to be applied to the weights of a deep neural model during training. These constraints, motivated by an analysis of Rademacher complexity, are compared with other constraints and penalty approaches in transfer learning. The authors were able to build on the reviewers feedback to improve their paper on several points during the discussion phase, leading to a consensus for acceptance among reviewers. They also agreed to conduct experiments targeting stronger experimental results to compare all methods in the situation where they provide state-of-the-art results. This will make a useful contribution to the ICRL audience, and I recommend acceptance. +",ICLR2021, +AQ0KFjyV_5G,1642700000000.0,1642700000000.0,1,zXne1klXIQ,zXne1klXIQ,Paper Decision,Reject,"The manuscript focuses on model robustness under distribution shift, specifically domain shifts and subpopulation shifts. Domain shift is where the test domain and train domain are disjoint. Subpopulation shift is where test distribution has different mixture proportion than train distribution. The assumption is that domain identification spuriously correlates with labels. The proposed framework learns an invariant representation by using mixup strategies and interpolates samples either with the same labels but different domains or with the same domain but different labels to. Experiments are performed on a variety of domain shift and subpopulation shift benchmarks, and results showed that the proposed framework is better than empirical risk minimization (ERM) and alternative data augmentation methods. Theoretical analysis is also provided and it is shown that, under certain conditions, the proposed framework has asymptotically smaller worst case classification errors than ERM and vanilla mixup. + +Reviewers agreed on several positive aspects of the manuscript, including: +1. The manuscripts addresses a critical point that prevent models from generalization, namely spurious correlation; +2. The proposed method is simple and easy to implement, and the empirical results are within expectation. + +Reviewers also highlighted several major concerns, including: +1. Different recent approaches introduce methods that use some sort of mixup across domains in similar settings; +2. Ablation study on datasets without spurious correlations are missing; +3. Evaluation of domain invariance representations and prediction-level invariance needs clarifications; + +Authors clarified different motivations of the two selection strategies in relation to spurious correlation between domains and labels, and provided an ablation study on datasets with no spurious correlation. Post-rebuttal, reviewers stayed with borderline ratings, and they have suggested further improvements: improving results analysis and the conclusion that “existing domain information may not fully reflect the spurious correlation”, understanding the implication and the reasons that invariance is achieved at the prediction level instead of at the representation level despite the original goal is to learn an invariant representation, and improving presentation of the manuscript including settings and assumptions.",ICLR2022, +HJgD3ML_e,1486400000000.0,1486400000000.0,1,r1Bjj8qge,r1Bjj8qge,ICLR committee final decision,Reject,"Dear authors, in general the reviewers found that the paper was interesting and has potential but needs additional work in the presentation and experiments. Unfortunately, even if all reviews had been a weak accept (i.e. all 6s) it would not have met the very competitive standard for this year. + + A general concern among the reviewers was the presentation of the research, the paper and the experiments. Too much of the text was dedicated to the explanation of concepts which should be considered to be general knowledge to the ICLR audience (for example the justification for and description of generative models). That text could be replaced with further analysis and justification. + + The choice of baseline comparisons and benchmarks did not seem appropriate given the presented model and text. Specifically, it is difficult to determine how good of a generative model it is if the authors don't compare it to other generative models in terms of data likelihood under the model. Similarly, it's difficult to place it in the literature as a model for representation learning if it isn't compared to the state-of-the-art for RL on standard benchmarks. + + The clarifications of the authors and revisions to the manuscript are greatly appreciated. Hopefully this will help the authors to improve the manuscript and submit to another conference in the near future.",ICLR2017, +SylNfxHJxN,1544670000000.0,1545350000000.0,1,BkGiPoC5FX,BkGiPoC5FX,"Great explanation of prior work, but limited applicability and no insight or analysis",Reject,"This paper proposes a training algorithm for ConvNet architectures in which the final few layers are fully connected. The main idea is to use direct feedback alignment with carefully chosen binarized (±1) weights to train the fully connected layers and backpropagation to train the convolutional layers. The binarization reduces the memory footprint and computational cost of direct feedback alignment, while the careful selection of feedback weights improves convergence. Experiments on CIFAR-10, CIFAR-100, and an object tracking task are provided to show that the proposed algorithm outperforms backpropagation, especially when the amount of training data is small. The reviewers felt that the paper does a terrific job of introducing the various training algorithms --- backpropagation, feedback alignment, and direct feedback alignment --- and that the paper clearly explained what the novel contributions were. However, the reviewers felt the paper had limited novelty because it combines ideas that were already known, that it has limited applicability because it will not work with fully convolutional architectures, that the baselines in the experiments were somewhat weak, and that the paper provided no insights on why the proposed algorithm might be better than backpropagation in some cases. Regrettably, only one reviewer (R2) participated in the discussion, though this was the reviewer who provided the most constructive review. The AC read the revised paper, and agrees with R2's concerns about the limited applicability of the proposed algorithm and lack of insight or analysis explaining why the proposed training algorithm would improve over backpropagation.",ICLR2019,5: The area chair is absolutely certain +HJlpWINWgV,1544800000000.0,1545350000000.0,1,S1gDCiCqtQ,S1gDCiCqtQ,Meta-review,Reject,"Pros: +- good results on Montezuma + +Cons: +- moderate novelty +- questionable generalization +- lack of ablations and analysis +- lack of stronger baselines +- no rebuttal + + +The reviewers agree that the paper should be rejected in its current form, and the authors have not bothered revising it to take into account the detailed reviews.",ICLR2019,5: The area chair is absolutely certain +RQ3ExW8T5In,1610040000000.0,1610470000000.0,1,XvOH0v2hsph,XvOH0v2hsph,Final Decision,Reject,"The paper proposes to use the sum of training losses during training, or a variant where the sum of training losses begins to be computed after the first E epochs, to estimate the generalization performance of the corresponding network. Although the results seem promising for query-based NAS strategies, the reviewers agree that as the paper proposes something that is fundamentally opposite to the common practice, it requires more careful and thorough analysis. Besides, while the connection made by authors to the Bayesian marginal likelihood is interesting, it's not a rigorous argument that convinces the audience about the applicability of the proposed method. I strongly encourage the authors to add more analysis and discussion to the revised version to strengthen their claim and clarify its scope. + +",ICLR2021, +cbtsY4Plom,1576800000000.0,1576800000000.0,1,H1gx1CNKPH,H1gx1CNKPH,Paper Decision,Reject,"This paper augments transformer encoder-decoder networks architecture with k nearest neighbors to fetch knowledge or information related to the previous conversation, and demonstrates improvements through manual and automated evaluation. Reviewers note the fact that the approach is simple and clean and results in significant improvements, however, the approach is incremental over the previous work (including https://arxiv.org/pdf/1708.07863.pdf). Furthermore, although the authors improved the article in the light of reviewer suggestions (i.e., rushed analysis, not so clear descriptions) and some reviewers increased their scores, none of them actually marked the paper as an accept or a strong accept.",ICLR2020, +23W1iqagLq,1576800000000.0,1576800000000.0,1,rye5YaEtPr,rye5YaEtPr,Paper Decision,Accept (Poster),"The reviewers all appreciated the results. They expressed doubts regarding the discrepancy between the assumptions made and the reality of the loss of deep networks. + +I share these concerns with the reviewers but also believe that, due to the popularity of Adam, a careful analysis of a variant is worthy of publication.",ICLR2020, +GRIPTGvBfFM,1610040000000.0,1610470000000.0,1,GjqcL-v0J2A,GjqcL-v0J2A,Final Decision,Reject,"This paper introduces and analyses a method to train a population of VAEs with mixed continuous (referred to as ""style"") and discrete (referred to as ""labels"") latent-variables. The population is trained under the constraint that inferred discrete latent variables to be the same for all models. +The paper also investigates a data augmentation mechanism inspired by (Antoniou et al., 2017). +The presentation is overall clear and the idea is interesting, although the language of ""agents"" is not standard in generative model literature and is a bit confusing. The experiments also show very good clustering results of the proposed method. +Unfortunately the pipeline was determined to be quite complex while the motivation for its design choices were unclear. This, combined with multiple concerns about the experimental validation, led to a reject decision. +",ICLR2021, +N17NVxCAoeP,1610040000000.0,1610470000000.0,1,MPO4oML_JC,MPO4oML_JC,Final Decision,Reject,"The paper presents an approach to multi-agent coordination using goal-driven exploration on subspaces of the observation space. + +The results of the paper show that the authors' approach performs baselines on grid worlds and two tasks from the StartCraft Multi-agent Challenge. While the rebuttal clarified many points raised by the reviewers, there was an agreement that the paper should be more convincing regarding the applicability of the approach. The reviewers were concerned with the scalability of the approach to larger environment, as well as the amount of hand-crafting/domain knowledge required to apply the approach. Overall, while the paper contributes interesting results showing that such domain knowledge can help when properly leveraged, it feels like the approach needs be validated on more challenging environments before acceptance. +",ICLR2021, +PyfLO-MWu,1576800000000.0,1576800000000.0,1,SJxIm0VtwH,SJxIm0VtwH,Paper Decision,Accept (Poster),"This work proposes a new adaptive method for solving certain min-max problems. + +The reviewers all appreciated the work and most of their concerns were addressed in the rebuttal. Given the current interest in both adaptive methods and min-max problems, this work is suited for publication at ICLR.",ICLR2020, +KL0AP8h-fr,1576800000000.0,1576800000000.0,1,HklmoRVYvr,HklmoRVYvr,Paper Decision,Reject,"The paper proposes a new recurrent unit which incorporates long history states to learn longer range dependencies for improved video prediction. This history term corresponds to a linear combination of previous hidden states selected through a soft-attention mechanism and can be directly added to ConvLSTM equations that compute the IFO gates and the new state. The authors perform empirical validation on the challenging KTH and BAIR Push datasets and show that their architecture outperforms existing work in terms of SSIM, PSNR, and VIF. +The main issue raised by the reviewers is the incremental nature of the work and issues in the empirical evaluation which do not support the main claims in the paper. After the rebuttal and discussion phase the reviewers agree that these issues were not adequately resolved and the work doesn’t meet the acceptance bar. I will hence recommend the rejection of this paper. Nevertheless, we encourage the authors improve the manuscript by addressing the remaining issues in the empirical evaluation.",ICLR2020, +H1j5_z_qfo,1576800000000.0,1576800000000.0,1,BJlrZyrKDB,BJlrZyrKDB,Paper Decision,Reject,"This submission proposes a statistically consistent saliency estimation method for visual model explainability. + +Strengths: +-The method is novel, interesting, and passes some recently proposed sanity checks for these methods. + +Weaknesses: +-The evaluation was flawed in several aspects. +-The readability needed improvement. + +After the author feedback period remaining issues were: +-A discussion of two points is missing: (i) why are these models so sensitive to the resolution of the saliency map? How does the performance of LEG change with the resolution (e.g. does it degrade for higher resolution?)? (ii) Figure 6 suggests that SHAP performs best at identifying ""pixels that are crucial for the predictions"". However, the authors use Figure 7 to argue that LEG is better at identifying salient ""pixels that are more likely to be relevant for the prediction"". These two observations are contradictory and should be resolved. +-The evaluation is still missing some key details for interpreting the results. For example, how representative are the 3 images chosen in Figure 7? Also, in section 5.1 the authors don't describe how many images are included in their sanity check analysis or how those images were chosen. +-The new discussion section is not actually a discussion section but a conclusion/summary section. + +Because of these issues, AC believes that the work is theoretically interesting but has not been sufficiently validated experimentally and does not give the reader sufficient insight into how it works and how it compares to other methods. Note also that the submission is also now more than 9 pages long, which requires that it be held to a higher standard of acceptance. + +Reviewers largely agreed with the stated shortcomings but were divided on their significance. +AC shares the recommendation to reject.",ICLR2020, +tnU6lKllj0a,1642700000000.0,1642700000000.0,1,Yn4CPz_LRKO,Yn4CPz_LRKO,Paper Decision,Reject,"The paper proposes a conditional generative adversarial network with an auxiliary discriminative classifier for conditional generative modeling. The auxiliary discriminative classifier can provide the discrepancy between the joint distribution of the real data and labels and that of the generated data and labels to the generator by discriminatively predicting the label of the real and generated data. Experiment results are provided to demonstrate the effectiveness of the proposed idea. The current paper receives mixed ratings after rebuttal (5, 6, 5, 8). Except that one reviewer (the Reviewer uPwH) will champion the paper with a score of 8, the concerns of the other three reviewers remain. To be specific, even though Reviewer ebJs assigns a score of 6, he/she doesn’t champion the paper because additional experiments requested are not provided by the authors, including (i) training on more datasets or higher resolutions, (ii) visualizing feature norm and grad norm as done in ReACGAN, (iii) experiments on ADC-GAN without unconditional GAN loss. The Reviewer DPgR pointed out that the paper might have a novelty issue because it bears some similarities with other works but it lacks a discussion in the revision. Additionally, Reviewer mZT7 pointed out that the authors didn’t provide a revised paper during the rebuttal, thus leading to a difficulty to assess the quality of the final paper. As a result, AC thinks that the paper is not ready to publish at the current stage and recommends a rejection. The AC urges the authors to revise their paper according to the comments provided by the reviewers, and resubmit their work in a future venue.",ICLR2022, +3Vr8QfRC-9G,1610040000000.0,1610470000000.0,1,NlrFDOgRRH,NlrFDOgRRH,Final Decision,Reject,"After carefully reading the reviews and the rebuttal, and after going over the paper itself, I'm not sure the paper it ready for ICLR. I do believe there is a lot of useful content in the current manuscript, and I urge the authors to keep working on the manuscript and resubmit it in due time. + +My concerns are as follows: + (a) there is a lot of discussion about *relational information retrieval* -- however there is lack of any formalization of what this term means. I don't mind relational reasoning to be used as motivation, but when it is used to consider what are valid baselines and what are not, I feel compelled to understand what exactly it means. Why is *self-attention* retrieval not *relational*? Beside the task being seemingly relational in spirit, how do we test whether the retrieved mechanism carries any relational information whatsoever? I think the community had a learning lesson here in CLEVER dataset, which arguably does not require as much relational reasoning as it seemed. So I agree with Rev5, that there is a decent probability that the task we are using do not require relational information retrieval. While I understand that some of these systems are Transformer inspired, I feel transformer should be a baseline. + (b) I also feel the paper should take one of two paths. + - Either embrace larger scale tasks and baseline outside of the relational reasoning literature (like transformer) and particularly settings where potentially self attention will struggle due to the quadratic term or where they tend to be hard to train due to the difficulty of doing credit assignment through the attention mechanism + - Provide more careful ablation studies and formalize the claims a bit more. Regarding e.g. the discussion of a single larger memory vs multiple memory blocks. One of the main difference comes from the attention over which memory block to use in the proposed approach, which due to softmax has a unimodal behavior. So is the reason why it works better this potential hiding of part of the memory representation (so a better way of reading a subset of the memory entry). This could potentially be done differently (e.g. multiplicative interaction in the same style, for e.g. that they were used in WaveNet). This is just a random thought on this particular aspect. I have similar questions about the self-supervised loss. + + I find the paper focusing on improving performance (unfortunately on toy domains) rather than ablation studies and an understanding and careful understanding of how things works. I realize there is some such analysis in the appendix. But I feel more of it should be in the main text. The paper is either proposing something that scales and works well at scale (and then understanding why is less important as it has direct application) or explores a very specific phenomena and then is fine to stay on toy tasks but there should be a bit of clarity in the claims, and an investigation whether the hypothesis (or intuition) put forward initially is the reason why the model works. +",ICLR2021, +Fk9_bMcyWmy,1610040000000.0,1610470000000.0,1,Py4VjN6V2JX,Py4VjN6V2JX,Final Decision,Reject,"The paper received mixed reviews. While AnonReviewer1 and AnonReviewer2 liked the idea of jointly learning global-local representations, the other reviewers were concerned about the technical novelty. Reviewers also raised various questions about the experiments and ablation studies. AC found that the rebuttal well addressed the reviewers' questions about the experiments, but it failed to elaborate on the ""why"" of combining global and local self-supervised representations. AC agreed with AnonReviewer3 and AnonReviewer4's concerns on technical novelty. Considering the reviews, we regret that the paper cannot be recommended for acceptance at this time. The authors are encouraged to consider the reviewers' comments when revising the paper for submission elsewhere.",ICLR2021, +ryxnoNTCJE,1544640000000.0,1545350000000.0,1,BylBr3C9K7,BylBr3C9K7,Good paper on minimizing energy cost in neural networks.,Accept (Poster),"All of the reviewers agree that this is a well-written paper with the novel perspective of minimizing energy consumption in neural networks, as opposed to maximizing sparsity, which does not always correlate with energy cost. There are a number of promised clarifications and additional results that have emerged from the discussion that should be put into the final draft. Namely, describing the overhead of converting from sparse to dense representations, adding the Imagenet sparsity results, and adding the time taken to run the projection step.",ICLR2019,5: The area chair is absolutely certain +HyKjhGIux,1486400000000.0,1486400000000.0,1,HJF3iD9xe,HJF3iD9xe,ICLR committee final decision,Invite to Workshop Track,"This paper studies neural models that can be applied to set-structured inputs and thus require permutation invariance or equivariance. After a first section that introduces necessary and sufficient conditions for permutation invariance/equivariance, the authors present experiments in supervised and semi-supervised learning on point-cloud data as well as cosmology data. + + The reviewers agreed that this is a very promising line of work and acknowledged the effort of the authors to improve their paper after the initial discussion phase. However, they also agree that the work appears to be missing more convincing numerical experiments and insights on the choice of neural architectures in the class of permutation-covariant. + + In light of these reviews, the AC invites their work to the workshop track. + Also, I would like to emphasize an aspect of this work that I think should be addressed in the subsequent revision. + + As the authors rightfully show (thm 2.1), permutation equivariance puts very strong constraints in the class of 1-layer networks. This theorem, while rigorous, reflects a simple algebraic property of matrices that commute with permutation matrices. It is therefore not very surprising, and the resulting architecture relatively obvious. So much so that it already exists in the literature. In fact, it is a particular instance of the graph neural network model of Scarselli et al. '09 (http://ieeexplore.ieee.org/abstract/document/4700287/) when you consider a complete graph, which has been used in the setup of full set equivariance for example in 'Learning Multiagent communication with backpropagation', Sukhbaatar et al NIPS'16; see also 'Order Matters: sequence to sequence for sets', Vinyals et al. https://arxiv.org/abs/1511.06391. + The general question of how to model point-cloud data, or more generally data defined over graphs, with neural networks is progressing rapidly; see for example https://arxiv.org/abs/1611.08097 for a recent survey. + + The question then is what is the contribution of the present work relative to this line of work. The authors should answer this question explicitly in the revised manuscript, either with a new application of the model, or with theory that advances our understanding of these models, or with new numerical applications.",ICLR2017, +q2raPE_2e8e,1610040000000.0,1610470000000.0,1,6xHJ37MVxxp,6xHJ37MVxxp,Final Decision,Accept (Poster),"All three reviewers recommend acceptance after the rebuttal stage, and the AC found no reason to disagree with them. The proposed method is simple and effective, and the concerns raised about experimental validation and novelty seem well addressed in the rebuttal. ",ICLR2021, +po9bOwnJ9jn,1610040000000.0,1610470000000.0,1,SVsLxTfHa1,SVsLxTfHa1,Final Decision,Reject,"This paper investigates how to align word senses across languages. This has not been studied much as past work has primarily considered aligning word (embeddings) across languages. The paper is well written and well motivated. Unfortunately the empirical results are not very strong. The baselines are somewhat low and the gains are modest (the excuse that it is difficult to train BERT-sized models in academia is acknowledged). Overall, there is not enough support for acceptance at such a competitive venue as ICLR. +",ICLR2021, +rJg9H3BvgV,1545190000000.0,1545350000000.0,1,Bylnx209YX,Bylnx209YX,A novel meta-learning based approach for testing robustness of grap neural nets,Accept (Poster)," The paper proposes an method for investigating robustness of graph neural nets for node classification problem; training-time attacks for perturbing graph structure are generated using meta-learning approach. Reviewers agree that the contribution is novel and empirical results support the validity of the approach. + ",ICLR2019,4: The area chair is confident but not absolutely certain +SkNg8J6BM,1517250000000.0,1517260000000.0,708,SybqeKgA-,SybqeKgA-,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers generally thought the proposed algorithm was a straightforward extension of Yin et al., 2017, and not enough for a new paper. They also objected to a lack of test results (to show generalization), but the authors did provide these in their revision. + +Pros: ++ Adaptive batch sizing is useful, especially if the larger batches license parallelization. + +Cons: +- Small, incremental change to the algorithm from Yin et al., 2017 +- Test performance did not improve over well-tuned momentum optimization, which limits the appeal of the method. +",ICLR2018, +Sk5B4kprf,1517250000000.0,1517260000000.0,346,Hy1d-ebAb,Hy1d-ebAb,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"Predicting graphs is an interesting and important direction, and there exist essentially no (effective) general-purpose techniques for this problem. The idea of predicting nodes one by one, though not entirely surprising, is interesting and the approach makes sense. Unfortunately, I (and some of reviewers) less convinced by evaluation: + +- For example, evaluation on syntactic parsing of natural language is very weak. First of all, the used metric -- perplexity and exact match are non-standard and problematic (e.g., optimizing exact match would largely correspond to ignoring longer sentences where predicting the entire tree is unrealistic). Also the exact match scores are very low (~30% whereas 45+ were achieve by models back in 2010). + +- A reviewer had, I believe, valid concerns about comparison with GrammarVAE, which were not fully addressed. + +Overall, I believe that it is interesting work, which regretfully cannot be published as a conference paper in its current form. + ++ important / under-explored problem ++ a reasonable (though maybe not entirely surprising / original) approach +- issues with evaluation + +",ICLR2018, +T0zXbWdQLZ,1642700000000.0,1642700000000.0,1,f9D-5WNG4Nv,f9D-5WNG4Nv,Paper Decision,Accept (Poster),"The authors propose three strategies for coreset selection in the context of continual learning. In particular, the authors consider class-imbalance and noisy scenarios. The authors run extensive benchmarks and ablation showing that the approach can be effective in practice. All reviewers were positive about this work, but found that the methodological contributions were relatively modest. The clarifications provided by the authors were highly appreciated. I would encourage the authors to revise the paper to incorporate these additional details as there were a number of concepts that reviewers found were not sufficiently documented/explained and lacked clarity. I would also highly encourage the authors to explain their use of ""online continual learning"" as this reads like a tautology. + +Finally, I would like to ask the authors to reflect on their insistance with the reviewers; while we would all want engaging and long discussions about our work, the reality is that reviewing papers and discussing them is time consuming and taxing, especially in the middle of continued pandemic. The authors should be grateful of the time reviewers have spent reading their work and providing feedback, and it is not in the authors' interest to ask for a revision of the scores.",ICLR2022, +VbyjgPgNR1C,1642700000000.0,1642700000000.0,1,Eot1M5o2Zy,Eot1M5o2Zy,Paper Decision,Reject,"The paper proposes a new neural network, the aestheticNet, for a bias-free facial beauty prediction. +All the reviewers agree that the work is not suitable for publication as it raised some serious ethic concerns: +* Prediction of beauty (aesthetic scores) is a potential harmful application. Well-intended as it may be, a research along these lines might be harmful. +* non-anonoymity issue: writing reveals/implies authors identity with reference to previous work +* Research integrity issues (e.g., plagiarism, dual submission), a figure is copied from previous work. + +There is also a concern that the work is not novel and not interesting as such. +The authors did not respond to the concerns. + +I suggest rejection.",ICLR2022, +BTM5e8k0Pe,1610040000000.0,1610470000000.0,1,QFYnKlBJYR,QFYnKlBJYR,Final Decision,Accept (Poster),"This paper considers the RL problems where actions and observations may be delayed randomly. The proposed solution is based on generating on-policy sub-trajectories from off-policy samples. The benefits of this approach over standard RL algorithms is clearly demonstrated on MuJoCO problems. The paper also provides theoretical guarantees. +This paper is well-written overall and technically strong. The majority of the reviewers find that this paper would constitute a valuable contribution to the ICLR program. ",ICLR2021, +0H0lt9rD_0,1576800000000.0,1576800000000.0,1,SklOUpEYvB,SklOUpEYvB,Paper Decision,Accept (Poster),"Main content: + +Blind review #1 summarizes it well: + +This paper is about learning an identifiable generative model, iFlow, that builds upon a recent result on nonlinear ICA. The key idea is providing side information to identify the latent representation, i.e., essentially a prior conditioned on extra information such as labels and restricting the mapping to flows for being able to compute the likelihood. As the loglikelihood of a flow model is readily available, a direct approach can be used for learning that optimizes both the prior and the observation model. + +-- + +Discussion: + +Reviewer questions were mostly about clarification, which the authors addressed during the rebuttal period. + +-- + +Recommendation and justification: + +All reviewers agree the paper is a weak accept based on degree of depth, novelty, and impact.",ICLR2020, +S1e10J2xl4,1544760000000.0,1545350000000.0,1,Hy4R2oRqKQ,Hy4R2oRqKQ,Metareview,Reject,"This manuscript proposes an implicit generative modeling approach for the non-linear CCA problem. One contribution is the proposal of Conditional Mutual Information (CMI) as a criterion to capture nonlinear dependency, resulting in an objective that can be solved using implicit distributions. The work seems to be well motivated and of interest to the community. + +The reviewers and AC opinions were mixed, and the rebuttal did not completely address the concerns. In particular, a reviewer pointed out an issue with a derivation in the paper, and the issue was not satisfactorily resolved by the authors. Some additional reading suggests that the misunderstanding may be partially due to incomplete notation and other issues with clarity of writing.",ICLR2019,4: The area chair is confident but not absolutely certain +2o2qXJ7fDce,1610040000000.0,1610470000000.0,1,uVnhiRaW3J,uVnhiRaW3J,Final Decision,Reject,"The reviewers appreciate the importance of enforcing safety in RL, and the technical directions considered in the paper related to incorporating cost in advantage estimation. However, they express several concerns about the formulation of the problem considered and the consistency of the approach, as well as the somewhat incremental contribution w.r.t. CPO. Three reviewers recommend rejection.",ICLR2021, +1v_JYk_sUlrq,1642700000000.0,1642700000000.0,1,wClmeg9u7G,wClmeg9u7G,Paper Decision,Reject,"In this paper, the authors consider two algorithms for solving (strongly) monotone variational inequalities with compressed communication guarantees, MASHA1 and MASHA2. MASHA1 is a variant of a recent algorithm proposed by Alacaoglu and Malitsky, while MASHA2 is a variant of MASHA1 that relies on contractive compressors (by contrast, MASHA1 only involves unbiased compressors). The authors then show that +- MASHA1 converges at a linear rate (in terms of distance to a solution squared), and at a $1/k$ rate when taking its ergodic averge (in terms of the standard VI gap function). +- MASHA2 converges at a linear rate (in terms of distance to a solution squared). + +Even though the paper's premise is interesting, the reviewers raised several concerns which were only partially addressed by the authors' rebuttal. One such concern is that the improvement over existing methods is a multiplicative factor of the order of $\mathcal{O}(\sqrt{1/q + 1/M})$ in terms of communication complexity (number of transmitted bits) for the RandK compressor, which was not deemed sufficiently substantive in a VI setting (relative to e.g., wall-clock time, which is not discussed). + +After the discussion with the reviewers during the rebuttal phase, the paper was not championed and it was decided to make a borderline ""reject"" recommendation. At the same time, I would strongly urge the authors to resubmit a properly revised version of their paper at the next opportunity (describing in more detail the innovations from the template method of Alacaoglu and Malitsky, as well as including a more comprehensive cost-benefit discussion of the stated improvements for the RandK/TopK compressors).",ICLR2022, +2syAgx1zX1l,1642700000000.0,1642700000000.0,1,eiwpbi3iwr,eiwpbi3iwr,Paper Decision,Reject,"This paper received 2 marginally below and 1 marginally above ratings. We discussed the paper with the reviewers and there was broad consensus that 1) the paper lacked clarity; 2) multiple modeling choices were debatable (e.g., ordering or embedding of neurons and convolution over neurons!!) and not sufficiently justified (and these choices will critically impact the conclusions drawn from the analysis); 3) we were not convinced by the relevance of the synthetic data to reflect a meaningful biological process; 4) we did not see any meaningful knowledge gained for biology from this whole analysis. My recommendation is thus to reject this paper.",ICLR2022, +jk4zX-LIT0,1610040000000.0,1610470000000.0,1,TaYhv-q1Xit,TaYhv-q1Xit,Final Decision,Accept (Poster),"The paper presents an analysis of the spectral impact of non-linearities in a neural network, using harmonic distortion analysis as a means to quantify the effect they have in the spectral domain, linking a blue-shift phenomenon to architectural choices. This is an interesting analysis, that could be strengthened by a more thorough exploration of how this analysis relates to other properties, such as generalization, as well as through the impact of the blueshift effect through the training process.",ICLR2021, +EGb5NlF92,1576800000000.0,1576800000000.0,1,SkezP1HYvS,SkezP1HYvS,Paper Decision,Reject,All three reviewers are consistently negative on this paper. Thus a reject is recommended.,ICLR2020, +YoU-IV7c5,1576800000000.0,1576800000000.0,1,HJgK0h4Ywr,HJgK0h4Ywr,Paper Decision,Accept (Poster),"This manuscript proposes and evaluates new metrics for measuring the quality of disentangled representations for both supervised and unsupervised settings. The contributions include conceptual definitions and empirical evaluation. + +In reviews and discussion, the reviewers and AC note missing or inadequate empirical evaluation with many available methods for learning disentangled representations. On the writing, reviewers mentioned that the conciseness of the manuscript could be improved. The reviewers also mentioned incomplete references and discussion of prior work, which should be improved.",ICLR2020, +dMJVR_MZxW9M,1642700000000.0,1642700000000.0,1,MAYipnUpHHD,MAYipnUpHHD,Paper Decision,Reject,"This work formulates the Adaptive Mesh Refinement (AMR) problem used in solving Finite Element Method (FEM) as an MDP, and suggests an RL-based solution for it. Most reviewers agree that this is a novel problem and the solution is promising. There are, however, several issues raised by our reviewers, who have expertise ranging from ML to computational methods to solve PDEs. Some of the concerns are: + +- As this is not a theoretical work, the burden of proof is on the empirical evaluations. Some reviewers found the experiments very small and not convincing enough. +- The paper does not compare with the state of the art AMR methods. +- The detail of how the problem is formulated as an MDP can be improved. + +Given that four out of five reviewers are on the negative side, unfortunately I cannot recommend acceptance of this paper in its current form. Nevertheless, I believe this is a promising application of RL. I'd like to encourage the authors to consider the reviews in order to improve their work, and resubmit it to another venue.",ICLR2022, +kmwhBwak52p,1642700000000.0,1642700000000.0,1,YDqIYJBQTQs,YDqIYJBQTQs,Paper Decision,Reject,"This paper studies the challenging problem of object-centric generation of visual scenes. While the paper has some novel ideas that make it interesting, its (quantitative and qualitative) comparison with existing methods is currently premature to allow drawing conclusions with sufficient evidence. + +Instead of claiming that existing models cannot do well for the more realistic datasets mentioned by reviewer dAqW, it would be more convincing to conduct a comprehensive experimental study by comparing the proposed method with existing methods on a range of datasets, from simple ones to more realistic ones. The synthetic Fishbowl dataset introduced in this paper can be one of them. + +Moreover, the clarity of the paper could be improved to make it appeal better to the readers. + +All three reviewers engaged actively in discussions (both including and not including the authors). Although one reviewer recommends 6 (weak accept), the reviewer also shares some of the concerns of the other reviewers. As it stands, the paper is not ready for acceptance. If the comments and suggestions are incorporated to revise the paper, it will have potential to be a good paper for future submission.",ICLR2022, +SJgWDhVBeE,1545060000000.0,1545350000000.0,1,HyGIdiRqtm,HyGIdiRqtm,"Important problem, solid contribution",Accept (Poster)," +The paper investigates mixed-integer linear programming methods for neural net robustness verification in presence of adversarial attckas. The paper addresses and important problem, is well-written, presents a novel approach and demonstrates empirical improvements; all reviewers agree that this is a solid contribution to the field.",ICLR2019,4: The area chair is confident but not absolutely certain +PHqFZVwDj_1,1610040000000.0,1610470000000.0,1,9WlOIHve8dU,9WlOIHve8dU,Final Decision,Reject,"The main problem as flagged by reviewers is the lack of formal evidence that the approach is a right one to carry out. Decision tree induction has early been the subject of formal studies in ML, whether in statistics (Friedman et al.) or ML (Kearns et al.). It is a bit sad that a new approach that relies on a much different standpoint on the problem and modelling of tree classification (Section 3, R2), with experimental results recognized by reviewers (R3, R4) is not accompanied by formal analyses on par with SOTA for related approaches (R3, R1). I would strongly suggest the authors fit in a few more Lemmata, either to follow up on specific problems (R1). The paper would tremendously benefit from extensive connections with the existing theory, be it from the generalization and overfitting standpoint (R2, remark #6) or the choice of the appropriate best contender using the boosting literature. Decision was taken not to accept the paper but I would very strongly encourage the authors to revise the draft. +",ICLR2021, +P3SptFT8uw,1576800000000.0,1576800000000.0,1,BJeguTEKDB,BJeguTEKDB,Paper Decision,Reject,"The paper proposes a new objective function called ICE for metric learning. + +There was a substantial discussion with the authors about this paper. The two reviewers most experienced in the field found the novelty compared to the vast existing literature lacking, and remained unconvinced after the discussion. Some reviewers also found the technical presentation and interpretations to need improvement, and this was partially addressed by a new revision. + +Based on this discussion, I recommend a rejection at this time, but encourage the authors to incorporate the feedback and in particular place the work in context more fully, and resubmit to another venue.",ICLR2020, +Sklrd8Drx4,1545070000000.0,1545350000000.0,1,ryxepo0cFX,ryxepo0cFX,Accept,Accept (Poster),"The paper presents a novel idea with a compelling experimental study. Good paper, accept.",ICLR2019,5: The area chair is absolutely certain +hNxs3V5f0Rd,1642700000000.0,1642700000000.0,1,uYLFoz1vlAC,uYLFoz1vlAC,Paper Decision,Accept (Oral),"All reviewers agreed this was a very strong submission: it was clearly written, was theoretically and experimentally interesting, and had excellent motivation. A clear accept. Authors: you've already indicated that you've updated the submission to respond to reviewer changes, if you could double check their comments for any recommendation you may have missed on accident that would be great! The paper will make a great contribution to the conference!!",ICLR2022, +BkxyHrNelE,1544730000000.0,1545350000000.0,1,ryeyti0qKX,ryeyti0qKX,Important topic but more work is required,Reject,"The paper considers an important problem of investigating the effects different statistical characteristics of representations (hidden unit activations) , such as sparsity, low correlation, etc, have on the neural network performance; while all reviewers agree that this is clearly a very important topic, there is also a consensus that perhaps the authors must strengthen and emphasize their contribution more clearly. + ",ICLR2019,4: The area chair is confident but not absolutely certain +yXPlEh04HBX,1610040000000.0,1610470000000.0,1,nlWgE3A-iS,nlWgE3A-iS,Final Decision,Reject,"This paper explores losses and other training details to produce a model-based agent for pixel-input continuous control problems. The authors present a rainbow-like approach that combines various separate innovations into a single system. They show an improvement over a previous baseline on this class of problem, and break down the contributions of the various components. + +Though the paper was seen as clearly written, fundamentally, the reviewers did not feel they gained insight through the presentation of the experiments. For example, one quirk brought up by multiple reviewers is that some combinations of methods show worse performance, but then adding yet another method makes things improved relative to baseline (the authors clarified that this was with the same hyperparameters). Reviewers found this a bit confusing and insufficiently explored (i.e. was this just hyperparameter tuning or does just the right selection tricks actually need to be combined). This confusion around method combinations is perhaps relatively minor by itself but indicative of how this paper did not build intuition for the reviewers. Moreover, none of the reviewers were impressed by the magnitude of improvement over the baseline dreamer agent. While it was acknowledged the the set of methods improved things, the reviewers felt that each innovation had already been independently validated as likely to improve sample efficiency, so the fact that they did so together was not especially insightful. + +I'd like to clarify for the authors that I believe this work was, in many respects, technically well executed. Ultimately, based on the reviews and my own assessment, I don't think the scope was sufficiently ambitious considering the competitiveness of this conference. While it is useful to occasionally produce summary works which pool a set of separate innovations, such papers must be insightful to readers, aggregate a sufficiently large number of innovations, and/or show striking performance gains. The final reviewer scores are 4, 5, 6, 4. + + +",ICLR2021, +Sb_vGTHCs1,1576800000000.0,1576800000000.0,1,H1lTQ1rFvS,H1lTQ1rFvS,Paper Decision,Reject,"This paper proposes a very interesting alternative to feed-forward network layers, based on Quaternion methods and Hamilton products, which has the benefit of reducing the number of parameters in the neural network (more than 50% smaller) without sacrificing performance. They conducted extensive experiments on language tasks (NMT and NLI, among others) using transformers and LSTMs. + +The paper appears to be clearly presented and have extensive results on a variety of tasks. However all reviewers pointed out that there is a lack of in-depth analysis and thus insight into why this approach works, as well as questions on the specific effects of regularization. These concerns were not addressed in the rebuttal period, instead leaving it to future work. My assessment is that, with further analysis, ablation studies, and comparison to alternative methods for reducing model size (quantization, etc), this paper has the potential to be quite impactful, and I look forward to future versions of this work. As it currently stands, however, I don’t believe it’s suitable for publication at ICLR. +",ICLR2020, +HJZgPJ6Sz,1517250000000.0,1517260000000.0,920,SJd0EAy0b,SJd0EAy0b,ICLR 2018 Conference Acceptance Decision,Reject,"This paper does not meet the acceptance bar this year, and thus I must recommend it for rejection.",ICLR2018, +uO5eDujn7dG,1642700000000.0,1642700000000.0,1,oLYTo-pL0Be,oLYTo-pL0Be,Paper Decision,Reject,"This work describes an interesting approach of using a reinforcement learning algorithm for federated learning. The paper is well organized and the use-case of performing federated learning while preserving patient privacy is also important. However, the paper has room for improvement. Important baselines used for client selection are missing and so the deep reinforcement learning approach is not well-motivated. Many important technical details are missing such as hyperparameters and distributions for MNIST and CIFAR. The approach is also lacking novelty, DRL has been used for neural scheduling before and the authors do not suggest improvements to that. Finally, the experiments showing robustness to backdoor attacks is unconvincing and can benefit from more analysis.",ICLR2022, +Byl535JLeN,1545100000000.0,1545350000000.0,1,H1ecDoR5Y7,H1ecDoR5Y7,Revise and resubmit,Reject,"All three reviewers expressed concerns about the assumptions made for the local stability analysis. The AC thus recommends ""revise and resubmit"".",ICLR2019,4: The area chair is confident but not absolutely certain +p7W3jI7jlmo,1642700000000.0,1642700000000.0,1,qqdXHUGec9h,qqdXHUGec9h,Paper Decision,Accept (Poster),"This paper considers the so-called partial-label learning problem and proposes a class activation map that is better at making accurate predictions than the model itself on selecting the true label from candidate labels. The authors investigate the approach in experimental results on four benchmark image datasets. + +The reviewers appreciated the simplicity of the approach and its effectiveness in practice. The reviewers raised questions how to apply the approach to another weakly supervised learning problem such as semi-supervised learning and whether the approach is an identification-based strategy. The reviewers also raised several questions asking for more details. + +The authors submitted responses to the reviewers' comments. After reading the response, updating the reviews, and discussion, the reviewers who took part in the discussion considered that their “questions have been well addressed” and that the “authors’ responses basically provided the answers to the questions”. + +The feedback provided was already fruitful and the final version should be already improved. + +Accept. Poster.",ICLR2022, +kZKaemIQbg,1610040000000.0,1610470000000.0,1,PH5PH9ZO_4,PH5PH9ZO_4,Final Decision,Accept (Poster),"This is a nice paper on generating adversarial programs. The approach is to carefully use program obfuscators. After discussion and improvements, reviewers were generally satisfied with the approach and evaluation. The problem domain was also found to be of interest.",ICLR2021, +S1pvX1pSf,1517250000000.0,1517260000000.0,164,S1jBcueAb,S1jBcueAb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Paper explore depth-wise separable convolutions for sequence to sequence models with convolutions encoders. +R1 and R3 liked the paper and the results. R3 thought the presentation of the convolutional space was nice, but the experiments were hurried. Other reviewers thought the paper as a whole had dense parts and need cleaning up, but the authors seem to have only done this partially. +From the reviewers comments, I'm giving this a borderline accept. I would have been feeling much more comfortable with the decision if the authors had incorporated the reviewers' suggestions more thoroughly..",ICLR2018, +dDZf10LQSfC,1642700000000.0,1642700000000.0,1,JHXjK94yH-y,JHXjK94yH-y,Paper Decision,Reject,"The idea of having two policies with opposing strategies, one aiming to maximize a notion of surprise whereas the other tries to minimize it, is an interesting one. However, even after the author rebuttal, all reviewers have lingering concerns about the evaluation protocol. In addition, there are remaining questions about the bonuses used; there are concerns that these only work for very specific domains. For these reasons, I'm recommending rejection. I encourage the authors to carefully read the concerns of the reviewers about evaluation and consider using a different evaluation protocol for a future version of this work.",ICLR2022, +H16FLJarG,1517250000000.0,1517260000000.0,834,r1kj4ACp-,r1kj4ACp-,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers are in agreement, that the paper is a big hard to follow and incorrect in places, including some claims not supported by experiments. ",ICLR2018, +qa9zi55xK,1576800000000.0,1576800000000.0,1,r1xZAkrFPr,r1xZAkrFPr,Paper Decision,Reject,"Paper https://arxiv.org/abs/1802.10026 (Garipov et. al, NeurIPS 2018) shows that one can find curves between two independently trained solutions along which the loss is relatively constant. The authors of this ICLR submission claim as a key contribution that they show the weights along the path correspond to different models that make different predictions (""Note that prior work on loss landscapes has focused on mode-connectivity and low-loss tunnels, but has not explicitly focused on how diverse the functions from different modes are, beyond an initial exploration in Fort & Jastrzebski (2019)""). Much of the disagreement between two of the reviewers and the authors is whether this point had already been shown in 1802.10026. + +It is in fact very clear that 1802.10026 shows that different points on the curve correspond to diverse functions. Figure 2 (right) of this paper shows the test error of an _ensemble_ of predictions made by the network for the parameters at one end of the curve, and the network described by \phi_\theta(t) at some point t along the curve: since the error goes down and changes significantly as t varies, the functions corresponding to different parameter settings along these curves must be diverse. This functional diversity is also made explicit multiple times in 1802.10026, which clearly says that this result shows that the curves contain meaningfully different representations. + +In response to R3, the authors incorrectly claim that ""Figure 2 in Garipov et al. only plots loss and accuracy, and does not measure function space similarity, between different initializations, or along the tunnel at all. Just by looking at accuracy and loss values, there is no way to infer how similar the predictions of the two functions are."" But Figure 2 (right) is actually showing the test error of an average of predictions of networks with parameters at different points along the curve, how it changes as one moves along the curve, and the improved accuracy of the ensemble over using one of the endpoints. If the functions associated with different parameters along the curve were the same, averaging their predictions would not help performance. + +Moreover, Figure 6 (bottom left, dashed lines) in the appendix of 1802.10026 shows the improvement in performance in ensembling points along the curve over ensembling independently trained networks. Section A6 (Appendix) also describes ensembling along the curve in some detail, with several quantitative results. There is no sense in ensembling models along the curve if they were the same model. + +These results unequivocally demonstrate that the points on the curve have functional diversity, and this connection is made explicit multiple times in 1802.10026 with the claim of meaningfully different representations: “This result also demonstrates that these curves do not exist only due to degenerate parametrizations of the network (such as rescaling on either side of a ReLU); instead, points along the curve correspond to meaningfully different representations of the data that can be ensembled for improved performance.” Additionally, other published work has built on this observation, such as 1907.07504 (UAI 2019), which performs Bayesian model averaging over the mode connecting subspace, relying on diversity of functions in this space; that work also visualizes the different functions arising in this space. + +It is incorrect to attribute these findings to Fort & Jastrzebski (2019) or the current submission. It is a positive contribution to build on prior work, but what is prior work and what is new should be accurately characterized, and currently is not, even after the discussion phase where multiple reviewers raised the same concern. Reviewers appreciated the broader investigation of diversity and its effect on ensembling, and the more detailed study regarding connecting curves. In addition to the concerns about inaccurate claims regarding prior work and novelty (which included aspects of the mode connectivity work but also other works), several reviewers also felt that the time-accuracy trade-offs of deep ensembles relative to standard approaches were not clearly presented, and comparisons were lacking. It would be simple and informative to do an experiment showing a runtime-accuracy trade-off curve for deep ensembles alongside FGE and various Bayesian deep learning methods and mc-dropout. It's also possible to use for example parallel MCMC chains to explore multiple quite different modes like deep ensembles but for Bayesian deep learning. For the paper to be accepted, it would need significant revisions, correcting the accuracy of claims, and providing such experiments.",ICLR2020, +Y2Nt5XFIBpl,1642700000000.0,1642700000000.0,1,UF5cHSBycOt,UF5cHSBycOt,Paper Decision,Reject,"This paper focuses on the extrapolation ability of graph neural networks and proposes a new pooling function based on vector norm. The proposed method can be applied to replace the commonly used pooling function like max/mean/sum, and is proved able for extrapolation in a simple example. + +Overall, all reviewers tend to reject this submission due to the following reasons +- The contribution to this paper is incremental. It builds on top of the well-known L-p norm pooing function and extends it to allow negative values of p and an additional learnable parameter q. +- However, this simple extension is not a well-behaved function for gradient-based optimization, which leads to unconvinced experiments, i.e., diverse performance compared with min/max. +- more recent baselines should be compared with and it would be better to see how GNP works on state-of-the-art model architectures on real-world applications",ICLR2022, +ziWdi8U6ER,1642700000000.0,1642700000000.0,1,gijKplIZ2Y-,gijKplIZ2Y-,Paper Decision,Reject,"Most of the reviewers thought this paper has issues where it could be improved. There was a range of concerns. Most importantly, several reviewers felt the novelty in the paper was unclear as well as the requirement for more details in the experimental evaluations.",ICLR2022, +2pU6UgZqyn5,1610040000000.0,1610470000000.0,1,Mf4ZSXMZP7,Mf4ZSXMZP7,Final Decision,Reject,"This paper received mixed reviews, 3 positives (7, 6, 6) and 2 negatives (4, 4). Due to the divergence of the reviews, I carefully read the paper and made my best efforts to understand the paper and the review comments. This paper proposes to learn a quantization network using a small calibration set given a network trained with the full precision. The combination of AdaQuant, integer programming, and batch-norm tuning makes sense although they do not have substantial novelty. The three components are reasonably tightly-coupled and comprise a complete algorithm. However, the sequential-AdaQuant distracts the main claim of this work significantly. This is probably added during the review process but looks ad-hoc to me. Sequential AdaQuant seems to be effective to improve accuracy, but cannot be applied before the bit allocation was set, which makes it require integer programming no more. Because of this issue, the overall presentation becomes confusing and the argument sometimes sounds unfair (please refer to the last posting by R5.). + +In addition, the presentation of this paper could be improved, especially for the details of the integer programming formulation. It is not clear how to define some variables mathematically. The discussion about the size of the calibration set together with the overfitting issue is lacking, and rigorous discussion and analysis would make the paper much stronger. The reviewers are not convinced of the novelty of this paper, and they rather believe that this is an engineering-oriented work. Considering this fact, the evaluation of this paper is not very comprehensive. The ablation study with respect to the size of the calibration set should be conducted more intensively. The experiment fails to show the benefit of mixed precision quantization effectively and it is limited to presenting the compression ratio in Figure 3. The authors used a small calibration set taken from the training dataset, which looks weird because they claim that the post-training quantization requires only a small ""unlabeled"" calibration set at the beginning of the abstract; it is more desirable to use arbitrary examples in the same domain. + +Despite the interesting aspects, I believe that this paper needs a focus and substantial improvement for publication, and, consequently, recommend rejection.",ICLR2021, +CjJQWzzPS,1576800000000.0,1576800000000.0,1,Skg3104FDS,Skg3104FDS,Paper Decision,Reject,"This paper has been assessed by three reviewers who scored it as 3/3/3, and they did not increase their scores after the rebuttal. The main criticism lies in novelty of the paper, lack of justification for MM^T formulation, speed compared to gradient descent (i.e. theoretical analysis plus timing). Other concerns point to overlaps with Baydin et al. 2015 and the question about the validity of Theorem 1. On balance, this paper requires further work and it cannot be accepted to ICLR2020.",ICLR2020, +DT8ETThmUux,1610040000000.0,1610470000000.0,1,rI3RMgDkZqJ,rI3RMgDkZqJ,Final Decision,Reject,"All reviewers appreciated the main result in the paper, which gives global optimality guarantee for constrained policy optimization for both tabular setting and NTK setting. However, there were a number of unclear parts of the paper reported by several reviewers (assumptions, hyperparameter tuning, complexity dependence on the number of neurons, experimental setups). On top of it, the AC also echoes with R1’s concern about the novelty of this work as it basically stacks existing results (TD by Dalal et al., Neural TD by Cai et al. (2019), NPG by Agarwal et al, CSA algorithm by Lan & Zhou). +These concerns made me reticent to recommend acceptance at this point. I strongly encourage the authors to continue their interesting work in considering the reviewer comments and strengthen the numerical experiments. +",ICLR2021, +5rY8JCUngbI,1642700000000.0,1642700000000.0,1,dHd6pU-8_fF,dHd6pU-8_fF,Paper Decision,Reject,"This paper presents an adaptive gradient method for neural net training inspired by L-BFGS. All of the reviewers recommend rejection. They raise concerns about the amount of novelty, the clarity of the writing, and the experimental comparisons. I encourage the authors to take the reviewers' comments into account and improve the submission for the next cycle.",ICLR2022, +HWOYPBiZI4,1576800000000.0,1576800000000.0,1,rJxcBpNKPr,rJxcBpNKPr,Paper Decision,Reject,"This paper is board-line but in the end below the standards for ICLR. Firstly this paper could use significant polishing. The text has significant grammar and style issues: incorrect words, phrases and tenses; incomplete sentences; entire sections of the paper containing only lists, etc. The paper is in need of significant editing. + +This of course is not enough to merit rejection, but there are concerns about the contribution of the new method, experiment details, and the topic of study. The results are reported from either a single run or unknown number of runs of the learning system, which is not acceptable even if the we suspect the variance is low. The proposed approach relies on pre-training a feature extractor which in many ways side-steps the forgetting/interference problem rather than what we really need: new algorithms that processes the training data in ways the mitigate interference by learning representations. In general the reviewers found it very difficult to access the fairness of the comparisons dues do differences between how different methods make use of stored data and pre-training. The reviewers highlighted the similarity between the propose approach and recent work in angle of generative modeling / out of distribution (OOD) detection which suggests that the proposed approach has limited utility (as detailed by R1) and that OOD baselines were missing. Finally, the CL problem formulation explored here, where task identifiers are available during training and data is i.i.d, is of limited utility. Its hard to imagine how approaches that learn individual networks for each task could scale to more realistic problem formulations. + +All reviewers agreed the paper's experiments were borderline and the paper has substantial issues. There are too many revisions to be done.",ICLR2020, +HJWO7kTBf,1517250000000.0,1517260000000.0,167,BJJLHbb0-,BJJLHbb0-,ICLR 2018 Conference Acceptance Decision,Accept (Poster)," + Empirically convincing and clearly explained application: a novel deep learning architecture and approach is shown to significantly outperform state-of-the-art in unsupervised anomaly detection. + - No clear theoretical foundation and justification is provided for the approach + - Connexion and differentiation from prior work on simulataneous learning representation and fitting a Gaussian mixture to it would deserve a much more thorough discussion / treatment. +",ICLR2018, +4DDFyn0A5m,1576800000000.0,1576800000000.0,1,Hyxfs1SYwH,Hyxfs1SYwH,Paper Decision,Reject,"Maintaining the privacy of membership information contained within the data used to train machine learning models is paramount across many application domains. Moreover, this risk can be more acute when the model is used to make predictions using out-of-sample data. This paper applies a causal learning framework to mitigate this problem, motivated by the fact that causal models can be invariant to the training distribution and therefore potentially more resistant to certain privacy attacks. Both theoretical and empirical results are provided in support of this application of causal modeling. + +Overall, during the rebuttal period there was no strong support for this paper, and one reviewer in particular mentioned lingering unresolved yet non-trivial concerns. For example, to avoid counter-examples raised the reviewer, a deterministic labeling function must be introduced, which trivializes the distribution p(Y|X) and leads to a problematic training and testing scenario from a practical standpoint. Similarly the theoretical treatment involving Markov blankets was deemed confusing and/or misleading even after careful inspection of all author response details. At the very least, this suggests that another round of review is required to clarify these issues before publication, and hence the decision to reject at this time.",ICLR2020, +Hy7TsfUug,1486400000000.0,1486400000000.0,1,rJEgeXFex,rJEgeXFex,ICLR committee final decision,Accept (Poster),"This paper applies RNNs to predict medications from billing costs. While this paper does not have technical novelty, it is well done and well organized. It demonstrates a creative use of recent models in a very important domain, and I think many people in our community are interested and inspired by well-done applications that branch to socially important domains. Moreover, I think an advantage to accepting it at ICLR is that it gives our ""expert"" stamp of approval -- I see a lot of questionable / badly applied / antiquated machine learning methods in domain conferences, so I think it would be helpful for those domains to have examples of application papers that are considered sound.",ICLR2017, +Jey-ZZZ7rV,1576800000000.0,1576800000000.0,1,HJlAUaVYvH,HJlAUaVYvH,Paper Decision,Reject,"The authors propose a novel method to estimate the Lipschitz constant of a neural network, and use this estimate to derive architectures that will have improved adversarial robustness. While the paper contains interesting ideas, the reviewers felt it was not ready for publication due to the following factors: + +1) The novelty and significance of the bound derived by the authors is unclear. In particular, the bound used is coarse and likely to be loose, and hence is not likely to be useful in general. + +2) The bound on adversarial risk seems of limited significance, since in practice, this can be estimated accurately based on the adversarial risk measured on the test set. + +3) The paper is poorly organized with several typos and is hard to read in its present form. + +The reviewers were in consensus and the authors did not respond during the rebuttal phase. + +Therefore, I recommend rejection. However, all the reviewers found interesting ideas in the paper. Hence, I encourage the authors to consider the reviewers' feedback and submit a revised version to a future venue.",ICLR2020, +BEJpmn_TRTJ,1610040000000.0,1610470000000.0,1,37Fh1MiR5Ze,37Fh1MiR5Ze,Final Decision,Reject,"The Authors study the learning dynamics of deep neural networks through the lenses of chaos theory. + +The key weakness of the paper boils down to a lack of clarity and precision. Chaos theory seems to be mostly used to computing eigenvalues but is not used to derive meaningful insights about the learning dynamics. R2 noted, ""Chaos theory provides a way of computing eigenvalues but does not give much understanding on the neural network optimization."". R4 noted, ""The authors use an insight from chaos theory to derive an efficient method of estimating the largest and smallest eigenvalues of the loss Hessian wrt the weight"". Hence, statements such as ""the rigorous theory developed to study chaotic systems can be useful to understand SGD"" seem unsubstantiated. + +Reduced to its essence, the key contribution is (1) a method to compute the top and the smallest eigenvalue, (2) the observation that the spectral norm of the Hessian along SGD optimization trajectory is related to the inverse of the learning rate, and (3) a method to automatically tune the learning rate. + +Let me discuss these three contributions: + +* The significance of the first contribution is unclear, as pointed out by R2. Indeed there are other methods (e.g. power method, Lanczos) for computing these quantities that should achieve either a similar speed or similar stability. Given the rich history of developing estimators of these quantities, a much more detailed evaluation is warranted to substantiate this claim. + +* The core insight that the top eigenvalue of the Hessian in SGD is related to the inverse of the learning rate in the training of deep neural networks is nontrivial but is not fully novel. Closely related observations were also shown in the literature. +This precise statement however indeed was not stated in the literature. This contribution could be a basis for acceptance, but the paper is not sufficiently focused on it, and the evaluation of this claim is a bit narrow in scope. + +* Finally, there is a range array of methods to tune the learning rate. As noted for example by R3, ""There are numerous ideas for proposing new optimization and without careful, through comparison to baseline, well-known methods"", the evaluation is too limited to treat this as a core contribution. + +Based on the above, I have to recommend the rejection of the paper. At the same time, I would like to thank the Authors for submitting the work for consideration to ICLR. I hope the feedback will be useful for improving the work.",ICLR2021, +#NAME?,1610040000000.0,1610470000000.0,1,j0p8ASp9Br,j0p8ASp9Br,Final Decision,Reject,"This paper aims to do efficient epistemic uncertainty quantification for model-based learning for control. It does so by augmenting the dataset with synthetic data around the true data points, and trying to classify whether a point is close to the training set or not. I agree with many of the criticisms that R3 and R5 brought fourth. Namely, it's not clear why a kernel density estimate couldn't be used instead (runtime complexity is cited as the reason, but could be addressed through approximations, inducing points etc). It is not clear how to set the sampling distribution for X_epi. Also, since efficiency is a motivation for the work, I suggest that the authors look at and cite: + +https://arxiv.org/abs/2002.06715 + +I think at the moment the paper is not ready for publication, but the idea is interesting. Aside from comparing with the work above, what would improve this paper is an automatic way to select the distribution, or at least the covariance, of X_epi. ",ICLR2021, +RVGDs2q10i,1642700000000.0,1642700000000.0,1,hRVZd5g-z7,hRVZd5g-z7,Paper Decision,Reject,"The paper eventually got 5 ""marginally above the threshold"" after rebuttal. Such scores testify to that the paper is a borderline one. By reading the post-rebuttal comments, it is evident that most of the reviewers still deemed that the novelty is incremental. One of the reviewer (vUb9) raised the score simply to ""encourage the authors to think more important problems"", rather than acknowledging the merits of the paper. The AC also read through the paper and had the following opinions: +1. The paper is actually about DNN compression, based on the ""new finding"" that the weights across layers are low-rank. However, the authors would not write the paper in the way of DNN compression, but put more emphasis on the ""new finding"", which has no theoretical support at all (only some heuristic reasoning). The AC would deem that the ""new finding"" is only an assumption. +2. Actually the ""new finding"" is not new at all. For example, + +[*] Zhong et al., ADA-Tucker: Compressing Deep Neural Networks via Adaptive Dimension Adjustment Tucker Decomposition, Neural Networks, 2019, + +used a shared core tensor (which could be regarded as the common dictionary) across all layers for higher compression rates. More recent references that use tensors and consider shared information across layers for compression can be easily found as well. + +So the AC thanked the authors for preparing the rebuttals carefully, but regretfully the paper is not good enough for ICLR.",ICLR2022, +SJgdqXmQxN,1544920000000.0,1545350000000.0,1,HyeGBj09Fm,HyeGBj09Fm,solid applications paper; final decision is subjective; borderline,Accept (Poster),"This paper presents a novel method for synthesizing fluid simulations, constrained to a set of parameterized variations, +such as the size and position of a water ball that is dropped. The results are solid; there is little related +work to compare to, in terms of methods that can ""compute""/recall simulations at that speed. +The method is 2000x faster than the orginal simulations. This comes with the caveats that: +(a) the results are specific to the given set of parameterized environments; the method is learning a +compressed version of the original animations; (b) there is a loss of accuracy, and therefore +also a loss of visual plausibility. + +The AC notes that the paper should use the ICLR format for citations, i.e., ""(foo et al.)"" rather than ""(19)"". +The AC also suggests that limitations should also be clearly documented, i.e., as seen from the +perspective of those working in the fluid simulation domain. + +The principle (and only?) contentious issue relates to the suitability of the paper for the ICLR audience, +given its focus on the specific domain of fluid simulations. The AC is of two minds on this: +(i) the fluid simulation domain has different characteristics to other domains, and thu +understanding the ICLR audience can benefit from the specific nature of the predictive problems that +come the fluid simulation domain; new problems can drive new methods. There is a loose connection +between the given work and residual nets, and of course res-nets have also been recently reconceptualized as PDEs. +(ii) it's not clear how much the ICLR audience will get out of the specific solutions being described; +it requires understanding spatial transformer networks and a number of other domain-specific issues. +A problem with this type of paper in terms of graphics/SIGGRAPH is that it can also be seen as ""falling short"" +there, simply because it is not yet competitive in terms of visual quality or the generality of +fluid simulators; it really fulfills a different niche than classical fluid simulators. + +The AC leans slightly in favor of acceptance, but is otherwise on the fence. + +",ICLR2019,2: The area chair is not sure +dgedvlIOm,1576800000000.0,1576800000000.0,1,r1gfweBFPB,r1gfweBFPB,Paper Decision,Reject,"While the reviewers generally appreciated the idea behind the method in the paper, there was considerable concern about the experimental evaluation, which did not provide a convincing demonstration that the method works in interesting and relevant problem settings, and did not compare adequately to alternative approach. As such, I believe this paper is not quite ready for publication in its current form.",ICLR2020, +pPGqUwVvxM,1576800000000.0,1576800000000.0,1,HJlMkTNYvH,HJlMkTNYvH,Paper Decision,Reject,"There is a consensus among reviewers that the paper should not be accepted. No rebuttal was provided, so the paper is rejected. ",ICLR2020, +C8_f8GFcla,1576800000000.0,1576800000000.0,1,S1g490VKvB,S1g490VKvB,Paper Decision,Reject,"Using ideas from mean-field theory and statistical mechanics, this paper derives a principled way to analyze signal propagation through gated recurrent networks. This analysis then allows for the development of a novel initialization scheme capable of mitigating subsequent training instabilities. In the end, while reviewers appreciated some of the analytical insights provided, two still voted for rejection while one chose accept after the rebuttal and discussion period. And as AC for this paper, I did not find sufficient evidence to overturn the reviewer majority for two primary reasons. + +First, the paper claims to demonstrate the efficacy of the proposed initialization scheme on multiple sequence tasks, but the presented experiments do not really involve representative testing scenarios as pointed out by reviewers. Given that this is not a purely theoretical paper, but rather one suggesting practically-relevant initializations for RNNs, it seems important to actually demonstrate this on sequence data people in the community actually care about. In fact, even the reviewer who voted for acceptance conceded that the presented results were not too convincing (basically limited to toy situations involving Cifar10 and MNIST data). + +Secondly, all reviewers found parts of the paper difficult to digest, and while a future revision has been promised to provide clarity, no text was actually changed making updated evaluations problematic. Note that the rebuttal mentions that the paper is written in a style that is common in the physics literature, and this appears to be a large part of the problem. ICLR is an ML conference and in this respect, to the extent possible it is important to frame relevant papers in an accessible way such that a broader segment of this community can benefit from the key message. At the very least, this will ensure that the reviewer pool is more equipped to properly appreciate the contribution. My own view is that this work can be reframed in such a way that it could be successfully submitted to another ML conference in the future.",ICLR2020, +#NAME?,1610040000000.0,1610470000000.0,1,_ptUyYP19mP,_ptUyYP19mP,Final Decision,Reject,"The reviewers have mixed views about this paper. + +However, it seems to me that the paper is missing some important related work on near-optimal exploration and it is only picking a couple of superficially similar approaches to look at. In particular, the standard benchmarks of Rmax, UCRL and Posterior Sampling do are not mentioned. I encourage the authors to look at such methods closely. They perform exploration byplanning over a sample or set of possible MDPs. + +I also want to raise another issue mentioned by the reviewers. The paper focuses extensively on neural networks, however a count-based metric is inherently for the tabular case. Why would a neural network be appropriate in such a setting? (The authors use a hash table because they are using a large discrete space. However, does it makes sense to essentially uniformly randomly cluster states together? Could there be another, better method? How about continuous spaces?) + +The algorithm idea is interesting, and the core is given already in (1) as: 'give reward in newly visited states'. However, the algorithm as described is incomplete. It is OK as a high-level description, but normally we'd require sufficient detail to reimplement the method from scratch. +You should for example specify how this intrinsic reward value is going to be used. Most of the reviewers, including me, could not understand how a student/teacher network would be combined with (1) to produce the intended exploration. Please try to explain in as much detail as possible your algorithm in order for the reviewers to be able to make an informed decision. +",ICLR2021, +9O_s3zFz3J,1576800000000.0,1576800000000.0,1,rJgjGxrFPS,rJgjGxrFPS,Paper Decision,Reject,"This paper proposes to use PCS to replace the conventional decoder for 3D shape reconstruction. It shows competitive performance to the state of the art methods. While reviewer #3 is overall positive about this work, both reviewer #1 and #2 rated weak rejection. Reviewer #1 concerns that important details are missing, and the discussion of results is insufficient. Reviewer #3 has questions on the clarity of the presentation and comparison with SOTA methods. The authors provided response to the questions, but did not change the rating of the reviewers. The ACs agree that this work has merits. However, given the various concerns raised by the reviewers, this paper can not be accepted at its current state.",ICLR2020, +tTdJZZW61tb6,1642700000000.0,1642700000000.0,1,DHLngM1mR3W,DHLngM1mR3W,Paper Decision,Reject,"This paper presents an augmentation-based training of autoencoders with stochastic latent space. The proposed method is examined on the representation learning task on several image datasets. While the reviewers found the submission interesting, simple, and easy to implement, they also raised serious concerns around the novelty of the proposed method and the impact of removing the KL term (which removes the generative interpretability of the model). Unfortunately, the experiments do not provide a convincing utility of the model compared to more popular representation learning methods (i.e., contrastive and non-contrastive methods). Given these concerns, the paper is not ready for presentation at ICLR.",ICLR2022, +LuQtvzo5V7,1610040000000.0,1610470000000.0,1,RSn0s-T-qoy,RSn0s-T-qoy,Final Decision,Reject,"This paper focuses on disentangled representation learning from multi-view data, which is an interesting and hot topic. However, there are several papers published in the last couple of years (especially in NeurIPS2020 and ECCV2020) solving very similar problems with closely related contributions to this paper. The contributions of this paper compared to all recent works in this space is unclear. Contributions and benefits of individual components in the method are not investigated. Although the method is designed for multi-view settings, the authors run experiments on simple settings with only two views. The experiments seem quite limited and do not show the method's capabilities. The rebuttal does not properly address the reviewers' concerns either. + +The paper received four reviews with three recommending below acceptance threshold (rejection) and one above the acceptance threshold (although this one was the least confident scoring). Given all the above shortcomings and reviewer recommendations I do not recommend acceptance of the paper. + + +",ICLR2021, +#NAME?,1610040000000.0,1610470000000.0,1,e60-SyRXtRt,e60-SyRXtRt,Final Decision,Reject,"This work investigates the choice of a 'baseline' for attribution methods. Such a choice is important and can heavily influence the outcome of any analysis that involves attribution methods. The work proposes doing (1) one-vs-one attribution in a sort of contrastive fashion (2) generating baselines using StarGAN. + +The reviewers have brought out a number of valid concerns about this work: + +1. One-vs-one attribution appears to be novel, and distinctive enough from the more prevalent ""one-vs-all"" formulations. I am perhaps more optimistic than the reviewers that such a formulation is in fact useful, but I can see where the hesitancy can come from. +2. It's not clear that the evaluation shows that the proposed method is in fact superior to the others. All the reviewers touched upon this one way or another. +3. Somewhat simplistic datasets used for evaluation (noted that there are CIFAR10 results in the rebuttal). + +This was more borderline than the scores would indicate. I thank the authors for the extensive replies and extra experiments. I encourage them to incorporate more of the feedback and resubmit to the next suitable conference. I do believe that doing experiments on ImagetNet (like previous work does, such as IG) would be quite worthwhile and convincing. I suspect the computational expense could be mitigated by re-using pretrained networks, of which there are many available for ImageNet specifically.",ICLR2021, +aRmVwKH7Gz,1610040000000.0,1610470000000.0,1,1EVb8XRBDNr,1EVb8XRBDNr,Final Decision,Reject,"This paper proposes a method of risk-sensitive multi-agent reinforcement learning in cooperative settings. The proposed method introduces several new ideas, but they are not theoretically well founded, which has caused many confusions among the reviewers. Although some of the confusions are resolved through discussion, there remain major concerns about the validity of the method. ",ICLR2021, +bYNnbgrTMqm,1642700000000.0,1642700000000.0,1,XzTtHjgPDsT,XzTtHjgPDsT,Paper Decision,Accept (Oral),"This paper takes inspiration from Global Workspace Theory to propose a modification for attention-based network architectures. This is exemplified both in transformer models and in recurrent models (RIMs). The key idea is to replace the quadratic, pairwise communication between ""specialist"" units (which in transformers corresponding to the positions) by a higher-order communication model which consists in a competitive, sparse writing step into a shared workspace, followed by a reading step where information is broadcasted from the global workspace to all specialists. The competitive writing step establishes a limited bandwidth channel for this communication which encourages specialization. + +The reviewers agree that this is an interesting and very well-written paper which unifies several existing ideas. The main contribution of this paper is in establishing a connection to GWT which may inspire future research to keep developing these ideas. The experiments on relatively small tasks (but challenging ones) provide a good proof of concept. Some concerns pointed out by some of the reviewers include a certain overstatement of the capabilities of the proposed model, as well as lack of experiments that scale up the model to larger and unstructured datasets. The authors replied with additional experiments included in the appendix, which in my opinion address these concerns convincingly. + +Overall, this is a strong paper and I recommend acceptance. I encourage the authors to take into account the reviewer's suggestions in the final version. I also think that the connection to related work could be improved, as there is several related works [1, 2, 3] which asks/investigates similar questions to this paper and should probably be acknowledged: +- The ""shared global workspace"" of this paper (Transformer + SW) is reminiscent of the Star-Transformer [1], as well as other more recent works which use special units (e.g. CLS tokens) to encode ""global"" representations. While that work does not include the competitive component (the ""bottleneck""), I think it should be acknowledged. +- Variants of transformers with competition among specialists via sparsity have also been proposed, e.g. adaptively sparse transformers [2]. That framework is an alternative to top-k softmax used in this paper. +- Empirical studies which analyze the redundancy among specialists (in this case attention heads) and propose strategies to prune them have also been made by [3]. + +[1] https://arxiv.org/abs/1902.09113 +[2] https://arxiv.org/abs/1909.00015 +[3] https://arxiv.org/abs/1905.09418 + +Minor point: ""Hence unlike pairwise interaction, messages passed among neural modules in the shared workspace setting also include HO interaction terms"" -- I believe higher-order interaction happens too every two layers with pairwise interaction. Perhaps this should be clarified.",ICLR2022, +-b7-c3SPyGX2,1642700000000.0,1642700000000.0,1,UkgBSwjxwe,UkgBSwjxwe,Paper Decision,Reject,"Unfortunately, the reviewers have unanimously voted to reject this paper. +There was some discussion of whether the paper was out-of-scope for ICLR; +I don't think that it is, necessarily, but I think that we can kind of screen off that topic because the reviewers had plenty of non-scope-related concerns that seem disqualifying to me, including both issues of novelty and issues related to the experimental validation. +Therefore, I am also recommending rejection in this case.",ICLR2022, +M5-jQdt5K9Z,1642700000000.0,1642700000000.0,1,eBCmOocUejf,eBCmOocUejf,Paper Decision,Accept (Poster),"This paper tackles a relatively novel problem that is the result of recent work on prefix tuning - specifically the need to be robust to adversarial perturbation in the context of prefix tuning and they show a method for achieving this without requiring more storage and obtain good results. + +There were some clarity issues that were addressed by the reviewers during the rebuttal. The main issue that was pointed out was the effect of batch size on the success of the model. The authors gave experiments with batch size 1 where results are less impressive but still outperform the baseline. Also the authors say that for now they are not considering the case where only some of the elements in the batch are adversarial, which I think is ok for a research paper on such a cutting-edge topic. + +Thus, the result of the discussion is to lean to accept this paper given that it is now more clear, has experiments that make it clear what the benefits are in realistic settings and obtains improvements.",ICLR2022, +JImidManlZA,1610040000000.0,1610470000000.0,1,p-NZIuwqhI4,p-NZIuwqhI4,Final Decision,Accept (Spotlight),"The paper analyzes the gradient flow dynamics of deep equilibrium models with linear activations and establishes linear convergence for quadratic loss and logistic loss; several exciting results and connections, solid contribution, accept! ",ICLR2021, +99KeOMSNsFA,1610040000000.0,1610470000000.0,1,pW--cu2FCHY,pW--cu2FCHY,Final Decision,Reject,"The new non linearity proposed in this paper present interesting observations and improvements on image and text datasets. +However, reviewers point out that there should’ve been more comparisons to other efficient transformers and on more datasets. +The speed improvements are also not clear. +I’d encourage the authors to revise and submit in the future.",ICLR2021, +3OEFVB1EFiSq,1642700000000.0,1642700000000.0,1,vpiOnyOBTzQ,vpiOnyOBTzQ,Paper Decision,Reject,"This manuscript tackles an interesting and significant line of research of long-term prediction and out-of-distribution generalization in time series models. I strongly believe this problem is an important one to solve. However, in its current form, its novelty is marginal, and the experiments fail to decisively show advantages. It also lacks of systematic improvements and error analysis. Further work could make it ready for publication at a next conference.",ICLR2022, +BylzGqwNe4,1545010000000.0,1545350000000.0,1,Hyffti0ctQ,Hyffti0ctQ,lack novelty,Reject,This paper proposes a new framework which combines pruning and model distillation techniques for model acceleration. The reviewers have a consensus on rejection due to limited novelty.,ICLR2019,5: The area chair is absolutely certain +OZSvmQrcj,1576800000000.0,1576800000000.0,1,HkeAepVKDH,HkeAepVKDH,Paper Decision,Reject,"main summary: method for quantizing GAN + +discussion: +reviewer 1: well-written paper, but reviewer questions novelty +reviewer 2: well-written, but some details are missing in the paper as well as comparisons to related work +reviewer 3: well-written and interesting topic, related work section and clarity of results could be improved +recommendation: all reviewers agree paper could be improved by better comparison to related work and better clarity of presentation. Marking paper as reject.",ICLR2020, +o5vvckGcag7,1642700000000.0,1642700000000.0,1,QuObT9BTWo,QuObT9BTWo,Paper Decision,Accept (Poster),"This paper develops a ``preference-conditioned” approach to approximate the Pareto frontier for Multi-Objective Combinatorial Optimization (MOCO) problems with a single model (thus dealing with the thorny problem that there can be exponentially-many Pareto-optimal solutions). It appears to provide flexibility for users to obtain various preferred tradeoffs between the objectives without extra search. The basic idea is to use end-to-end RL to train the single model for all different preferences simultaneously. + +The technical soundness and practical performance are strong. This work's approximation guarantee depends on the ability to approximately solve several (weighted) single-objective problems. This may be challenging due to the NP-hardness of the latter. However, this limitation seems to also apply to other end-to-end learning-based approaches. + +One area where the novelty is somewhat limited is that the paper borrows some number of ideas from neural single-objective optimization. The contribution overall seems noteworthy for hard multi-objective problems.",ICLR2022, +7OVzVpDj71,1576800000000.0,1576800000000.0,1,BJxsrgStvr,BJxsrgStvr,Paper Decision,Accept (Spotlight),"This work studies small but critical subnetworks, called winning tickets, that have very similar performance to an entire network, even with much less training. They show how to identify these early in the training of the entire network, saving computation and time in identifying them and then overall for the prediction task as a whole. + +The reviewers agree this paper is well-presented and of general interest to the community. Therefore, we recommend that the paper be accepted.",ICLR2020, +AYXoPOQ4lQ,1610040000000.0,1610470000000.0,1,1NRMmEUyXMu,1NRMmEUyXMu,Final Decision,Reject,"This paper proposes a model-based RL algorithm which, instead of simply fitting a parameterized transition model and uses rollout for planning, learns latent landmarks via distance-based clustering and conducts planning on the learned graph. Although some of these ideas themselves have appeared in literatures, the overall approach is very nice, novel and sophisticated. The experimental results appear strong and interesting. Most reviewers feel positive about the contributions of the paper, but there remain concerns that need to be addressed. + +The proposed approach is highly nontrivial, and more ablation, generalization and environments need to be studied to fully justify what's going on. The authors agree to expand the paper and add the needed results, which would require substantial work thus reviewers recommend that the paper be submitted again to a future conference and receive another round of review. Showing the generalization is nontrivial, and it would be make the paper stronger if the authors put more thoughts into this issue, although it is not a must. + +Minor: Another technical comment is that the approach seems heavily rely the choice of embedding distance. Learning the best embedding with meaningful embedding distance has been considered in other scenarios, see eg https://arxiv.org/abs/1906.00302. It would be interesting to try out and compare difference choices of the embedding distance. + +",ICLR2021, +nWgcJS8p_Q,1576800000000.0,1576800000000.0,1,H1g4M0EtPS,H1g4M0EtPS,Paper Decision,Reject,"This paper presents a Markov Random Fields (MRF) for generating adversarial examples in a black-box setting, where only it has access to loss function evaluations. The method exploits the structure of input data to model the covariance structure of the gradients. Empirically, the resulting method uses fewer queries than the current state of the art to achieve comparable performance. Overall, the paper has valuable contributions. The main issue is on empirical evaluation, which can be strengthened, e.g., by including results with multi-step methods and more thorough analysis of the estimated gradients.",ICLR2020, +66At-usmmo9o,1642700000000.0,1642700000000.0,1,36rU1ecTFvR,36rU1ecTFvR,Paper Decision,Reject,"This paper suggests a novel defense against adversarial perturbations where during training a loss term is added which enforces similar feature representations. +At test time: i) noise is added, ii) the feature loss is minimized + +The authors report excellent results against AutoAttack but the problem is that AutoAttack expects a static, non-randomized defense. Both is not the case for the defense proposed in the present paper. Therefore, the evaluation with AutoAttack could significantly overestimate the actual robustness and the evaluation of the paper is therefore not valid. Thus adaptive attacks are needed, which are tailored to the defense mechanism, see e.g. Carlini et al, On Evaluating Adversarial Robustness, https://arxiv.org/abs/1902.0670. + +As two reviewers noticed, the suggested ""adaptive attack"" in the paper is not properly attacking the whole defense mechanism by unrolling the test time optimization and using additionally EOT. Thus it is unclear at the moment if the method is really robust. Moreover, the inference time is significantly increased so that it is questionable if this approach is practically relevant. Therefore this paper is not ready for publication yet.",ICLR2022, +CBqTwH5uUx,1610040000000.0,1610470000000.0,1,tiqI7w64JG2,tiqI7w64JG2,Final Decision,Accept (Poster),"This paper compares “Graph Augmented MLPs” (GA-MLP), which augment node features by a single aggregation of neighbors and then pass the resulting features through an MLP, to graph neural networks (GNNs). The paper establishes results on representational power of some GA-MLP models being less powerful than GNNs. While practitioners may not change their behavior as a result, the work appears carefully done, is novel, and reviewers are mostly in agreement that the paper is a nice read and good contribution to the field.",ICLR2021, +Eq55j-Xan0,1576800000000.0,1576800000000.0,1,BJx-ZeSKDB,BJx-ZeSKDB,Paper Decision,Reject,"The authors propose a new type of compositional embedding (with two proposed variants) for performing tasks that involve set relationships between examples (say, images) containing sets of classes (say, objects). The setting is new and the reviewers are mostly in agreement (after discussion and revision) that the approach is interesting and the results encouraging. There is some concern, however, that the task setup may be too contrived, and that in any real task there could be a more obvious baseline that would do better. For example, one task setup requires that examples be represented via embeddings, and no reference can be made to the original inputs; this is justified in a setting where space is a constraint, but the combination of this setting with the specific set query tasks considered seems quite rare. The paper may be an example of a hammer in search of a nail. The ideas are interesting and the paper is written well, and so the authors can hopefully refine the proposed class of problems toward more practical settings.",ICLR2020, +sT8BSRAqgw,1576800000000.0,1576800000000.0,1,SyedHyBFwS,SyedHyBFwS,Paper Decision,Reject,"All reviewers rated this submission as a weak reject and there was no author response. +The AC recommends rejection.",ICLR2020, +utOOvqsCFR,1642700000000.0,1642700000000.0,1,r4PibJdCyn,r4PibJdCyn,Paper Decision,Reject,"Reviewers are in agreement that the paper is below the acceptance threshold. Main concerns focus around novelty, experiments, and justification of the paper's main claims.",ICLR2022, +B1x63f8Ox,1486400000000.0,1486400000000.0,1,S1_pAu9xl,S1_pAu9xl,ICLR committee final decision,Accept (Poster),The paper describes a method for training neural networks with ternary weights. The results are solid and have a potential for high impact on how networks for high-speed and/or low-power inference are trained.,ICLR2017, +SkvlnG8ug,1486400000000.0,1486400000000.0,1,BkXMikqxx,BkXMikqxx,ICLR committee final decision,Reject,"There is consistent agreement towards the originality of this work and that the topic here is ""interesting"". Additionally there is consensus that the work is ""clearly written"", and (excepting questions of the word ""cortical"") all would be primed to accept this style of work. + + However there is a shared concern about the quality and potential impact of the work, in particularly in terms of the validity of empirical evaluations. Reviewers are generally not inclined to believe that the current empirical evidence validates the conclusions of the word. Suggestions are to: make greater use of a language model, compare to external baselines, or remove the handwriting aspects.",ICLR2017, +HyyLrJpHz,1517250000000.0,1517260000000.0,566,B1EPYJ-C-,B1EPYJ-C-,ICLR 2018 Conference Acceptance Decision,Reject,"The authors study the problem of reducing uplink communication costs in training a ML model where the training data is distributed over many clients. The reviewers consider the problem interesting, but have concerns about the extent of the novelty of the approach. As the reviewers and authors agree that the paper is an empirical study, and the authors agree that the novelty is in the problem studied and the combination of approaches used, a more thorough experimental analysis would +benefit the paper.",ICLR2018, +kXjbUMcttHZ,1642700000000.0,1642700000000.0,1,lP11WtZwquE,lP11WtZwquE,Paper Decision,Reject,"This paper gets decent performance gains (~2% on GLUE) by soft regularization to make negatives closer to positives in contrastive learning and hard correction of too-close negatives to at least avoid synonyms. These are useful ideas which to some extent build on the simple technique of ELECTRA (controlling the size of the generator MLM in Electra encourages the negatives to in general be ""close but not too close"", right?). As such, the paper is correct and provides potentially useful gains, but it appears a quite small adjustment of existing techniques, and in addition the use of WordNet is fairly brittle (and its similarity calculations do not consider context at all). + +The authors should be commended for the thorough job they did at updating their paper to address particular questions and concerns of reviewers, and useful new information emerged. Relative to the question of whether this method can be applied with other MLMs, the new Appendix A results do show that the answer is Yes, but the gains turn out to be much more modest (~0.5% on GLUE). However, ultimately, while this is all useful information and side experiments, these improvements just can't fix the key problem that all the reviewers felt that this paper does not provide sufficient ""Technical Novelty and Significance"". As such without bigger new ideas, this improved paper would probably be best as a good workshop paper. + +My recommendation is that this paper not be accepted to ICLR 2022 on the basis of its limited technical novelty and significance.",ICLR2022, +Hy-UEkarM,1517250000000.0,1517260000000.0,352,rk8wKk-R-,rk8wKk-R-,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"meta score: 5 + +This paper gives a thorough experimental comparison of convolutional vs recurrent networks for a variety of sequence modelling tasks. The experimentation is thorough, but the main point of the paper, that convolutional networks are unjustly ignored for sequence modelling, is overstated as there are several areas where convolutional networks are well explored. +Pros: + clear and well-written + thorough set of experiments +Cons + original contribution is not strong + it is not as radical to consider convolutional networks for sequence modeling as the authors seem to suggest +",ICLR2018, +t3ZBZIXhnv,1642700000000.0,1642700000000.0,1,WAid50QschI,WAid50QschI,Paper Decision,Accept (Oral),"This paper proposes mixed distributions over convex polytopes, and provides theory for mixed distributions that is relevant to the machine learning community. All of the reviewers were positive, and agree that this is a solid contribution. I agree, and I believe that this paper stands a chance of being a foundational paper for future work in probabilistic ML and structured learning.",ICLR2022, +4awCnNnpYO,1576800000000.0,1576800000000.0,1,S1e2agrFvS,S1e2agrFvS,Paper Decision,Accept (Spotlight),This paper is consistently supported by all three reviewers and thus an accept is recommended.,ICLR2020, +NflYy4z4ZX,1576800000000.0,1576800000000.0,1,SyeYiyHFDH,SyeYiyHFDH,Paper Decision,Reject,"The reviewers have reached consensus that while the paper is interesting, it could use more time. We urge the authors to continue their investigations.",ICLR2020, +AxddIdJAP-4,1642700000000.0,1642700000000.0,1,OCgCYv7KGZe,OCgCYv7KGZe,Paper Decision,Reject,"This work addresses the problem of learning representations from noisy expert demonstrations in in adversarial imitation learning. The authors build on top of GAIL, which utilizes a discriminator to model a ""pseudo""-reward from demonstrations. In this work, the discriminator is replaced with an auto-encoder. The authors hypothesis is that using an auto-encoder helps in 2 ways: 1) denoising expert trajectories for more ""robust"" learning; 2) using the reconstruction error (instead of binary classification loss) to distinguis experts from samples provides more informative signal for reward learning. + +**Strengths** +on a global perspective this work is well motivated +a novel algorithmic variant of GAIL is proposed +thorough experimental evaluation + +**weaknesses** +The manuscript doesn't clearly distinguish between adversarial imitation learning algorithms (like GAIL) and ""true"" inverse reinforcement learning algorithms. This makes it unclear what the real goal of the proposed method is. The ultimate goal of adversarial IL is to learn a policy (by inferring a pseudo-reward at ""train"" time which is then never used again), while the primary goal of IRL is to learn a reward function at train time, which can then be used at test time. The manuscript motivates the algorithm by saying it will have a more informative signal for learning reward functions, but the algorithm itself is an adversarial IL algorithm which primary goal is to learn a policy from demonstrations. Overall, makes the evaluation and analysis confusing. Ideally, the authors would have focussed on the question ""Does the reconstruction error lead to better policies?"" (through better pseudo-reward modeling) - or would have extended an IRL method. + +Second, the motivation is that the autoencoder helps with more ""robust"" learning, but it's unclear to me that the evaluation really shows that learning is more robust (also because ""robustness"" is not clearly defined) + +The experimental evaluation is a bit of a mixed bag, and it's a unclear why the new algorithm performs better on non-noisy data (when compared to baselines), but not less so on the noisy data. + +**Summary** +Overall, this work provides a promising direction, however in it's current form the manuscript is not yet ready for publication.",ICLR2022, +Rrine5bsTk,1576800000000.0,1576800000000.0,1,Bkeeca4Kvr,Bkeeca4Kvr,Paper Decision,Accept (Poster),The authors propose a method for few-shot learning for graph classification. The majority of reviewers agree on the novelty of the proposed method and that the problem is interesting. The authors have addressed all major concerns.,ICLR2020, +jROSmOi6QSHo,1642700000000.0,1642700000000.0,1,O1DEtITim__,O1DEtITim__,Paper Decision,Accept (Spotlight),"This work considers one-shot pruning in deep neural networks. The main departure from previous work is to consider stochastic Frank-Wolfe. The reported results are convincing although a number of baselines were missing from the initial submission. The authors provide a balanced account of the strengths and weaknesses of the proposed approach. + +The authors adequately addressed the concerns of the reviewers. For instance they ran additional experiments to compare to missing pruning baselines. I would encourage the authors to revise the manuscript by including the missing related work, the additional clarification discussions (e.g., motivation for K-sparse constraints, follow-up analysis, and cost per iteration) and to include the additional experiments that were conducted (e.g., pruning with training).",ICLR2022, +Il1NHE7L68,1610040000000.0,1610470000000.0,1,gYbimGJAENn,gYbimGJAENn,Final Decision,Reject,"All the reviewers shared the concerns about the novelty and the quality of the results. Comparisons with some SOTA results are missing, and the inclusion of deblurring/denosing tasks is not convincing. The authors carefully addressed these issues in the rebuttal but the reviewers didn’t change their mind afterwards. After carefully examining the results in the paper, the AC agrees with the reviewers that the improvement on image quality, if any, seems to be too small to warrant a publication. ",ICLR2021, +EDETE5illl,1610040000000.0,1610470000000.0,1,nhIsVl2UoMt,nhIsVl2UoMt,Final Decision,Reject,"This paper proposes a method for modeling higher-order interactions in Poisson processes. Unfortunately, the reviewers do not feel that the paper, in its current state, meets the bar for ICLR. In particular, reviewers found the descriptions unclear and the justifications lacking. While the responses did aid the reviewers understanding, the paper would benefit from rewriting and more careful thought given to the experimental design.",ICLR2021, +5H730Hl_UOt,1610040000000.0,1610470000000.0,1,hypDstHla7,hypDstHla7,Final Decision,Reject,"The paper analyzes neuron activations for neural networks trained via RL to perform reaching with planar robot arms. This analysis includes an evaluation of the correlation between neurons of different models trained to control arms with different degrees-of-freedom. In performing these evaluations, the paper proposes a heuristic pruning algorithm that reduces the size of the network and increases information density. Correlation is assessed based on a projection of the source network on the target network. + +The paper is well written and considers a challenging problem of interest to the community. The proposed pruning strategy as a means of maximizing information content is reasonable and seems to perform well. However, the significance of the contributions is limited by the experimental evaluation. The experiments consider a large number of models, however the scope of problems on which the method is evaluated is narrow, making it difficult to draw conclusions about the merits and significance of the work. The authors are encouraged to extend the analysis to a more diverse set of problems.",ICLR2021, +cD1-E7ae9bo,1610040000000.0,1610470000000.0,1,9D_Ovq4Mgho,9D_Ovq4Mgho,Final Decision,Reject,"A majority of the reviewers find the paper lacks novelty and provides an insufficient discussion of the state-of-the-art in knowledge distillation and student teacher training to warrant publication. +The approach is quite narrow to the application domain and the paper does not provide novel insights on how to chose a good network. +A subset of the experiments performed on an internal data set with random train-test-splits do not evaluate a realistic transfer setting. + +",ICLR2021, +fmCYK3XXCA,1576800000000.0,1576800000000.0,1,S1lVhxSYPH,S1lVhxSYPH,Paper Decision,Reject,"The paper presents a quantization method that generates per-layer hybrid filter banks consisting of full-precision and ternary weight filters for MobileNets. The paper is well-written. However, it is incremental. Moreover, empirical results are not convincing enough. Experiments are only performed on ImageNet. Comparison on more datasets and more model architectures should be performed.",ICLR2020, +Bke_kssBe4,1545090000000.0,1545350000000.0,1,HkgHk3RctX,HkgHk3RctX,"valuable application area, limited novelty",Reject,"The paper addresses the problem of learning to (re)rank slates of search results while optimizing some performance metric across the entire list of results (the slate). The work builds on a wealth of prior work on slate optimization from the information retrieval community, and proposes a novel approach to this problem, an extension of pointer networks, previously used in sequence learning tasks. + +The paper is motivated by an important real world application, and has potential for significant practical impact. Reviewers noted in particular the valuable evaluation in an A/B test against a strong production system - showing that the work has practical impact. Reviewers positively noted the discussion of practical issues related to applying the work at scale. The paper was found to be clearly written, and demonstrating a thorough understanding of related work. + +The authors and AC also note several potential weaknesses. Several of these were addressed by the authors, as follows. R3 asked for more breadth on metrics, and additional clarifications - the authors provided the requested information. Several questions were raised regarding the diverse-clicks setting and choice of hyperparameter \eta - both were discussed in the rebuttal. Further analysis / discussion of computational and performance trade-offs are requested and discussed. + +Overall, the main drawback of the paper, raised by all three reviewers, is the size of the contribution. The paper extends an approach called ""pointer networks"" to the model application setting considered here. The reviewers and AC agree that, while practically relevant and interesting, the research contribution of the resulting approach limited. As a result, the recommendation is to not accept the paper for publication at ICLR in its current form. +",ICLR2019,4: The area chair is confident but not absolutely certain +t322KCaH8k1,1642700000000.0,1642700000000.0,1,Qg2vi4ZbHM9,Qg2vi4ZbHM9,Paper Decision,Accept (Oral),"The paper provides an interesting analysis of aligned GAN models. The paper shows that when a model is obtained (fine-tuned) from another, then the corresponding hidden semantic spaces are aligned. The paper uses this property to show that without any additional architecture or training, the models can perform diverse tasks such as image translation and morphing. The paper also demonstrates that zero-shot tasks can be performed by learning in the parent domain and transferring to the child domain. + +All reviewers agree that the paper presents an interesting analysis and findings and will make a valuable contribution to the field. The reviewers raised some particular concerns, which were addressed by the authors in their response.",ICLR2022, +HJxgf19NxE,1545020000000.0,1545350000000.0,1,HJMCdsC5tX,HJMCdsC5tX,not fit for ICLR,Reject,"This paper presents an heuristic method to detect periodicity in a time-series such that it can handle noise and multiple periods. + +All reviewers agreed that this paper falls off the scope of ICLR since it does not discuss any learning-related question. Moreover, the authors did not provide any response nor updated manuscript addressing the reviewers remarks. The AC thus recommends rejection. ",ICLR2019,5: The area chair is absolutely certain +O7limtSdqPE,1610040000000.0,1610470000000.0,1,yT7-k6Q6gda,yT7-k6Q6gda,Final Decision,Reject,"This is a tricky one, hence my low confidence rating. + +The reviewers seem to agree that the paper is well written, easy to follow, and that it tests a relevant hypothesis that is of interest to the community. There was some disagreement as to whether the experiments are comprehensive, complete and/or conclusive enough, although on balance it seems reviewers were overall satisfied barring a few additional requests which the authors addressed in their feedback. + +However, no reviewers support the paper strongly (borderline accepts) while R5 remains unconvinced and has raised a technical point in their review about the estimator of the Trace of the Fisher information matrix. The question R5 has raised is central to the paper's methods, arguments and conclusions. In a message to ACs and PCs the authors raised concerns about R5. I personally thought that while R5 could have worded their review more carefully and respectfully (as I pointed out in my respose) the concerns raised were otherwise motivated, the reviewer engaged in a discussion, and the arguments were laid out clearly. I side with R5 and I think that the paper should be rewritten with more clarity on this question - the problem R5 found is likely to trip up others who read or build on the paper. + +The authors have raised that there are two parallel submissions closely related to this one, complicating the decision making somewhat: +[1] https://openreview.net/forum?id=rq_Qr0c1Hyo [2] https://openreview.net/forum?id=3q5IqUrkcF +",ICLR2021, +2ttdwMImxw5,1642700000000.0,1642700000000.0,1,9XhPLAjjRB,9XhPLAjjRB,Paper Decision,Accept (Spotlight),"Overall, the paper provides interesting counter examples for the SGD with constant step-size (that relies on a relative noise model that diminishes at the critical points), which provide critical (counter) insights into what we consider as good convergence metrics, such as expected norm of the gradient. + +The initial submission took a controversial position between the mathematical statements and the presentation of the statements on the behavior of the SGD method in non-convex optimization problems. While the mathematical is sufficient for acceptance at ICLR, the presentation was inferring conclusions that could have been misread by the community. + +I am really happy to state that the review as well as the rebuttal processes helped improved the presentation of the results that I am excited to recommend acceptance.",ICLR2022, +rklz0IhVeE,1545030000000.0,1545350000000.0,1,H1lJws05K7,H1lJws05K7,Lack of clarity makes it difficult to discern a clear contribution over recent literature,Reject,"The paper attempts to extend the recent analysis of random deep networks to alternative activation functions. Unfortunately, none of the reviewers recommended the paper be accepted. The current presentation suffers from a lack of clarity and a sufficiently convincing supporting argument/evidence to satisfy the reviewers. The contribution is perceived as too incremental in light of previous work.",ICLR2019,5: The area chair is absolutely certain +BJTHBJTHM,1517250000000.0,1517260000000.0,565,HyEi7bWR-,HyEi7bWR-,ICLR 2018 Conference Acceptance Decision,Reject,"The authors use the Cayley transform representation of an orthogonal matrix to provide a parameterization of an RNN with orthogonal weights. The paper is clearly written and the formulation is simple and elegant. However, I share the concerns of reviewer 3 about the significance of another method for parameterizing orthogonal RNN, as there has not been a lot of evidence that these have been useful on real problems (and indeed, on most of the toys used show the value of orthogonal RNN, one can get good results just by orthogonal initialization, e.g. as in Henaff et. al. as cited in this work). This work does not compare experimentally against many of the other methods, e.g. https://arxiv.org/pdf/1612.00188.pdf, the two Jing et. al. works cited, simple projection methods (either full projections at each step or stochastic projections as in Henaff et. al.). It does not cite or compare against the approach in https://arxiv.org/pdf/1607.04903.pdf. ",ICLR2018, +J_U_H7id_N4,1642700000000.0,1642700000000.0,1,WQc075jmBmf,WQc075jmBmf,Paper Decision,Accept (Poster),"The paper presents a deep learning approach encodes codebases as databases that conform to rich relational schemas. Based on this, a biased graph-walk mechanism efficiently feeds this structured data into a transformer and deepset approach. The results shown a quite good, compared to other approaches present at ICLR. Moreover, one reviewer is strongly voting for accepting the paper, arguing that ""that this paper is of significance to the ML4Code research community, as it shows how to offload the engineering cost of extracting semantic information from programs to a standard tool."" Overall, I have really enjoyed reading the paper, and the use of relational database as codebase together with a transformers is sweat. On the other, it also presented in a rather engineering way, as pointed out by several reviewers, suggesting that some software engineering venue might be a better place for the work. But then ICLR had similar papers, and the present paper demonstrates a benefit of using a relational encoding. Thus, I weight the leaning towards rejects borderlines votes less and suggest an accept overall. We all should keep in mind that also deep neural architecture are full of design choices.",ICLR2022, +KtGtnI6RmP,1576800000000.0,1576800000000.0,1,BkxX30EFPS,BkxX30EFPS,Paper Decision,Reject,"The authors present a new training procedure for generative models where the target and generated distributions are first mapped to a latent space and the divergence between then is minimised in this latent space. The authors achieve state of the art results on two datasets. + +All reviewers agreed that the idea was vert interesting and has a lot of potential. Unfortunately, in the initial version of the paper the main section (section 3) was not very clear with confusing notation and statements. I thank the authors for taking this feedback positively and significantly revising the writeup. However, even after revising the writeup some of the ideas are still not clear. In particular, during discussions between the AC and reviewers it was pointed out that the training procedure is still not convincing. It was not clear whether the heuristic combination of the deterministic PGA parts of the objective (3) with the likelihood/VAE based terms (9) and (12,13), was conceptually very sound. Unfortunately, most of the initial discussions with the authors revolved around clarity and once we crossed the ""clarity"" barrier there wasn't enough time to discuss the other technical details of the paper. As a result, even though the paper seems interesting, the initial lack of clarity went against the paper. + +In summary, based on the reviewer comments, I recommend that the paper cannot be accepted. +",ICLR2020, +4fCALXuh8Q7,1642700000000.0,1642700000000.0,1,edN_G_4njyi,edN_G_4njyi,Paper Decision,Reject,"Dear authors, + +I have carefully read the reviews, rebuttals, and the subsequent discussion. Most reviews provide high quality feedback, and the reviewers' combined opinion is strongly oriented towards recommending rejection. I have to concur with this recommendation. While essentially all reviewers agreed that the paper is well written, they also raised several key concerns. I will reiterate (and further elaborate) on some of them here: + +1) I do not agree with the authors' claim that there is a significant difference between client sampling in FL and data sampling in SGD in terms of the underlying mathematics; at least not in the present form. The mathematical formulation of both problems is the same. While in SGD it is possible to access local information and construct more powerful sampling strategies using this information, it is also possible to forgo using this information and propose simpler data-agnostic strategies. Such strategies have been studies in the SGD literature before. I recommend the literature on *arbitrary sampling* pioneered by Richtarik and Takac (""Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function"", Mathematical Programming, 2014) in the context of randomized coordinate descent. The same sampling approach was later adopted for SGD. The paper by Csiba and Richtarik (Importance sampling for minibatches, JMLR 2018) suggested by one of the reviewers is relevant here as it adopts the arbitrary sampling approach to (variance reduced) SGD. Such an arbitrary sampling paradigm is more general than the unbiased sampling strategy you study here (indeed, arbitrary sampling includes also biased samplings). The work of Chen et al mentioned by a reviewer is also more relevant than you appreciate. This work also mentions a couple decomposition statements. They work with the arbitrary sampling framework in the FL setting - and this seems more general than your framework since it includes biased samplings as well. Note that they then proceed to compute the optimal sampling out of all samplings, while you do not attempt to theoretically capture what samplings are best. This prior works thus addresses a similar problem, and goes deeper in this aspect. Also, the parameters $v_i$ in their Lemma 1 are simply just statistics of the sampling - and are unrelated to the data. Also, Lipschitz smoothness constant of the aggregate loss over all local data samples is not hard to estimate, and is needed anyway to set the stepsize correctly. So, your comments about the unavailability of these quantities in the FL setting seem incorrect. + +2) Theorem 1 is indeed just a simple calculation / observation rather than a result. I agree with the reviewer who said that the value of this simple observation, without a deeper study of its consequences, and an explanation of how the consequences lead to new results that are in some sense interesting, is quite limited. As mentioned by one reviewer, sophisticated methods are often non-monotonic: they do *not* attempt to greedily reduce some simple potential (e.g., distance to the minimizer) as such a strategy may be suboptimal from a total convergence point of view. This observation by the reviewer seems to have been misunderstood by the authors. This limits the impact of Theorem 1 as Theorem 1 assumes that one is interested in a greedy method. + +3) The bounded variance and bounded dissimilarity assumption *are* strong. The suggestion by one of the reviewers to consider a work on more accurate ways of modeling stochastic gradients (Khaled et al) was appropriate. I suggest the authors read the paper to see detailed reasoning explaining why the types of assumptions you make in this paper are problematic. The fact that some other papers use such problematic assumptions, even if they are well known, is not evidence that these assumptions are not problematic. It is merely evidence that many papers share the same issue. I also have to oppose the authors' view that non-peer-reviewed work should not be brought up. In my view, this is a deeply problematic and unscientific attitude to research that is available online. Peer review does not imply correctness, and vice versa. + +4) Experimental comparison with any other methods is missing. Why do you not compare with the optimal sampling strategy of Chen et al, for example? Does your framework suggest a better strategy in some sense? If yes, show it. If no, then in what sense are your sampling strategies interesting? In any case, all have been considered in the arbitrary sampling framework before as far as I can see. + +5) There are more issues with have been identified by the reviewers. I strongly recommend the authors to take all of them seriously in their revision. + +In summary, this is a solid paper. However, it has some serious issues and for this reason I cannot recommend it for acceptance. Having said that, I thank the authors for their submission and wish them best of luck in future research with this project. + +Area Chair",ICLR2022, +lVkL64AWO_,1576800000000.0,1576800000000.0,1,BJg9hTNKPH,BJg9hTNKPH,Paper Decision,Reject,"This paper is an empirical studies of methods to stabilize offline (ie, batch) RL methods where the dataset is available up front and not collected during learning. This can be an important setting in e.g. safety critical or production systems, where learned policies should not be applied on the real system until their performance and safety is verified. Since policies leave the area where training data is present, in such settings poor performance or divergence might result, unless divergence from the reference policy is regularized. This paper studies various methods to perform such regularization. + +The reviewers are all very happy about the thoroughness of the empirical work. The work only studies existing methods (and combination thereof), so the novelty is limited by design. The paper was also considered well written and easy to follow. The results were very similar between the considered regularizers, which somehow limits the usefulness of the paper as practical guideline (although at least now we know that perhaps we do not need to spend a lot of time choosing the best between these). Bigger differences were observed between ""value penalties"" versus ""policy regularization"". This seems to correspond to theoretical observations by Neu et al (https://arxiv.org/abs/1705.07798, 2017), which is not cited in the manuscript. Although unpublished, I think that work is highly relevant for the current manuscript, and I'd strongly recommend the authors to consider its content. Some minor comments about the paper are given below. + +On the balance, the strong point of the paper is the empirical thoroughness and clarity, whereas novelty, significance, and theoretical analysis are weaker points. Due to the high selectivity of ICLR, I unfortunately have to recommend rejection for this manuscript. + +I have some minor comments about the contents of the paper: +- The manuscript contains the line: ""Under this definition, such a behavior policy πb is always well-defined even +if the dataset was collected by multiple, distinct behavior policies"". Wouldn't simply defining the behavior as a mixture of the underlying behavior policies (when known) work equally well? +- The paper mentions several earlier works that regularize policies update using the KL from a reference policy (or to a reference policy). The paper of Peters is cited in this context, although there the constraint is actually on the KL divergence between state-action distributions, resulting in a different type of regularization.",ICLR2020, +PEGfKFuvOq,1642700000000.0,1642700000000.0,1,nuWpS9FNSKn,nuWpS9FNSKn,Paper Decision,Reject,"This paper generated a large amount of discussion. Three reviewers were marginally above and one marginally below. The paper presents an intriguing relationship between self-supervised learning and topic model inference that extends earlier work of Tosh. The result seems to be subtle because there was considerable discussion with the authors wrapping up with a reminder of what the main goal is: SSL can achieve the state-of-the-art performance for topic inference problem, moreover (main goal) SSL can be oblivious to the specific topic model. This is indeed intriguing. But with all the discussion, and one persistent negative reviewer, I feel the paper needs to be polished. Given the theorem gives a testable statement, I don't see why experimental results cannot be done for 4 different real data sets, to give us more confidence.",ICLR2022, +5WEbQYRbQ6r,1610040000000.0,1610470000000.0,1,iEcqwosBEgx,iEcqwosBEgx,Final Decision,Reject,"This paper investigate the interesting problem of policy seeking in reinforcement learning via constrained optimization. Conditioned on reviewers' judgements, this is a good submission but hasn't reached the bar of ICLR.",ICLR2021, +-D6vixOfr-,1576800000000.0,1576800000000.0,1,SyeLGlHtPS,SyeLGlHtPS,Paper Decision,Reject,"The paper received mixed reviews. On one hand, there is interesting novelty in relation to biological vision systems. On the other hand, there are some serious experimental issues with the machine learning model. While reviewers initially raised concerns about the motivation of the work, the rebuttal addressed those concerns. However, concerns about experiments remained. ",ICLR2020, +sAiHALZyE1V,1610040000000.0,1610470000000.0,1,PrvaKdJcKhX,PrvaKdJcKhX,Final Decision,Reject,"The reviewers agreed that there were a few issues with the current version of this work, mainly: + +- Some missing baselines that are mentioned in the paper, but not sufficiently compared to + +- Problems with the presentation that did not make it easy to understand. + +- Not an optimal fit with the intended audience of this conference. + + +",ICLR2021, +BJ6RLJpHz,1517250000000.0,1517260000000.0,903,ry831QWAb,ry831QWAb,ICLR 2018 Conference Acceptance Decision,Reject,"The paper proposes to study the impact of normalizing the gradient for each layer before applying existing techniques such as SG + momentum, Adam or AdaGrad. The study is done on a reasonable number of datasets and, after the reviewers' comments, confidence intervals have been added, although Table 1 puts results in bold but many of these results are not statistically significant. + +The paper, however, lacks a proper analysis of the results. Two main things could be improved: +- Normalization does not always have the same effect but the reasons for it are not discussed. This needs not be done theoretically but a more thorough analysis would have been appreciated. +- There is no hyperparameter tuning, which means that the results are heavily dependent on which hyperparameters were chosen. Thus, it is hard to draw any conclusion. + +Regarding the seemingly conflicting remarks of the two reviewers, it all depends on what the paper is trying to achieve. If it tries to show that is it state-of-the-art, then comparing to state-of-the-art algorithms on every dataset is crucial. If it tries to study the impact of one specific change, in this case layer normalization, on the optimization, then comparing to the vanilla version is fine. The paper seems to try to address the latter so it is OK if it is not compared to all the state-of-the-art algorithms. However, proper tuning of existing methods is still required. + +Ultimately, a better understanding of layer normalization could be useful but the paper is not convincing enough to provide that understanding. There is no need to increase the number of datasets but it should rather focus on designing setups to test and validate hypotheses.",ICLR2018, +r1TlXJpHM,1517250000000.0,1517260000000.0,67,rkHVZWZAZ,rkHVZWZAZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper presents a nice set of results on a new RL algorithm. The main downside is the limitation to the Atari domain, but otherwise the ablation studies are nice and the results are strong.",ICLR2018, +ByeGlJp1eN,1544700000000.0,1545350000000.0,1,HkGSniC9FQ,HkGSniC9FQ,Limited contribution,Reject,"Dear authors, + +All reviewers pointed out the fact that your result is about the expressivity of the big network rather than its accuracy, a result which is already known for the literature. + +I encourage you to carefully read all reviews should you wish to resubmit this work to a future conference.",ICLR2019,5: The area chair is absolutely certain +LHk2doSMduF,1642700000000.0,1642700000000.0,1,m7zsaLt1Sab,m7zsaLt1Sab,Paper Decision,Reject,"This paper proposes a theory for understanding the context representation in pretrained language models. The strengths of the paper, as identified by reviewers, are in the importance of an attempt to explain contextualization in language models, and in the novelty of using the category theory to model the connection between contexts and their representations. However, all the reviewers identify several major weaknesses, including flawed/incoherent definitions of concepts in the proposed theory and insufficient experimental results. Although the authors' rebuttal put a great deal of effort to address raised concerns, all five reviewers agree (and provide very detailed justifications along with suggestions for improvements) that the work is not yet ready for publication.",ICLR2022, +SJxCVFYBgN,1545080000000.0,1545350000000.0,1,rkl4M3R5K7,rkl4M3R5K7,Unclear motivation and significance of empirical results,Reject,Four reviewers have evaluated this paper. The reviewers have raised concerns about the specific formulation used for adversarial example generation which requires further clarity in motivation and interpretation. The reviewers have also made the point that the experimental evaluation is against previous work which tried to solve a different problem (black box based attack) and hence the conclusions are unconvincing.,ICLR2019,3: The area chair is somewhat confident +6VOtrhe3Ff,1610040000000.0,1610470000000.0,1,czv8Ac3Kg7l,czv8Ac3Kg7l,Final Decision,Reject,"The paper proposes a method for inference in models with GP priors and neural network likelihoods for multi-output modelling, dealing with the problem of scalability and missing data. The paper builds upon previous work on inducing variables for scalability on GP models and inference networks for amortization (reducing the number of parameters to estimate) and dealing with missing data. + +There are several concerns about the paper in terms of generality/flexibility of the approach, as the proposed model shares the NN parameters across tasks and the results on the small datasets do not show improvements wrt baseline such as GPAR. The authors’ comments provide somewhat satisfactory replies to these issues. Nonetheless, the major drawback of this paper is its novelty as the ideas on the paper have been explored extensively in the GP literature. Although the authors do make a case for scalability when using inference networks, there are other previous works that perhaps the authors are unaware of, for example, https://arxiv.org/abs/1905.10969 and even more sophisticated inference algorithms than can serve as truly state-of-the-art competing approaches (for example based on stochastic gradient Hamiltonian Monte Carlo, https://arxiv.org/abs/1806.05490). ",ICLR2021, +#NAME?,1576800000000.0,1576800000000.0,1,rJl5rRVFvH,rJl5rRVFvH,Paper Decision,Reject,"This paper offers a possibly novel approach to regularizing policy learning to make it suitable for large-scale divergence in the underlying domain. Unfortunately all the reviewers are unanimous that the paper is not acceptable in present form. Insufficient clarity regarding the contribution relative to several references, some of which were missing from the submitted version, is perhaps the most significant issue in the view of the AC.",ICLR2020, +POn535UERJ,1576800000000.0,1576800000000.0,1,rkecJ6VFvr,rkecJ6VFvr,Paper Decision,Accept (Poster),"This paper extends the Transformer, implementing higher-dimensional attention generalizing the dot-product attention. The AC agrees that Reviewer3's comment that generalizing attention from 2nd- to 3rd-order relations is an important upgrade, that the mathematical context is insightful, and that this could lead to the further potential development. The readability of the paper still remains as an issue, and it needs to be address in the final version of the paper.",ICLR2020, +iVNq9sylqt,1576800000000.0,1576800000000.0,1,S1gR2ANFvB,S1gR2ANFvB,Paper Decision,Reject,"The paper has received all negative scores. Furthermore, one of the reviewers identified an anonymity violation. This is a reject.",ICLR2020, +Q4gKjqglLn,1610040000000.0,1610470000000.0,1,pQ-AoEbNYQK,pQ-AoEbNYQK,Final Decision,Reject,"The paper was discussed by the reviewers that acknowledged the rebuttal and the authors’ responses. In particular, they appreciated the fact that some of their concerns were alleviated (e.g., going beyond the single ImageNet evaluation). + +More generally, while all the reviewers thought that the problem tackled by the paper was of clear interest (i.e., full end-to-end auto-ML encompassing DA, NAS and HPO), they still expressed concerns (even after the rebuttal), in particular: + +* _Clarity of the methodology_: None of the reviewers could clearly and fully understand the mathematical formulation of the joint optimization, leading to a series of questions regarding the confusing usage of the training/validation set in the experimental setup. This unfortunately made the assessment of (some aspects of) the paper speculative for the reviewers. +* _Comparison with AutoHAS_: AutoHAS and DiffAutoML are obviously related methods. Even if AutoHAS has weaknesses compared to the proposed approach DiffAutoML, e.g., discretization of the continuous hyperparameters and no tuning of DA, it is still meaningful to carry out an actual comparison (possibly normalized by the different costs at play since the authors have highlighted the different memory overheads). Though the listed weaknesses of AutoHAS _should_ play in favor of DiffAutoML, a proper experimental comparison would better support that claim. + +Given those remaining concerns and the overall mixed scores, the paper is recommended for rejection. The detailed comments of the reviewers provide an actionable list of items to improve the paper for a future resubmission. +",ICLR2021, +soXLQ8OSstX,1610040000000.0,1610470000000.0,1,GHCu1utcBvX,GHCu1utcBvX,Final Decision,Reject,"This work considers an apparent problem with current approaches to compositional generalisation (CG) in neural networks. The problem seems to be roughly: +1. prior work in CG aims to extract 'compositional representations' from the training distribution +2. work on CG, the training set and the test set are drawn from different distributions +therefore +3. we don't know whether these models can also extract compositional representations from the test distribution + +All four expert reviewers were, to differing degrees, confused by this problem framing, largely because they consider the premise (1) to be false. + +I am also aware of a large body of recent work on CG in neural networks (see those papers listed by R2) and, as far as i know, none of it involves extracting 'compositional representations' from the training set. Rather, it involves learning something (from the training set) that enables strong performance on a test set that differs from the training set in a way that is informed by ideas of compositionaity. + +As far as I know, there are very few studies that try to identify compositionality by considering the internal representations of neural networks, so it feels incorrect to claim this is standard practice. Any work that goes down this route ought to have a very thorough treatement of the various thorny philosophical and theoretical treatments of compositionality in the literature. As pointed out by R4, the work in its current form does not do this. + +In summary, this work attempts to solve a problem that none of the four expert reviewers consider to be in need of a solution. ",ICLR2021, +A9onvXV0IFX,1642700000000.0,1642700000000.0,1,mF5tmqUfdsw,mF5tmqUfdsw,Paper Decision,Reject,"The paper proposes a new reinforcement learning actor-critic type algorithm for parameterized policy spaces. The actor builds gradient estimates derived from perturbations of the policy (in the spirit of simultaneous perturbation stochastic approximation (SPSA) or Flaxman-Kalai-McMahan's ""Gradient Descent without a Gradient"" idea), while the critic is based on standard temporal difference (TD) learning. The algorithm is benchmarked, along with other well-known techniques, on Mujuco-based environments where it is seen to often perform well. + +There were several concerns raised by the reviews initially, including the validity of the value function obtained by the rather non-standard perturbation of the behavior policy suggested in the paper, the necessity of the zeroth order scheme, the impact of the hyperparameter N, the lack of clarity about the overall algorithmic flow, and the lack of more contemporary baselines such as SAC, A3C and TD3. + +Most concerns appear to have been addressed by the author(s) in their detailed responses, and new explanations have been added with significant effort, to the credit of the author(s). While the paper breaks new ground in the conceptual sense, and the reviewers are borderline positive about the paper, I am afraid that parts of the paper, especially relating the the soundness of the algorithm, are still unclear and not concretely motivated. This, coupled with the low confidence levels expressed in the reviewers' evaluations, renders the paper's form too preliminary at this stage to merit acceptance. + +For instance, I notice upon a careful reading of the paper the following issues: + +(a) Equation (7) is derived by claiming that $V^\beta(s_t)$ is uncorrelated with the Gaussian noise $\epsilon$. However, I fail to see why this should hold, since the paper mentions, in the paragraph before equation (6) that $\beta = \pi_{\theta + \sigma \epsilon}$, so $\beta$ ostensibly clearly depends on $\epsilon$. + +(b) The motivation behind the objective $J_{ZOAC}$ in (6), and the quantities involved in its definition, is rather opaque. For instance, the right side of (6) suggests an infinite horizon discounted reward criterion, whereas the expectation is taken with respect to $d^\beta$, the ""stationary distribution"" of the policy $\beta$. How/why is this justified? I would expect the use of the discounted occupancy measure here, instead of the (long term) stationary measure which washes out any near-term trajectory effects. + +(c) The paper mentions that $\epsilon$ is a sequence of random perturbations *per time step* in (6) as opposed to the usual ES perturbation of a one-time perturbation. However, the size of the covariance matrix $I$ in (6) and (3) are not explicitly distinguished, leading to much confusion in the mind of the keen reader. + +I hope that the author(s) can utilize the feedback from the reviews in order to put up a significantly clearer and solidly motivated paper in the next round, so that its conceptual merits can be proven without doubt. Thanks and best wishes.",ICLR2022, +SJ4F4yaBz,1517250000000.0,1517260000000.0,396,rJk51gJRb,rJk51gJRb,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"The reviewers agree that the paper is below threshold for acceptance in the main track (one with very low confidence), but they favor submitting the paper to the workshop track. + +The paper considers policy gradient methods for two-player zero-sum Alternating Markov games. They propose adversarial policy gradient (fairly obviously), wherein the critic estimates min rather than mean reward. They also report promising empirical results in the game of Hex, with varying board sizes. I found the paper to be well-written and easy to read, possibly due to revisions in the rebuttal discussions. + +The reviewers consider the contribution to be small, mainly due to the fact that the key algorithmic insights were already published decades ago. Reintroducing them is a service to the community, but its novelty is limited. Other critiques mentioned that results in Hex only provide limited understanding of the algorithm's behavior in general Alternating Markov games. The lack of comparison with modern methods like AlphaGo Zero was also mentioned as a limitation. + +Bottom line: The paper provides a small but useful contribution to the community, as described above, and the committee recommends it for workshop. +",ICLR2018, +tr80T0nmxx4,1610040000000.0,1610470000000.0,1,N07ebsD-lHp,N07ebsD-lHp,Final Decision,Reject,"The paper proposes an algorithm to defend against black-box attacks. All the reviewers think the current experiments are not convincing enough, and the method seems to have some issues (e.g., not scalable). ",ICLR2021, +BJex8Z_JgN,1544680000000.0,1545350000000.0,1,H1eadi0cFQ,H1eadi0cFQ,Meta-review,Reject,"The paper proposes a method to escape saddle points by adding and removing units during training. The method does so by preserving the function when the unit is added while increasing the gradient norm to move away from the critical point. The experimental evaluation shows that the proposed method does escape when positioned at a saddle point - as found by the Newton method. The reviewers find the theoretical ideas interesting and novel, but they raised concerns about the method's applicability for typical initializations, the experimental setup, as well as the terminology used in the paper. The title and terminology were improved with the revision, but the other issues were not sufficiently addressed.",ICLR2019,3: The area chair is somewhat confident +S1gm71PWg4,1544810000000.0,1545350000000.0,1,SJgw_sRqFQ,SJgw_sRqFQ,Intersting empirical study!,Accept (Poster),"This work analyses the use of parameter averaging in GANs. It can mainly be seen as an empirical study (while also a convergence analysis of EMA for a concrete example provides some minor theoretical result) but experimental results are very convincing and could promote using parameter averaging in the GAN community. Therefore, even if the technical novelty is limited, the insights brought by the paper are intesting. ",ICLR2019,3: The area chair is somewhat confident +cCGf-N5zMsO,1610040000000.0,1610470000000.0,1,Ua5yGJhfgAg,Ua5yGJhfgAg,Final Decision,Reject,"The goal of the paper is to learn policies that can solve a given task while adhering to certain constraints specified via natural language. The paper closely builds upon prior work on constrained RL and passes the representation of natural language constraints by pre-training an interpreter. Experiments are done in a new proposed 2D grid-world benchmark. Although reviewers liked the premise, the main issue raised is that the way natural language constraints are handled is no different from the way it is done in prior work on constrained RL. The authors provided the rebuttal and addressed some of the concerns regarding paper details. However, upon discussion post rebuttal, the reviewers and AC feel that the paper does not provide clear scientific insight because the natural language part is processed separately from the policy learning part. We also believe that the paper will immensely benefit with results in more complex environments beyond the 2D grid-world. Please refer to the reviews for final feedback and suggestions to strengthen the submission.",ICLR2021, +Gh0gyQ2ZxaO,1642700000000.0,1642700000000.0,1,Zf4ZdI4OQPV,Zf4ZdI4OQPV,Paper Decision,Accept (Poster),"The paper shows that the transfer attack is query efficient and the success rate can be kept high with the zeroth-order score-based attack as a backup. Experiments show state-of-the-art results. + +Pros: +- Simple method based on a simple idea. +- State of the art performance. + +Cons: +- Proposal is a straightforward combination of two methods, and therefore technical contribution is marginal. +- The threat model is easy (surrogate can be trained on the same datasets and use the same loss function) and questionable. Most of the experimental evidence shows that the research for this threat model is almost saturated (and the problem seems almost solved). + +This paper got a borderline score with reviewer's concerns above. I agree with the authors that the simplest method is best among those performing similarly, but the threat setting considered might be not very realistic as the authors admitted. I see the proposed method a kind of egg of Columbus in a negative sense. Namely, the authors found a shortcut to win a game that was created and adopted by the community. Perhaps this paper would give an impact on the small community and would make the community change the game. But to give an impact to a general audience, the authors should convince that there are some situations where the analyzed thread model is realistic and therefore the proposed method is really useful. Or, the authors could adjust the thread model to be more realistic. Serious discussion on the thread model would be a big plus to the marginal technical contributions. + +After discussion with SAC, and PC, our conclusion is that this paper effectively tells the community that the benchmark they are using is too simple, which alone is worthwhile publishing because this may move the community forward (even if the community is small).",ICLR2022, +9pJqh4a5oo,1576800000000.0,1576800000000.0,1,rygG4AVFvH,rygG4AVFvH,Paper Decision,Accept (Poster),"This paper proposes to optimize the code optimal code in DNN compilers using adaptive sampling and reinforcement learning. This method achieves significant speedup in compilation time and execution time. The authors made strong efforts in addressing the problems raised by the reviewers, and promised to make the code publicly available, which is of particular importance for works of this nature. +",ICLR2020, +G2gz8hxPGu_,1642700000000.0,1642700000000.0,1,CpTuR2ECuW,CpTuR2ECuW,Paper Decision,Accept (Poster),"The paper addresses coordination improvement in the MARL setting by learning intristic rewards that motivate the exploration and coordination. The paper is theoretically founded and the empirical evaluations back up the claims. + +During the rebuttal the carried out an impressive amount of work. They provided several additional studies and substantially improved the presentation, addressing all of the reviewers' requests. Although not all the reviewers responded to the authors, the authors' response was taken into the account when recommending the decision. + +Minor: +- The authors should comment on the learning intristic rewards with evolution (Faust et al, 2019): https://arxiv.org/abs/1905.07628",ICLR2022, +9r8If1oa_V,1576800000000.0,1576800000000.0,1,Bkgq9ANKvB,Bkgq9ANKvB,Paper Decision,Reject,"Thank you very much for the detailed feedback to the reviewers, which helped us better understand your paper. +Thanks also for revising the manuscript significantly; many parts were indeed revised. +However, due to the major revision, we find more points to be further discussed, which requires another round of reviews/rebuttals. +For this reason, we decided not to accept this paper. +We hope that the reviewers' comments are useful for improving the paper for potential future publication. +",ICLR2020, +15XxZlBJ_,1576800000000.0,1576800000000.0,1,BJe4oxHYPB,BJe4oxHYPB,Paper Decision,Reject,"This paper proposes a new algorithm called Continuous Sparsification (CS) to search for winning tickets (in the context of the Lottery Ticket Hypothesis from Frankle & Carbin (2019)), as an alternative to the Iterative Magnitude Pruning (IMP) algorithm proposed therein. CS continuously removes parameters from a network during training, and learns the sub-network's structure with gradient-based methods instead of relying on pruning strategies. The papers shows empirically that CS finds lottery tickets that outperforms the ones learned by ITS with up to 5 times faster search, when measured in number of training epochs. + +While this paper presents a novel contribution of pruning and of finding winning lottery tickets and is very well written, there are some concerns raised by the reviewers regarding the current evaluation. The paper presents no concrete data on the comparative costs of performing CS and IMP even though the core claim is that CS is more efficient. The paper does not disclose enough detail to compute these costs, and it seems like CS is more expensive than IMP for standard workflows. Moreover, the current presentation of the data through ""pareto curves"" is misleadingly favorable to CS. The reviewers suggest including more experiments on ImageNet and a more thorough evaluation as a pruning technique beyond the lottery ticket hypothesis. We recommend the authors to address the detailed reviewers' comments in an eventual ressubmission. +",ICLR2020, +srXjx1YFsfS,1642700000000.0,1642700000000.0,1,dtt435G80Ng,dtt435G80Ng,Paper Decision,Reject,"### Description +The paper investigates the choice of a fixed quantization grid for weights. Namly, the paper observes that symmetric uniform quantization levels such as {-1.5,-0.5,0.5,1.5} lead to better results than non-symmetric ones, e.g. {-2,-1,0,1}. While it is a small thing, it can be appreciated that it is investigated systematically and pedantically, proposing an explanation and showing experimentally that the effect is constantly present in favour of symmetric quantization. While the improvement is small, it comes almost at no cost. A part of the contribution proposes an efficient implementation. + +### Decision +Reviewers and AC came to a consensus that the contribution of the paper is marginal. Symmetric quantization schemes themseleves were already employed by many models, albeit without analysis or even a discussion of such choice. The analysis presented in the paper was found unconvincing by the reviewers (see below). The efficient implementation follows from basic linear algebra (see below). The potential impact of the work was considered as limited due to a rather marginal observed improvement. The average rating of the paper was 4.5. Therefore must reject. + +### Details +Regarding the proposed analysis of CSQ, it is not clear, why the number of quantization levels of an elementary product matters, given that these numbers are then summed over all corresponding input channels and spatial dimensions of a convolution kernel applied at a single location. It is questionable whether the number of these quantization levels indeed corresponds to the representation capacity. Finally, the paper misses to demonstrate the effect on binary (1 bit) networks. In this case the standard approach is to use {-1,1} weights and {-1,1} activatinos. The paper could investigate the case of {0,1} activations, where there would be 50% more unique possible outputs from the product, namely {-1,0,1} to validate their hypothesis. If the hypothesis holds, an improvement in the binary case would be observed. This is important since the binary case is know to be the hardest and since the respective recommendation of representations would be non-standard. It could be further questioned why the distribution of real-valued weights has any relevance (such as in the arguments in appendix E) if the model is trained from scratch? A training method need not keep any real-valued latent weights in the first place. + +The technical part in section 5 ""efficient realization"" adds very little, if anything, to the paper's contribution. A simple linear algebra suffices to see that + +$(W-0.5) \ast x = W* x - 0.5 I \ast x,$ + +where $I$ is the kernel of ones of the same shape as $W$. It is clear that the convolution $I \ast x$ can be implemented efficiently (e.g. it is just a sum over channels followed by a separable spatial only convolution) and is not a bottleneck and. The final detail such as whether to slice by bits and use popcount for it or to use 8-bit addition, depend very much on the choice of the bit-packed representation and the hardware available. It would be known to engineers in the field how to implement it efficiently.",ICLR2022, +Nohh-gdvmH4,1642700000000.0,1642700000000.0,1,Y3cm4HJ3Ncs,Y3cm4HJ3Ncs,Paper Decision,Reject,"Even though reviewers found some responses by the authors satisfactory, several concerns regarding the paper still remain. The authors are strongly encouraged to: + +1) Explore how dataset size impacts accuracy. +2) Reason about annotation costs via empirical experiments. +3) Including benchmark datasets in experimental evaluations.",ICLR2022, +5I3IDOFP6,1576800000000.0,1576800000000.0,1,rkgIW1HKPB,rkgIW1HKPB,Paper Decision,Reject,"The reviewers agree that this is an interesting paper but it required major modifications. After rebuttal, thee paper is much improved but unfortunately not above the bar yet. We encourage the authors to iterate on this work again.",ICLR2020, +tLwv-k5cR14,1642700000000.0,1642700000000.0,1,OXRZeMmOI7a,OXRZeMmOI7a,Paper Decision,Accept (Poster),"A new sampling strategy for experience is proposed and compared with alternative sampling strategies. The main weakness of the paper is the limited applicability of the strategy as it only works well goal-oriented tasks, and stochasticity reduces the effectiveness. And within this setting, only good performance is shown on two gridworld-like: MiniGrid and Sokoban. In the rebuttal phase, the authors have added additional experiments that suggest applicability of the approach beyond just goal-oriented tasks, which have let several reviewers to raise their score. While general applicability of the approach is still somewhat of a concern, the authors have done enough to show the potential of their approach. Hence, I recommend acceptance.",ICLR2022, +Nbyva_njCWq,1642700000000.0,1642700000000.0,1,2-mkiUs9Jx7,2-mkiUs9Jx7,Paper Decision,Accept (Poster),"This paper proposes an unsupervised learning method for GANs, called SLOGAN, which allows conditional generation of samples, by utilizing clustering structures of training data in a latent space. The main significance of the proposal over existing unconditional conditional GANs is that it is capable of dealing with training data with imbalance in the latent space. The proposal consists of the use of implicit reparameterization based on the generalized Stein lemma, which makes learning of the mixing coefficient parameters possible, as well as introduction of the U2C loss. + +The initial review score distribution is such that two of them are just above the acceptance threshold, and two others are just below it. Upon reading the review comments and the author responses, as well as the paper itself, I think that the evaluations of the reviewers are more or less coherent with each other: + +1. The proposed method is moderately, if not significantly, novel: The differences from DeLiGAN are the use of implicit reparameterization based on the generalized Stein lemma, learning of the mixing coefficient parameters, and introduction of the U2C loss. +2. The experimental results, while demonstrating effectiveness of the proposed method to some extent, were not convincing enough. + +As for the item 2, the authors have provided results of additional experiments in their responses, as suggested by the reviewers, and two reviewers have revised their scores upward accordingly. + +Yet another point I would like to mention is that in some numerical results summarized in Tables 1 and 2, as well as in several other places, one can notice somewhat large errors, so that one might be able to question the statistical significance of the claimed best-performing methods, shown in bold. (If my guess would be correct, the authors regarded the *best in the mean* as the best, ignoring the standard error, and did not perform any statistical testing to confirm the significance.) I would therefore appreciate additional assessment of significance of the numerical results via proper statistical testing. + +Because of the above, I would like to recommend acceptance of this paper.",ICLR2022, +rYticQZC3fc,1610040000000.0,1610470000000.0,1,vhKe9UFbrJo,vhKe9UFbrJo,Final Decision,Accept (Poster),"The paper proposes an efficient method to train generative models on multimodal data using a contrastive approach. Usually training such models requires significant training data to be able to learn patterns. The authors propose a variational autoencoder approach that enables multimodal learning of models using a data-efficient approach, and shows the effectiveness of the approach on challenging datasets. + +The authors have mostly addressed the feedback of the reviewers and done some of the necessary changes to the paper (e.g., adding more results and missing related work). They should make sure to address any lingering concerns about the paper, mentioned by the reviewers in their post-rebuttal feedback.",ICLR2021, +3lfsznJb8_x,1642700000000.0,1642700000000.0,1,bVuP3ltATMz,bVuP3ltATMz,Paper Decision,Accept (Oral),"This work adapts the widely used DP learning algorithm to language models. Reviewers all agreed that this work tackles an important problem with clear motivation and thorough experiments, and achieved strong performance (memory reduction and effectiveness) on NLP tasks. Thus, we recommend an acceptance.",ICLR2022, +Bygv35mggN,1544730000000.0,1545350000000.0,1,H1xSNiRcF7,H1xSNiRcF7,Metareview,Accept (Oral),"The manuscript presents a promising new algorithm for learning geometrically-inspired embeddings for learning hierarchies, partial orders, and lattice structures. The manuscript builds on the build on the box lattice model, extending prior work by relaxing the box embeddings via Gaussian convolutions. This is shown to be particularly effective for non-overlapping boxes, where the previous method fail. + +The primary weakness identified by reviewers was the writing, which was thought to be lacking some context, and may be difficult to approach for the non-domain expert. This can be improved by including an additional general introduction. Otherwise, the manuscript was well written. + +Overall, reviewers and AC agree that the general problem statement is timely and interesting, and well executed. In our opinion, this paper is a clear accept.",ICLR2019,4: The area chair is confident but not absolutely certain +kAYfyKXLK4,1610040000000.0,1610470000000.0,1,PkqwRo2wjuW,PkqwRo2wjuW,Final Decision,Reject,"I think this is a very promising paper, but the work is not ready for publication. + +The most significant concern shared by several reviewers is the insufficient evaluation. For example, the work is not compared with more traditional approaches to equivalence checking or any other baselines beyond ablations of the proposed method. Given that this is not the first paper to propose the use of deep learning to search for proofs, it seems important to compare to alternative methods. There is also a misalignment between the claims of novelty and the evaluation. For example, section 4 cites the novel approach to generating data as key to this approach, but the evaluation does not really address this claim. On the positive side, I was impressed with the ability to search for proofs of length 10 given the large branching factor, and I thought the results were promising. + +The authors should also consider some of the concerns with presentation raised by the reviewers. +",ICLR2021, +QhEW2Hl5itD,1642700000000.0,1642700000000.0,1,YtdASzotUEW,YtdASzotUEW,Paper Decision,Reject,"The problem considered in this paper is of general interest to all reviewers. However, while the reviewers in general appreciate the authors’ effort in providing theoretical analysis for a seemingly effective algorithm, they are unconvinced that the key technical claims are well justified (i.e. separation between theoretical analysis and the algorithm, which ultimately relies on the OOD score), the propositions are clear (e.g., key claims in the quality of kNN density estimator as an OOD detector not well supported by analysis/experiments), or that the experimental results are sufficiently compelling (e.g., lack of controlled experiments/ ablation study) to merit acceptance for the proposed solution.",ICLR2022, +HklJDWR-g4,1544840000000.0,1545350000000.0,1,SyeBqsRctm,SyeBqsRctm,metareview,Reject,This work proposes a modification of gradient based saliency map methods that measure the importance of all nodes at each layer. The reviewers found the novelty is rather marginal and that the evaluation is not up to par (since it's mostly qualitative). The reviewers are in strong agreement that this work does not pass the bar for acceptance.,ICLR2019,5: The area chair is absolutely certain +UiUyNvGyv7,1576800000000.0,1576800000000.0,1,r1xI-gHFDH,r1xI-gHFDH,Paper Decision,Reject,"The paper proposed a general framework to construct unsupervised models for representation learning of discrete structures. The reviewers feel that the approach is taken directly from graph kernels, and the novelty is not high enough. ",ICLR2020, +FrWfjn623x,1610040000000.0,1610470000000.0,1,iqmOTi9J7E8,iqmOTi9J7E8,Final Decision,Reject,"While reviewers believe that the motivation of the paper is strong and the idea is interesting the ultimate execution of the paper is not up to the standards of ICLR. I believe the biggest concern is the precise privacy guarantee of the method. As pointed out, it is an extremely strong assumption that the model structure of the adversary is known (or even approximately known). Standard privacy guarantees are either information theoretic or based in computational hardness. This work does not provide such guarantees. While there has been recent work on using adversarial learning to learn models that are robust to such adversaries, they have been heavily criticized within privacy and security communities due to the lack of such guarantees. I was not convinced by the authors response to such questions: there are plenty of cryptographic/privacy-preserving schemes that work in the honest-but-curious setting, and techniques that use the standard guarantee of differential privacy do not suffer from large slow downs. + +Thus, I would urge the authors to modify this work so that it can leverage the guarantees of well-known cryptographic/privacy-preserving schemes. If done so, these arguments about privacy will go away and the paper will have a much better shot at acceptance.",ICLR2021, +Skn7716HM,1517250000000.0,1517260000000.0,109,B14TlG-RW,B14TlG-RW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This work replaces the RNN layer of square with a self-attention and convolution, achieving a big speed up and performance gains, particularly with data augmentation. The work is mostly clear presented, one reviewer found it ""well-written"" although there was a complaint the work did not clear separate out the novel aspects. In terms of results the work is clearly of high quality, producing top numbers on the shared task. There were some initial complaints of only using the SQuAD dataset, but the authors have now included additional results that diversify the experiments. Perhaps the largest concern is novelty. The idea of non-RNN self-attention is now widely known, and there are several systems that are applying it. Reviewers felt that while this system does it well, it is maybe less novel or significant than other possible work. ",ICLR2018, +fJjrUOyUG,1576800000000.0,1576800000000.0,1,ryebG04YvB,ryebG04YvB,Paper Decision,Accept (Poster),"This paper presents an empirical study towards understanding the transferability of robustness (of a deep model against adversarial examples) in the process of transfer learning across different tasks. + +The paper received divergent reviews, and an in-depth discussion was raised among the reviewers. + ++ Reviewers generally agree that the paper makes an interesting study to the robust ML community. The paper provides a nice exploration of the hypothesis that robust models learn robust intermediate representations, and leverages this insight to help in transferring robustness without adversarial training on every new target domain. + +- Reviewers also have concerns that, as an experimental paper, it should perform a larger study on different datasets and transfer problems to eliminate the bias to specific tasks, and explore the behavior when the task relatedness increases or decreases. + +AC agrees with the reviewers and encourages the authors to incorporate these constructive suggestions in the revision, in particular, explore more tasks with different task relatedness. + +I recommend acceptance, assuming the comments will be fully addressed.",ICLR2020, +jyK6mllc0T0T,1642700000000.0,1642700000000.0,1,wqD6TfbYkrn,wqD6TfbYkrn,Paper Decision,Accept (Poster),"This paper presents an approach based on conditional denoising diffusion models for point cloud completion. The reviewers have recognized the significance of contributions, the clarity of presentation, and the comprehensivity of experiments. I am happy to recommend this paper for presentation at ICLR.",ICLR2022, +BJxBDIyblE,1544780000000.0,1545350000000.0,1,H1x3SnAcYQ,H1x3SnAcYQ,Good work but some critical issues need to be addressed ,Reject,"This paper extends the DiCE estimator with a better control variate baseline for variance reduction. +The reviewers all think the paper is fairly clear and well written. However, as the reviews and discussion indicates, there are several critical issues, including lack of explanation of the choice of baseline, the lack more realistic experiments and a few misleading assertions. We encourage the authors to rewrite the paper to address these criticism. We believe this work will make a successful submission with proper modification in the future. +",ICLR2019,4: The area chair is confident but not absolutely certain +egPak1cZj2g,1610040000000.0,1610470000000.0,1,defQ1AG6IWn,defQ1AG6IWn,Final Decision,Reject,"This paper proposes a method (via a novel objective) for feature alignment between source and target tasks in an unsupervised domain adaptation scenario. + +Pro +- the proposed approach is sensible in many realistic scenarios of distribution shift +- the submission provides an extensive empirical evaluation establishing state of the art results on several benchmark tasks + +Con +- there is no thorough discussion of the the underlying assumptions (when should we expect them to hold? for shich types of tasks? which types of shifts? can they generally be reliably tested? from which type of data? unlabeled?) +- one reviewer raised concerns over novelty, which should be more clearly addressed before publication +- two reviewers raised concerns over use of target data for hyper-parameter selection, which seem valid; these should be fixed or clearly explained (and implications of this be discussed) in subsequent versions of this work + +I agree with concerns of the reviewers (the last two points), and would therefore not recommend this work for publication in the current state.",ICLR2021, +HhILx7njMf8,1642700000000.0,1642700000000.0,1,U4uFaLyg7PV,U4uFaLyg7PV,Paper Decision,Accept (Poster),"This paper introduces a tree-structured wavelet deep neural network to effectively extract more discriminative and expressive feature representations in time series signals. Based on a frequency spectrum energy analysis, the approach decomposes input signals into multiple subbands and builds a tree structure with data-driven wavelet transforms the bases of which are learned using invertible neural networks. In the end, the scattering subband features are fused using a self-attention-like mechanism. The effectiveness of the proposed approach is verified extensively on a variety of datasets from different domains including follow-up experiments in the rebuttal. Overall, the work is technically novel and provides an interesting way of extracting adaptive finer-grained features to deal with time series signals. The authors' rebuttal is solid which has cleared most of the concerns raised by the reviewers with additional supportive experimental evidence. + +I would recommend accept.",ICLR2022, +BJxbHGYxeE,1544750000000.0,1545350000000.0,1,r1gl7hC5Km,r1gl7hC5Km,Further validation of algorithm needed,Reject,"This paper tackles the problem of using auxiliary losses to help regularize and aid the learning of a ""goal"" task. The approach proposes avoiding the learning of irrelevant or contradictory details from the auxiliary task at the expense of the ""goal"" tasks by observing cosine similarity between the auxiliary and main tasks and ignore those gradients which are too dissimilar. + +To justify such a setup one must first show that such negative interference occurs in practice, warranting explicit attention. Then one must show that their algorithm effectively mitigates this interference and at the same time provides some useful signal in combination with the main learning objective. + +During the review process there was a significant discussion as to whether the proposed approach sufficiently justified its need and usefulness as defined above. One major point of contention is whether to compare against the multi-task literature. The authors claim that prior multi-task learning literature is out of scope of this work since their goal is not to measure performance on all tasks used during learning. However, this claim does not invalidate the reviewer's request for comparison against multi-task learning work. In fact, the authors *should* verify that their method outperforms state-of-the-art multi-task learning methods. Not because they too are studying performance across all tasks, but because their method which knows to prioritize one task during training should certainly outperform the learning paradigms which have no special preference to one of the tasks. + +A main issue with the current draft centers around the usefulness of the proposed algorithm. First, whether the gradient co-sine similarity is a necessary condition to avoid negative interference and 2) to show at least empirically that auxiliary losses do offer improved performance over optimizing the goal task alone. Based on the experiments now available the answers to these questions remains unclear and thus the paper is not yet recommended for publication.",ICLR2019,5: The area chair is absolutely certain +ryhMaML_x,1486400000000.0,1486400000000.0,1,SyOvg6jxx,SyOvg6jxx,ICLR committee final decision,Reject,"The paper proposes a simple approach to exploration that uses a hash of the current state within a exploration bonus approach (there are some modifications to learned hash codes, but this is the basic approach). The method achieves reasonable performance on Atari game tasks (sometimes outperformed by other approaches, but overall performing well), and it's simplicity is its main appeal (although the autoencoder-based learned hash seems substantially less simple, so actually loses some advantage there). + + The paper is likely borderline, as the results are not fantastic: the approach is typically outperformed or similarly-performed by one of the comparison approaches (though it should be noted that no comparison approach performs well over all tasks, so this is not necessarily that bad). But overall, especially because so many of these methods have tunable hyperparameters it was difficult to get a clear understanding of just how these experimental results fit. + + Pros: + + Simple method for exploration that seems to work reasonably well in practice + + Would have the potential to be widely used because of its simplicity + + Cons: + - Improvements over previous approaches are not always there, and it's not clear whether the algorithm has any ""killer app"" domain where it is just clearly the best approach + + Overall, this work in its current form is too borderline. The PCs encourage the authors to strengthen the empirical validation and resubmit.",ICLR2017, +rJg4OdAllN,1544770000000.0,1545350000000.0,1,BJl6AjC5F7,BJl6AjC5F7,"rebuttal improved the review scores, no serious issues other than relatively weak novelty .",Accept (Poster),"This paper investigates learning to represent edit operations for two domains: text and source code. The primary contributions of the paper are in the specific task formulation and the new dataset (for source code edits). The technical novelty is relatively weak. + +Pros: +The paper introduces a new dataset for source code edits. + +Cons: +Reviewers raised various concerns about human evaluation and many other experimental details, most of which the rebuttal have successfully addressed. As a result, R3 updated their score from 4 to 6. + +Verdict: +Possible weak accept. None of the remaining issues after the rebuttal is a serious deal breaker (e.g., task simplification by assuming the knowledge of when and where the edit must be applied, simplifying the real-world application of the automatic edits). However, the overall impact and novelty of the paper is relatively weak.",ICLR2019,2: The area chair is not sure +PE7Qe9I4yzO,1610040000000.0,1610470000000.0,1,ryUprTOv7q0,ryUprTOv7q0,Final Decision,Reject," A ""quantum deformed"" generalization of a probabilistic binary neural network is introduced, which can be either run on a quantum computer or simulated with a classical computer. Reviewers agreed that the paper is well written, introduces some new ideas merging quantum computing with a variational Bayesian framework, and the reported numbers on MNIST and Fashion MNIST outperform prior QNN approachers. However, reviewers questioned how useful the proposed ideas are, noting that: the reported gains could be attributed to increased parameterization (this was not carefully ablated with baseline approaches). Additionally, while the quantum supremacy experiments seem technically correct, there was no clear motivation for empirically demonstrating quantum supremacy when no theoretical guarantees are provided. Taken together, there was no clear path to practical improvements of real systems from the proposed ideas.",ICLR2021, +PRtwbLuv5kt,1610040000000.0,1610470000000.0,1,NqPW1ZJjXDJ,NqPW1ZJjXDJ,Final Decision,Reject,"In this paper, a network architecture search (NAS) problem in a changing environment is studied and an online adaptation (OA) algorithm for the problem is proposed. Many reviewers found that the OA-NAS problem discussed in this paper is interesting and practically important. However, many reviewers (including those with high review scores) recognize that the weakness of this paper is the lack of sufficient theoretical verification. Furthermore, although extensive experiments are conducted, it is still not clear whether the experimental setups discussed in the paper are generally applicable to other practical problems. Overall, although this is a nice work in that a new practical problem is considered and a workable algorithm for the problem is demonstrated in an extensive simulation study, I could not recommend the acceptance in its current form because of the lack of theoretical validity and evidence of general applicability.",ICLR2021, +mTmENsmTwc,1642700000000.0,1642700000000.0,1,3YqeuCVwy1d,3YqeuCVwy1d,Paper Decision,Accept (Poster),"This paper proposes to use Anderson Acceleration on min-max problems, provides some theoretical convergence rates and presents numerical results on toy bilinear problems and GANs. + +After the discussion, the reviewers agreed that this paper makes a nice contribution to ICLR. Some concerns were originally expressed in terms of incrementality of the theoretical results with respect to previous work (KCYs, gBHU), but the authors have well clarified their contributions in the discussion, and have updated their manuscript accordingly. There were also initial concerns about the related work coverage, but this was also properly addressed in the rebuttal, with additional experimental comparisons as well as extended related work section, as well as an additional convergence result for convex-nonconcave problems.",ICLR2022, +q3tWSVEthJD,1610040000000.0,1610470000000.0,1,8FRw857AYba,8FRw857AYba,Final Decision,Reject,"Although originally all reviewers were leaning towards rejection, the authors have done a very good job at addressing their concerns, significantly strenghtening the paper. There is now a consensus towards weak acceptance, with the exception of R3. However, I have decided to ignore R3's review for the following reasons: +- The original review was way too short and uninformative +- R3 did not reply to the authors' request for more constructive feedback +- R3 did not reply to my own request (private email) + +That being said, even if other reviewers decided to increase their score after the rebuttal and discussion period, none of them was particularly enthusiastic about it: this remains a borderline paper combining ideas that, although promising, are not particularly original. At this time it falls slightly short off meeting the bar for an ICLR publication. I do believe that combining ideas from the RL and evolutionary research communities is a promising research direction, and I encourage the authors to take into account the reviewers' remaining comments to polish their paper (in particular, adding even stronger empirical results, and ensuring the key take-aways are clearly communicated).",ICLR2021, +SpYko3ApQck,1610040000000.0,1610470000000.0,1,VMAesov3dfU,VMAesov3dfU,Final Decision,Reject,"Dear Authors, + +Thank you very much for submitting this very interesting paper. + +This work analyzes the effect of gradient descent training on the compositionality of the learned model. Their main argument is that GD tries to use the redundant information in the data and, as a result, it doesn't generalize well. The paper then tries to show that theoretically and empirically with some simple experiments. + +There is a general consensus among all the reviewers that this paper is not suitable for publication at ICLR. The authors do not entirely address most of the concerns raised by the reviewers during the rebuttal. + +If the authors improve the clarity of the paper, making some of the propositions and theories more concrete and grounded in experiments as well, I would recommend them to resubmit this paper to a different venue since the premise of the paper is important and interesting. + +Some of the reasons: + +- The paper claims that the gradient descent can not ignore the redundant information without providing sufficient empirical results. Though the part that is not clear to me whether if it is a credit assignment or an optimization problem. I agree with R1 that it is not clear what type of new insights from the proofs. + +- As R1 mentions, this paper's claim seems too strong and not supported by experiments. + +- R2 finds part of the paper unclear and thinks that some of the paper's propositions and theories are either trivial or wrong. The rebuttal doesn't seem to be doing a good job in terms of addressing those concerns. + +- R4 also is confused with the paper thinks that some of the theories are incorrect. + + +",ICLR2021, +OkSo0oNgqB8,1610040000000.0,1610470000000.0,1,1hkYtDXAgOZ,1hkYtDXAgOZ,Final Decision,Reject,"The paper focuses on the task of finding higher fidelity action proposals for temporal action proposal detection. As the reviewers mentioned, this task is a pre-task to temporal activity localization/detection in video, which is the main task to be solved. The paper may be perceived differently if it were presented as a detection method instead. Apart from the scope of the paper, the reviewers also unanimously agree on the limited technical novelty of the proposed methodology in the paper. The proposed method can be seen as an application of self-attention and transformer techniques on the problem of activity detection. The goal of these techniques is feature enrichment that serves to incorporate information across long-term context, a concept that has appeared previously in other work but not necessarily with the same machinery (e.g. G-TAD). + +Despite its shortcomings and since it presents promising experimental results on well-known proposal/detection benchmarks, the authors can benefit from considering the reviewers' comments and suggestions to produce a stronger and more compelling future submission.",ICLR2021, +eOnsyf3S496,1610040000000.0,1610470000000.0,1,e3bhF_p0T7c,e3bhF_p0T7c,Final Decision,Reject,"The reviewer seems to reach a consensus that the paper is not ready for publication at ICLR. One of the major issues seems to be that the paper only analyzes the case of $d=2$. (In the AC's opinion, $d>2$ might be fundamentally more difficult to analyze than $d=2$). + +",ICLR2021, +DbriM9fNH6W,1610040000000.0,1610470000000.0,1,fkhl7lb3aw,fkhl7lb3aw,Final Decision,Reject,"This paper proposes to address the class imbalance problem by defining an over-sampling strategy based on oversampling. It brings potentially interesting ideas. The reviewers agree on the fact that the experiments are limited, some methodological aspects require some clarifications and the writing needs to be improved. +The authors did not provide any rebuttal. +Hence I recommend rejection. +",ICLR2021, +PT5Y5E4AQi,1576800000000.0,1576800000000.0,1,r1eUukrtwH,r1eUukrtwH,Paper Decision,Reject,"This paper describes a new generative model based on the information theoretic principles for better representation learning. The approach is theoretically related to the InfoVAE and beta-VAE work, and is contrasted to vanilla VAEs. The reviewers have expressed strong concerns about the novelty of this work. Some of the very closely related baselines (e.g. Zhao et al., Chen et al., Alemi et a) are not compared against, and the contributions of this work over the baselines are not clearly discussed. Furthermore, the experimental section could be made stronger with more quantitative metrics. For these reasons I recommend rejection.",ICLR2020, +N54NQpyhX3O,1642700000000.0,1642700000000.0,1,9Nk6AJkVYB,9Nk6AJkVYB,Paper Decision,Accept (Poster),"The paper presents a comprehensive analysis of lottery tickets hypothesis (LTH) on automatic speech recognition. The authors verified the existence of highly sparse “winning tickets” in ASR task, and analyzed its robustness to noise, transferable to other datasets, and supported with structured sparsity. + +As agreed with the reviewers, the paper is well-written, the justification is thorough, and the finding that LTH does perform well on ASR is interesting. Though the novelty is marginal as it's a direct application of the LTH, this is the first investigation of LTH and brings new insights to the community. + +The decision is mainly based on the thorough justification of the LTH to ASR and new insights it brings to the community.",ICLR2022, +licXSaUKkC,1576800000000.0,1576800000000.0,1,r1eCy0NtDH,r1eCy0NtDH,Paper Decision,Reject,"This paper focuses on studying the impact of initialization and activation functions on the Neural Tangent Kernel (NTK) type analysis. The authors claim to make a connection between NTK and edge of chaos analysis. The reviewers had some concern about (1) impact of smooth activations ""any NTK-based training method for DNNs should use a Smooth Activation Function from the class S and the network should be initialized on the EOC"" (2) proofs of residual networks (3) and why mixing NTK with EOC is interesting. Some of these concerns were addressed in the response. I do share the reviewer concerns about (2). The authors need to give a clear proof. I think this combination of NTK and EOC could be interesting but needs to be better motivated. As a result I do not recommend publication.",ICLR2020, +BsMUT8wST2,1610040000000.0,1610470000000.0,1,L8BElg6Qldb,L8BElg6Qldb,Final Decision,Reject,The paper extends results from the recent work of Steinke and Zakynthinou (SZ) for the test loss of randomized learning algorithms. They provide bounds in the single draw as well as PAC-Bayes setting. The main result is about fast rates the proof of which follows with minor modifications from the corresponding result in SZ. It is unclear to me the contribution over existing work is sufficient to merit acceptance.,ICLR2021, +oJ4lESJm-a,1576800000000.0,1576800000000.0,1,SyegvgHtwr,SyegvgHtwr,Paper Decision,Reject,"This paper proposes to overcome some fundamental limitations of normalizing flows by introducing auxiliary continuous latent variables. While the problem this paper is trying to address is mathematically legitimate, there is no strong evidence that this is a relevant problem in practice. Moreover, the proposed solution is not entirely novel, converting the flow in a latent-variable model. Overall, I believe this paper will be of minor relevance to the ICLR community.",ICLR2020, +BJggy6LTy4,1544540000000.0,1545350000000.0,1,B1ffQnRcKX,B1ffQnRcKX,Nice framing of the problem; architecturally somewhat incremental over routing nets,Accept (Poster)," +pros: +- the paper is well-written and presents a nice framing of the composition problem +- good comparison to prior work +- very important research direction + +cons: +- from an architectural standpoint the paper is somewhat incremental over Routing Networks [Rosenbaum et al] +- as Reviewers 2 and 3 point out, the experiments are a bit weak, relying on heuristics such as a window over 3 symbols in the multi-lingual arithmetic case, and a pre-determined set of operations (scaling, translation, rotation, identity) in the MNIST case. + +As the authors state, there are three core ideas in this paper (my paraphrase): + +(1) training on a set of compositional problems (with the right architecture/training procedure) can encourage the model to learn modules which can be composed to solve new problems, enabling better generalization. +(2) treating the problem of selecting functions for composition as a sequential decision-making problem in an MDP +(3) jointly learning the parameters of the functions and the (meta-level) composition policy. + +As discussed during the review period, these three ideas are already present in the Routing Networks (RN) architecture of Rosenbaum et al. However CRL offers insights and improvements over RN algorithmically in a several ways: + +(1) CRL uses a curriculum learning strategy. This seems to be key in achieving good results and makes a lot of sense for naturally compositional problems. +(2) The focus in RN was on using the architecture to solve multi-task problems in object recognition. The solutions learned in image domains while ""compositional"" are less clearly interpretable. In this paper (CRL) the focus is more squarely on interpretable compositional tasks like arithmetic and explores extrapolation. +(3) The RN architecture does support recursion (and there are some experiments in this mode) but it was not the main focus. In this paper (CRL) recursion is given a clear, prominent role. + +I appreciate that the authors' engagement in the discussion period. My feeling is that the paper offers nice improvements, a useful framing of the problem, a clear recursive formulation, and a more central focus on naturally compositional problems. I am recommending the paper for acceptance but suggest that the authors remove or revise their contributions (3) and (4) on pg. 2 in light of the discussion on routing nets. + +Routing Networks, Adaptive Selection of Non-Linear Functions for Multi-task Learning, ICLR 2018",ICLR2019,5: The area chair is absolutely certain +Y6JZCZaFlt8,1610040000000.0,1610470000000.0,1,UwOMufsTqCy,UwOMufsTqCy,Final Decision,Reject,"This paper falls in the borderline area and there are still some concerns (for instance by AnonReviewer5 and AnonReviewer2) that deserve further treatment. Given that most ideas can only be validated in experiments (as the results are not theoretical), some points that remain are the comparison with other approaches (there are reasonable comparisons, but there are very famous contenders missing such as xgboost, ok that LightGBM is, but why not the other?), details about the tuning, the significance of results (practical and statistical is not complete/detailed enough), and the reasoning about situations with many rules and interpretability seems to be worth exploring/discussing further.",ICLR2021, +1xTBJK6tl8p,1610040000000.0,1610470000000.0,1,P42rXLGZQ07,P42rXLGZQ07,Final Decision,Reject,"The paper presents an evolutionary optimization framework for training discrete VAEs, which is different to the standard way of training VAEs. One of the main criticism of the paper was the choice of experiments, but the authors addressed this point by adding an inpainting benchmark. + +Unfortunately, the reviewers' scores are borderline, and one of the reviewers pointed out the lack of scalability (more precisely, linear scalability with the number of observations) and cannot recommend acceptance based on the limited application impact. Given the large number of ICLR submissions, this paper unfortunately does not meet the acceptance bar. That said, I encourage the authors to address this point and resubmit the paper to another (or the same) venue.",ICLR2021, +8r0pvO4u3yr,1610040000000.0,1610470000000.0,1,M9hdyCNlWaf,M9hdyCNlWaf,Final Decision,Reject,"The scores for this paper have been borderline, however the decision has been greatly facilitated by the participation of the authors and reviewers to the discussion and, more importantly, by active private discussion among reviewers and AC. Specifically, from the private discussion it seems that the reviewers find interesting ideas in this paper, but are overall are not entirely convinced about its significance, at least in the way the paper is currently positioned and motivated. + +More specifically, the reviewers found the main idea of using inducing weights interesting, both technically (e.g. associated sampling scheme) but also in terms of application (sparsity). The results are insightful from a theoretical perspective. That is, the inducing weights and their treatment does seem to result in interesting and potentially useful statistical properties for the model. On the other hand, it is important to note that the high-level idea of variational inducing weights, with usage of matrix normals in this setting, as well as connection to deep GPs has been studied before, as pointed out by R2 (refs [1,2]). Furthermore, even after discussions the motivation is still not entirely convincing, especially in conjunction with the experiments. Although various interesting ideas exist in the paper, both R2 and R3 in particular remain unconvinced about what is the main benefit (e.g. pruning or runtime efficiency) stemming out of the proposed idea. Another reviewer agreed with this point in the private discussions. Apart from overall clearer positioning of the paper, the claimed benefit would need to be supported by experiments tailored to illustrate this main point. The authors argued against some of the suggested comparisons (e.g. past pruning methods), and further discuss that there is no established experimental benchmark for the parameter efficiency of BNNs. I indeed sympathize with both of these arguments; however, I believe that if the reviewers' suggestions for extra experiments are rejected, it remains the responsibility of the authors to find a slightly different way of motivating their work and demonstrating its efficiency in some convincing, meaningful and more well-defined setting with the appropriate benchmarks. +",ICLR2021, +EKWhKUvcn7,1642700000000.0,1642700000000.0,1,03RLpj-tc_,03RLpj-tc_,Paper Decision,Accept (Poster),"This paper introduces a model, named Crystal Diffusion Variational Autoencoder (CDVAE), that can learn to sample valid material structures. It accounts for known symmetries (SE(3), permutation) of the structure via SE(3)-equivariant GNNs. + +The proposed model is a complicated combination of many existing models / modeling techniques (VAEs, NCSNs, diffusion models) but it is not entirely ad hoc; the revised paper does a reasonable job in justifying the many different modeling choices made. +Existing model components, often designed in the context of molecule generation, do not account for the periodicity of the crystal's lattice structure; so this paper introduces modifications to account for this periodicity. + +The paper evaluates the model on several datasets and also introduces new benchmarks that can be used for further research. The experimental results look promising but there are a few remaining clarity issues with the metrics used (cf. reviews).",ICLR2022, +hYLvNQJ6XW,1576800000000.0,1576800000000.0,1,SJgn3lBtwH,SJgn3lBtwH,Paper Decision,Reject,"This paper explores the practice of using lower-dimensional embeddings to perform Bayesian optimization on high dimensional problems. The authors identify several issues with performing such an optimization on a lower-dimensional projection and propose solutions leading to better empirical performance of the optimization routine. Overall the reviewers found the work well written and enjoyable. However, the reviewers were concerned primarily about the connection to existing literature (R2) and the empirical analysis (R1, R3). The authors claim that their method outperforms state-of-the-art on a range of problems but the reviewers did not feel there was sufficient empirical evidence to back up this claim. + Unfortunately, as such the paper is not quite ready for publication. The authors claim to have significantly expanded the experiments in the response period, however, which will likely make it much stronger for a future submission. +",ICLR2020, +#NAME?,1610040000000.0,1610470000000.0,1,HOFxeCutxZR,HOFxeCutxZR,Final Decision,Accept (Poster),"All the reviewers rate the paper above the bar. They like the experiment results and think the proposed latent space editing approach makes intuitive sense. While several weakness points were raised, including a lack of continuous editing comparison and sometimes vague descriptions, they were not considered major to reject the paper. After consolidating the reviews and rebuttal, the AC agrees with the reviewer assessment and recommends accepting the paper.",ICLR2021, +H1sTM1aHf,1517250000000.0,1517260000000.0,35,H15odZ-C-,H15odZ-C-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The paper presents a modified sampling method for improving the quality of interpolated samples in deep generative models. + +There is not a great amount of technical contributions in the paper, however it is written in a very clear way, makes interesting observations and analyses and shows promising results. Therefore, it should be of interest to the ICLR community.",ICLR2018, +DmR_LKrU5nmX,1642700000000.0,1642700000000.0,1,Zr5W2LSRhD,Zr5W2LSRhD,Paper Decision,Accept (Poster),"The paper studies the problem of how to construct orthogonal convolutional layers. It is known that a convolution layer is orthogonal if and only if its filters are obtained by certain Fourier operations on an orthogonal matrix. Previous work proposes to learn this orthogonal matrix, parameterized either through Cayley transform, or the exponential of a skew-symmetric matrix. This requires spectral computations with large matrices. The idea of this submission is to reduce the computational cost associated with this construction by letting this “core” matrix P be a periodic extension of a smaller orthogonal matrix P_0. Because of cancelations in the inverse DFT, this leads to sparse filters which can be implemented by dilated convolution. + +The review process generated a very detailed discussion between authors and reviewers, with several important clarifications. Reviewers generally found that the paper contributes a novel construction of orthogonal convolution layers, with better efficiency at test time. Remaining concerns held by some reviewers include the limitations vis previous constructions of orthogonal convolution layers, questions about the efficacy of use of a Taylor expansion, certain minor limitations of the experiments. After detailed interaction with the authors, the reviewers converged to a decision to accept, motivated by the novelties of the construction and its advantages for test-time efficiency.",ICLR2022, +Ske_97TJe4,1544700000000.0,1545350000000.0,1,B1l8iiA9tQ,B1l8iiA9tQ,Limited novelty compared to Dropout,Reject,"Dear authors, + +All reviewers pointed out that the proximity with Dropout warranted special treatment and that the justification provided in the paper was not enough to understand why exactly the changes were important. In its current state, this work is not suitable for publication to ICLR. + +Should you decide to resubmit this work to another venue, please take the reviewers' comments into account.",ICLR2019,4: The area chair is confident but not absolutely certain +U2rH0RvFFAg,1610040000000.0,1610470000000.0,1,H6ATjJ0TKdf,H6ATjJ0TKdf,Final Decision,Accept (Poster),"The paper proposes a layer-wise magnitude-based tuning method through the adoption of LAMP score, motivated by minimizing the model output distortion. The new importance score differs from vanilla magnitude-based score in that it incorporate more layer-wise information. Extensive experiments are conducted on image and language models to show the improved accuracy upon prior arts under same model compression ratio. Ablation study is also provided to further explain the intuition and comparison of LAMP with other pruning methods. + +Though the experiments are extensive, some reviewers raised questions that only image datasets are tested. In the rebuttal, the authors addressed more on Appendix D which provides non-image results, and also modified the abstract to highlight the efficacy on image data. In all, given the extensive empirical evaluation on various datasets and model architectures, the improvement of LAMB over prior methods seems convincing. Nevertheless, we urge the authors to include more experimental results, for example ResNet-18 on ImageNet as promised to Reviewer 1, to make the results more solid. It is also suggested to include and discuss some relevant papers mentioned by the reviewers in the final version. ",ICLR2021, +1besX6y6HDH,1610040000000.0,1610470000000.0,1,DILxQP08O3B,DILxQP08O3B,Final Decision,Accept (Poster),"This paper addresses the problem of visual object navigation by defining a novel visual transformer architecture, where an encoder consisting of a pretrained object detector extracts objects (i.e. their visual features, position, semantic label, confidence) that will serve as keys in an attention-based retrieval mechanism, and a decoder computes global visual features and positional descriptors as a coarse feature map. The visual transformer is first pretrained (using imitation learning) on simple tasks consisting in moving the state-less agent / camera towards the target object. Then an RL agent is defined by adding an LSTM to the VTNet and training it end-to-end on the single-room subset of the AI2-Thor environment where it achieves state-of-the-art performance. + +After rebuttal, all four reviewers converged on a score of 6. The reviewers praised the novelty of the method, extensive evaluation with ablation studies, and the SOTA results. Main points of criticism were about clarity of writing and some explanations (which the authors improved), using DETR vs. Faster R-CNN, and the relative simplicity of the task (single room and discrete action space). There were also minor questions, a request for more recent transformer-based VLN bibliography, and a request for a new evaluation on RoboThor. One area of discussion -- where I empathise with the authors -- was regarding the difficulty of pure RL training of transformer-based agents and the necessity to pre-train the representations. + +Taking all this into account, I suggest this paper gets accepted. +",ICLR2021, +8YOm7IUTgZp,1642700000000.0,1642700000000.0,1,6ya8C6sCiD,6ya8C6sCiD,Paper Decision,Reject,"This manuscript presents a novel approach to learning a shared language between multiple agents. + +In general, reviewers had difficulty understanding the symbolic mapping component. For such a critical part of the manuscript, questions by multiple reviewers were extremely basic, asking what symbolic mapping even is. Authors did clarify this in the discussion and updated the manuscript, but further improvements to the manuscript are warranted. + +Reviewers had concerns about the novelty of the approach. Including being confused about whether this is just an application of curriculum learning. Reviewers were also concerned about the lack of ablations. + +Reviewers also had concerns about the fact that this is a toy domain. Symbolic mapping as defined in the manuscript appears to be possible only for such toy domains. It fundamentally wouldn't scale to simple language games with real images. This significantly limits the scope of the work. More broadly, reviewers wanted to see symbolic mapping exercised much more. If this is a useful idea, they wanted to see the authors apply it to other domains. + +Reviewers were confused about many other details in the manuscript. For example, about the fact that refdis is later discarded as a metric, which the authors answered is due to redundant symbols (""the symbolic mapping is not a highly compositional representation here because of the redundant symbols""). Why redundant symbols lead to less compositional representations seem unclear. + +With significant additional improvements to the clarity of the manuscript, a demonstration of how symbolic mapping is useful in another domain, and additional experiments suggested by multiple reviewers this could be a strong submission in the future.",ICLR2022, +qEuv1CdS4S,1576800000000.0,1576800000000.0,1,S1g8K1BFwS,S1g8K1BFwS,Paper Decision,Accept (Poster),"The paper proposes a novel method to calibrate a knowledge graph embedding method when ground truth negatives are not available. Essentially, the method relies on generating corrupted triples as negative examples to be used by known approaches (Platt scaling and isotonic regression). + +This is claimed as the first approach of probability calibration for knowledge graph embedding models, which is considered to be very relevant for practitioners working on knowledge graph embedding (although this is a narrow audience). The paper does not propose a wholly novel method for probability calibration. Instead, the value in experimental insights provided. + +Some reviewers would have liked to see a more in-depth analysis, but reviewers appreciated the thoroughness of the results in the clear articulation of the findings and the fact that multiple datasets and models are studied. + +There was an animated discussion about this paper, but the paper seems a useful contribution to the ICLR community and I would like to recommend acceptance. ",ICLR2020, +r1CXrypBz,1517250000000.0,1517260000000.0,539,BJLmN8xRW,BJLmN8xRW,ICLR 2018 Conference Acceptance Decision,Reject,"meta score: 4 + +This is basically an application in which some different deep learning approaches are compared on the task of automatically identifying domain names automatically generated by malware. The experiments are well-constructed and reported. However, the work does not have novelty beyond the application domain, and thus is not really suitable for ICLR. + +Pros + - good set of experiments carried out on an important task + - clearly written +Cons + - lacks technical novelty +",ICLR2018, +HJleU2sAkE,1544630000000.0,1545350000000.0,1,Hkl84iCcFm,Hkl84iCcFm,novel yet unripe,Reject,"The paper uses dynamical systems theory to evaluate feed-forward neural networks. The theory is used to compute the optimal depth of resnets. An interesting approach, and a good initiative. + +At the same time, the approach seems not to be thought through well enough, and the work needs another level of maturation before publication. The application that is realised is too immature, and the corresponding contributions are not significant in their current form. All reviewers agree on rejection of the paper.",ICLR2019,5: The area chair is absolutely certain +H1MNNk6Hf,1517250000000.0,1517260000000.0,328,ry-TW-WAb,ry-TW-WAb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),The paper presents a variational Bayesian approach for quantising neural network weights and makes interesting and useful steps in this increasingly popular area of deep learning.,ICLR2018, +BylM8JTrz,1517250000000.0,1517260000000.0,731,BkPrDFgR-,BkPrDFgR-,ICLR 2018 Conference Acceptance Decision,Reject,"All three reviewers are in agreement that this paper is not ready for ICLR in its current state. Given the pros/cons, the committee feels the paper is not ready for acceptance in its current form.",ICLR2018, +OAVZzGMgR95,1642700000000.0,1642700000000.0,1,SHbhHHfePhP,SHbhHHfePhP,Paper Decision,Accept (Poster),"The manuscript develops a new kind of graph neural network (a Graph Mechanics Network; GMN) that is particularly well suited to representing and making predictions about physical mechanics systems (and data with similar structure). It does so by developing a way to build geometric constraints implicitly and naturally into the forward kinematics of the network, while still allowing for effective learning from data. The manuscript proves some essential properties of the new architecture and runs experiments both with simulated particles, hinges, sticks (and their combination), as well as with motion capture data. +Reviewers were generally impressed by the writing and clarity of the work, as well as the main results. In addition, in those cases where reviewers thought that the experiments were lacking, the authors delivered effective new experiments to address those concerns (e.g. looking at mocap and molecular datasets). One reviewer initially scores the manuscript as a Reject/3 on the basis of concerns about novelty and the scope of the theoretical and experimental contributions of the paper. However, they adjust their score 3->5 based on the rebuttal presented by the authors (including new experiments). The reviewer also downgrades their certainty (from 3->2) on the basis of the engagement from reviewers offering higher scores. +Overall, the manuscript presents a promising contribution to the graph networks literature and I agree with the general consensus in favour of publication.",ICLR2022, +y4BBlXf4L,1576800000000.0,1576800000000.0,1,Byx4NkrtDS,Byx4NkrtDS,Paper Decision,Accept (Poster),"Navigation is learned in a two-stage process, where the (recurrent) network is first pre-trained in a task-agnostic stage and then fine-tuned using Q-learning. The analysis of the learned network confirms that what has been learned in the task-agnostic pre-training stage takes the form of attractors. + +The reviewers generally liked this work, but complained about lack of comparison studies / baselines. The authors then carried out such studies and did a major update of the paper. + +Given that the extensive update of the paper seems to have addressed the reviewers' complaints, I think this paper can be accepted.",ICLR2020, +b6DIvscGx,1576800000000.0,1576800000000.0,1,S1x0CnEtvB,S1x0CnEtvB,Paper Decision,Reject,"This paper proposes a neural architecture search method based on greedily adding layers with random initializations. The reviewers all recommend rejection due to various concerns about the significance of the contribution, novelty, and experimental design. They checked the author response and maintained their ratings. +",ICLR2020, +Bygj0yBxgV,1544730000000.0,1545350000000.0,1,Bkx0RjA9tX,Bkx0RjA9tX,Clear accept ratings from reviewers.,Accept (Poster),"All reviewers recommend accept. +Discussion can be consulted below.",ICLR2019,4: The area chair is confident but not absolutely certain +jE9TH5VPW8X,1642700000000.0,1642700000000.0,1,b30Yre8MzuN,b30Yre8MzuN,Paper Decision,Reject,"The paper proposes a new method for subgraph similarity search by learning embeddings via a GNN-based approach to reflect the edit distance between subgraphs. Reviewers highlighted that the paper proposes an intuitive and promising approach to an interesting problem and provides a good balance between theoretical and empirical results. However, reviewers raised also concerns regarding the significance of technical contributions, limited analysis (e.g, performance on large-scale graphs, baselines, evaluation) and comparison to related work. After author response and discussion, reviewers did not come to a full agreement with two reviewers indicating weak acceptance and two reviewers indicating (weak) reject. Taking rebuttal and discussion into account, I agree with the viewpoint that the paper is not yet ready for acceptance at ICLR as it would require an additional revision to fully address the raised concerns. However, I encourage the authors to revise and resubmit their manuscript based on the feedback from this reviewing round.",ICLR2022, +HJKd4ypHz,1517250000000.0,1517260000000.0,387,rybAWfx0b,rybAWfx0b,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"Pros +-- A novel way to incorporate LM into an end-to-end model, with good adaptation results. + +Cons +-- Lacks results on public corpora or results are not close to SOTA (e.g., for Librispeech). + +Given the reviews, it is clear that the experimental evaluations can be improved. But the presented approach is novel and interesting. Therefore I am recommending the paper to the workshop track.",ICLR2018, +HyJlUkaSz,1517250000000.0,1517260000000.0,703,SJ19eUg0-,SJ19eUg0-,ICLR 2018 Conference Acceptance Decision,Reject,"Pros: ++ Clearly written paper. + +Cons: +- Limited empirical evaluation: paper should compare to first-order methods with well-tuned hyperparameters, since the block Hessian-free hyperparameters likely were well tuned, and plots of convergence as a function of time need to be included. +- Somewhat limited novelty in that block-diagonal curvature approximations have been used before, though the application to Hessian-free optimization is new. + +The reviewers liked the clear description of the proposed algorithm and well-structured paper, but after discussion were not prepared to accept it primarily because (1) they wanted to see algorithmic comparisons in terms of convergence vs. time in addition to the convergence vs. updates that were provided; (2) they wanted more assurance that the baseline first-order optimizers had been carefully tuned; and (3) they wanted results on larger scale tasks. +",ICLR2018, +HyUL3fI_e,1486400000000.0,1486400000000.0,1,S1HcOI5le,S1HcOI5le,ICLR committee final decision,Reject,All three reviewers point to significant deficiencies. No response or engagement from the authors (for the reviews). I see no basis for supporting this paper.,ICLR2017, +e5hzbLrzJ,1576800000000.0,1576800000000.0,1,HkxJHlrFvr,HkxJHlrFvr,Paper Decision,Reject,"This paper proposes a new measure for CNN and show its correlation to human visual hardness. The topic of this paper is interesting, and it sparked many interesting discussions among reviews. After reviewing each others’ comments, reviewers decided to recommend reject due to a few severe concerns that are yet to be address. In particular, reviewer 1 and 2 both raised concerns about potentially misleading and perhaps confusing statements around the correlation between HSF and accuracy. A concrete step was suggested by a reviewer - reporting correlation between accuracy and HSF. A few other points were raised around its conflict/agreement with prior work [RRSS19], or self-contradictory statements as pointed out by Reviewer 1 and 2 (see reviewer 2’s comment). We hope authors would use this helpful feedback to improve the paper for the future submission. +",ICLR2020, +BydqQ1pBf,1517250000000.0,1517260000000.0,202,B1J_rgWRW,B1J_rgWRW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),Theoretical analysis and understanding of DNNs is a crucial area for ML community. This paper studies characteristics of the relu DNNs and makes several important contributions.,ICLR2018, +SeH3Qv7s4dP,1610040000000.0,1610470000000.0,1,pQq3oLH9UmL,pQq3oLH9UmL,Final Decision,Reject,"This paper proposes a new hard attention model for the image classification as a way to achieve explainability. Two of the reviewers do not find the output of the system interpretable, which is a fatal weakness for a paper on XAI. +R1: The visualization in Fig.5 shows only that the region selected in each timestep indeed has the maximum EIG. But how to interpret the explainability from the glimpse sequence is still confusing. I can hardly perceive the sequence using my knowledge. +R2: However the output of the system is not so appealing either in performance or explainability. +R3: Post-discussion note: For me, it's a bit hard to say the proposed methodology is novel. Authors needs to explain why the proposed model is different from pre-existing methodologies regarding attention mechanism. +R4: Due to the above, the recommendation is Reject - but the authors are strongly encouraged to do experiments on more challenging data and compare to a newer baseline. +",ICLR2021, +bz-AZNzLqCNR,1642700000000.0,1642700000000.0,1,e0uknAgETh,e0uknAgETh,Paper Decision,Reject,"The manuscript investigates common adversarial attacks on event-based data for spiking neural networks. +They conclude that also in this setup adversarial attacks can strongly harm SNN performance. + +Although the reviewers agree that the paper presents some solid results and is well written, there was also substantial criticism. + +The main points were: +- It is not very clear how the usual attacks are applied to event-based data, and in general experimental setups are unclear. +- The methodological contribution of the paper seems limited. +- The novelty is limited, in particular Marchisio et al. 2021 investigates a very similar question and goes somewhat further. The author noted that their attacks are not deployed on neuromorphic hardware. A number of other important prior work is not discussed. +- The impact of adversarial defences was not considered. +- A more detailed comparison of event-based attacks to standard ANN attacks would be desired. + +After the reviews, the authors have invested substantial efforts to improve the paper. These efforts were appreciated by the reviewers. In particular, the authors ran additional experiments using the defence method TRADES. The results showed that TRADES is effective, but the attack has still a large success. + +In summary, the reviewers agree that this is a solid manuscript and an interesting direction, however, they see it finally slightly below acceptance threshold for ICLR.",ICLR2022, +L-SEjQfizT,1576800000000.0,1576800000000.0,1,BkgRe1SFDS,BkgRe1SFDS,Paper Decision,Reject,"This paper introduces an approach for structured exploration based on graph-based representations. While a number of the ideas in the paper are quite interesting and relevant to the ICLR community, the reviewers were generally in agreement about several concerns, which were discussed after the author response. These concerns include the ad-hoc nature of the approach, the limited technical novelty, and the difficulty of the experimental domains (and whether the approach could be applied to a more general class of challenging long-horizon problems such as those in prior works). Overall, the paper is not quite ready for publication at ICLR.",ICLR2020, +O0LE7qmXz0,1576800000000.0,1576800000000.0,1,S1eqj1SKvr,S1eqj1SKvr,Paper Decision,Reject,"This paper provides an improved feature space adversarial attack. + +However, the contribution is unclear in its significance, in part due to an important prior reference was omitted (song et al.) + +Unfortunately the paper is borderline, and not above the bar for acceptance in the current pool.",ICLR2020, +SyFbmypBM,1517250000000.0,1517260000000.0,77,H1vEXaxA-,H1vEXaxA-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The paper considers learning an NMT systems while pivoting through images. The task is formulated as a referential game. From the modeling and set-up perspective it is similar to previous work in the area of emergent communication / referential games, e.g., Lazaridou et al (ICLR 17) and especially to Havrylov & Titov (NIPS 17), as similar techniques are used to handle the variable-length channel (RNN encoders / decoders + the ST Gumbel-Softmax estimator). However, its multilingual version is interesting and the results are sufficiently convincing (e.g., comparison to Nakayama and Nishida, 17). The paper would more attractive for those interested in emergent communication than the NMT community, as the set-up (using pivoting through images) may be perceived somewhat exotic by the NMT community. Also, the model is not attention-based (unlike SoA in seq2seq / NMT), and it is not straightforward to incorporate attention (see R2 and author response). + ++ an interesting framing of the weakly-supervised MT problem ++ well written ++ sufficiently convincing results +- the set-up and framework (e.g., non-attention based) is questionable from practical perspective +",ICLR2018, +N1DJKizW8O,1576800000000.0,1576800000000.0,1,SylGpT4FPS,SylGpT4FPS,Paper Decision,Reject,"This provides a simple analysis of an existing algorithm for min-max optimization under some favorable assumptions. The paper is clean and nice, though unfortunately lands just below borderline. + +I urge the authors to continue their interesting work, and amongst other things address the reviewer comments, for example those on stochastic gradient descent. ",ICLR2020, +EumacoGzRvE,1610040000000.0,1610470000000.0,1,AHm3dbp7D1D,AHm3dbp7D1D,Final Decision,Accept (Poster),"There is definite consensus on this paper, with all reviewers expressing very favorable opinions. The author responses are very well articulated and address the main concerns expressed by the reviewers. The paper is very well-written and the ablation study well-executed. Some recent related work was missed in the original submission, but this was adequately addressed in rebuttal. The proposed approach is novel technique for feature representation learning. The clarifications to the manuscript and the new analyses are especially appreciated. ",ICLR2021, +EYie4KEhiIj,1642700000000.0,1642700000000.0,1,MXdFBmHT4C,MXdFBmHT4C,Paper Decision,Accept (Poster),"This work proposes a new embedding for sets of features. A set is represented by the output means of an EM algorithm for fitting the input set with a mixture of Gaussians. The authors draw a new connection to an existing method for set embedding (OTKE). Moreover, their method achieves good experimental results. + +There is general consensus among the reviewers that the paper is sound, well-written and provides new insights for set representation, with convincing experiments. + +The authors have answered to most comments raised by the reviewers and have revised the paper accordingly. + +I recommend acceptance as a poster.",ICLR2022, +ByhY3ML_l,1486400000000.0,1486400000000.0,1,SJgWQPcxl,SJgWQPcxl,ICLR committee final decision,Reject,"The presented approach builds heavily on recent work, but does provide some novelty. The presentation is generally all right, but there are parts of the manuscript that the reviewers feel needed/needs work. All reviewers note that the evaluation and experimental work could be improved.",ICLR2017, +KXrK3zyOY6a,1610040000000.0,1610470000000.0,1,Iw4ZGwenbXf,Iw4ZGwenbXf,Final Decision,Accept (Poster),"The authors propose an intriguing alternative to IFT or unrolled GD as a method for optimizing through arg min layers in a neural net, by using a differentiable sampling-based optimization approach. I found the general idea in the paper to be intriguing and thought-provoking. The reviewers generally seem to have also appreciated the method, and many of the reviewers' concerns were addressed by the authors during the rebuttal. Although the paper does have a number of flaws -- in particular, the evaluation is a bit hard to appreciate, since improvement over prior work is either unclear, or no meaningful comparison is offered, -- I think in this case the benefits outweigh the downsides. The work is far from perfect, but the ideas that are presented are interested and valuable to the community, and I think that ICLR attendees will appreciate learning about this work. I would encourage the authors however to improve the paper, and especially the empirical evaluation, as much as possible for the camera-ready, and to take reviewer comments into account insofar as feasible. I'm also not sure how much I buy the ""overfitting to hyperparameters"" argument for unrolled GD, and a less charitable interpretation is that the authors present this issue largely to make up for the comparative lack of other benefits. That's not necessarily a bad thing, but I think making such a big deal of it is a bit strange. It's probably fair to say at this stage that the actual benefits of this approach are a bit modest (though improvements in runtime are a good thing...), but the idea is interesting, and may spur future research.",ICLR2021, +SK8GVzfNbwM,1610040000000.0,1610470000000.0,1,jphnJNOwe36,jphnJNOwe36,Final Decision,Accept (Poster),"This paper studies how to improve the worst-case subgroup error in overparameterized models using two simple post-hoc processing techniques. All reviewers were positive about the paper, though R5 questioned the novelty of the paper which built heavily on a few previous papers (in particular, it builds heavily on Sagawa et al. 2020a,b). The AC is satisfied with the authors`' response clarifying the novelty. Given that this topic is quite timely and of interest to the ICLR community, and that this paper presented a clean investigation on it, the AC recommends acceptance.",ICLR2021, +8rwCvEAs4Z0,1610040000000.0,1610470000000.0,1,EBRTjOm_sl1,EBRTjOm_sl1,Final Decision,Reject,"The authors propose to linearly combine the utility functions of (batch) active learning algorithms. The linear combination coefficients are ""learned"" with Monte Carlo estimators to adapt the coefficients to different kinds of tasks automatically. + +The reviewers find the presentation within the papers generally clear. The simplicity of the approach, which is highlighted in the authors' rebuttal, should be appreciated. The authors also addressed the issue of robustness with respect to the batch size. But the paper left quite a few unanswered issues even after the authors rebuttal. The novelty with respect to several earlier papers require clarification and concrete comparisons, such as the ones in reinforcement learning and bandit learning as pointed out by reviewers. The lack of comparisons to those existing works, both illustratively and empirically, is a key weakness of the current paper. A more careful study of RL setting (such as reward shaping) is also important to understand the value of the work. Finally, the gap between the ensemble approach and the single approach also deserves more investigation to justify the significance of the contribution. +",ICLR2021, +zwkOfj5hyEV,1610040000000.0,1610470000000.0,1,NQbnPjPYaG6,NQbnPjPYaG6,Final Decision,Accept (Poster),"This paper presents a series of negative results regarding the convergence of deterministic, ""reasonable"" algorithms in min-max games. The defining characteristic of such algorithms is that (a) the algorithm's fixed points are critical points of the game; and (b) they avoid strict maxima from almost any initialization. The authors then construct a range of simple $2$-dimensional ""market games"" in which every reasonable algorithm fails to converge, from almost any initialization. + +The paper received three positive recommendations and one negative, with all reviewers indicating high confidence. After my own reading of the paper, I concur with the majority view that the paper's message is an interesting one for the community and will likely attract interest in ICLR. + +In more detail, I view the authors' result as a cautionary tale, not unlike the NeurIPS 2019 spotlight paper of Vlatakis-Gkaragkounis et al, and a concurrent arxiv preprint by Hsieh et al. (2020). In contrast to the type of cycling/recurrence phenomena that are well-documented in bilinear games (and which can be resolved through the use of extra-gradient methods), the non-convergence phenomena described by the authors of this paper appear to be considerably more resilient, as they apply to all ""reasonable"" algorithms. Determining whether GANs (or other practical applications of min-max optimization) can exhibit such phenomena is an important open question, and one which needs to be informed by a deeper understanding of the theory. I find this paper successful in this regard and I am happy to recommend acceptance.",ICLR2021, +SkgcPwaGeE,1544900000000.0,1545350000000.0,1,B1g30j0qF7,B1g30j0qF7,Interesting work taking recent advances one step further,Accept (Poster),"There has been a recent focus on proving the convergence of Bayesian fully connected networks to GPs. This work takes these ideas one step further, by proving the equivalence in the convolutional case. + +All reviewers and the AC are in agreement that this is interesting and impactful work. The nature of the topic is such that experimental evaluations and theoretical proofs are difficult to carry out in a convincing manner, however the authors have done a good job at it, especially after carefully taking into account the reviewers’ comments. + +",ICLR2019,5: The area chair is absolutely certain +4uudRwsglG,1576800000000.0,1576800000000.0,1,rkl8sJBYvH,rkl8sJBYvH,Paper Decision,Accept (Spotlight),This paper carries out extensive experiments on Neural Tangent Kernel (NTK) --kernel methods based on infinitely wide neural nets on small-data tasks. I recommend acceptance.,ICLR2020, +BkxwTdDAJE,1544610000000.0,1545350000000.0,1,Hygn2o0qKX,Hygn2o0qKX,ICLR 2019 decision,Accept (Poster),"Existing PAC Bayes analysis gives generalization bounds for stochastic networks/classifiers. This paper develops a new approach to obtain generalization bounds for the original network, by generalizing noise resilience property from training data to test data. All reviewers agree that the techniques developed in the paper (namely Theorem 3.1) are novel and interesting. There was disagreement between reviewers on the usefulness of the new generalization bound (Theorem 4.1) shown in this paper using the above techniques. I believe authors have sufficiently addressed these concerns in their response and updated draft. Hence, despite the concerns of R3 on limitations of this bound and its dependence on pre-activation values, I agree with R2 and R4 that the techniques developed in the paper are of interest to the community and deserve publication. I suggest authors to keep comments of R3 in mind while preparing the final version. ",ICLR2019,4: The area chair is confident but not absolutely certain +GfyAvLt4aF,1576800000000.0,1576800000000.0,1,rJehVyrKwH,rJehVyrKwH,Paper Decision,Accept (Spotlight),"This paper addresses to compress the network weights by quantizing their values to some fixed codeword vectors. The paper is well written, and is overall easy to follow. The proposed algorithm is well-motivated, and easy to apply. The method can be expected to perform well empirically, which the experiments verify, and to have potential impact. On the other hand, the novelty is not very high, though this paper uses these existing techniques in a different setting.",ICLR2020, +Mdz7zs_awPR,1610040000000.0,1610470000000.0,1,c5QbJ1zob73,c5QbJ1zob73,Final Decision,Reject,"This paper presents new analysis for self-supervised learning. All reviewers are positive about some new perspectives of the analysis. However, some serious concerns have been raised about the rigorousness and the presentation clarity. The paper would be significantly improved, if the authors could address the concerns.",ICLR2021, +ryebNJaHf,1517250000000.0,1517260000000.0,287,rJlMAAeC-,rJlMAAeC-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper present a functional extension to NPI, allowing the learning of simpler, more expressive programs. + +Although the conference does not put explicit bounds on the length of papers, the authors pushed their luck with their initial submission (a body of 14 pages). It is clear, from the discussion and the reviews, however, that the authors have sought to substantially reduce the length of their paper while improving its clarity. + +Reviewers found the method and experiments interesting, and two out of three heartily recommend it for acceptance to ICLR. I am forced to discount the score of the third reviewer, which does not align with the content of their review. I had discussed the issue of length with them, and am disappointed that they chose not to adjust their score to reflect their assessment of the paper, but rather their displeasure at the length of the paper (which, as stated above, does push the boundary a little). + +Overall, I recommend accepting this paper, but warn the authors that this is a generous decision, heavily motivated by my appreciation for the work, and that they should be careful not to try such stunts in future conference in order to preserve the fairness of the submission process.",ICLR2018, +H1_O2GI_e,1486400000000.0,1486400000000.0,1,Hyvw0L9el,Hyvw0L9el,ICLR committee final decision,Invite to Workshop Track,"The paper extends PixelCNN to do text and location conditional image generation. The reviewers praise the diversity of the generated samples, which seems like the strongest result of the paper. On the other hand, they are concerned with their low resolution. The authors made an effort of showing a few high-resolution samples in the rebuttal, which indeed look better. Two reviewers mention that the work with respect to PixelCNN is very incremental, and the AC agrees. Overall, this paper is very borderline. While all reviewers became slightly more positive, none was particularly swayed. The paper will make a nice workshop contribution.",ICLR2017, +M3lIg_-Imk,1610040000000.0,1610470000000.0,1,K5YasWXZT3O,K5YasWXZT3O,Final Decision,Accept (Poster),"Dear Authors, + +Thank you very much for your detailed feedback to the initial reviews and also for further answering additional questions raised by a reviewer. Your effort has been certainly contributed to clarifying some of the concerns raised by the reviewers and improving their understanding of this paper. + +Overall, all the reviewers found a merit in this paper and thus I suggest its acceptance. However, as Reviewer #2 suggested, investigating the convergence in the stochastic case is very important. More discussion on this would be a valuable addition to the paper, which the authors can incorporate in the final version.",ICLR2021, +BkHtVkTSf,1517250000000.0,1517260000000.0,397,BJInEZsTb,BJInEZsTb,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"This paper compares autoencoder and GAN-based methods for 3D point cloud representation and generation, as well as new (and welcome) metrics for quantitatively evaluating generative models. The experiments form a good but still a bit too incomplete exploration of this topic. More analysis is needed to calibrate the new metrics. Qualitative analysis would be very helpful here to complement and calibrate the quantitative ones. The writing also needs improvement for clarity and verbosity. The author replies and revisions are very helpful, but there is still some way to go on the issues above. Overall, the committee is intersting and recommends this paper for the workshop track.",ICLR2018, +BJl2MiBxe4,1544740000000.0,1545350000000.0,1,BJg_roAcK7,BJg_roAcK7,Metareview,Accept (Poster),"This manuscript proposes a new algorithm for instance-wise feature selection. To this end, the selection is achieved by combining three neural networks trained via an actor-critic methodology. The manuscript highlight that beyond prior work, this strategy enables the selection of a different number of features for each example. Encouraging results are provided on simulated data in comparison to related work, and on real data. + +The reviewers and AC note issues with the evaluation of the proposed method. In particular, the evaluation of computer vision and natural language processing datasets may have further highlighted the performance of the proposed method. Further, while technically innovative, the approach is closely related to prior work (L2X) -- limiting the novelty. + +The paper presents a promising new algorithm for training generative adversarial networks. The mathematical foundation for the method is novel and thoroughly motivated, the theoretical results are non-trivial and correct, and the experimental evaluation shows a substantial improvement over the state of the art.",ICLR2019,3: The area chair is somewhat confident +d-Z2rILZohQ,1610040000000.0,1610470000000.0,1,hPWj1qduVw8,hPWj1qduVw8,Final Decision,Accept (Poster),"This paper studies the problem of visual question answering in multi-turn dialogues. +The proposed method is to identify relevant dialog turns as a path in a semantic graph that connects the dialogue turns. Empirical performance of the proposed method is strong. Reviewers concerns have been compressively addressed. Overall, the paper has novelty, and explores an interesting direction in this line of work.",ICLR2021, +TGBf-zgiAxQ,1642700000000.0,1642700000000.0,1,p0rCmDEN_-,p0rCmDEN_-,Paper Decision,Accept (Poster),"This paper explores the idea that fixational drift of a sensor over an image (something that primate eyes do) could be used to achieve visual hyperacuity, i.e. image recognition with low resolution images equivalent to what would be achieved with high resolution images. The authors construct networks where the bottom of a deep convnet is replaced by recurrent networks and the network is then trained on low-resolution versions of high-resolution images that are sampled with fixational drift across the image. The authors show that this approach allows their system (dynamical recurrent classifier, or DRC) to get much better classification performance on CIFAR images than can be achieved without the early recurrence and drift. The authors also show that the most robust classification mandates drift trajectories with higher curvature, and they show that this matches some of the properties of visual drift trajectories in humans. + +The reviews on this paper were highly divergent (ranging from 3 to 10). Three of the reviewers felt this paper should be rejected, but one felt very strongly it should be accepted. The primary concerns from the negative reviewers were lack of appropriate controls, lack of insight into why the system works, lack of appropriate references to past work, and lack of connection to biology. The authors made a very concerted effort to attend to all of the reviewers' comments. They ran all of the requested control experiments, updated the text to better reflect past literature, and included some comparison to psychophysics data. In the end, only one reviewer increased their score, though, leading to final scores of 3, 10, 5, and 3. Discussion did not lead to any more consensus. + +Thus, this paper was still very much in the borderline zone, and required AC consideration. After reading through the paper, reviews, and rebuttals, the AC felt that the authors really had addressed the primary concerns as best as could be hoped for in the time-frame for ICLR, and that the paper was sufficiently interesting and informative for ML and neuroscience to be worthy of publication. Some of the negative review points stand, e.g. there are still some mysteries as to why this works and there is certainly a lot more that could be done to make this paper informative for neuroscience. Nonetheless, in total, the AC felt that this paper deserved to be accepted, given that the authors did most of what the reviewers requested of them.",ICLR2022, +BCezA2BoqE,1610040000000.0,1610470000000.0,1,kic8cng35wX,kic8cng35wX,Final Decision,Reject,"In line with recent work in the NAS literature, the authors consider a weak NAS performance strategy to filter out bad architectures and narrow down the exploration to the most promising region of the search space. The authors propose to estimate weak predictors progressively by learning a series of weak predictors that can connect towards the best architectures. The authors provided a number of additional experiments during rebuttal, addressing most of the reviewers' comments convincingly and further showing the strong performance of their method. However, the authors should relate their work to Bayesian optimization, which comes in many flavors, and black-box optimization techniques in general as their work shows a number of similarities, but is less principled.",ICLR2021, +FANtQce_foPK,1642700000000.0,1642700000000.0,1,D9SuLzhgK9,D9SuLzhgK9,Paper Decision,Reject,"The paper studies how adaptive methods help train GANs to achieve better FID scores. It empirically shows that the adaptive magnitude in ADAM is the reason for ADAM's wide adoption for GAN training. The paper receives three reviews: one ranked the paper ""accept, good paper"" and two ranked the paper ""marginally below the acceptance threshold"". The supportive reviewer likes the findings in the paper interesting but does not provide enough explanation on the significance of the findings. On the other hand, the negative reviewers raise several concerns, including the GAN architectures used in the paper are outdated and the achieved performance gain is not major. As the paper focuses on performance instead of convergence, the meta-reviewer feels it would be better to include results on SOTA GAN architectures. The provided rebuttal does not lead to any review score change. Consolidating the review and rebuttal, the meta-reviewer feels the paper needed to be improved to meet the bar and would not recommend its acceptance.",ICLR2022, +1p8KBJ6Q9IE,1610040000000.0,1610470000000.0,1,o_V-MjyyGV_,o_V-MjyyGV_,Final Decision,Accept (Spotlight),"This paper describes a method for adapting an RL policy in a deployment environment that does not provide a reward signal. This concern arises commonly when a task reward is available in a robot simulator but not on the physical robot where the policy is eventually deployed. The proposed solution is to learn an inverse dynamics model as an auxiliary prediction task on an internal state embedding that is shared with the policy. The policy is adapted during deployment by modifying the state embedding using this auxiliary task (with the assumption that the main objective remains unchanged). The proposed method is tested with transfer between simulated domains and also on transfer from a simulator to a physical robot. The experiments showed the method had consistently higher performance than alternatives. + +The reviewers found many positive contributions in the presented paper. These include the problem's importance (R1, R2,R4), extensive experiments (R1, R2, R3), clear writing (R1,R4), simplicity and effectiveness in comparison to ablations (R3, R4). The reviewers saw a weakness in the method's limitation to perceptual adaptation instead of dynamics adaptation (R1-4) and the lack of novelty (R4). The author response addressed both concerns. They stated that the method is novel for adapting to continuously changing environments in a self-supervised fashion without rewards. The authors modified the paper to clarify how the method demonstrates robustness to changes in the system dynamics. The reviewers found the author response addressed their major concerns. + +Four reviewers indicate accept for the contributions stated above and expressed no remaining concerns. The paper is therefore accepted. ",ICLR2021, +-4I5-m6NF7,1576800000000.0,1576800000000.0,1,BJxVI04YvB,BJxVI04YvB,Paper Decision,Accept (Poster),"This paper describes a method for bounding the confidence around predictions made by deep networks. Reviewers agree that this result is of technical interest to the community, and with the added reorganization and revisions described by the authors, they and the AC agree the paper should be accepted. ",ICLR2020, +R6DMgSmOiWH,1642700000000.0,1642700000000.0,1,tzO3RXxzuM,tzO3RXxzuM,Paper Decision,Reject,"This paper focuses on generalization bounds for exponential family Langevin dynamics, which extends related recent work for stochastic iterative algorithms such as SGLD in several ways. They derive expected stability bounds for a more general class of noisy stochastic iterative algorithms, leading to an exponential family variation of Langevin dynamics and a noisy version of the sign-SGD algorithm. The contributions are technical and quite positively received by one reviewer, while the others were not convinced to change their opinions during the author response as there were concerns on the limitation of the theoretical contributions and the extent to which these contributions have implications on achieving state of the art performance. While I find it valid that the scope of the paper focuses on generalization bounds and provide improvements over the existing literature, rather than on empirical benchmarks or on optimization-related aspects, the overall borderline impression of the reviewers on the whole suggests that a refined version of the paper that further clarifies the contributions and makes clear its impacts as well as limitations may make for a stronger and more impactful paper.",ICLR2022, +5u61raYNQqa,1610040000000.0,1610470000000.0,1,avBunqDXFS,avBunqDXFS,Final Decision,Reject,"This paper proposes a semi-supervised setting to reduce memory budget in replay-based continual learning. +It uses unlabeled data in the environment for replaying which requires no storage, and generates pseudo-labels where unlabeled data is connected to labeled one. +The method was validated on the proposed tasks. + +Pros: +- The semi-supervised continual learning setting is novel and interesting. +- The proposed approach is memory efficient, since it does not need exemplars to replay past tasks. + +Cons: +- The scale of experiment is small. It lacks evaluation in real world environment. +- The novelty is limited, because it is a combination of existing technologies: pseudo-labeling, consistency regularization, Out-of-Distribution (OoD) detection, and knowledge distillation. +- The comparison might not be fair due to different settings. + +The authors addressed the fairness and scalability with additional experiments +and leave some suggestions of reviewers for future work. +R3 had a concern on the error propagation of pseudo-labels which I also share. The authors agreed that this is a challenge for all CL methods. + +In summary, the reviews are mixed. All reviewers agree that the semi-supervised continual learning setting is novel and interesting, and some have concerns on scalability and novelty of the method which I also share. So at present time I believe there is much room for the authors to improve their method and experiments before publication. +",ICLR2021, +L53TvA7NS0N,1610040000000.0,1610470000000.0,1,ZglaBL5inu,ZglaBL5inu,Final Decision,Reject,"Reviewers generally appreciate the contributions of the paper, namely the horocycle neuron, Poisson neuron, and the universal approximation properties. However, there are concerns, especially by R4 and R5, that the presentation is confusing, lacks clarity, and should be substantially improved. + +Note: Theorem 1.7 in (Helgason, 1970) is proved explicitly for the case n=2, not for general n as claimed in (9). Thus the Laplacian eigenspace motivation needs to be re-written/re-examined.",ICLR2021, +MAADcOCQZi,1642700000000.0,1642700000000.0,1,xZ6H7wydGl,xZ6H7wydGl,Paper Decision,Accept (Poster),"This paper proposes an importance-sampling estimator for probabilities of observations of SDEs. The proposed approach has several advantages over conventional methods: it does not require an SDE solver, it has lower gradient variance, and shows nice results with a Gaussian process representation of the function. Reviewers were somewhat split on this paper, with some concerns that experiments were limited. On balance, however, the paper makes several nice contributions, the experiments are in line with related works, and the authors did a good job of clarifying Theorem 1 in the rebuttal. We note that Reviewer K19Y changed their opinion to accept (although they forgot to update the score). Please carefully account for all reviewer comments in the final version.",ICLR2022, +UG66lvgWA3,1610040000000.0,1610470000000.0,1,dx11_7vm5_r,dx11_7vm5_r,Final Decision,Accept (Poster),"The authors propose to provide fast convergence results for the OGDA and OMWU algorithms based on a reinterpretation of the metric subregularity in the saddle point problem setting. During the rebuttal period, the paper improved significantly, not only due to the diligence of the authors but also due to reactive reviewers that provided extremely constructive comments. The technical developments are quite nice: Lemma 2 allows constant step-size parameter as compared to Daskalakis and Panageas, followed by Theorem 3, which establishes the first linear rate under the saddle point metric subregularity. The numerical demonstrations are also helpful in driving the point home. Although it is not surprising that the shape of the polytope matters, it is still impactful to see the linear rate. + + +ps. The authors should consider including a related work comparison to the reflected FB algorithm in [1] since it reduces to the FoRB and it also provides convergence analysis for the sequence in the general monotone inclusions. + +[1] Cevher and Vu, ""A reflected forward-backward splitting method for monotone inclusions involving Lipschitzian operators,'' +https://arxiv.org/pdf/1908.05912.pdf",ICLR2021, +rDePEOO_O,1576800000000.0,1576800000000.0,1,BklC2RNKDS,BklC2RNKDS,Paper Decision,Reject,"This submission proposes a deep network training method to verify desired temporal properties of the resultant model. + +Strengths: +-The proposed approach is valid and has some interesting components. + +Weaknesses: +-The novelty is limited. +-The experimental validation could be improved. + +Opinion on this paper was mixed but the more confident reviewers believed that novelty is insufficient for acceptance.",ICLR2020, +Hy_R81TrG,1517250000000.0,1517260000000.0,898,ByL48G-AW,ByL48G-AW,ICLR 2018 Conference Acceptance Decision,Reject,"Evaluating simple baselines for continuous control is important and nearest neighbor search methods are interesting. However, the reviewers think that the paper lacks citation and comparison to some prior work and evaluation on more challenging benchmarks.",ICLR2018, +JbOeK1pZDj,1610040000000.0,1610470000000.0,1,2ioNazs6lvw,2ioNazs6lvw,Final Decision,Reject,"Given the reviewer's exchange with the authors, and my own examination of the paper, I don't think that it can be accepted in the present form. + +First, since this paper aims at solving an optimization problem (for which existing methods exist, with theoretical guarantees) via a NN, it is important to compare appropriately to those methods, which is not done here. + +Further, there are possible issues when applying these only to 2D data, and it is possible that it would not extend appropriately to other types of geometries, and costs in OT problems.",ICLR2021, +oicex5LjJ5e,1610040000000.0,1610470000000.0,1,rC8sJ4i6kaH,rC8sJ4i6kaH,Final Decision,Accept (Oral),"The paper looks into theoretical analysis of self-training beyond the existing linear case and considers deep networks under additional assumption on data. namely: expansion and minimal overlap in the neighborhood of examples in different classes. The results shed some light on self-training algorithms that use input consistency regularizers. +Although the assumptions are very hard to check for all input distributions, the authors make an attempt by considering output of BigGAN generator. In summary, the paper is a great first step in understanding self-training for deep networks. + +The paper is overall clearly written. please add the explanation of Assumption 4.1 as requested by Reviewer 4. + +Pros: - given the extensive use of self-training the paper is of great importance to the community +-extending the analysis of self-training to deep networks +-the paper is clearly written and easy to follow + +cons: -the assumptions are very hard to validate on all datasets +",ICLR2021, +swdmxaEI2K6,1610040000000.0,1610470000000.0,1,#NAME?,#NAME?,Final Decision,Accept (Poster),"This is a well written paper addressing a challenging problem with an original approach. While one reviewer claims there is not a strong call for calibration of regression tasks, this may well be because methods don't exist. Certainly, calibration is a critical tool for classification. + +The major failing of the paper, however, is the empirical evaluation. Given that no prior work exists, it is arguably OK to not do this, but one could easily reject the paper on this issue alone, as AnonReviewer4 was inclined to do. One reviewer, however, thought highly of the paper, which bumped up its average score, more than I think it should have got (due to the poor experimental work). + +The abstract could be improved by mentioning the use of kernels, the nature of this solution is a substantial part of the paper.",ICLR2021, +zZZY9Nravkk,1642700000000.0,1642700000000.0,1,CpgtwW8GBxe,CpgtwW8GBxe,Paper Decision,Reject,"This paper investigates a semi-supervised label refining approach to searching for similar voices for voice-dubbing. The apporach is based on generating refined labels using a clustering algorithm on the initial labels. Therefore, better voice characteristics can be extracted and used to select a new voice in the target language that closely matches the voice characteristics of the source language. Experiments are carried out on MassEffect as the main dataset and Skyrim as the second dataset and results show that the proposed approach slightly outperforms state of the art. While the topic under investigation is interesting and has its value to the applications such as voice casting, there are strong concerns raised by the reviewers. Reviewers find the paper difficult to follow. Some important pieces of information are either missing or only vaguely explained (e.g. non-expert initial labels, clear interpretation of p-vectors, etc.), which greatly hinders a deep understanding of the work. Some technical details such as network architecture and its training should be elaborated. This paper needs some good improvement in order to get accepted. No rebuttal is provided by the authors so all these concerns still stand.",ICLR2022, +rJd3IkaSf,1517250000000.0,1517260000000.0,871,Sy3fJXbA-,Sy3fJXbA-,ICLR 2018 Conference Acceptance Decision,Reject,"The paper proposes a method for learning connectivity in neural networks, evaluated on the ResNeXt architecture. The novelty of the method is rather limited, and even though the method has been shown to improve on the ResNeXt baselines on CIFAR-100 and ImageNet classification tasks (which is encouraging), it should have been evaluated on more architectures and datasets to confirm its generality.",ICLR2018, +By0yPkprG,1517250000000.0,1517260000000.0,918,Sktm4zWRb,Sktm4zWRb,ICLR 2018 Conference Acceptance Decision,Reject,"The authors have proposed a 'soft' version of VIN which is differentiable, where the cost function is trained by behavior cloning / imitation learning from expert/computer trajectories. The method is applied to a toy problem and to real historical data from mars rovers. The paper does not acknowledge nor compare against other methods, and the contribution is unclear, as is the justification for some of the aspects of the method. Additionally it is difficult to interpret the relevance or significance of the results (45% correct).",ICLR2018, +ryxYCndyTm,1541540000000.0,1545350000000.0,1,rkg5fh0ctQ,rkg5fh0ctQ,"Running short on novelty, but richer on the experimental side",Reject,"This paper proposes a transfer learning approach based on previous works on this area, to build language understanding models for new domains. Experimental results show improved performance in comparison to previous studies in terms of slot and intent accuracies in multiple setups. +The work is interesting and useful, but is not novel given the previous work. +The paper organization is also not great, for example, the intro should introduce the approach beyond just mentioning transfer learning and meta-learning. +The improvements over the baselines look good, but the baselines themselves are quite simple. It'd be better to include comparisons with other state of the art methods. Also, the improvements over DNN are not consistent, it would be good to analyze and come up with suggestions on when to use which approach. ",ICLR2019,4: The area chair is confident but not absolutely certain +acfogliSQy,1610040000000.0,1610470000000.0,1,TmkN9JmDJx1,TmkN9JmDJx1,Final Decision,Reject,"The paper presents a computational model for transformer encoders in the form of a programming language (called RASP), shows how to use this language to ""program"" tasks solvable by transformers, and describes how to use this model to explain known facts about transformer models. + +While the reviewers appreciated the novelty of the main idea, the evaluation and the exposition were found to be below the ICLR bar. As a result, the paper cannot be accepted this time around. I urge the authors to prepare a better new version using the feedback from the reviews and discussion. In particular, the paper would be much stronger with a discussion of how the ideas here can help with improving the transformer model, and whether these ideas generalize to models other than transformers.",ICLR2021, +BkxTdp4SeE,1545060000000.0,1545350000000.0,1,HyMRUiC9YX,HyMRUiC9YX,Needs improvements.,Reject,"While the paper contains significant information, most insights have already been revealed in previous work as noted by R1. +The empirical novelty is therefore limited and the authors do not provide theoretical analysis to complement this.",ICLR2019,4: The area chair is confident but not absolutely certain +GV90cWxRc4y,1642700000000.0,1642700000000.0,1,IR-V6-aP-mv,IR-V6-aP-mv,Paper Decision,Reject,"This paper studies the method to achieve the batch size-invariant for policy gradient algorithms (PPO, PPG). The paper achieves this by decoupling the proximal policy from the behavior policy. Empirical results show that the methods are somewhat effective at providing batch size invariance. + +After reading the authors' feedback, the reviewer discussed the paper and they did not reach a consensus. On the one hand, the rebuttal made some reviewers change their minds who appreciated the explanations provided by the authors and the new Figure that better highlights the batch size invariance property. +On the other hand, some reviewers think that there is still significant work to be done to get this paper ready for publication. In particular, it is necessary to improve the theoretical analysis and the evaluation of the empirical results. + +I encourage the authors to follow the reviewers' suggestions while they will update their paper for a new submission.",ICLR2022, +RZr0w-S7_tZ,1610040000000.0,1610470000000.0,1,QIRlze3I6hX,QIRlze3I6hX,Final Decision,Accept (Oral),"The paper proposes a new solution for cross-domain correspondence in control, which combines GANs and cycle-consistency, and separates shifts in observation space and in action space. The paper targets unpaired data / simulations, and discovers alignment of state by enforcing that domains are mappable. + +The paper was received well by reviewers, who pointed out several strengths: a strong contribution on a fundamental problem, and an interesting formulation; a well written and well positioned paper; This compensates minor weaknesses, in particular the fact that transfer has been tested between two different simulated environments. + +The reviewers unanimously suggested acceptance, the AC concurs.",ICLR2021, +H1aJhfLOe,1486400000000.0,1486400000000.0,1,B1akgy9xx,B1akgy9xx,ICLR committee final decision,Reject,No reviewer was willing to champion the paper and the authors did not adequately address reviewer comments in a revision. Recommend rejection.,ICLR2017, +rJBQnfLux,1486400000000.0,1486400000000.0,1,S13wCE9xx,S13wCE9xx,ICLR committee final decision,Reject,"The paper is mostly clearly written. The observation made in the paper that word-embedding models based on optimizing skip-gram negative sampling objective function can be formulated as a low-rank matrix estimation problem, and solved using manifold optimization techniques, is sound. However, this observation by itself is not new and has come up in various other contexts such as matrix completion. As such the reviewers do not see sufficient novelty in the algorithmic aspects of the paper, and empirical evaluation on the specific problem of learning word embeddings does not show striking enough gains relative to standard SGD methods. The authors are encouraged to explore complimentary algorithmic angles and benefits that their approach provides for this specific class of applications.",ICLR2017, +cdm8ijukh9,1576800000000.0,1576800000000.0,1,rkxtNaNKwr,rkxtNaNKwr,Paper Decision,Reject,"This work has a lot of promise; however, the author response was not sufficient to address the concerns expressed by reviewer 1, leading to an aggregate rating that is just not sufficient to justify an acceptance recommendation. The AC recommends rejection.",ICLR2020, +jy-6cGhk9pk_,1642700000000.0,1642700000000.0,1,s-b95PMK4E6,s-b95PMK4E6,Paper Decision,Reject,"This paper led to significant discussion, and the AC is generally on the fence. First of all, thanks to the reviewers for the significant time they invested in the discussion, and thanks for the authors for promptly and patiently answering our questions. + +Overall, the reviewer recommendations are positive. However, the discussion showed that despite the positive recommendation, the reviewers struggled to distill the general contribution of the paper beyond performance on ALFRED. In discussion, the authors distinguished their contribution from existing work by focusing on using a set of low-level policies at the root of the overall policy. This relies on the discrete set of behaviors that is defined within the ALFRED benchmark. It's not clear how it generalizes to the actual problem of instructing a robot to execute natural language instruction. In realistic scenarios, is it possible to define a set of behaviors in such a clean way, and at scale? And then train/manage a separate model for each behavior? The set of interaction policies in Figure 2 illustrates this challenge well. The answer to this scaling question is not clear. This corresponds to a concern raised repeatedly by the reviewers about the approach too specialized to ALFRED. The AC shares this concern. + +(which are roughly equal to the SOTA at the time of submission, but show significantly more overfitting to seen environments) + +On the positive side, this is solid work, with good results. The paper is well written, and the authors largely addressed the concerns raised as much as possible. The results are not SOTA though. The current SOTA was submitted on 09/19/2021, prior to the ICLR deadline -- it's not included in the results table in this paper. (To clarify, the fact that it's not the current SOTA does not affect the final decision, as they are considered as contemporaneous.) With concerns regarding the specificity of the approach, this paper may interest researchers working on ALFRED, but not clear to what depth, despite the clearly significant work and effort the authors put into the paper. + +(If the paper is accepted, the AC asks the authors to fix the standing errors with regard to previous work, as discussed below, and to include more recent results from the leaderboard)",ICLR2022, +TLl5cExp96V,1642700000000.0,1642700000000.0,1,2I1wy0y6xo,2I1wy0y6xo,Paper Decision,Reject,"The paper focuses on providing generalization bounds for SGD for functions that are invariant under scaling. The paper's analysis is based on the stability framework but instead focuses on a metric that is based on the anglular distance as compared to the euclidean distance. + +Overall the reviewers found the paper to be interesting and the results to be useful. However the reviewers found the paper to be significantly lacking in terms of its presentation. In particular a clear exposition of the central object of the paper, i.e. normalized loss function was missing as well as clear comparisons between the presented results and existing results. I recommend the authors to motivate their results better and contrast their presented results with existing results to fully highlight the impact of their presented result. Hopefully the suggestions made by the reviewers in terms of presentation will be helpful to the authors towards improving the paper.",ICLR2022, +gCxKzOZl2UF,1610040000000.0,1610470000000.0,1,pAj7zLJK05U,pAj7zLJK05U,Final Decision,Reject,"Reviewers liked the concept of the zero-day attack and yet raised different concerns about the other parts of the paper. In general, Reviewers wanted to see more thorough experimental evaluations (e.g., against blackbox attack and adaptive attack) and improved clarity of the theoretical analyses. AC encourages authors to incorporate Reviewers' comments when preparing the paper for elsewhere.",ICLR2021, +I43Sal-vmiP,1610040000000.0,1610470000000.0,1,UwGY2qjqoLD,UwGY2qjqoLD,Final Decision,Accept (Poster),"Four reviewers have reviewed this paper and after rebuttal, they were overall positive about the proposed idea. We congratulate authors on the paper.",ICLR2021, +4iv2us5Wol,1576800000000.0,1576800000000.0,1,HJxEhREKDH,HJxEhREKDH,Paper Decision,Accept (Poster),This paper provides further analysis of convergence in deep linear networks. I recommend acceptance. ,ICLR2020, +18SflV3PlLt,1610040000000.0,1610470000000.0,1,KJNcAkY8tY4,KJNcAkY8tY4,Final Decision,Accept (Poster),"This paper studies whether neural networks with different architectures, especially different width and depth, learn similar representations. All reviewers agree that the investigations are thorough and the experimental discoveries are convincing and well explained. Good work. I recommend accept.",ICLR2021, +IdKl8YyaTq,1576800000000.0,1576800000000.0,1,rylBK34FDS,rylBK34FDS,Paper Decision,Accept (Poster),"The authors propose a scale-invariant sparsity measure for deep networks. The experiments are extensive and convincing, according to reviewers. I recommend acceptance.",ICLR2020, +w4hZINnWJP,1642700000000.0,1642700000000.0,1,3z9RnbAS49,3z9RnbAS49,Paper Decision,Reject,"This paper derives a PAC-Bayes generalization bound for SGD and uses the results to postulate a functional form for the generalization error as a function of the ratio of the learning rate to the batch size. This functional form is then leveraged to develop a kernel function GP hyperparameter optimization. + +The reviewers favorably viewed the novel PAC-Bayes bound, but were not convinced by the subsequent analysis. In particular, the reviewers expressed some skepticism about the soundness and generality of the proposed functional form, and were unconvinced that the method would be useful in practice. As such, I cannot recommend the paper for acceptance.",ICLR2022, +YLchM1IawG,1576800000000.0,1576800000000.0,1,r1xPh2VtPB,r1xPh2VtPB,Paper Decision,Accept (Poster),"The paper proposes a novel model-free solution to POMDPs, which proposes a unified graphical model for hidden state inference and max entropy RL. The method is principled and provides good empirical results on a set of experiments that relatively comprehensive. I would have liked to see more POMDP tasks instead of Atari, but the results are good. Overall this is good work.",ICLR2020, +lidG9RGTx3g,1642700000000.0,1642700000000.0,1,qXa0nhTRZGV,qXa0nhTRZGV,Paper Decision,Reject,"As evident by the title the paper focuses on understanding sharpness-aware minimization which is a contemporary training procedure based on minimizing the worse case perturbation of the weights in ball. It has been observed that SAM improves the generalization and this paper aims to demystify this success. They also provide a convergence proof of SAM for non-convex objectives in a simplified setting and also discuss benefits of SAM in the noisy label setting.The reviewers thought this paper was an interesting first step The reviewers raised concerns about (1) novelty of the proof technique, (2) interpretation of the analysis. The response mitigated some the concerns but did not resolve them. I concur with the reviewers. The paper has some nice insights and good potential. However, there are a few things that need to be clarified and the paper has to be substantially rewritten to reflect this and thus I do not recommend acceptance at this time.",ICLR2022, +ByljbyaCJV,1544630000000.0,1545350000000.0,1,SJfb5jCqKm,SJfb5jCqKm,Interesting idea,Accept (Poster),"The paper proposes an improved method for uncertainty estimation in deep neural networks. + +Reviewer 2 and AC note that the paper is a bit isolated in terms of comparing the literature. + +However, as all of reviewers and AC found, the paper is well written and the proposed idea is clearly new/interesting.",ICLR2019,4: The area chair is confident but not absolutely certain +wRD_LEFtNX,1576800000000.0,1576800000000.0,1,BJgZBxBYPB,BJgZBxBYPB,Paper Decision,Reject,"This paper aims to estimate the parameters of a projectile physical equation from a small number of trajectory observations in two computer games. The authors demonstrate that their method works, and that the learnt model generalises from one game to another. However, the reviewers had concerns about the simplicity of the tasks, the longer term value of the proposed method to the research community, and the writing of the paper. During the discussion period, the authors were able to address some of these questions, however many other points were left unanswered, and the authors did not modify the paper to reflect the reviewers’ feedback. Hence, in the current state this paper appears more suitable for a workshop rather than a conference, and I recommend rejection.",ICLR2020, +MkUtlPJPl4x,1610040000000.0,1610470000000.0,1,4HGL3H9eL9U,4HGL3H9eL9U,Final Decision,Reject,"I thank the authors and reviewers for their discussions about this paper. The proposed AT-GAN is a GAN-based method to generate adversarial examples. Similar methods (e.g. Song et al) have been proposed to use GANs to generate adv. examples more efficiently. Authors show their method has some numerical benefits. However, more experiments are needed to further justify it. Also, creating ""unrestrictive"" adv. examples can cause a risk of generating samples where the true label is flipped. Authors need to clarify it. Given all, I think the paper needs a bit of more work to be accepted. I recommend authors to address the aforementioned concerns in the updated draft. + +-AC",ICLR2021, +S1RGUy6SM,1517250000000.0,1517260000000.0,741,BJvVbCJCb,BJvVbCJCb,ICLR 2018 Conference Acceptance Decision,Reject,"The paper proposes an approach to jointly learning a data clustering and latent representation. The main selling point is that the number of clusters need not be pre-specified. However, there are other hyperparameters and it is not clear why trading # clusters for other hyperparameters is a win. The empirical results are not strong enough to overcome these concerns.",ICLR2018, +Kk4KU54TS5,1642700000000.0,1642700000000.0,1,hfU7Ka5cfrC,hfU7Ka5cfrC,Paper Decision,Accept (Spotlight),"The paper provides a method for with tuning continuous hyperparameters (HPs). It is closely related to a previous work (Lorraine, 2019) that was limited to certain HPs, and in particular could not be applied to HPs controlling the learning such as learning rate, momentum, and are known to be influential to the convergence and overall performance (for non-convex objectives). +The reviews indicate a uniform opinion that the paper tackles an important problem, that its methods provide a non-trivial improvement over previous techniques and in particular those of (Lorrain, 2019), and that the provided experiments are extensive and convincing. The initial reviews had several concerns about technical details in the paper such as the analysis or how the meta-hyperparameters are tuned. However, in the discussions the authors provided adequate responses, resolving these concerns. I believe that with minor edits that are possible to get done by the camera-ready deadline the authors can incorporate their responses into the paper making it a welcome addition to ICLR.",ICLR2022, +Mwoc-3KJJk,1610040000000.0,1610470000000.0,1,mxfRhLgLg_,mxfRhLgLg_,Final Decision,Reject,"This paper is very interesting and timely, but as the reviewers note there is significant room for improvement in the clarity of the presentation and evaluation. In addition to the references mentioned by the reviewers, some other relevant references are the following: + +[1] Evan Rosenman, Nitin Viswanathan, ""Using Poisson Binomial GLMs to Reveal Voter Preferences,"" https://arxiv.org/abs/1802.01053 + +[2] Law, H. C. L., Sutherland, D., Sejdinovic, D., & Flaxman, S. (2018, March). ""Bayesian approaches to distribution regression."" In International Conference on Artificial Intelligence and Statistics (pp. 1167-1176). +",ICLR2021, +ByP3Q1TBM,1517250000000.0,1517260000000.0,221,H1meywxRW,H1meywxRW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This is an interesting paper that provides modeling improvements over several strong baselines and presents SOTA on Squad. One criticism of the paper is that it evaluates only on Squad, which is somewhat of an artificial task, but we think for publication purposes at ICLR, the paper has a reasonable set of components.",ICLR2018, +yO0XqJF7aWf,1610040000000.0,1610470000000.0,1,bQf4aGhfmFx,bQf4aGhfmFx,Final Decision,Reject,"The paper presents a method for meta-learning the loss function. The analysis mainly concerns the recently proposed TaylorGLO method on the (slightly less recent) Baikal loss. There was no consensus on this paper, but no reviewer was willing to fight for acceptance either. I found the paper not self-contained, with important non-standard elements undefined, starting with the Baikal loss, notations that are not defined in the main text, and a nomenclature that is also unusual with important terms such as ""attractor"" or ""invariant"" used in meanings that are non-standard in optimization or machine learning. + +Regarding content, most of the analyses refer to properties of the Baikal loss (not presented in the main text) that are deemed to be positive, without any theoretical support (Theorems 1 and 2). The inability to overfit is here posed as an obvious quality of a training loss. Then, a way to prevent the failure of the meta-training algorithm is presented in Theorem 3. Finally, an experiment is provided, showing that the proposed meta-training algorithm performs better than ""vanilla"" training with respect to adversarial attacks with FGSM. There is no comparison with other defense mechanisms and no analysis explains the results. Overall, although some interesting aspects may be developedin this paper, they are currently not well served by writing or the experimental evidences, so I recommend rejection. +",ICLR2021, +T-TDPcSYHRO,1610040000000.0,1610470000000.0,1,jQSBcVURlpW,jQSBcVURlpW,Final Decision,Reject,"This paper was reviewed by 4 experts in the field. The reviewers raised their concerns on lack of novelty, unconvincing experiment, and the presentation of this paper, While the paper clearly has merit, the decision is not to recommend acceptance. The authors are encouraged to consider the reviewers' comments when revising the paper for submission elsewhere.",ICLR2021, +BrD2Jxo7eD,1610040000000.0,1610470000000.0,1,2m0g1wEafh,2m0g1wEafh,Final Decision,Accept (Spotlight),"This paper analyzes deep networks optimized using non-convex noisy gradient descent. The main result shows that in a teacher-student setting, the excess risk converges in a fast-rate and is stronger than any linear estimators (which include kernel methods). The paper also gives a convergence rate result that depends on some spectral gaps (which can be very small) but not on dimension. Overall the paper is interesting. It should probably emphasize that the dependency on spectral gaps (and the fact that they could be exponentially small) on the convergence as the current abstract suggests efficient convergence.",ICLR2021, +Qb4Lm7XiRW,1576800000000.0,1576800000000.0,1,HJgLLyrYwB,HJgLLyrYwB,Paper Decision,Accept (Poster),"This paper addresses the setting of imitation learning from state observations only, where the system dynamics under which the demonstrations are performed differs from the target environment. The paper proposes to circumvent this dynamics shift with an algorithm whereby the target policy is trained to imitate its own past trajectories, re-ranked based on the similarity in state occupancies as judged by a WGAN critic. + +The reviewers found the paper to be clearly written and enjoyable. The paper improved considerably through reviewers feedback. Notably, a behavior cloning from observations (BCO) baseline was added, which was stronger than the authors expected but still helped highlight the strength of the proposed method by comparison. R1 had a particularly productive multiple round exchange, clarifying the description of previous work, clarifying the details of the proposed procedure and strengthening the presentation of empirical evidence. + +This work compellingly addresses an important problem, and in its final form is a polished piece of work. I recommend acceptance.",ICLR2020, +t9FO16glvu,1576800000000.0,1576800000000.0,1,SylL0krYPS,SylL0krYPS,Paper Decision,Accept (Poster),"This paper considers adversarial attacks in continuous action model-based deep reinforcement learning. An optimisation-based approach is presented, and evaluated on Mujoco tasks. + +There were two main concerns from the reviewers. The first was that the approach requires strong assumptions, but in the rebuttal some relaxations were demonstrated (e.g., not attacking every step). Additionally, there were issues raised with the choice of baselines, but in the discussion the reviewers did not agree on any other reasonable baselines to use. + +This is a novel and interesting contribution nonetheless, which could open the field to much additional discussion, and so should be accepted.",ICLR2020, +eq-ilaBwFCx,1610040000000.0,1610470000000.0,1,5qK0RActG1x,5qK0RActG1x,Final Decision,Reject,"The reviewers all found that the Consensus method introduced seemed sensible and applauded the authors on their extensive experiments. However, clearly they struggled to understand the paper well and asked for a clearer and more formal definition of the methods introduced. Unfortunately, the highest scoring review was also the shortest and also indicated issues with clarity. It seems like the authors have gone a long way to improve the notation, organization and clarity of the paper, but ultimately the reviewers didn't think it was ready for acceptance. Hopefully the feedback from the reviewers will help to improve the paper for a future submission.",ICLR2021, +HJxQ9_hCyV,1544630000000.0,1545350000000.0,1,H1x1noAqKX,H1x1noAqKX,Limited novelty,Reject,"The paper addresses the problem of out-of-distribution detection for helping the segmentation process. + +The reviewers and AC note the critical limitation of novelty of this paper to meet the high standard of ICLR. AC also thinks the authors should avoid using explicit OOD datasets (e.g., ILVRC) due to the nature of this problem. Otherwise, this is a toy binary classification problem. + +AC thinks the proposed method has potential and is interesting, but decided that the authors need more works to publish.",ICLR2019,5: The area chair is absolutely certain +ayTZGWuN7Qa,1610040000000.0,1610470000000.0,1,4JLiaohIk9,4JLiaohIk9,Final Decision,Reject,"This paper makes use of the unlikelihood objective from Welleck et al (2019) which was shown in NLP to the problem of forecasting motion trajectories on roads. The unlikelihood term is meant to lower the probability mass in non-driveable areas. The paper makes use of Trajectron++, and existing trajectory forecasting model to demonstrate the idea. While the idea is interesting, the notion of using negative examples to lower the likelihood outside a valid domain has been used in multiple occasions. The paper mentions contrastive learning, but I did not see a meaningful discussion on the difference between unlikelihood training and contrastive learning, beyond what exists in the related works section. Also, due to the unlikelihood term having appeared in Welleck et al, reviewers are hesitant to acknowledge novelty of the method. One of the reviewers also questions the significance of the results, which the authors countered by saying that their method reduces the violation rate from 10.6% to 8.9% in their predictions. This is good, but combined with the former issue implies that the paper needs more work before publication. ",ICLR2021, +LRcdpXpGpDg,1610040000000.0,1610470000000.0,1,R43miizWtUN,R43miizWtUN,Final Decision,Reject,"The paper focuses on the update step in Message-Passing Neural Networks, specifically for GNN. A series of sparse variants of the update step, say complete removal and expander graphs with varying density, are compared in empirical studies. The findings are quite useful for practice, and the paper is organized and written well. As observed by the reviewers, there are several concerns regarding the novelty and contribution of the work. Besides, theoretical analysis of the sparsification approach is lacking. The authors provided a good rebuttal and addressed some concerns, but not to the degree that reviewers think it passes the bar of ICLR. We encourage the authors to further improve the work to address the key concerns. ",ICLR2021, +oejiKPKQap,1576800000000.0,1576800000000.0,1,SJl28R4YPr,SJl28R4YPr,Paper Decision,Reject,"This work investigates the use of graph NNs for solving 2QBF . The authors provide empirical evidence that for this type of satisfiability decision problem, GNNs are not able to provide solutions and claim this is due to the message passing mechanism that cannot afford for complex reasoning. Finally, the authors propose a number of heuristics that extend GNNs and show that these improve their performance. + +2-QBF problem is used as a playground since, as the authors also point, their complexity is in between that of predicate and propositional logic. This on its own is not bad, as it can be used as a minimal environment for the type of investigation the authors are interested. That being said, I find a number a number of flaws in the current form of the paper (some of them pointed by R3 as well), with the main issue being that of lack experimental rigor. Given the restricted set of problems the authors consider, I think the experiments on identifying pathologies of GNNs on this setup could have gone more in depth. Let me be specific. + +1) The bad performance is attributed to message-passing. However, this feels anecdotal at the moment and authors do not provide firm conclusions about that. The only evidence they provide is that performance becomes better with more message-passing iterations they allow. This is a hint though to dive deeper rather than a firm conclusion. For example do we know if the finding about sensitivity to message-passing is due to the small size of the network or the training procedure? +2) To add on that, there is virtually no information on the paper about the specifics of the experimental setup, so the reader cannot be convinced that the negative results do not arise from a bad experimental configuration (e.g., small size of network). +3) Moreover, the negative results here, as the authors point, seem to contradict previous work, providing negative results against GNNs. Again, this is a valuable contribution if that is indeed the case, but again the paper does not provide enough evidence. In lieu of a convincing set of experiments, the paper could provide a proof (as also asked by R3). However with no proof and not strong empirical evidence that this result does not feel ready to get published at ICLR. + +Overall, I think this paper with a bit more rigor could be a very good submission for a later conference. However, as it stands I cannot recommend acceptance. +",ICLR2020, +1XGltkq_lty,1642700000000.0,1642700000000.0,1,_ysluXvD1M,_ysluXvD1M,Paper Decision,Reject,"This paper introduces a new (un)fairness metric for recommender systems based on mutual information and then develop an algorithm to account for this metric in matrix factorization-based collaborative filtering. The reviewers all agree that the proposed metric and algorithm are sound at a technical level, however, they have concerns regarding the motivation of the introduced metric as well as the experimental evaluation. The rebuttal by the authors did not persuade the reviewers to reconsider their original assessment and they still argued that their concerns remained. In the final recommendation, the simplicity of the metric was not seeing as a weakness of the work.",ICLR2022, +Qn5FLwxngp,1576800000000.0,1576800000000.0,1,BkgrBgSYDS,BkgrBgSYDS,Paper Decision,Accept (Spotlight),The paper generalizes several existing results for structured linear transformations in the form of K-matrices. This is an excellent paper and all reviewers confirmed that.,ICLR2020, +mNJjhTjHz-_,1642700000000.0,1642700000000.0,1,Vx8l4vwv94,Vx8l4vwv94,Paper Decision,Reject,"Strength +* The paper is relatively clearly written. +* A new method is proposed. + +Weakness +* The evaluation is weak. The experimental setup is not clear enough. More quantitative evaluation is necessary. There are strong and new baselines that need to be compared with. +* Relation with existing work needs to be more clearly described. +* The novelty of the work is limited. It is a combination of existing methods. +* Justification of the proposed method needs to be provided. +* The writing of the paper can be improved.",ICLR2022, +HCKOxAUQdlb,1610040000000.0,1610470000000.0,1,0p-aRvcVs-U,0p-aRvcVs-U,Final Decision,Reject,"This paper proposes \alphaVIL, a method for weighting the task-specific losses in a multi-task setting in order to optimize the performance on a particular target task. The idea is to first collect gradient updates for the model based on all the separate tasks, and then re-weight those updates in order to optimize the loss on a held-out development set for the target task. In practice, this meta-optimization is performed with gradient descent. Experiments on multi-MNIST and several tasks that are part of GLUE and SuperGLUE show that \alphaVIL is close in performance to a baseline multitask method and discriminative importance weighting. + +Strengths: +- The idea is intuitively appealing. Directly reweighting tasks as a meta-optimization step is straightforward and appears to not be proposed previously in the literature. +- The paper is clear in its presentation. + +Weaknesses: +- The reviewers agree that the main weakness is that the experimental results do not show that \alphaVIL offers any substantial benefits over existing methods. On the multi-MNIST task, while \alphaVIL tends to have the highest mean performance, the difference is small (less than a standard deviation). On the GLUE/SuperGLUE tasks, it outperforms other methods on only 1 out of 10 experiments. There are also no confidence intervals/standard deviations provided to assess the significance of the results.",ICLR2021, +Rr508hBUgc,1610040000000.0,1610470000000.0,1,l-PrrQrK0QR,l-PrrQrK0QR,Final Decision,Accept (Poster),"Three reviewers agree on the value of the contribution and recommend acceptance. A reviewer votes for rejection but the authors have clarified all the major concerns raised by the reviewer. Therefore, I recommend acceptance. ",ICLR2021, +hEZEkuBTlX,1576800000000.0,1576800000000.0,1,ByxduJBtPB,ByxduJBtPB,Paper Decision,Reject,"This paper describes situations whereby data augmentation (particularly drawn from a true distribution) can lead to increased generalization error even when the model being optimized is appropriately formulated. The authors propose ""X-regularization"" which requires that models trained on standard and augmented data produce similar predictions on unlabeled data. The paper includes a few experiments on a toy staircase regression problem as well as some ResNet experiments on CIFAR-10. This paper received 2 recommendations for rejection, and one weak accept recommendation. After the rebuttal phase, the author who recommended weak acceptance indicated their willingness to let the paper be rejected in light of the other reviews. The reviewer highlighted: ""I think the authors could still to better to relate their theory to practice, and expand on the discussion/presentation of X-regularization."" The main open issue is that the theoretical contributions of the paper are not sufficiently linked to the proposed algorithm.",ICLR2020, +ryWy2GIOe,1486400000000.0,1486400000000.0,1,BJRIA3Fgg,BJRIA3Fgg,ICLR committee final decision,Invite to Workshop Track,"The paper investigates the problem of morphing one convolutional network into another with application to exploring the model space (starting from a pre-trained baseline model). The resulting morphed models perform better than the baseline, albeit at the cost of more parameters and training. Importantly, it has not been demonstrated that morphing leads to training time speed-up, which is an important factor to consider when exploring new architectures starting from pre-trained models. Still, the presented approach and the experiments would be of interest to the community, so I recommend the paper for workshop presentation.",ICLR2017, +Wk6jd2UkAV,1576800000000.0,1576800000000.0,1,H1lVvgHKDr,H1lVvgHKDr,Paper Decision,Reject,"This paper has been assessed by three reviewers scoring it as follows: 6, 3, 8. The submission however attracted some criticism post-rebuttal from the reviewers e.g., why concatenating teacher to student is better than the use l2 loss or how the choice of transf. layers has been made (ad-hoc). Similarly, other major criticism includes lack of proper referencing to parts of work that have been in fact developed earlier in preceding papers. On balance, this paper falls short of the expectations of ICLR 2020, thus it cannot be accepted at this time. The authors are encouraged to work through major comments and resolve them for a future submission.",ICLR2020, +3zqZUS_-IK,1642700000000.0,1642700000000.0,1,ms7xJWbf8Ku,ms7xJWbf8Ku,Paper Decision,Reject,"This paper proposes to re-organize the training data in such a way that padding can be avoided. The novelty is somewhat limited and the results are what one would expect - a nice speed-up of 2x but nothing really game-changing. While the reviewer scores straddle the decision boundary, nobody is very strongly supportive of acceptance and the positive reviews actually have lower confidence.",ICLR2022, +H1xlSOydeE,1545230000000.0,1545380000000.0,1,rkMnHjC5YQ,rkMnHjC5YQ,meta-review,Reject,"The reviewers seem to reach a consensus that the contribution of the paper is somewhat incremental give the prior work of Goel et al and that a main drawback of the paper is that it's not clear the similar technique can be applied to multiple **convolutional filters**. The authors mentioned in the response that some of the techniques can be heuristically applied to multiple layers, but the AC is skeptical about it because, with multiple layers and multiple convolutional filters, one has to deal with the permutation invariance caused by the multiple convolutional filters. (It's unclear to the AC how one could have a meaningful setting with multiple layers but a single convolution filters.) ",ICLR2019,5: The area chair is absolutely certain +Byeze-HJeN,1544670000000.0,1545350000000.0,1,SkeZisA5t7,SkeZisA5t7,"Interesting visualizations, but more rigor and analysis would help",Accept (Poster),"This paper suggests that noise-regularized estimators of mutual information in deep neural networks should be adaptive, in the sense that the variance of the regularization noise should be proportional to the range of the hidden activity. Two adaptive estimators are proposed: (1) an entropy-based adaptive binning (EBAB) estimator that chooses the bin boundaries such that each bin contains the same number of unique observed activation levels, and (2) an adaptive kernel density estimator (aKDE) that adds isotropic Gaussian noise, where the variance of the noise is proportional to the maximum activity value in a given layer. These estimators are then used to show that (1) ReLU networks can compress, but that compression may or may not occur depending on the specific weight initialization; (2) different nonsaturating noninearities exhibit different information plane behaviors over the course of training; and (3) L2 regularization in ReLU networks encourages compression. The paper also finds that only compression in the last (softmax) layer correlates with generalization performance. The reviewers liked the range of experiments and found the observations in the paper interesting, but had reservations about the lack of rigor in the paper (no theoretical analysis of the convergence of the proposed estimator), were worried that post-hoc addition of noise distorts the function of the network, and felt that there wasn't much insight provided on the cause of compression in deep neural networks. The AC shares these concerns, and considers them to be more significant than the reviewers do, but doesn't wish to override the reviewers' recommendation that the paper be accepted.",ICLR2019,3: The area chair is somewhat confident +BklSvwP6kE,1544550000000.0,1545350000000.0,1,SyxvSiCcFQ,SyxvSiCcFQ,ICLR 2019 decision,Reject,"This paper studies the problem of training binary neural networks using quantum amplitude amplification method. Reviewers agree that the problem considered is novel and interesting. However the consensus is that there are only few experiments in the current paper and the paper needs more experiments on different datasets with comparisons to proper baselines. Reviewers opined that the paper was not so easy to follow initially, though later revisions may have somewhat alleviated this problem.",ICLR2019,4: The area chair is confident but not absolutely certain +ULPGE6t-IGe,1610040000000.0,1610470000000.0,1,c_E8kFWfhp0,c_E8kFWfhp0,Final Decision,Accept (Poster),This paper presents a framework for joint differentiable simulation of physics and image formation for inverse problems. It brings together ideas from differentiable physics and differentiable rendering in a compelling framework.,ICLR2021, +1KpU2HKn2G,1576800000000.0,1576800000000.0,1,BkxgrAVFwH,BkxgrAVFwH,Paper Decision,Reject,"The paper presents a framework named Wasserstein-bounded GANs which generalizes WGAN. The paper shows that WBGAN can improve stability. + +The reviewers raised several questions about the method and the experiments, but these were not addressed. + +I encourage the authors to revise the draft and resubmit to a different venue.",ICLR2020, +BJ0ONyarz,1517250000000.0,1517260000000.0,391,B1Z3W-b0W,B1Z3W-b0W,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"This paper is intersting but has a few flaws that still need to be addressed. As one reviewer noted, ""the authors seems to have simply applied the method of Andrychowicz et al. If they added some discussion and experiments clearly showing why this is a better way to improve the existing inference methods, the paper might have more impact."". +Overall, this work builds on existing work, but does not really dig deep enough for answers to these questions raised by the reviewers. The committee still feels this paper will be of great value at ICLR and recommends it for a workshop paper. +",ICLR2018, +NazKJQg5M,1576800000000.0,1576800000000.0,1,SkxUrTVKDH,SkxUrTVKDH,Paper Decision,Reject,"This paper investigates an existing method for fitting sparse neural networks, and provides a novel proof of global convergence. Overall, this seems like a sensible, if marginal, contribution. However, there were serious red flags regarding the care which which the scholarship was done which make me deem the current submission unsuitable for publication. In particular, two points raised by R4, which were not addressed even after the rebuttal: + +1) ""One important issue with the paper is that it blurs the distinction between prior work and the new contribution. For example, the subsection on Split Linearized Bregman Iteration in the ""Methodology"" section does not contain anything new compared to [1], and this is not clear enough to the reader."" + +2) ""The newly-written conclusion is still incorrect, stating again that Split LBI achieves SOTA performance on ImageNet."" + +I believe that R3's high score is due to not noticing these unsupported or misleading claims. + +",ICLR2020, +qgKYHvFnrox,1642700000000.0,1642700000000.0,1,B0oHOwT5ENL,B0oHOwT5ENL,Paper Decision,Accept (Poster),"The authors introduce a neural network approach for solving the fixed point equations arising in deep equilibrium models. This consists of a tiny network that provides an initial guess for the fixed point, as well as a small network that computes coefficients inside an algorithm inspired by Anderson iteration. + +Overall, there is consensus among the reviewers that the paper is well written and is a strong empirical study. + +I recommend acceptance as a poster. + +Additional remarks: + +- The authors argue the DEQs / implicit deep learning models allow a decoupling between representational capacity and inference-time efficiency. Yet, in the ""Regularizing Implicit Models"" paragraph, they write ""Implicit models are known to be slow during training and inference. To address this, recent works have developed certain regularization methods that encourage these models to be more stable and thus easier to solve."", which seems like a contradiction to me. So while in theory I agree with this decoupling, in practice, it seems not completely true. + +- Section 3 should include some discussion on conditions on f_theta for the existence of a fixed point. + +- Since the initialization and HyperAnderson networks are trained using unrolling, there is some memory overhead compared to vanilla DEQs, that are differentiated purely using implicit differentiation. It would be great to clarify the amount of extra memory needed by these networks. It is necessary to justify that the initialization and HyperAnderson networks are smaller than usual neural networks.",ICLR2022, +uJ5U8t99mw8,1642700000000.0,1642700000000.0,1,bfuGjlCwAq,bfuGjlCwAq,Paper Decision,Accept (Poster),This paper proposes a new approach to online 3D bin packing with deep reinforcement learning. It received mixed reviews. AC finds that the responses from authors have addressed the concerns satisfactorily.,ICLR2022, +gVbCrFINI2,1642700000000.0,1642700000000.0,1,baUQQPwQiAg,baUQQPwQiAg,Paper Decision,Accept (Poster),"To address the problem of unauthorized use of data, methods are proposed to make data unlearnable for deep learning models by adding a type of error-minimizing noise. Based on th fact that the conferred unlearnability is found fragile to adversarial training, the authors design new methods to generate robust unlearnable examples that are protected from adversarial training. In addition, considering the vulnerability of error-minimizing noise in adversarial training, robust error-minimizing noise is then introduced to reduce the adversarial training loss. +The authors have tried to respond to reviewers' comments along with adding more experiments. +Overall, this manuscript finally gets three positive reviews and one negative review, where the possible vulnerability or robustness of error-minimizing noise against (simple) image processing operations was not verified. +In comparison with other manuscripts I'm handling that got consistent positive comments, this manuscript is still recommended to be accepted (poster) with a further study of robustness under simple image transformations in the final version.",ICLR2022, +zrbA8uGEx6,1576800000000.0,1576800000000.0,1,rkg6sJHYDr,rkg6sJHYDr,Paper Decision,Accept (Talk),"The authors introduce a framework for automatically detecting diverse, self-organized patterns in a continuous Game of Life environment, using compositional pattern producing networks (CPPNs) and population-based Intrinsically Motivated Goal Exploration Processes (POP-IMGEPs) to find the distribution of system parameters that produce diverse, interesting goal patterns. + +This work is really well-presented, both in the paper and on the associated website, which is interactive and features source code and demos. Reviewers agree that it’s well-written and seems technically sound. I also agree with R2 that this is an under-explored area and thus would add to the diversity of the program. + +In terms of weaknesses, reviewers noted that it’s quite long, with a lengthy appendix, and could be a bit confusing in areas. Authors were responsive to this in the rebuttal and have trimmed it, although it’s still 29 pages. My assessment is well-aligned with those of R2 and thus I’m recommending accept. In the rebuttal, the authors mentioned several interesting possible applications for this work; it’d be great if these could be included in the discussion. + +Given the impressive presentation and amazing visuals, I think it could make for a fun talk. +",ICLR2020, +HkxM4PCAkN,1544640000000.0,1545350000000.0,1,ryM_IoAqYX,ryM_IoAqYX,"Good convergence analysis on convex model training with combined weight and gradient quantization, and empirical evidence for deep networks.",Accept (Poster),"This paper provides the first convergence analysis for convex model distributed training with quantized weights and gradients. It is well written and organized. Extensive experiments are carried out beyond the assumption of convex models in the theoretical study. + +Analysis with weight and gradient quantization has been separately studied, and this paper provides a combined analysis, which renders the contribution incremental. + +As pointed out by R2 and R3, it is somewhat unclear under which problem setting, the proposed quantized training would help improve the convergence. The authors provide clarification in the feedback. It is important to include those, together with other explanations in the feedback, in the future revision. + +Another limitation pointed out by R3 is that the theoretical analysis applies to convex models only. Nevertheless, it is nice to show in experiments that deep networks training is benefitted from the gradient quantization empirically.",ICLR2019,3: The area chair is somewhat confident +ylPSjiQo4V,1576800000000.0,1576800000000.0,1,BklOXeBFDS,BklOXeBFDS,Paper Decision,Reject,"Paper proposes a method for active learning on graphs. Reviewers found the presentation of the method confusing and somewhat lacking novelty in light of existing works (some of which were not compared to). After the rebuttal and revisions, reviewers minds were not changed from rejection. ",ICLR2020, +r1lGsE5eeE,1544750000000.0,1545350000000.0,1,H1gFuiA9KX,H1gFuiA9KX,motivation is unclear,Reject,"although the proposed method could be considered an interesting application to recently popular hypobolic space to word embeddings, it is unclear why this needs to be done so. experiments also do not support why or whether the application of hyperbolic space to word embedding is necessary.",ICLR2019,4: The area chair is confident but not absolutely certain +tJ7OA5cTKl,1642700000000.0,1642700000000.0,1,7fFO4cMBx_9,7fFO4cMBx_9,Paper Decision,Accept (Poster),"Meta Review for Variational Neural Cellular Automata + +This paper proposes a generative model, a VAE whose decoder is implemented via neural cellular automata (NCA). The authors show that this model performs well for reconstruction, but they also show that the architecture has some robustness properties against damage during generation. + +Experiments were conducted on 3 datasets: MNIST, Noto Emoji, and CelebA, and while experimental results were great on MNIST, the method was less performant so on the other two datasets, although there is clear evidence that the model can learn to generate meaningful images. For the robustness experiments, the authors show that VNCA is robust to perturbations (occlusions) and show that the model has a reasonable degree of robustness even without ever seeing any perturbations at training time. + +All authors agree that this model is an improvement over neural cellular automata, and that the approach is interesting and the results are sound (and even useful). Initially, there were concerns that NCA's were simply convolutional neural networks (the connection is already known, and not the point of the paper), and issues with comparison with baselines for damage reconstruction tasks, but these were addressed by the authors (which the reviewers have acknowledged, and have improved their scores). The authors have also responded to the concerns of reviewer cp9d, and due to the lack of response from cp9d, I assessed the authors' response myself and find that they do address the concerns (in particular, they removed claims of super-resolution, and improved the clarity of the work). With that in mind, the score of 5 is viewed as a score of 6 from my perspective (giving this work effectively an average score of 6). + +After my assessment of the paper and reviews, I agree with reviewer kwgv, as they have summarized the work in their original review: +- The authors propose a variational neural cellular automaton, which learns to generate images by iterating the transition rule. +- The paper is interesting, with good results, and a good fit for ICLR. +- The paper solves an interesting problem on the topic of neural cellular automata. +- There are some doubts/limitations that I have asked the authors to address (mainly concerning the architecture of the model). +- There are some missing references and details that would help the readers to get a better sense of the subject. + +Crucially, kwgv have acknowledged that the *authors have improved the paper significantly after the reviews, and they have addressed all questions and comments that [they] raised* (especially with regards to the last 2 points), and kwgv has subsequently championed the work with a score of 8. With the increased scores from kwgv and AnwX in mind, and also with what I view as an increased score of 6 from cp9d (in the lack of response from the reviewer, the authors have addressed the concerns in my judgement), my conclusion is that this is a nice work that bridges NCAs with generative models, and I think the work will be a useful addition to the growing literature in this space. I will recommend it for acceptance at ICLR 2022 as a poster.",ICLR2022, +r1xGoih6JE,1544570000000.0,1545350000000.0,1,H1xEtoRqtQ,H1xEtoRqtQ,"This paper provides some interesting ideas, but has a mismatch between the title and motivation and what is provided",Reject,"As all the reviewers have highlighted, there is some interesting analysis in this paper on understanding which models can be easier to complete. The experiments are quite thorough, and seem reproducible. However, the biggest limitation---and the ones that is making it harder for the reviewers to come to a consensus---is the fact that the motivation seems mismatched with the provided approach. There is quite a lot of focus on security, and being robust to an adversary. Model splitting is proposed as a reasonable solution. However, the Model Completion hardness measure proposed is insufficiently justified, both in that its not clear what security guarantees it provides nor is it clear why training time was chosen over other metrics (like number of samples, as mentioned by a reviewer). If this measure had been previously proposed, and the focus of this paper was to provide empirical insight, that might be fine, but that does not appear to be the case. This mismatch is evident also in the writing in the paper. After the introduction, the paper largely reads as understanding how retrainable different architectures are under which problem settings, when replacing an entire layer, with little to no mention of security or privacy. + +In summary, this paper has some interesting ideas, but an unclear focus. The proposed strategy should be better justified. Or, maybe even better for the larger ICLR audience, the provided analysis could be motivated for other settings, such as understanding convergence rates or trainability in neural networks.",ICLR2019,4: The area chair is confident but not absolutely certain +5z8aGOxQ4,1576800000000.0,1576800000000.0,1,HJg_tkBtwS,HJg_tkBtwS,Paper Decision,Reject,"The paper presents an approach to feature selection. Reviews were mixed and questions whether the paper has enough substance, novelty, the correctness of the theoretical contributions, experimental details, as well as whether the paper compares to the relevant literature. ",ICLR2020, +gGEtOvN6-j,1610040000000.0,1610470000000.0,1,27acGyyI1BY,27acGyyI1BY,Final Decision,Accept (Poster),"This work proposes a stochastic process variant that extends existing work on neural ODEs. The resulting method allows for a fast data-adaptive method that can work well fit to sparser time series settings, without retraining. The methodology is backed up empirically, and after the response period, the reviewers' concerns are sufficiently addressed and reviewers are in agreement that the contributions are clear and correct.",ICLR2021, +uw6WEChgMUX,1642700000000.0,1642700000000.0,1,dwg5rXg1WS_,dwg5rXg1WS_,Paper Decision,Accept (Spotlight),"The paper proposes a GAN architecture with a ViT-based discriminator and a ViT-based generator. The paper initially received a mixed rating with two ""slightly above the acceptance threshold"" ratings and ""three slightly below the acceptance threshold"" ratings. Several concerns were raised in the reviews, including whether there are advantages of using a ViT-based GAN architecture over the CNN-based GAN and whether the proposed method can be extended to high-resolution image synthesis. These concerns are well-addressed in the rebuttal with most of the reviewers increasing their ratings to be above the bar. The meta-reviewer agrees with the reviewers' assessments and would like to recommend acceptance of the paper.",ICLR2022, +rkx6RZ_iyV,1544420000000.0,1545350000000.0,1,S1eVe2AqKX,S1eVe2AqKX,metareview: no rebuttal,Reject,"All reviewers rate the paper as below threshold. While the authors responded to an earlier request for clarification, there is no rebuttal to the actual reviews. Thus, there is no basis by which the paper can be accepted.",ICLR2019,5: The area chair is absolutely certain +ub8dq2rzgdSx,1642700000000.0,1642700000000.0,1,demdsohU_e,demdsohU_e,Paper Decision,Reject,"The paper proposes a method for inferring which of a set of pretrained neural networks, once fine-tuned on a transfer task, will generalize the best. This is accomplished by deriving a quantity based on a mean-field approximation of a dynamical system defined on the adjacency matrix of the weights of a neural network, known as the ""neural capacitance"". The model selection procedure involves attaching a fixed, randomly initialized network onto the outputs of the pretrained network and fine-tuning for a small number of iterations, and computing the metric; the fixed network is called the ""neural capacitance probe"" (NCP). + +Reviews, though low confidence, awarded borderline scores, and a central concern was clarity and motivation, in particular the role of the NCP. acZh, the highest confidence and most verbose reviewer, echoed these concerns along with specific criticisms, for example about the heavy reliance on Gao et al (2016) without elaboration. The authors have responded in considerable depth but unfortunately the reviewer has not acknowledged these responses. On the NCP, the authors note that this is an approximation to the ideal metric that they have empirically validated. + +Reading the updated draft, I find myself still concurring with reviewer acZh in large degree. The draft has improved with the noted additions, such as Appendix G devoted to an explanation of Gao et al (2016), but the presentation is still quite challenging to follow. I am left with fundamental questions about the soundness of the approximation being made, its wider applicability, and the many arbitrary decisions regarding the architecture of the NCP that appear out of nowhere. How sensitive is the procedure to these choices? Did the authors tune these architectural hyperparameters? Using what data? The table of results does not include units, and for a paper proposing a general purpose metric I'd ideally want to see a a robust rationale for hyperparameter selection of method-specific hyperparameters as well as a rigorous statistical treatment of the method's performance. Since it involves an approximation, a comparison to the ""ideal"" or ""exact"" procedure on a toy problem where the latter is feasible would strengthen the paper considerably. I do appreciate the breadth of architectures and datasets examined, but I believe the central focus of the paper should be explaining the mathematical motivation (perhaps at a higher level and deferring more detail to the appendix), why precisely it makes sense in the context of neural networks (also raised by acZh, with an answer provided that I believe partially addresses this) and justifying the concrete, approximate instantiation of the method involving the NCP and the hyperparameter selection and evaluation protocol that led you to the particular NCP employed. + +At a higher level, this is a very mathematically dense paper that relies considerably on concepts outside of what might be considered typical expertise in the ICLR community, reflected in the confidence scores of the reviewers. While I feel that the issues described above already preclude acceptance at this time, I believe it may be difficult to do the proposed method justice in the short conference paper format, and would suggest to the authors to consider a journal submission instead, where a didactic presentation can be given the full attention it deserves without the difficulty created by length constraints. + +Finally, I'd like to apologize to the authors for the non-responsiveness of the Area Chair. The original Area Chair was not able to complete their duty and I have been belatedly assigned this paper to evaluate it, and it is clear that not as much discussion took place as would have been ideal.",ICLR2022, +D9LxjMLt5hz,1610040000000.0,1610470000000.0,1,ni_nys-C9D6,ni_nys-C9D6,Final Decision,Reject,"After reading the paper, reviews and authors’ feedback. The meta-reviewer agrees with the reviewers that the paper touches an interesting topic (reversible computing) but could be improved in the area of presentation and evaluation. Therefore this paper is rejected. + +Thank you for submitting the paper to ICLR. +",ICLR2021, +Q0qpdigPU,1576800000000.0,1576800000000.0,1,Hyx-jyBFPr,Hyx-jyBFPr,Paper Decision,Accept (Spotlight),"The paper focuses on supervised and self-supervised learning. The originality is to formulate the self-supervised criterion in terms of optimal transport, where the trained representation is required to induce $K$ equidistributed clusters. The formulation is well founded; in practice, the approach proceeds by alternatively optimizing the cross-entropy loss (SGD) and the pseudo-loss, through a fast version of the Sinkhorn-Knopp algorithm, and scales up to million of samples and thousands of classes. + +Some concerns about the robustness w.r.t. imbalanced classes, the ability to deliver SOTA supervised performances, the computational complexity have been answered by the rebuttal and handled through new experiments. The convergence toward a local minimum is shown; however, increasing the number of pseudo-label optimization rounds might degrade the results. + +Overall, I recommend to accept the paper as an oral presentation. A more fancy title would do a better justice to this very nice paper (""Self-labelling learning via optimal transport"" ?). ",ICLR2020, +rCMr0PvrZVL,1642700000000.0,1642700000000.0,1,bOcUqfdH3S8,bOcUqfdH3S8,Paper Decision,Reject,"The paper studies an important problem of quantifying uncertainty (as measure by calibration) of predictions made by an ML algorithm in the presence of distribution drift. However, all reviewers point out a slew of concerns that went un-rebutted by the authors. The reviewers concurred that the paper deserved to be rejected at the current stage, and I concur. I recommend that the authors take the critical and constructive feedback into account to improve the paper and perhaps resubmit to a different venue in 2022.",ICLR2022, +3bPcn1wwYfI,1642700000000.0,1642700000000.0,1,8Wdj6IJsSyJ,8Wdj6IJsSyJ,Paper Decision,Reject,"The reviewers are in consensus that this manuscript falls just short of the bar. I recommend that the authors take their recommendations into consideration in revising their manuscript, with a particular focus on comparison to the state of the art.",ICLR2022, +5krNxRk7My9,1610040000000.0,1610470000000.0,1,qbRv1k2AcH,qbRv1k2AcH,Final Decision,Reject,"*Overview* This paper applies RL to automated theorem proving to eliminate the need for human-written proofs as training data. The method uses TF-IDF for premise selections. The experiments compared with supervised baseline demonstrate some good performance. + +*Pro* The paper provides a side-by-side comparison of the effect of the availability of human proofs on the final theorem proving. +The experiments compared with supervised baseline show that the proposed method has good performance even without human knowledge. The prosed TF-IDF selection algorithm addresses a challenging issue in exploration of RL. + +*Con* The reviewers primarily concern about the novelty of the methods. It appears the method is not new since there exist a body of work leveraging RL to learn theorem provers. The tasks are also not novel. After rebuttal, the reviewers are not convinced that the novelty is significant enough for ICLR. The reviewers are also concerned that the proposed method might not be easily generalized to other tasks. + +*Recommendation* Although the proposed method and experiment demonstrate some merits, there is a lack of novelty in terms of approaches. Since existing results already consider similar methods and similar tasks, it would make the paper stronger if thorough experimental comparisons are performed. +",ICLR2021, +SJxVRzyWgE,1544770000000.0,1545350000000.0,1,SkeK3s0qKQ,SkeK3s0qKQ,Interesting idea with relevance to some common settings,Accept (Poster)," +The authors present a novel method for tackling exploration and exploitation that yields promising results on some hard navigation-like domains. The reviewers were impressed by the contribution and had some suggestions for improvement that should be addressed in the camera ready version. +",ICLR2019,4: The area chair is confident but not absolutely certain +93iQ0Ft4pyc,1610040000000.0,1610470000000.0,1,2hT6Fbbwh6,2hT6Fbbwh6,Final Decision,Reject,"Dear Authors, + +Thanks for your detailed feedback to and even communications with the reviewers. Your additional input certainly clarified some of the concerns raised by the reviewers and also improved their understanding of your work. + +However, we still think that the notion of sequential bias is unclear, and the authors overclaim what they have done. +For these reasons, this paper cannot be recommended for acceptance. +I hope that the detailed feedback from the reviewers will help improve this work for future publication. +",ICLR2021, +rHhhysmoB7m,1610040000000.0,1610470000000.0,1,IDFQI9OY6K,IDFQI9OY6K,Final Decision,Accept (Poster),"The paper proposes a user-interaction framework where users choose a subset of LFs from a family of LFs generated using some template (e.g. keywords for text classification). The proposed criteria is not very surprising, but the authors present a practical and useful system that is well demonstrated both in the paper and the very careful author feedback. These enhancements have also been incorporated in the revised version. + + Apart from the literature pointed by the reviewers, here are some more papers that are related to this paper: +1. Gregory Druck, Burr Settles, Andrew McCallum: +Active Learning by Labeling Features. EMNLP 2009: 81-90 + +2. Gregory Druck, Gideon S. Mann, Andrew McCallum: +Learning from labeled features using generalized expectation criteria. SIGIR 2008: 595-602 + + 3. Data Programming using Continuous and Quality-Guided Labeling Functions. In AAAI, 2020.",ICLR2021, +SJeDEJ6BG,1517250000000.0,1517260000000.0,366,B1KJJf-R-,B1KJJf-R-,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"the reviewers all found the problem to be important, the proposed approach to be interesting, but the manuscript to be preliminary. i agree with them.",ICLR2018, +a4hWqOzRF3C,1610040000000.0,1610470000000.0,1,fV4vvs1J5iM,fV4vvs1J5iM,Final Decision,Reject,"The paper builds on the prior work by Miryoosefi et al. (2019) that finds a feasible mixed policy under convex constraints through distance minimization over a simplex set. Instead of the primal-dual approach used in Miryoosefi et al. (2019), this paper proposes to apply Frank-Wolfe type algorithm (particularly, the minimum norm point algorithm) to promote sparsity of the mixed policy, while achieving the same complexity. + +Despite the improvement on sparsity, the AC and some reviewers share two main concerns: (1) incremental novelty of the algorithm/theory, which basically follows from existing optimization work, (2) lack of (theoretical and numerical) justification of the significance of sparsity (especially given that the main computation costs come from projection and RL oracle). + +Unfortunately, the paper lands just below borderline and cannot be accepted this time. +",ICLR2021, +3kGSD2RZvU,1610040000000.0,1610470000000.0,1,4YzI0KpRQtZ,4YzI0KpRQtZ,Final Decision,Reject,"The paper proposes a Bayesian neural network model for tensor factorization, with particular focus on streaming data. The key contribution is the streaming posterior inference of the deep TF models. The combinations of online tensor factorization, Bayesian NN with sparsity priors, posterior inference is new and interesting. However, there are many approximation steps, and the quality of the approximation and convergence of algorithm are not well justified. ",ICLR2021, +CnZiloV-j,1576800000000.0,1576800000000.0,1,SylO2yStDr,SylO2yStDr,Paper Decision,Accept (Poster),"This paper presents Layerdrop, which is a method for structured dropout which allows you to train one model, and then prune to a desired depth at test time. This is a simple method which is exciting because you can get a smaller, more efficient model at test time for free, as it does not need fine tuning. They show strong results on machine translation, language modelling and a couple of other NLP benchmarks. The reviews are consistently positive, with significant author and reviewer discussion. This is clearly an approach which merits attention, and should be included in ICLR.",ICLR2020, +iCKrBLs28s,1610040000000.0,1610470000000.0,1,Ovp8dvB8IBH,Ovp8dvB8IBH,Final Decision,Accept (Poster),"All reviewers find the proposed data augmentation approach simple, interesting and effective. They agree that paper does a good job exploring this idea with number of experiments. However the paper also suffers from some drawbacks, and reviewers raise questions about some of the conclusions of the paper - in particular how to designate an augmentation as either negative or positive is not clear apriori to training. While I agree with this criticism, I believe the paper overall explores an interesting direction and provides a good set of experiments than can be built on in future works, and I suggest acceptance. I encourage authors to address all the reviewers concerns as per the feedback in the final version.",ICLR2021, +Bke8gY5HlV,1545080000000.0,1545350000000.0,1,ryxeB30cYX,ryxeB30cYX,reject,Reject,"While the paper contains interesting ideas, the reviewers suggest improving the clarity and experimental study of the paper. The work holds promises but is not ready for publication at ICLR.",ICLR2019,4: The area chair is confident but not absolutely certain +r5Iy5tlmjt,1576800000000.0,1576800000000.0,1,Hkg9HgBYwH,Hkg9HgBYwH,Paper Decision,Reject,"Main content: + +Blind review #3 summarizes it well: + +This paper presents a technique for encoding the high level “style” of pieces of symbolic music. The music is represented as a variant of the MIDI format. The main strategy is to condition a Music Transformer architecture on this global “style embedding”. Additionally, the Music Transformer model is also conditioned on a combination of both “style” and “melody” embeddings to try and generate music “similar” to the conditioning melody but in the style of the performance embedding. + +-- + +Discussion: + +The reviewers questioned the novelty. Blind review #2 wrote: ""Overall, I think the paper presents an interesting application and parts of it are well written, however I have concerns with the technical presentation in parts of the paper and some of the methodology. Firstly, I think the algorithmic novelty in the paper is fairly limited. The performance conditioning vector is generated by an additional encoding transformer, compared to the Music Transformer paper (Huang et. al. 2019b). However, the limited algorithmic novelty is not the main concern. The authors also mention an internal dataset of music audio and transcriptions, which can be a major contribution to the music information retrieval (MIR) community. However it is not clear if this dataset will be publicly released or is only for internal experiments."" + +However, after revision, the same reviewer has upgraded the review to a weak accept, as the authors wrote ""We emphasize that our goal is to provide users with more fine-grained control over the outputs generated by a seq2seq language model. Despite its simplicity, our method is able to learn a global representation of style for a Transformer, which to the best of our knowledge is a novel contribution for music generation. Additionally, we can synthesize an arbitrary melody into the style of another performance, and we demonstrate the effectiveness of our results both quantitatively (metrics) and qualitatively (interpolations, samples, and user listening studies)."" + +-- + +Recommendation and justification: + +This paper is borderline for the reasons above, and due to the large number of strong papers, is not accepted at this time. As one comment, this work might actually be more suitable for a more specialized conference like ISMIR, as its novel contribution is more to music applications than to fundamental machine learning approaches.",ICLR2020, +nKfM2eVSOti,1642700000000.0,1642700000000.0,1,kj8TBnJ0SXh,kj8TBnJ0SXh,Paper Decision,Reject,"The reviewers raised a number of major concerns including a poor readability of the presented materials, incremental novelty of the presented and, most importantly, insufficient and unconvincing ablation and experimental evaluation studies presented. The authors’ rebuttal failed to address all reviewers’ questions and failed to alleviate reviewers’ concerns. The authors explain that due to the lack of time they could not complete all experimental studies. A major revision of the paper is needed before the paper can be accepted for publication. Hence, I cannot suggest this paper for presentation at ICLR.",ICLR2022, +3uhJ1KKgm,1576800000000.0,1576800000000.0,1,SylUzpNFDS,SylUzpNFDS,Paper Decision,Reject,"Main content: + +Blind review #3 summarizes it well: + +This paper proposes a new loss for training models that predict where events occur in a sequence when the training sequence has noisy labels. The central idea is to smooth the label sequence and prediction sequence and compare these rather than to force the model to treat all errors as equally serious. + +The proposed problem seems sensible, and the method is a reasonable approach. The evaluations are carried out on a variety of different tasks (piano onset detection, drum detection, smoking detection, video action segmentation). + +-- + +Discussion: + +The reviewers were concerned about the relatively low level of novelty, simplicity of the proposed approach (which the authors argue could be seen as a feature rather than a flaw, given its good performance), and inadequate motivation. + +-- + +Recommendation and justification: + +After the authors' revision in response to the reviews, this paper could be a weak accept if not for the large number of stronger submissions.",ICLR2020, +B1k6hMUue,1486400000000.0,1486400000000.0,1,HyoST_9xl,HyoST_9xl,ICLR committee final decision,Accept (Poster),"Important problem, simple (in a positive way) idea, broad experimental evaluation; all reviewers recommend accepting the paper, and the AC agrees. Please incorporate any remaining reviewer feedback.",ICLR2017, +HklRzRxmeV,1544910000000.0,1545350000000.0,1,ryl8-3AcFX,ryl8-3AcFX,meta review,Accept (Poster),"This paper proposes an approach for probing an environment to quickly identify the dynamics. The problem is relevant to the ICLR community. The paper is well-written, and provides a detailed empirical evaluation. The main weakness of the paper is the somewhat small originality over prior methods on online system identification. Despite this, the reviewer's agreed that the paper exceeds the bar for publication at ICLR. Hence, I recommend accept. + +Beyond the related work mentioned by the reviewers, the approach is similar to work in meta-learning. Meta-RL and multi-task learning has typically been considered in settings where the reward is changing (e.g. see [1],[2],[3],[4], where [4] also uses an embedding-based approach). However, there is some more recent work on meta-RL across varying dynamics, e.g. see [5],[6]. The authors are encouraged to make a conceptual connection between this approach and the line of work in model-based meta-RL (particularly [5] and [6]) in the final version of the paper. + +[1] Duan et al. https://arxiv.org/abs/1611.02779 +[2] Wang et al. CogSci '17 https://arxiv.org/abs/1611.05763 +[3] Finn et al. ICML '17 https://arxiv.org/abs/1703.03400 +[4] Hausman et al. ICLR '17: https://openreview.net/forum?id=rk07ZXZRb +[5] Sæmundsson et al. https://arxiv.org/abs/1803.07551 +[6] Nagabandi et al. https://arxiv.org/abs/1803.11347 +",ICLR2019,4: The area chair is confident but not absolutely certain +HhY7xXLJcyt,1610040000000.0,1610470000000.0,1,fw-BHZ1KjxJ,fw-BHZ1KjxJ,Final Decision,Accept (Poster),"The paper proposes an approach to learn sparse embeddings for documents/labels which can be trained by using multiple GPUs in parallel, and are more amenable to nearest neighbor search. + +The paper certainly seemed to have botched comparison to SNRM and requires to fix the claims in section 5.1. +But, the impressive performance on extreme classification tasks is quite convincing. Also, reviewers in general are quite enthusiastic about the paper. So we would recommend the paper for acceptance, but authors certainly need to take comments of reviewers into account (especially around baselines and comparison to SNRM).",ICLR2021, +c8I8ONyEoc,1576800000000.0,1576800000000.0,1,S1ef6JBtPr,S1ef6JBtPr,Paper Decision,Reject,"The paper takes the perspective of ""reinforcement learning as inference"", extends it to the multi-agent setting and derives a multi-agent RL algorithm that extends Soft Actor Critic. Several reviewer questions were addressed in the rebuttal phase, including key design choices. A common concern was the limited empirical comparison, including comparisons to existing approaches. ",ICLR2020, +Rhe1zFsi9Fw,1642700000000.0,1642700000000.0,1,TIdIXIpzhoI,TIdIXIpzhoI,Paper Decision,Accept (Spotlight),"This paper presents a faster sampling method for diffusion based generative models which are usually slow in practice. The key idea is based a progressive distillation approach (e.g., how to distill a 4 step sampler into a 1 step sampler). The paper studies the various design choices for diffusion models which existing work hasn't looked at that deeply and sheds light on the effects of these choices. The paper also shows that DDIM can be seen as a numerical integrator for probability flow ODE. The experimental results are impressive. + +There were some concerns such as the effect of progressive distillation and the overhead of distilling the diffusion model but the authors provided a satisfactory response and backed it up with additional results. + +Overall, this is a nice paper on making diffusion based generative models generate faster samples and also provides novel insights into the behavior of these models under various design choices. Given the significant recent interest in these models which are pretty impressive in terms of generation quality but slow, the paper indeed makes a timely contribution which will fuel further interest in these models. All the reviewers have voted for acceptance. Based on my own reading, the reviewers' assessments, the discussions, and the authors' response, I would vote for acceptance.",ICLR2022, +BJWK4y6HG,1517250000000.0,1517260000000.0,394,BkfEzz-0-,BkfEzz-0-,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track," +The reviewers have significantly different views, with one strongly negative, +one strongly positive, and one borderline negative. However, all three +reviews seem to regard the NaaA framework as a very interesting and novel approach to training neural nets. They also concur that the major issue with the paper is very confusing technical exposition regarding the motivation, math details, and how the idea works. The authors indicate that they have significantly revised the manuscript to improve the exposition, but none of the reviewers have changed their scores. One reviewer states that ""technical details are still too heavy to easily follow."" My own take regarding the current section 3 is that it is still very challenging to parse and follow. Given this analysis, the committee recommends this for workshop. + +Pros: + Interesting and novel framework for training NNs + ""Adaptive DropConnect"" algorithm contribution + Good empirical results in image recognition and ViZDoom domains + +Cons: + Technical exposition is very challenging to parse and follow + Some author rebuttals do not inspire confidence. For example, +motivation of method due to ""$100 billion market cap of Bitcoin"" and in reply to unconvincing neuroscience motivation, saying ""throw away the typical image of auction.""",ICLR2018, +kt3lMzWDPRn,1642700000000.0,1642700000000.0,1,_fLxZ6VpXTH,_fLxZ6VpXTH,Paper Decision,Reject,"The paper proposes the use of a state distribution estimation objective with a classic behavioral cloning objective for imitation learning. +The submission also proposes the use of a continuous normalizing flow training technique coined ""denoising normalizing flow"" to learn the state distribution. The authors experimentally validate their method on several MuJoCo continuous control benchmarks. +The theorem 4.1 does validate the fact that this proposed objective is can be maximized by the target policy. +However, the technical contributions (proposal of new objective and the denoising normalizing flow method) are marginal compared to previous work (e.g., SoftFlow or Energy-Based Imitation Learning). +The empirical validation is lacking more extensive comparison with PWIL or NDI, which are more recent methods attempting to address the challenges described in the submission. +I'm recommending this paper for rejecting for this conference.",ICLR2022, +HyglvaKxlN,1544750000000.0,1545350000000.0,1,HJMCcjAcYX,HJMCcjAcYX,meta-review,Accept (Poster),"The paper proposes an architecture to learn over sets, by proposing a way +to have permutations differentiable end-to-end, hence learnable by gradient +descent. Reviewers pointed out to the computational limitation (quadratic in +the size of the set just to consider pairwise interactions, and cubic overall). +One reviewer (with low confidence) though the approach was not novel but +didn't appreciate the integration of learning-to-permute with a differentiable +setting, so I decided to down-weight their score. Overall, I found the paper +borderline but would propose to accept it if possible.",ICLR2019,4: The area chair is confident but not absolutely certain +SkdVNJ6Bz,1517250000000.0,1517260000000.0,332,H1Y8hhg0b,H1Y8hhg0b,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The results in the paper are interesting, and the modifications improve the paper further. Reviewers found teh paper interesting and potentailly applicable to many models.",ICLR2018, +S1xX9W7lgN,1544720000000.0,1545350000000.0,1,HkNDsiC9KQ,HkNDsiC9KQ,Well written paper with an interesting idea,Accept (Oral),"The reviewers all agree that the idea is interesting, the writing clear and the experiments sufficient. + +To improve the paper, the authors should consider better discussing their meta-objective and some of the algorithmic choices. ",ICLR2019,4: The area chair is confident but not absolutely certain +Q1U_58GGCc,1642700000000.0,1642700000000.0,1,qNcedShvOs4,qNcedShvOs4,Paper Decision,Reject,"The paper aims to integrate Stein variational inference methods into the existing probabilistic programming language NumPyro. The implemented methods include variantions of Stein variational gradient descent with different types of kernel functions, non-linear scaling of update terms, and matrix-valued kernels. The paper includes empirical results with a comparsion with existing baselines in real-world problems. Using this framework, the authors developed a new Stein mixture algorithm for deep Markov models, which shows better performance than existing methods. + +Strengths: + +- The paper is overall well-written and the method is clearly explained. +- The literature review is thorough. +- Integration of SteinVI into numpyro seems useful. Users can easily take advantage of the state-of-the-art SteinVI algorithms for their own Bayesian modelings. +- Extending the stein mixture method to deep Markov models is a novel application. + +Weaknesses: + +- The originality is low the authors propose algorithms that are very similar to previous work and there is a lack of experiments to verify the usefulness of the proposed method, for example, + to verify the decreased variance of the gradient estimates claimed by the authors. +- Efforts are required to illustrate why ELBO-within-Stein is preferred over the existing work. +- Some important Stein VI methods seem lacking. +- No experiments to support the usefulness of EinSteinVI for Non-linear Stein VI, Matrix-valued kernel stein VI, and message passing stein VI. + +All reviewers vote for rejection. I recommend the authors to addrss the limitatoins mentioned above and improve the paper before its resubmission to another venue.",ICLR2022, +rSVmSsS6Cpb,1642700000000.0,1642700000000.0,1,aWA3-vIQDv,aWA3-vIQDv,Paper Decision,Reject,"This paper observes the similarity between the universality in renormalization group and the lottery ticket hypothesis and proposes that the iterative magnitude pruning, which is used to find the winning tickets, could be a renormalization group scheme. The authors also provide some evidence on their theory on vision model of ResNet families. While it is interesting to try a theoretical explanation of the transferability of lottery ticket used in similar tasks using the theory from statistical physics, the paper does not provide enough experimental results to show how to use such an explanation to improve iterative magnitude pruning or determine the best architecture that can be transferred for different tasks. Therefore the work is more like working in the progress report and not ready for publication yet.",ICLR2022, +HJgeJKWbxE,1544780000000.0,1545350000000.0,1,HyVxPsC9tm,HyVxPsC9tm,lacking experiments against simple baselines,Reject,"The paper proposes a method for saving computation in surveillance videos (videos without camera motion) by re-using features from parts of the image that do not change. The results show that this significantly saves computation time, which is a big benefit, given also the amount of surveillance video input available for processing nowadays. Reviewers request comparisons to obvious baselines, e.g., selecting a subset of frames for processing or performing a low level pixel matching to select the pixels to compute new features on. Such experiments would make this paper much stronger. There is no rebuttal and thus no ground for discussion or acceptance. +",ICLR2019,5: The area chair is absolutely certain +S1SpG1TBM,1517250000000.0,1517260000000.0,30,Byt3oJ-0W,Byt3oJ-0W,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper with the self-explanatory title was well received by the reviewers and, additionally, comes with available code. The paper builds on prior work (Sinkhorn operator) but shows additional, significant amount of work to enable its application and inference in neural networks. There were no major criticisms by the reviewers, other than obvious directions for improvement which should have been already incorporated in the paper, issues with clarity and a little more experimentation. To some extent, the authors addressed the issues in the revised version. ",ICLR2018, +k5JHM7N2e,1576800000000.0,1576800000000.0,1,HJxdTxHYvB,HJxdTxHYvB,Paper Decision,Accept (Poster),"This work presents a ""shadow attack"" that fools certifiably robust networks by producing imperceptible adversarial examples by search outside of the certified radius. The reviewers are generally positive on the novelty and contribution of the work. ",ICLR2020, +TErejuvPEXT,1642700000000.0,1642700000000.0,1,GIEPR9OomyX,GIEPR9OomyX,Paper Decision,Reject,"This paper proposes an amortization strategy for MC sampling from a single chain rather than per-datapoint chains, and uses this strategy to define a new Bayesian autoencoder based on Langevin dynamics. + +The reviewers find the line of thought very promising, and a potentially interesting addition to the latent variable literature, while also raising some concerns. The dimension of the single chain must match the dataset size, which limits the computational benefits coming from amortization, and in fact this restriction seems hard, as empirical results (added in the discussion period) are qualitatively worse in the `d this seems like a typo (?) and otherwise I don't really understand how this defines a proper definition over states +* As at least one reviewer pointed out, authors start from the mutual information $I$ but drop the entropy term $H(a_t | R_{t+1})$ by claiming it doesn't matter since the goal is to learn the ASR. However, in that objective the distribution over actions $p_{\alpha}$ seems to be learnable (through the $\alpha$ parameters), so if we try to minimize the mutual information $I$ including over $\alpha$ it would have been important to retain the entropy over actions as well. + +In terms of the relevance of the results, they look pretty good but: +* The proposed algorithm ends up being somewhat complex, with a lot of terms in the loss (eq. 4), and a lack of empirical validation of what actually matters. I see a single ablation study in the Appendix (Fig. 10), and possibly also the comparison to VRL (but it isn't entirely clear to me what this baseline is implementing as it lacks details). I would have appreciated a more thorough empirical analysis of how each term in eq. 4 matters. +* CarRacing experiments consistently use 21 dimensions ""for a fair comparison"", but this dimensionality was chosen specifically for and by ASR. As a result, it doesn't really look ""fair"" to me: a fairer comparison would have either selected the optimal dimensionality for each method, or shown results across a range of different dimensionalities. + +I also have some concerns regarding the applicability of the algorithm: +1. Relying on random actions to build a world model only works if random actions allow sufficient enough exploration of the state space. There are many situations where this isn't a realistic assumption (also alluded to by at least one reviewer). +2. Minor: in the setup of eq. 1 the reward $r_t$ doesn't directly depend on $s_t$. I'm not sure to which extent this matters for the proposed algorithm, but if this is a necessary condition for it to work properly, it may cause issues in many stochastic environments. + +As a result, I am recommending rejection as I believe the paper is not quite ready for publication. I would encourage the authors to try and simplify the presentation (the paper is very notation-heavy and not an easy read), focusing on showing convincing theoretical and empirical justification for all components of the proposed technique.",ICLR2022, +ByIdLJ6BG,1517250000000.0,1517260000000.0,816,rJqfKPJ0Z,rJqfKPJ0Z,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers have various reservations. +While the paper has interesting suggestions, it is slightly incremental and the results are not sufficiently compared to other techniques. +We not that one reviewer revised his opinion ",ICLR2018, +BJlDoe1xg4,1544710000000.0,1545350000000.0,1,SJg6nj09F7,SJg6nj09F7,"Interesting application of DRL, but too confuse presentation and too little experimental for the task and results",Reject,"The paper trains a classifier to decide if a program is a malware and when to halt its execution. The malware classifier is mostly composed of an RNN acting on featurized API calls (events). The presentation could be improved. The results are encouraging, but the experiments lack solid baselines, comparisons, and grounding of the task usefulness, as this is not done on an established benchmark.",ICLR2019,4: The area chair is confident but not absolutely certain +vuw3-o8vjE,1576800000000.0,1576800000000.0,1,r1gIwgSYwr,r1gIwgSYwr,Paper Decision,Reject,"This paper proposes PAC-Bayes bounds for meta-learning. The reviewers who are most knowledgeable about the subject and who read the paper most closely brought up several concerns regarding novelty (especially a description of how the proposed bounds relate to those in prior works (Pentina el al. (2014), Galanti et al. (2016) and Amit and Meir (2018))) and regarding clarity. The reviewers found theoretical analysis and proofs hard to follow. For these reasons, the paper isn't ready for publication at this time. See the reviewer's comments for details.",ICLR2020, +ppp2HL2SeTW,1642700000000.0,1642700000000.0,1,VFBjuF8HEp,VFBjuF8HEp,Paper Decision,Accept (Poster),"The paper tackles a very interesting problem in the context of diffusion-based generative models and provides empirical improvements. Pre-rebuttal, reviewers' main concerns lie in the motivation and clarification of the method, while after rebuttal, all reviewers satisfied the response and gave positive scores. The authors should include the additional results to well address the reviewers' concerns in the final version.",ICLR2022, +X4P5WYPW9O,1576800000000.0,1576800000000.0,1,BygXFkSYDH,BygXFkSYDH,Paper Decision,Accept (Talk),"The paper presents a general view of supervised learning models that are jointly trained with a model for embedding the labels (targets), which the authors dub target-embedding autoencoders (TEAs). Similar models have been studied before, but this paper unifies the idea and studies more carefully various components of it. It provides a proof for the specific case of linear models and a set of experiments on disease trajectory prediction tasks. The reviewer concerns were addressed well by the authors and I believe the paper is now strong. It would be even stronger if it included more tasks (and in particular some ""typical"" tasks that more of the community is focusing on), and the theoretical part is to my mind not a major contribution, or at least not as large as the paper implies, because it analyzes a much simpler model than anyone is likely to use TEAs for.",ICLR2020, +ByRxSkaSG,1517250000000.0,1517260000000.0,500,ryjw_eAaZ,ryjw_eAaZ,ICLR 2018 Conference Acceptance Decision,Reject,"The updated draft has helped to address some of the issues that the reviewers had, however the reviewers believe there are still outstanding issues. With regard to the technical flaw, one reviewer has pointed out that the update changes the story of the paper by breaking the connection between the generative and discriminative model in terms of preserving or ignoring conditional dependencies. + +In terms of the experiments, the paper has been improved by the reporting of standard deviation, and comparison to other works. However it is recommended that the authors compare to NAS by fixing the number of parameters and reporting the results to facilitate an apples-to-apples comparison. Another reviewer also recommends comparing to other architectures for a fixed number of neurons.",ICLR2018, +11lSkS8kbN,1576800000000.0,1576800000000.0,1,rke3U6NtwH,rke3U6NtwH,Paper Decision,Reject,All three reviewers are consistently negative on this paper. Thus a reject is recommended.,ICLR2020, +Kwft8zr-RL,1576800000000.0,1576800000000.0,1,S1gKkpNKwH,S1gKkpNKwH,Paper Decision,Reject,"This paper describes a method for learning compact RL policies suitable for mobile robotic applications with limited storage. The proposed pipeline is a scalable combination of efficient neural architecture search (ENAS) and evolution strategies (ES). Empirical evaluations are conducted on various OpenAI Gym and quadruped locomotion tasks, producing policies with as little as 10s of weight parameters, and significantly increased compression-reward trade-offs are obtained relative to some existing compact policies. + +Although reviewers appreciated certain aspects of this paper, after the rebuttal period there was no strong support for acceptance and several unsettled points were expressed. For example, multiple reviewers felt that additional baseline comparisons were warranted to better calibrate performance, e.g., random coloring, wider range of generic compression methods, classic architecture search methods, etc. Moreover, one reviewer remained concerned that the scope of this work was limited to very tiny model sizes whereby, at least in many cases, running the uncompressed model might be adequate.",ICLR2020, +sNbwqRtqb1,1576800000000.0,1576800000000.0,1,SJxzFySKwH,SJxzFySKwH,Paper Decision,Accept (Poster),"The paper shows the relationship between node embeddings and structural graph representations. By careful definition of what structural node representation means, and what node embedding means, using the permutation group, the authors show in Theorem 2 that node embeddings cannot represent any extra information that is not already in the structural representation. The paper then provide empirical experiments on three tasks, and show in a fourth task an illustration of the theoretical results. + +The reviewers of the paper scored the paper highly, but with low confidence. I read the paper myself (unfortunately not with a lot of time), with the aim of increasing the confidence of the resulting decision. The main gap in the paper is between the phrases ""structural node representation"" and ""node embedding"", and their theoretical definitions. The analogy of distribution and its samples follows unsurprisingly from the definitions (8 and 12), but the interpretation of those definitions as the corresponding English phrases is not obvious by only looking at the definitions. There also seems to be a sleight of hand going on with the most expressive representations (Definitions 9 and 11), which is used to make the conditional independence statement of Theorem 2. The authors should clarify in the final version whether the existence of such a representation can be shown, or even better a constructive way to get it from data. + +Given the significance of the theoretical results, the authors should improve the introduction of the two main concepts by: +- relating them to prior work (one way is to move Section 5 towards the front) +- explaining in greater detail why Definitions 8 and 12 correspond to the two concepts. For example expanding the part of the proof of Corollary 1 about SVD, to make clear what Definition 12 means. +- a corresponding simple example of Definition 8 to relate to a classical method. + +The paper provides a nice connection between two disparate concepts. Unfortunately, the connection uses graph invariance and equivariance, which is unfamiliar to many of the ICLR audience. On balance, I believe that the authors can improve the presentation such that a reader can understand the implications of the connection without being an expert in graph isomorphism. As such, I am recommending an accept. + +",ICLR2020, +uLYEYd-MdZI,1642700000000.0,1642700000000.0,1,eIvzaLx6nKW,eIvzaLx6nKW,Paper Decision,Reject,"This paper introduces a multi-domain self-supervised representation learning method. Its objective consists of three terms: the first two terms are identical to SimCLR and the last one is to minimize the similarity of pairs across different datasets which is similar to the second term of SimCLR. In the experiment, it tests the methods across multiple common datasets. The method is simple but results are pretty good at the multi-domain setting. It seems to demonstrate the importance of domain clustering and moving the domains apart. However, there are several important questions the paper may need more clarification on: +1. What is the definition of the domain? How to determine the pair of data is from different domains? What is the motivation/theory that you used to choose those datasets as different domains in your experiment? +2. Is there any of the public datasets that would cover multiple domains? +Without solving these questions, I think it would constrain the future research/adoption of the method.",ICLR2022, +7AMZBL03nvl,1610040000000.0,1610470000000.0,1,bVzUDC_4ls,bVzUDC_4ls,Final Decision,Reject,"There are many recent methods for the formal verification of neural networks. However, most of these methods do not soundly model the floating-point representation of real numbers. This paper shows that this unsoundness can be exploited to construct adversarial examples for supposedly verified networks. The takeaway is that future approaches to neural network verification should take into account floating-point semantics. + +This was a borderline paper. On the other hand, to anyone well-versed in formal methods, it is not surprising that unsound verification leaves the door open for exploits. Also, there is prior work (Singh et al., NeurIPS 2018) on verification of neural networks that explicitly aims for soundness w.r.t. floating-point arithmetic. On the other hand, it is true that many adversarial learning researchers do not appreciate the value of this kind of soundness. In the end, the decision came down to the significance of the result. Here I have to side with Reviewer 1: the impact of this problem is limited in the first place, and also, the issue of floating-point soundness has come up in prior work on neural network verification. For these reasons, the paper cannot be accepted this time around.",ICLR2021, +S130IypHM,1517250000000.0,1517260000000.0,902,H1l8sz-AW,H1l8sz-AW,ICLR 2018 Conference Acceptance Decision,Reject,"Dear authors, + +Despite the desirable goal, that is to move away from regularization in parameter space toward regularization in function space, the reviewers all thought that the paper was not convincing enough, both in the choice of the particular regularization and in the experimental section. + +While I appreciate that you have done a major rework of the paper, the rebuttal period should not be used for that and we can not expect the reviewers to do a complete re-review of a new version. + +This paper thus cannot be accepted to ICLR.",ICLR2018, +HkLaQk6HM,1517250000000.0,1517260000000.0,235,BkpiPMbA-,BkpiPMbA-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Authors propose an approach to generation of adversarial examples that jointly examine the effects to classification within a local neighborhood, to yield a more robust example. This idea is taken a step further for defense, whereby the classification boundaries within a local neighborhood of a presented example are examined to determine if the data was adversarially generated or not. + + +Pro: +- The idea of examining local neighborhoods around data points appears new and interesting. +- Evaluation and investigation is thorough and insightful. +- Authors made reasonable attempts to address reviewer concerns. + +Con + - Generation of adversarial examples an incremental improvement over prior methods + +",ICLR2018, +SJcSnGL_x,1486400000000.0,1486400000000.0,1,r1aGWUqgg,r1aGWUqgg,ICLR committee final decision,Reject,"The authors propose to explore an important problem -- learning state representations for reinforcement learning. However, the experimental evaluation raised a number of concerns among the reviewers. The method is tested on only a single relatively easy domain, with some arbitrarily choices justified in unconvincing ways (e.g. computational limitations as a reason to not fix aliasing). The evaluation also compares to extremely weak baselines. In the end, there is insufficient evidence to convincingly show that the method actually works, and since the contribution is entirely empirical (as noted by the reviewers in regard to some arbitrary parameter choices), the unconvincing evaluation makes this method unsuitable for publication at this time. The authors would be encouraged to evaluate the method rigorously on a wide range of realistic tasks.",ICLR2017, +J80hmqFQSXx,1610040000000.0,1610470000000.0,1,2V1ATRzaZQU,2V1ATRzaZQU,Final Decision,Reject,"In this paper, the authors study the behavior of the Lookahead dynamics of Zhang et al. (2019) in bilinear zero-sum games. These dynamics work as follows: given a base algorithm for solving the game (such as gradient descent-ascent or extra-gradient), the Lookahead dynamics perform $k$ iterations of the base algorithm followed by an exponential moving average step with weight $\alpha$. The authors then provide a range of sufficient conditions for the eigenvalues of the matrix defining the game under which the Lookahead dynamics become more stable and converge faster than the base method. + +This paper received four reviews and generated a very lively discussion between the authors and reviewers. Reviewer 4 was enthusiastic about the paper; the other three initially recommended rejection. During the discussion phase, the authors revised their paper extensively, and Reviewer 3 increased their score to an ""accept"" recommendation as a result. In the end, the reviewers were evenly split, and I also struggled a lot to reach a recommendation decision. + +On the plus side, the paper treats an interesting problem: prior empirical evidence suggests that the Lookahead dynamics can improve the training of some adversarial machine learning models, so a theoretical study is very welcome and of clear value. On the other hand, the setting treated by the paper (bilinear min-max games) is somewhat restrictive, and the authors' theoretical conclusions do not always admit as clear an interpretation as one would like. + +The issues that ended up playing the most important role in my recommendation were as follows: +1. The Lookahead dynamics with period $k$ involve $k$ gradient evaluations, so their rate of convergence should be compared at a $k:1$ ratio to GD and EG (with an additional $2:1$ ratio between GD and EG to put things on an even scale). To a certain degree, this $k:1$ ratio is present in the last part of Lemma 3; however, the exact acceleration achieved by the ""shrinkage"" of the spectral radius is not clear. This can also be seen in the semi-log plots provided by the authors, where the corresponding slopes of GD/EGD methods should be multiplied by $k$ when compared to the respective LA variants. In this regard, a comparison with the values of $k$ provided in Appendix D reveal that the performance of the Lookahead variants in terms of gradient queries is very similar (if not worse) to the non-LA variants. This is a cause of concern because, if LA does not accelerate convergence in simple bilinear games, it is not credible to expect faster convergence in more complicated problems. During the AC/reviewer discussion of this point, Reviewer 3 pointed out that this might be due to a suboptimal tuning of $\alpha$ (i.e., that it was not chosen ""small enough""), and went out to note that this echoes the arguments of other reviewers that the characterization of acceleration may be problematic and not significant (even if it takes place). +2. Another major concern has to do with the stabilization provided by the Lookahead dynamics: using a benchmark game proposed in a recent paper by Hsieh et al. (2020), the authors showed that the Lookahead dynamics converge to a point which is unstable under GDA/EG (and hence avoided). This is fully consistent with the authors' theoretical analysis, but it also highlights an important problem with the Lookahead optimizer: if $k$ and $\alpha$ are tuned to suitable values for stabilization, the algorithm converges to a non-desirable critical point (a max-min instead of a min-max solution). This is a major cause of concern because it shows that the algorithm may, in general, converge to highly suboptimal states. + +The above create an inconsistency in the main story of the paper. In fact, it seems to me that the authors' results form more of a ""cautionary tale in hiding"": even in very simple bilinear problems, the lookahead step may not provide acceleration and, even worse, it could converge to highly undesirable critical points. I find this ""negative"" contribution quite valuable from a theoretical standpoint, and I believe that a thoroughly revised paper along these lines would be of interest in the top venues of the community (though a more theoretical outlet like COLT might be more appropriate). However, this would require a drastic rewrite of the paper, to the extent that it should be treated as a new submission. + +In view of all this, I am recommending a rejection at this stage. I insist however that this should not be seen as a critique for the mathematical analysis of the authors (which was appreciated by the reviewers), but as a recommendation to reframe the paper's narrative to bring it in line with the algorithm's observed behavior. I strongly encourage the authors to resubmit at the next top-tier opportunity.",ICLR2021, +3IU7c3j5CL,1642700000000.0,1642700000000.0,1,FEDfGWVZYIn,FEDfGWVZYIn,Paper Decision,Accept (Spotlight),The paper proposes an approach and specific training algorithm to defend against membership inference attacks (MIA) in machine learning models. Existing MIA attacks are relatively simple and rely on the test loss distribution at the query point and therefore the proposed algorithm sets a positive target mean training loss value and applies gradient ascent if the average loss of current training batch is smaller than it (in addition to the standard gradient descent step). The submission gives extensive experimental results demonstrating advantage over existing defense methods on several benchmarks. The primary limitation of the work is that it defends only against rather naive existing attacks which do not examine the model (but rely only on the loss functions).,ICLR2022, +LB86Nt4JaL,1576800000000.0,1576800000000.0,1,H1lBYCEFDB,H1lBYCEFDB,Paper Decision,Reject,The authors analyze the natural gradient algorithm for training a neural net from a theoretical perspective and prove connections to the K-FAC algorithm. The paper is poorly written and contains no experimental evaluation or well established implications wrt practical significance of the results.,ICLR2020, +VAeW3Ppohl,1576800000000.0,1576800000000.0,1,B1eZweHFwr,B1eZweHFwr,Paper Decision,Reject,"This paper proposes a smoothing-based certification against various forms of transformations, such as rotations, translations. The reviewers have concerns on the novelty of the work and several technical issues. The authors have made efforts to address some of issues, but the work may still significantly benefit from a throughout improvement in both presentation and technical contribution.",ICLR2020, +H8aVcZpeu2J,1642700000000.0,1642700000000.0,1,iMSjopcOn0p,iMSjopcOn0p,Paper Decision,Accept (Spotlight),"This work concerns Automatic Music Transcription (AMT) -- transcribing notes given the audio of the music. The paper demonstrates that a single general-purpose transformer model can perform AMT for many instruments across several different transcription datasets. The method represents the first unified AMT model that can transcribe music audio with an arbitrary number of instruments. + +All reviewers rated this paper highly and are excited about seeing it at the conference. One reviewer noted that ""This paper seems to be a great milestone in the AMT research. It is probably the first unified AMT model that can take music audio with an arbitrary number of instruments."" + +The reviewers had some suggestions and comments, which appear to be addressed by the authors.",ICLR2022, +Hy0NEkpHM,1517250000000.0,1517260000000.0,336,SkFqf0lAZ,SkFqf0lAZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper provides a comparison of different types of a memory augmented models and extends some of them to beyond their simple form. Reviewers found the paper to be clearly written, saying it ""nice introduction to the topic"" and noting that they ""enjoyed reading this paper"". In general though there was a feeling that the ""substance of the work is limited"". One reviewer complained that experiments were limited to small English datasets PTB and Wikitext-2 and asked why they didn't try ""machine translation or speech recognition"". (The author's note that they did try the Linzen dataset, and while the reviewers found the experiments impressive, the task itself felt artificial) . Another felt that the ""multipop model"" alone was not too large a contribution. The actual experiments in the work are well done, although given the fact that the models are known there was expectation of ""more ""in-depth"" analysis of the different models"". Overall this is a good empirical study, which shows the limited gains achieved by these models, a nevertheless useful piece of information for those working in this area.",ICLR2018, +BJxq-1b2JE,1544450000000.0,1545350000000.0,1,HyGEM3C9KQ,HyGEM3C9KQ,Good paper; solid improvements to DNC,Accept (Poster)," +pros: +- Identification of several interesting problems with the original DNC model: masked attention, erasion of de-allocated elements, and sharpened temporal links +- An improved architecture which addresses the issues and shows improved performance on synthetic memory tasks and bAbI over the original model +- Clear writing + +cons: +- Does not really show this modified DNC can solve a task that the original DNC could not and the bAbI tasks are effectively solved anyway. It is still not clear whether the DNC even with these improvements will have much impact beyond these toy tasks. + +Overall the reviewers found this to be a solid paper with a useful analysis and I agree. I recommend acceptance. + +",ICLR2019,4: The area chair is confident but not absolutely certain +9eKmyGJ7PsH3,1642700000000.0,1642700000000.0,1,1bEaEzGwfhP,1bEaEzGwfhP,Paper Decision,Reject,"This paper introduces a novel task (i.e., modelling the iterative process of editing sequences) and proposes a Transformer-based architecture to address it tractably. The paper also elects a number of metrics that are argued to shed enough light onto the merits of the proposed architecture. + +In our view, the current version is not ready for acceptance. Here are some of the reasons I'd highlight: + +* It is not entirely clear to us that the task in consideration has enough substance to grant acceptance nor that it speaks to a large enough audience. Perhaps the challenges identified here are more general and the developments for this task can be extended to related generation problems? If so, this is something one could consider for a revised version of the paper. +* The motivation does not seem to align well with the datasets used to demonstrate the task. Perhaps the difficulty to find a dataset that matches the motivation is an indication that the task and its challenges are tad too specific. +* It's the impression of more or less everyone involved that the paper lacks comparisons, and that the evaluation is not thorough enough, and the rebuttal did not ease our concerns sufficiently.",ICLR2022, +ct1TE-pi2Y,1642700000000.0,1642700000000.0,1,XeqjsCVLk1m,XeqjsCVLk1m,Paper Decision,Reject,"This paper studies the use of natural language explanations during the training of an agent for odd-one-out tasks. Experiment results show that using quality explanation as abstract information about object properties helps with the agent performance, as compared with the vanilla method. + +Strengths: +- Experiment results are conducted thoroughly to support the major claims made by the paper +- The problem is well motivated and has an important implication + +Weakness: +- There has been extensive discussion about whether the paper lacks a more formal and rigorous definition of ""explanation"" as considered in the scope of this paper. +- Concerns are raised regarding the gaps between the broad claims in the paper and the restricted experiment settings",ICLR2022, +ccZjzpHCsU,1576800000000.0,1576800000000.0,1,BJl7mxBYvB,BJl7mxBYvB,Paper Decision,Reject,"The authors address the problem of robust reinforcement learning. They propose an adversarial perspective on robustness. Improving the robustness can now be seen as two agent playing a competitive game, which means that in many cases the first agent needs to play a mixed strategy. The authors propose an algorithm for optimizing such mixed strategies. + +Although the reviewers are convinced of the relevance of the work (as a first approach of Bayesian learning to reach mixed Nash equilibria, which is useful not only for robustness but for any problem that can be formulated as zero-sum game requiring a mixed strategy), they are not completely convinced by the work in current state. Three of the reviewers commented on the experiments not being rigorous and convincing enough in current form, and thus not (yet!) being able to recommend acceptance to ICLR. ",ICLR2020, +SJxxqV2bg4,1544830000000.0,1545350000000.0,1,BklACjAqFm,BklACjAqFm,Meta-review,Reject,"Pros: +- interesting algorithmic idea for using successor features to propagate uncertainty for use in epxloration +- clarity + +Cons: +- moderate novelty +- initially only simplistic experiments (later complemented with Atari results) +- initially missing baseline comparisons +- no regret-based analysis +- questionable soundness because uncertainty is not guaranteed to go down + +All the reviewers found the initial submission to be insufficient for acceptance, and the one reviewer who read the rebuttal/revision did not change their mind, despite the addition of some large-scale results (Atari).",ICLR2019,4: The area chair is confident but not absolutely certain +rklhGf2ggN,1544760000000.0,1545350000000.0,1,H1g0Z3A9Fm,H1g0Z3A9Fm,"Good paper, accept",Accept (Poster),"This paper introduces a new graph convolutional neural network, called LGNN, and applied it to solve the community detection problem. The reviewers think LGNN yields a nice and useful extension of graph CNN, especially in using the line graph of edge adjacencies and a non-backtracking operator. The empirical evaluation shows that the new method provides a useful tool for real datasets. The reviewers raised some issues in writing and reference, for which the authors have provided clarification and modified the papers accordingly. ",ICLR2019,4: The area chair is confident but not absolutely certain +m8KdMkJC-mv,1610040000000.0,1610470000000.0,1,DlPnp5_1JMI,DlPnp5_1JMI,Final Decision,Reject,"This paper proposes a method for regularizing image classifiers by encouraging their hidden activations to conform to a PDE. This is a reasonable idea, and the authors clearly improved the paper a lot in response to the reviews. However, the main tasks of MNIST and SVHN classification seem way too easy, and the baselines all need to be tuned to be as fast as possible for a given accuracy, if that's the relevant metric. I agree with the reviewers that this line of work is promising but that the current paper is not sufficiently illuminating or well-executed to meet ICLR standards.",ICLR2021, +R0T0Gl5lbgT,1642700000000.0,1642700000000.0,1,1ch9DLxqF-,1ch9DLxqF-,Paper Decision,Reject,"This paper experimentally shows that the block-structure of similarities between layers typically appears for different models and such a structure is mainly induced by small set of dominant datapoints. Moreover, the dominant datapoints are not just noisy artifacts but represent some common image patterns such as background colors. The authors also found that the block structure can easily disappear by removing the dominant datapoints, and the authors also proposed a method to suppress the block structure by regularizing PCs, Shake-Shake regularization, and transfer learning. + +This paper gives thorough experiments that clarify the mechanism of appearance of a block structure. However, its significance is a bit minor. Indeed, the block structure does not affect the generalization ability very much, and it can be removed without changing the predictive performance. I agree that investigating the behavior of the internal representation is of scientific interest as the authors pointed out, but on the other hand, its significance would not be convincing. Indeed, this concern was pointed out by several reviewers. Next, the main focus of this study is about the setting of large model with small data size. It is not clear whether it is universal across different model size relative to the dataset size. There is no theoretical investigation (for example, the block-structure phenomenon could be explained by a high dimensional random matrix theory). + +In summary, this paper investigates a somehow interesting phenomenon but its significance is not convincing. Thus, it would be a bit below the threshold of the acceptance.",ICLR2022, +Ig0vX8XsZO,1642700000000.0,1642700000000.0,1,YedA6OCN6X,YedA6OCN6X,Paper Decision,Reject,"This paper proposes a new evaluating metric for assessing the quality of model-generated images, that aims to correct some of the problems with the popular FID metric. The reviewers acknowledge the importance of this problem, but do not find the empirical evaluation convincing. In particular, they highlight the following issues +* Comparing FID and the new metric on examples that are adversarially selected against FID does not provide a fair comparison. +* The methods are compared on images of bad quality (FID > 25) that are therefore not informative. +* The comparison against existing techniques is incomplete +* The reviewers raise concerns about how the comparison is done quantitatively + +The reviewers are not sufficiently convinced by the author response regarding these issues. I therefore recommend not accepting the paper.",ICLR2022, +QWCDJor6b-g,1642700000000.0,1642700000000.0,1,PC8u74o7xc2,PC8u74o7xc2,Paper Decision,Reject,"The paper proposes a new framework to express and analyze embedding methods based on the stable coloring problem. Reviewers highlighted as strengths that the paper provides an interesting perspective for understanding one of the central approaches in NLP, graph learning, and other fields --- and as such could inspire promising research directions. However, reviewers raised concerns regarding the significance of contributions (theoretical insights and analysis, relation to prior work, missing empirical evaluation etc.) as well as the clarity of presentation (also with regard to correctness and scope). All reviewers and the AC agree that the paper is not yet ready for publication at ICLR and would require an additional revision to address the aforementioned issues.",ICLR2022, +6pmNfHuOyuh,1610040000000.0,1610470000000.0,1,YZrQKLHFhv3,YZrQKLHFhv3,Final Decision,Reject,"This work proposes to train networks with mixed image sizes to allow for faster inference and also for robustness. The reviewers found the paper was well-written and appreciated that the code was available for reproducibility. However, the paper does not sufficiently compare to related methods. The authors should resubmit once the comparisons suggested by the reviewers have been added to the paper.",ICLR2021, +Hyx9kE0gg4,1544770000000.0,1545350000000.0,1,Hyxsl2AqKm,Hyxsl2AqKm,Relatively weak novelty and empirical results.,Reject,"This paper presents the empirical relation between the task granularity and transfer learning, when applied between video classification and video captioning. The key take away message is that more fine-grained tasks support better transfer in the case of classification---captioning transfer on 20BN-something-something dataset. + +Pros: +The paper presents a new empirical study on transfer learning between video classification and video captioning performed on the recent 20BN-something-something dataset (220,000 videos concentrating on 174 action categories). The paper presents a lot of experimental results, albeit focused primarily on the 20BN dataset. + +Cons: +The investigation presented by this paper on the effect of the task granularity is rather application-specific and empirical. As a result, it is unclear what generalizable knowledge or insights we gain for a broad range of other applications. The methodology used in the paper is relatively standard and not novel. Also, according to the 20BN-something-something leaderboard (https://20bn.com/datasets/something-something), the performance reported in the paper does not seem competitive compared to current state-of-the-art. There were some clarification questions raised by the reviewers but the authors did not respond. + +Verdict: +Reject. The study presented by the paper is a bit too application-specific with relatively narrow impact for ICLR. Relatively weak novelty and empirical results.",ICLR2019,4: The area chair is confident but not absolutely certain +xXbI1gEMfU,1642700000000.0,1642700000000.0,1,vSix3HPYKSU,vSix3HPYKSU,Paper Decision,Accept (Spotlight),"This paper proposes a message passing neural network to solve PDEs. The paper has sound motivation, clear methodology, and extensive empirical study. However, on the other hand, some reviewers also raised their concerns, especially regarding the lack of clear notations and sufficient discussions on the difference between the proposed method and previous works. Furthermore, there is no ablation study and the generalization to multiple spatial resolution is not clearly explained. The authors did a very good job during the rebuttal period: many concerns/doubts/questions from the reviewers were successfully addressed and additional experiments have been performed to support the authors' answers. As a result, several reviewers decided to raise their scores, and the overall assessment on the paper turned to be quite positive.",ICLR2022, +QwmqusbVHJ,1576800000000.0,1576800000000.0,1,BylD9eSYPS,BylD9eSYPS,Paper Decision,Reject,"The paper discusses a simple but apparently effective clustering technique to improve exploration. There are no theoretical results, hence the reader relies fully on the experiments to evaluate the method. Unfortunately, an in-dept analysis of the results is missing making it hard to properly evaluate the strength and weaknesses. Furthermore, the authors have not provided any rebuttal to the reviewers' concerns.",ICLR2020, +XN7ATNrsemp,1610040000000.0,1610470000000.0,1,yOkSW62hqq2,yOkSW62hqq2,Final Decision,Reject,"Knowledge distillation (KD) has been widely used in practice for deployment. In this paper, a variant of KD is proposed: given a student network, an auxiliary teacher architecture is temporarily generated via dynamic additive convolutions; dense feature connections are introduced to co-train the teacher and student models. The proposed method is novel and interesting. Empirical results showed that the proposed method can perform better than several KD variants. However, it is unclear why the proposed method works, although the authors tried to address this issue in their rebuttal. Besides this, a bigger concern on this work is that it missed a comparison with a recent approach in [1] which looks much simpler and performs significantly better on similar experiments. In [1], their ResNet50 (0.5x) is smaller than the student model in this paper (which used more filters on the top) but showed much stronger performance on both relative and absolute improvements over the same baseline (training from scratch) for the ImageNet classification task. On the technical side, the method in [1] simply uses the original ResNet50 as the teacher model, and the student model ResNet50 (0.5x) progressively mimics the intermediate outputs of the teacher model from layer to layer. [1] also contains a theoretic analysis (mean-field analysis based) to support their method. Comparing with the method in [1], the proposed method here is more complicated, less motivated, and less efficient. + +[1] D. Zhou, M. Ye, C. Chen, T. Meng, M. Tan, X. Song, Q. Le, Q. Liu and D. Schuurmans. Go Wide, Then Narrow: Efficient Training of Deep Thin Networks. ICML 2020.",ICLR2021, +8wHweZ56jJV,1610040000000.0,1610470000000.0,1,aD1_5zowqV,aD1_5zowqV,Final Decision,Accept (Poster),"This work proposes to train EBMs using multi-stage sampling. The EBMs are then used for generating high dimensional images, performing image to image translation, and out-of-distribution detection. The reviewers are impressed with the results, but indicate that the novelty is limited. While I agree that the work can be seen as a combination of previously proposed techniques, demonstrating that this combination can be made to work well is still a significant contribution to the field. In addition, the paper demonstrates strong results in using Langevin dynamics to translate between images, which I do think is novel. I therefore recommend accepting the paper for a poster presentation.",ICLR2021, +j6yD6eMB01u,1610040000000.0,1610470000000.0,1,1wtC_X12XXC,1wtC_X12XXC,Final Decision,Reject,"The authors propose an algorithm to perform backprop in a feed-forward neural network without the need to backpropagate errors. They hence claim that this algorithm is a biologically plausible variant of Backprop. + +After a forward-propagation phase, the method introduces a relaxation phase and they remark that at the equilibrium of this phase, the activity is equal to the derivatives. Some related algorithms have been proposed previously (predictive coding, equilibrium-prop, target-propagation). Advantages of the proposed algorithm are that it does not need multiple distinct backwards phases and that it only utilizes a single type of neuron instead of separate populiations (such as in predictive coding). + +Their method is tested on MNIST and fashion MNIST using a 4-layer fully connected network. In a revised version after the initial reviews, the authors added preliminary results on CIFAR-10 with a 4 layer CNN. + +The authors then study the impact of some unbiological constraints such as symmetric weights. + +While the reviewers agreed that the work is interesting, there was some disagreement on the significance of the model. In particular, it was noted that while the learning rules are indeed local in space, they are not local in time (the network has to remember variables from the forward phase until the update at the relaxation phase), which was deemed questionable from the biological perspective. + +In addition, it was criticized that the simulations are no sufficient to support the claims of the paper. The datasets are relatively simple, networks are shallow and performances of baseline models are not state-of-the-art. + In a revised version after the initial reviews, the authors added preliminary results on CIFAR-10 with a 4 layer CNN. However, these results do not seem conclusive, as the baselines are far below SOTA and networks are still quite shallow (a study by Lillicrap found that biological approximations to backprop struggle especially when applied to deep networks). + +In summary, the work looks promising but some questions remain about the locality of learning and its applicability to more demanding tasks. + +I add that one reviewer gave a very good rating with a poor review and did not respond to any questions about the justification. Therefore I had to neglect this review.",ICLR2021, +0QJWbXULBlR,1610040000000.0,1610470000000.0,1,jHykXSIk3ch,jHykXSIk3ch,Final Decision,Reject,"Three reviewers recommend rejecting or weak reject. The studied problem is interesting, but as one reviewer pointed out, it is not that clear how this work changes our theoretical understanding of those methods or what they imply for applications. Overall, I feel this work is on the borderline (probably it deserves higher score than the current score), but probably below the acceptance bar at the current form. ",ICLR2021, +GXXnPn_EC5b,1642700000000.0,1642700000000.0,1,4JlwgTbmzXQ,4JlwgTbmzXQ,Paper Decision,Reject,"This paper proposes to learn a latent space representation such that some linear equivariance and symmetry constraints are respected in the latent space, with the goal to improve sample efficiency. One core idea is that the latent space is also the same as the space of linear transformation used in the constraints, which is shown to simplify some of the mathematical derivations. Experiments on the Atari 100K benchmark demonstrate a statistical improvement over the SPR baseline when using the SE(2) group of linear transformations as latent space. + +Following the discussion period, most reviewers were in favor of acceptance. However, one reviewer remained unconvinced, and after carefully reading the paper, I actually share the same concerns, i.e., that it is unclear under which conditions the proposed approach actually works, and what makes it work. I believe that, as a research community, we should value understanding over moving the needle on benchmarks, especially when proposing such a complex method as this one (see Fig. 5). + +More specifically: + +1. The method is only evaluated on Atari games, showing some improvements when using SE(2), and arguing that there are corresponding symmetries in such games. There is however no analysis demonstrating (or even hinting at the fact) that the proposed technique is actually learning to take advantage of such symmetries (NB: I had a quick look at the animation added by the authors in the supplementary material, but I do not see if/how they help on this point). Even if analyzing representations on Atari may be tricky, I believe that given the motivation of this new algorithm, it *must* be evaluated on some toy example (e.g., the pendulum mentioned throughout the paper) to validate that it is learning what we want it to learn (although I also agree with the authors that experimenting on a more complex benchmark like Atari is equally important). + +2. The idea of embedding states into the same space as transformations is interesting, and brings some advantages when writing down equations, as demonstrated by the authors. However, there is no justification besides mathematical convenience, and it doesn't seem intuitive to me at all that why this should be a good idea, considering that it ties the state representation to the mathematical representation of group transformations. For instance, what does the spcial group element $e$ mean for a state? And this coupling makes it difficult to interpret the effect of using a different group of transformations: for instance when moving from GL(2) to SE(2), is the observed benefit because we are using only specific transformations, or simply because we are reducing the dimensionality of the state embedding? (note that in Fig. 4(c) the MLP variant has similar performance to GL(2), and based on my understanding they use the same embedding dimensionality ==> I believe it would be important to check what would happen with an MLP variant using the same dimensionality as SE(2)) + +3. The effect of the $L_{GET}$ loss is not convincing, as pointed out by several reviewers. I think it would have been an opportunity for the authors to investigate why, especially since it seems to work in some games and not others. But just focusing on ""here are the 17/26 games where it works better"" doesn't really bring added value here. Do these games have some specific properties that make them better candidates to take advantage of $L_{GET}$? This could have been a very interesting insight if that was the case, but as it is now, I am not sure what we can learn from that. + +4. There are several implementation ""details"", some moving the final algorithm farther from its theoretical justification, that are not ablated, making it difficult to understand their impact (ex: using target networks, the choice of the value of M, using projections onto the unit sphere of some arbitrary dimensionality, how the $s'$ state is chosen in $L_{GET}$) + +As a result, we have here an algorithm with some interesting theoretical background, but with a lot of moving components which -- when properly tweaked -- can lead to a statistically meaningful improvement on Atari 100K -- without really understanding why. I believe this is not quite enough for publication at ICLR, and I would encourage the authors to delve deeper into the understanding of their algorithm, which I hope will bring useful insights to the research community working on representation learning.",ICLR2022, +BJe0xdMggE,1544720000000.0,1545350000000.0,1,B1e7hs05Km,B1e7hs05Km,A neat idea with impressive results but has technical flaws and issues with clarity,Reject,"There was a significant amount of discussion on this paper, both from the reviewers and from unsolicited feedback. This is a good sign as it demonstrates interest in the work. Improving exploration in Deep Q-learning through Thompson sampling using uncertainty from the model seems sensible and the empirical results on Atari seem quite impressive. However, the reviewers and others argued that there were technical flaws in the work, particularly in the proofs. Also, reviewers noted that clarity of the paper was a significant issue, even more so than a previous submission. + +One reviewer noted that the authors had significantly improved the paper throughout the discussion phase. However, ultimately all reviewers agreed that the paper was not quite ready for acceptance. It seems that the paper could still use some significant editing and careful exposition and justification of the technical content. + +Note, one of the reviews was disregarded due to incorrectness and a fourth reviewer was brought in.",ICLR2019,5: The area chair is absolutely certain +uvNdr51rdc_,1610040000000.0,1610470000000.0,1,vY0bnzBBvtr,vY0bnzBBvtr,Final Decision,Reject,"This paper explores the performance of Q-learning in the presence of either one-sided feedback or full feedback. Such feedbacks play an important role in improving the resulting regret bounds, which are (almost) not affected by the dimension of the state and action space. The motivation of such feedback settings stems from problems like inventory control. However, the assumptions underlying the theory herein are often quite strong, which might limit the applicability of the theory. The dependency on the length per episode H can also be improved.   ",ICLR2021, +0dBwKL7pvsb,1610040000000.0,1610470000000.0,1,F8lXvXpZdrL,F8lXvXpZdrL,Final Decision,Reject,"This paper collects a variety of results that cast straight-through estimators as arising as principled methods that make a linearization assumption on the loss for functions with binary arguments. R1 & R3 recommended against acceptance, citing clarity concerns and a lack of novelty. R2 & R4 recommended acceptance, but had low confidence. This paper had uncharacteristically low confidence on behalf of the reviewers, and this is my fault. I apologize to the authors for this. + +I have read the paper myself. I believe that this paper contains many interesting ideas, but I agree with R1 & R3 that the paper suffers from clarity issues. Unfortunately, these issues persist in the recent revisions, despite having been asked by R1 & R3 to improve the clarity. The authors asked for concrete reference points. Here are some: + +- ""proxy function"" is not well-defined, despite being critical to the arguments. +- deterministic ST is not defined clearly before it is discussed. +- The section structure of Sec 2 could be improved. At the moment it seems to flow from the loss function to the standard ST algorithm through to a disjointed list of questions addressed in the paper. +- The section titles are not particularly informative. +- It is difficult to know which results are known and which results are new. + +In general, I believe this work could benefit from a significant restructuring. It would be best to delineate preceding work in its own section, then lay out the new results, making sure that all of the important concepts are clearly defined. I think many of these results are valuable for the community, but the current draft makes it challenging for these great ideas to reach their full potential. +",ICLR2021, +WR3BHJ7zkwK,1642700000000.0,1642700000000.0,1,kz6rsFehYjd,kz6rsFehYjd,Paper Decision,Reject,The paper considers the question of identifying bad data so that models can be trained on the subset of data that is good. This question is formulated as a utility optimization problem. The paper shows that some popular heuristics are quite bad in the framework they propose. They also propose a new algorithmic framework called DataSifter. There is empirical evaluation provided for this. Questions have been raised in the reviews about the size of the models that have been used in the empirical evaluation. The authors have responded to this by suggesting the use of proxy model techniques. There are also questions about learnability of data utility for which some responses are provided in the rebuttal.,ICLR2022, +EzlkP0iMnR,1610040000000.0,1610470000000.0,1,BntruCi1uvF,BntruCi1uvF,Final Decision,Reject,"This paper proposes a deterministic policy gradient method that does not require to inject noise in the action selection. +Although the reviewers acknowledge that this paper has merits (novel and interesting idea, well written, technically sound), they have some doubts about the motivations for the proposed approach and about its empirical performance: a deeper analysis is requested. +The paper is borderline and needs to be revised before being ready for publication.",ICLR2021, +_UvAmGw3ak0d,1642700000000.0,1642700000000.0,1,ECvgmYVyeUz,ECvgmYVyeUz,Paper Decision,Accept (Poster),"The paper under review provides a theoretical analysis for contrastive representation learning. The paper proposes a guarantee on the performance (specifically upper and lower bounds) without resorting to previously used conditional independence assumptions. Throughout, the theoretical results and assumptions are supported by experiments. + +After a lively discussion, and after changes made to the paper in the revision stage, all four reviewers recommend this paper for acceptance. +- Reviewer tWSB appreciates that the paper makes weaker assumptions than prior work (i.e., not assuming conditional independence), but raises a number of serious concerns on the theoretical results: The review questions whether assumption 4.6 used in the theory can be true, and whether the bound is vacuousness. The authors argue that this assumption was used in prior work, point out that only some of their results rely on this assumption, and that the assumption is compatible with the theory. The response of the authors partly resolved the reviewers concern and the reviewer raised their score. +- Reviewer bTLa finds the idea of understanding contrastive learning for intra-class samples interesting, but finds some key assumptions too strong, a critique similar to that raised by reviewer tWSB. The authors responded and the reviewer increased their score, and mentioned that most concerns were addressed. The response partially resolved the reviewers concern, and the reviewer now also recommends acceptance. + +I recommend to accept the paper. Understanding contrastive learning better is an important problem, and based on my own reading, I agree with the reviewers that the paper contributes to the understanding of contrastive learning. Two reviewers had concerns about unrealistic assumptions, but those have been largely resolved in the discussion.",ICLR2022, +yL-UVcdFGIq,1610040000000.0,1610470000000.0,1,0n3BaVlNsHI,0n3BaVlNsHI,Final Decision,Reject,"This paper proposes to improve the robustness of computer vision models through a new augmentation strategy. There are two primary contributions of the work, first the use of a bottleneck autoencoder to generate discretized variants of the clean image, and second a slight variant of the task loss, where the task loss is evaluated on the augmented image vs the clean image as is done in prior work. Reviewers argued that the method did not meaningfully improve upon prior work, the method alone underperforms AugMix on existing benchmarks, and when combined with some additional augmentations from AugMix the gains were marginal. Additionally, when there were gains in robustness it was unclear as to the source. The work would be improved with additional experimental evidence that the claimed benefit of the information bottleneck is substantial for improving robustness (for example, when DJMix+RA outperforms AugMix, is this due to the use the of autoencoder or is it primarily due to the new task loss?). I recommend the authors incorporate additional reviewer feedback and resubmit.",ICLR2021, +lwznwaQdE_,1576800000000.0,1576800000000.0,1,BkleBaVFwB,BkleBaVFwB,Paper Decision,Reject,"The paper proposed an efficient way of generating graphs. Although the paper claims to propose simplified mechanism, the reviewers find that the generation task to be relatively very complex, and the use of certain module seems ad-hoc. Furthermore, the results on the new metric is at times inconsistent with other prior metrics. The paper can be improved by addressing those concerns concerns. ",ICLR2020, +20LZpQD4al,1576800000000.0,1576800000000.0,1,BJevihVtwB,BJevihVtwB,Paper Decision,Reject,"This paper introduces a closed-form expression for the Stein’s unbiased estimator for the prediction error, and a boosting approach based on this, with empirical evaluation. While this paper is interesting, all reviewers seem to agree that more work is required before this paper can be published at ICLR. ",ICLR2020, +BkzY41TSf,1517250000000.0,1517260000000.0,395,Hk91SGWR-,Hk91SGWR-,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"This paper turned out to be quite difficult to call. My take on the pros/cons is: + +1. The research topic, how and why humans can massively outperform DQN, is unanimously viewed as highly interesting by all participants. + +2. The authors present an original human subject study, aiming to reveal whether human outperformance is due to human knowledge priors. The study is well conceived and well executed. I consider the study to be a contribution by itself. + +3. The study provides prima facie evidence that human priors play a role in human performance, by changing the visual display so that the priors cannot be used. + +4. However, the study is not definitive, as astutely argued by AnonReviewer2. Experiments using RL agents (with presumably no human priors) yield behavior that is similar to human behavior. So it is possible that some factor other than human prior may account for the behavior seen in the human experiments. + +5. It would indeed be better, as argued by AnonReviewer2, to use some information-theoretic measure to distinguish the normal game from the modified games. + +6. The paper has been substantially improved and cleaned up from the original version. + +7. AnonReviewer1 provided some thoughtful detailed discussion of how the authors may be overstating the conclusions that one can draw from the paper. + +Bottom line: Given the procs and cons of the paper, the committee recommends this for workshop. +",ICLR2018, +LL7QC1A0EKN,1642700000000.0,1642700000000.0,1,6NePxZwfae,6NePxZwfae,Paper Decision,Accept (Poster),"The main detractor of this paper feels that the paper makes a relatively small technical and empirical contribution given existing results on HER (Andrychowicz et al., NeurIPS 2017). However, several other reviewers, who had more engagement in the discussion, were strong supporters. Having looked at the paper myself I thought the selection of experimental problems undermined the results. Experiments are most compelling when many unaffiliated groups compete on the same benchmarks. But the basic idea of integrating HER with AlphaZero, and a reasonable attempt at this, seems to be interesting enough to warrant a poster.",ICLR2022, +SkyjIkprf,1517250000000.0,1517260000000.0,850,SJlhPMWAW,SJlhPMWAW,ICLR 2018 Conference Acceptance Decision,Reject,"The authors present GraphVAE, a method for fitting a generative deep model, a variational autoencoder, to small graphs. Fitting deep learning models to graphs remains challenging (although there is relevant literature as brought up by the reviewers and anonymous comments) and this paper is a strong start. + +In weighing the various reviews, AnonReviewer3 is weighed more highly than AnonReviewer1 and AnonReviewer2 since that review is far more thorough and the reviewer is more expert on this subject. Unfortunately, the review from AnonReviewer1 is extremely short and of very low confidence. As such, this paper sits just below the borderline for acceptance. In general, the main criticisms of the paper are that some claims are too strong (e.g. non-differentiability of discrete structures), treatment of related work (missing references, etc.) and weak experiments and baselines. The consensus among the reviews (even AnonReviewer2) is that the paper is preliminary. The paper is close, however, and addressing these concerns will make the paper much stronger. + +Pros: +- Proposes a method to build a generative deep model of graphs +- Addresses a timely and interesting topic in deep learning +- Exposition is clear + +Cons: +- Treatment of related literature should be improved +- Experiments and baselines are somewhat weak +- ""Preliminary"" +- Only works on rather small graphs (i.e. O(k^4) for graphs with k nodes)",ICLR2018, +B53UQddfwa,1576800000000.0,1576800000000.0,1,SkxJ8REYPH,SkxJ8REYPH,Paper Decision,Accept (Poster),"This paper presents a new approach, SlowMo, to improve communication-efficient distribution training with SGD. The main method is based on the BMUF approach and relies on workers to periodically synchronize and perform a momentum update. This works well in practice as shown in the empirical results. + +Reviewers had a couple of concerns regarding the significance of the contributions. After the rebuttal period some of their doubts were clarified. Even though they find that the solutions of the paper are an incremental extension of existing work, they believe this is a useful extension. For this reason, I recommend to accept this paper.",ICLR2020, +jt9gk0Eq0,1576800000000.0,1576800000000.0,1,BkeoaeHKDS,BkeoaeHKDS,Paper Decision,Accept (Poster),"The paper makes a reasonable contribution to extracting useful features from a pre-trained neural network. The approach is conceptually simple and sufficient evidence is provided of its effectiveness. In addition to the connection to tangent kernels there also appears to be a relationship to holographic feature representations of deep networks. The authors did do a reasonable job of providing additional ablation studies, but the paper would be improved if a clearer study were added to investigate applying the technique to different layers. All of the reviewer comments appear worthwhile, but AnonReviewer2 in particular provides important guidance for improving the paper.",ICLR2020, +LANN56zuC6,1576800000000.0,1576800000000.0,1,BJe932EYwS,BJe932EYwS,Paper Decision,Reject,"This paper presents a non-autoregressive NMT model which predicts the positions of the words to be produced as a latent variable in addition to predicting the words. This is a novel idea in the field of several other papers which are trying to do similar things, and obtains good results on benchmark tasks. The major concerns are systematic comparisons with the FlowSeq paper which seems to have been published before the ICLR submission deadline. The reviewers are still not convinced by the empirical performance comparison as well as speed comparisons. With some more work this could be a good contribution. As of now, I am recommending a Rejection.",ICLR2020, +anGgYHvDN8V,1610040000000.0,1610470000000.0,1,E4PK0rg2eP,E4PK0rg2eP,Final Decision,Reject,"This paper studies a problem setup of parameter-efficient transfer learning for large-scale deep models. The approach consists of learning a diff vector with a sparsity constraint and then pruning the vector using magnitude pruning. A group penalty is also introduced to enhance structured sparsity. The main motivation is that for each new task, we only need to add a few parameters based on a pre-trained model without fine-tuning it. + +The proposed approach possesses technical soundness and shows empirical efficacy for the studied problem setup. During the rebuttal and discussion phases, two of the reviewers raised two major concerns based on which they strongly disagreed with acceptance: +- The problem setup is not elaborated sufficiently and falls short of plausibility. An approach targeting at efficiency should either improve inference speed or reduce storage cost. Unfortunately, neither advantage has been well approached. +- The technical novelty is somewhat incremental, given the rich previous work on residual adapter, network re-parameterization, and network compression (pruning, sparsity etc.). + +AC read the paper and agreed that, while the paper has some merit such as a better model for the particular problem setup, the reviewers' concerns are reasonable and need to be addressed in a more convincing way. For example, try to study a practical application in which the proposed approach is essential and useful for efficiency enhancement.",ICLR2021, +S1eOeKDmeN,1544940000000.0,1545350000000.0,1,ryGgSsAcFQ,ryGgSsAcFQ,Analysis of obstructions in skinny networks,Accept (Poster),"The paper shows limitations on the types of functions that can be represented by deep skinny networks for certain classes of activation functions, independently of the number of layers. With many other works discussing capabilities but not limitations, the paper contributes to a relatively underexplored topic. + +The settings capture a large family of activation functions, but exclude others, such as polynomial activations, for which the considered type of obstructions would not apply. Also a concern is raised about it not being clear how this theoretical result can shed insight on the empirical study of neural networks. + +The authors have responded to some of the comments of the reviewers, but not to all comments, in particular comments of reviewer 1, who's positive review is conditional on the authors addressing some points. + +The reviewers are all confident and are moderately positive, positive, or very positive about this paper. ",ICLR2019,4: The area chair is confident but not absolutely certain +Qu7Ac7yzbuK,1610040000000.0,1610470000000.0,1,hkMoYYEkBoI,hkMoYYEkBoI,Final Decision,Accept (Poster),"This paper aims at answering an interesting question that puzzles the whole community of deep learning: why CNNs perform better than FCNs? The authors show that CNNs can solve the k-pattern problem much more efficiently than FCNs, which partially contributes to the answer of the question. + +Pros: +1. Studies an interesting question on DNNs. +2. Constructs a specific problem, the k-pattern problem, so that CNNs can solve much more efficiently than FCNs. + +Cons: +1. The analysis is only a very limited answer to the question. It only shows that CNNs are more efficient than FCNs on a very specific problem, which is of little interest to the community. On the one hand, people want to see the advantage of CNNs on more common problems, perhaps the image recognition problem (The AC understands that analyzing this problem is nearly impossible. It is just for hinting the choice of problems to analyze)? On the other hand, maybe others can find another specific problem that FCNs can solve much more efficiently than CNNs. If so, the value of this paper will be totally gone. The authors did not exclude such a possibility (Nonetheless, it is still a computational ""separation"" between CNNs and FCNs :)). +2. Reviewer #4 pointed out an issue in the proof. The response from the authors, though looked promising, did not fully convince the reviewer (in the confidential comment). Reviewer #3 also raised a question on the bounded stepsize. The authors should address both issues. + +Overall, since the problem studied is of great interest to the community and the analysis is mostly sound, the AC recommended acceptance.",ICLR2021, +OqzNx2PIWcC,1610040000000.0,1610470000000.0,1,FUtMxDTJ_h,FUtMxDTJ_h,Final Decision,Reject,"This paper proposes to learn symmetries of a physical system jointly with its Hamiltonian from data by learning a canonical transformation that render some of the coordinates constant. +The Hamiltonian dynamics and ""canonical"" transformation are softly enforced via loss terms. +A few experiments are performed demonstrating that the idea works and can learn a few approximate invariants, as well as some improvements over baselines agnostic to the symmetries. +The idea is interesting, but the experiments are limited in scope. It is not clear how to extend this idea to more complex systems where we do not know the number of conserved quantities in advance. It is also not clear how good are the learned invariants, as the results showing errors in conserved quantities (Fig 3) suggest that it is not very precise beyond a few time steps. +",ICLR2021, +dIMVu9eEDe,1576800000000.0,1576800000000.0,1,SyxGoJrtPr,SyxGoJrtPr,Paper Decision,Reject,"This paper proposes a new training technique to produce a learned model robust against adversarial attacks -- without explicitly training on example attacked images. The core idea being that such a training scheme has the potential to reduce the cost in terms of training time for obtaining robustness, while also potentially increasing the clean performance. The method does so by proposing a version of label smoothing and doing two forms of data augmentations (gaussian noise and mixup). + +The reviewers were mixed on this work. Two recommended weak reject while one recommended weak accept. All agreed that this work addressed an important problem and that the proposed solution was interesting. The authors and reviewers actively engaged in a discussion, in some cases with multiple back and forths. The main concern of the reviewers is the inconclusive experimental evidence. Though the authors did demonstrate strong performance on PGD attacks, the reviewers had concerns about some attack settings like epsilon and how that may unfairly disadvantage the baselines. In addition, the results on CW presented a different story than the results with PGD. + +Therefore, we do not recommend this work for acceptance in its current form. The work offers strong preliminary evidence of a potential solution to provide robustness without direct adversarial training, but more analysis and explanation of when each component of their proposed solution should increase robustness is needed. +",ICLR2020, +BkeutKjWgV,1544830000000.0,1545350000000.0,1,SkeRTsAcYm,SkeRTsAcYm,"Application specific paper, but well written with interesting evaluations and analysis",Accept (Poster),"The authors propose an algorithm for enhancing noisy speech by also accounting for the phase information. This is done by adapting UNets to handle features defined in the complex space, and by adapting the loss function to improve an appropriate evaluation metric. + +Strengths +- Modifies existing techniques well to better suit the domain for which the algorithm is being proposed. Modifications like extending UNet to complex Unet to deal with phase, redefining the mask and loss are all interesting improvements. +- Extensive results and analysis. + +Weaknesses +- The work is centered around speech enhancement, and hence has limited focus. + +Even though the paper is limited to speech enhancement, the reviewers agreed that the contributions made by the paper are significant and can help improve related applications like ASR. The paper is well written with interesting results and analysis. Therefore, it is recommended that the paper be accepted. +",ICLR2019,5: The area chair is absolutely certain +S1lpnWwkgV,1544680000000.0,1545350000000.0,1,Byx83s09Km,Byx83s09Km,Well written paper with an novel approach for exploration,Accept (Poster),"The paper introduces a method for using information directed sampling, by taking advantage of recent advances in computing parametric uncertainty and variance estimates for returns. These estimates are used to estimate the information gain, based on a formula from (Kirschner & Krause, 2018) for the bandit setting. This paper takes these ideas and puts them together in a reasonably easy-to-use and understandable way for the reinforcement learning setting, which is both nontrivial and useful. The work then demonstrates some successes in Atari. Though it is of course laudable that the paper runs on 57 Atari games, it would make the paper even stronger if a simpler setting (some toy domain) was investigated to more systematically understand this approach and some choices in the approach.",ICLR2019,4: The area chair is confident but not absolutely certain +5C0mw9s2jN,1610040000000.0,1610470000000.0,1,VqzVhqxkjH1,VqzVhqxkjH1,Final Decision,Accept (Spotlight),"The paper presents a new idea for detection of model stealing attacks. The new method generates ""fingerprint"", i.e., adversarial examples that transfer to surrogate models (extracted in model stealing attacks) but not to reference models (i.e., models obtained independently from the same data). If a model owner suspects that some model is stolen, fingerprints can be used for verifications of such claims. + +The paper's contribution is novel and significant. It is the first practical tool, to my knowledge, suitable for a reliable characterization of stolen models. The empirical results are quite impressive demonstrating the detection of stolen models with an AUC = 1.0. Some presentations issues have been addressed by the authors during the revision. ",ICLR2021, +SkHfmkpSf,1517250000000.0,1517260000000.0,88,HyH9lbZAW,HyH9lbZAW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Thank you for submitting you paper to ICLR. The paper presents a general approach for handling inference in probabilistic graphical models that employ deep neural networks. The framework extends Jonhson et al. (2016) and Khan & Lin (2017). The reviewers are all in agreement that the paper is suitable for publication. The paper is well written and the use of examples to illustrate the applicability of the methods brings great clarity. The experiments are not the strongest suit of the paper and, although the revision has improved this aspect, I would encourage a more comprehensive evaluation of the proposed methods. Nevertheless, this is a strong paper.",ICLR2018, +7OztopXgxO,1576800000000.0,1576800000000.0,1,SygEukHYvB,SygEukHYvB,Paper Decision,Reject,"This paper proposes CEB, Conditional Entropy Bottleneck, as a way to improves the robustness of a model against adversarial attacks and noisy data. The model is tested empirically using several experiments and various datasets. + +We appreciate the authors for submitting the paper to ICLR and providing detailed responses to the reviewers' comments and concerns. After the initial reviews and rebuttal, we had extensive discussions to judge whether the contributions are clear and sufficient for publication. In particular, we discussed the overlap with a previous (arXiv) paper and decided that the overlap should not be considered because it is not published at a conference or journal. Plus the paper makes additional contributions. + +However, reviewers in the end did not think the paper showed sufficient explanation and proof of why and how this model works, and whether this approach improves upon other state-of-the-art adversarial defense approaches. + +Again, thank you for submitting to ICLR, and I hope to see an improved version in a future publication.",ICLR2020, +B1xnvXEZeN,1544790000000.0,1545350000000.0,1,rkzjUoAcFX,rkzjUoAcFX,Limited novelty but sound experimental work,Accept (Poster),"The paper benchmarks three strategies to adapt an existing TTS system (based on WaveNet) to new speakers. + +The paper is clearly written. The models and adaptation strategies are not very novel, but still a scientific contribution. Overall, the experimental results are detailed and convincing. The rebuttals addressed some of the concerns. + +This is a welcomed contribution to ICLR 2019.",ICLR2019,4: The area chair is confident but not absolutely certain +O4Khubgt9G,1642700000000.0,1642700000000.0,1,uSE03demja,uSE03demja,Paper Decision,Accept (Oral),"This paper proposes a method to solve the inverse problem of identifying parameters of a dynamic physical system from image observations. The main idea is to train a rendering-invariant state-prediction (RISP), which estimates the inverse mapping from the pixel to the state domain. The authors introduce a new loss to this end, and an efficient gradient computation of the loss. + +The paper received three clear accept recommendations. The reviewers discussed the potential improvement of RISP when combined to disentanglement methods, and also raise several concerns regarding experiments, e.g. rendering conditions during training and testing, or evaluation on real data. The rebuttal did a good job in answering reviewers' concerns, and the reviewers especially appreciated the new results on real videos. Eventually, all reviewers recommended a clear acceptance of the paper. + +The AC's own readings confirmed the reviewers' recommendations. The paper is introduces very solid contributions for solving the complex task of physical parameter identification in the unobservable setting. The paper is also clear and well written, and validated with convincing experimental results. Therefore, the AC recommends acceptance.",ICLR2022, +8BwxKqYObfLq,1642700000000.0,1642700000000.0,1,IEsx-jwFk3g,IEsx-jwFk3g,Paper Decision,Reject,"This paper proposes a Graph Neural Network model to estimate latent dynamics in the human brain using functional Magnetic Resonance Imaging (fMRI) and Diffusion Weighted Imaging (DWI). The representation is tested on a classification task. While reviewers acknowledge the importance of this application, various concerns have been raised and partially addressed. The work focuses on graph deep learning and offers limited evidence of its superiority over more traditional ML or non graph based deep learning. Besides the methodological novelty is unclearly argued, which is not ideal for the audience of a conference like ICLR. + +For all these reasons, this work cannot be endorsed for publication at ICLR 2022.",ICLR2022, +4QT62hcEVr-y,1642700000000.0,1642700000000.0,1,anWCFENEc5H,anWCFENEc5H,Paper Decision,Reject,"This paper aims to model adversarial noise by learning the transition relationship between adversarial labels and natural labels. In particular, an instance-dependent transition matrix to relate adversarial labels and natural labels. Reviewers agreed that the paper is well motivated and well written, and the proposed method is novel. Meanwhile, reviewers raised some concerns about experiments and paper presentation. During discussion, the authors provided a lot of additional results that partially addressed the reviewers' concerns. However, the reviewers still think the experimental part of this paper should be further strengthened before acceptance. + +Thus, I recommend to reject this paper. I encourage the authors to take the review feedback into account and submit a future version to another venue.",ICLR2022, +8OrVU4Ux26,1642700000000.0,1642700000000.0,1,tCx6AefvuPf,tCx6AefvuPf,Paper Decision,Reject,"This paper received some additional discussion between the reviewers and the area chair. The reviewers were largely unswayed by the author responses. One concern was the level of technical novelty, feeling that this was largely a straightforward adaptation of DPSGD (as, admittedly, most works in the DP ML setting are). The primary technical contribution may be the sampling amplification theorem, which one reviewer felt was also straightforward from previous work. Other criticisms was that the privacy parameter epsilon is rather large, and that results are restricted to 1-layer GNNs. Generally, the work did not feel very novel to reviewers from either the privacy or the GNN community. However, they felt that the paper could benefit substantially from exploration and implementation of the comments made in the responses, so the authors are encouraged to pursue those directions. Some of the many suggestions from reviewer Xcpu may help the authors make the paper appeal more to the GNN community.",ICLR2022, +OXU2TxqGUFp,1642700000000.0,1642700000000.0,1,jJJWwrMrEsx,jJJWwrMrEsx,Paper Decision,Reject,"I think there is good research behind this paper, but the presentation issues make it difficult to argue for acceptance. + +On the positive side, the paper has made a clear advance in terms of the ability to do full SAT-based verification of neural networks. However, there are also important issues with the paper that prevent it from being accepted: +* The paper argues for the value of the new approach for *both* verifiability and interpretability, where interpretability is measured in terms of the ability to make targeted adjustments to the network to change its behavior. These are very different goals, but they are conflated in different parts of the paper, leading to confusion, for example, from reviewer RhEH. +* The paper only compares against SAT/SMT-based verification, but completely ignores other approaches to verification that are arguably more effective for many problems. In particular, there is an emerging literature on Abstract Interpretation-based verification that is significantly more scalable than SAT-based verification and which this paper ignores. +* The paper's claims sometimes get ahead of the presented evidence, as pointed out by reviewer garj. + +So overall, I think this paper needs another iteration before it is ready for acceptance.",ICLR2022, +8GwH1ezxoE1,1610040000000.0,1610470000000.0,1,V1ZHVxJ6dSS,V1ZHVxJ6dSS,Final Decision,Accept (Poster),"The paper proposes an approach for solving constrained optimization problems using deep learning. The key idea is to separate equality and inequality constraints and ""solve"" for the equality constraints separately. Empirical results are given for convex QPs and for a non-convex problem that arises in AC optimal power flow. +There was much discussion of this paper between the reviewers and the area chair. THe key question was whether the empirical evaluation is sufficient to convince that the method is more effective than existing solvers. The current experiments do not show that the method achieves better solutions than existing solvers. For the convex case this is to be expected since solvers are optimal. But in the non-convex case, it would have been nice to see that the method indeed can find better solutions. +This leaves the advantage of the method in its speedup over existing methods. However, as the authors acknowledge, it is possible that this speedup is due to better use of parallelization than the methods they compare to. It is true that deep learning is particularly easy to parallelize, but this is not impossible for other methods (e.g., for linear algebra operations etc). +Thus, taken together the empirical support for the current method is somewhat limited. The method itself does make sense, and this was indeed appreciated by the reviewers. + +",ICLR2021, +Rxtb3xd8Biz,1642700000000.0,1642700000000.0,1,DtfrnB1fiX,DtfrnB1fiX,Paper Decision,Reject,"This paper proposes a variant of stochastic gradient descent that parallelizes the algorithm for distributed training via delayed gradient averaging. While the algorithm (DaSGD) proposed is sensible and seems to work, it also seems to miss a lot of related work. As pointed out by one of the reviewers, the class of asynchronous decentralized methods already seem to cover the space of DaSGD, and it's not clear how DaSGD differs from the existing methods in this space. As a result of this lack of comparison to related work, the reviewers recommended that the paper not be accepted at this time, and this evaluation was not challenged by an author response. I agree with this consensus.",ICLR2022, +#NAME?,1610040000000.0,1610470000000.0,1,u9ax42K7ND,u9ax42K7ND,Final Decision,Reject,"The reviewers agreed that the paper presents interesting ideas but the presentation of the paper needs be improved. Also, the experiments and the related work section need be improved.",ICLR2021, +yvpkM6pA4BZ,1610040000000.0,1610470000000.0,1,OEgDatKuz2O,OEgDatKuz2O,Final Decision,Reject,"This work proposes an EM type of approach for domain adaptation under covariate shift. The approach well motivated and developed and experimentally evaluated on synthetic data. + +Pro: +- The EM type of framework is simple and natural and promising direction for DA, which should be explored and analyzed further. + +Con: +- The presentation is highly overselling the results. Both in terms of the generality of the findings and in terms references to privacy preserving properties. Both would need a solid formal analysis which this submission does not provide. +- Several reviewers have stated that, while the authors promised updates to their manuscript during the author response phase, no such updated submission has been made. +- The work bases their approach by referring to a well known theoretical DA bound by Ben-David et al (2010). The theorem is not stated correctly. The most important component in that work is to restrict the models to a class of bounded capacity. +- The claim of the authors of ""solving the problem"" under covariate shift are overstated. It is reasonable to expect that the authors provide a more thorough analysis of the limitations or their approach, that is, clearly state the conditions under which it would succeed and fail. Below are some references on lower bounds of DA under covariate shift. +- Given that the theoretical analysis is limited, a more thorough experimental exploration would be expected. + +Refs on difficulty of DA learning under covariate shift and bounded d_H distance: + +Shai Ben-David, Ruth Urner: +On the Hardness of Domain Adaptation and the Utility of Unlabeled Target Samples. ALT 2012: 139-153 + +Shai Ben-David, Tyler Lu, Teresa Luu, Dávid Pál: +Impossibility Theorems for Domain Adaptation. AISTATS 2010: 129-136 ",ICLR2021, +H8qEMqeGBga,1610040000000.0,1610470000000.0,1,Jnspzp-oIZE,Jnspzp-oIZE,Final Decision,Accept (Spotlight)," +This paper addresses a crucial problem with graph convolutions on meshes. +The authors identify the issues related to existing networks and devise a sensible approach. +The work presents a novel message passing GNN operator for meshes that is equivariant under gauge transformations. +The reviewers unanimously agree on the both the importance of the problem and the impact the proposed work could have. + +Suggestions for next version: +- The paper is unreadable without the appendix and somehow it would be better to make it self-contained +- Additional references as suggested in the reviews. +- Expanded experiments as suggested by R4, will also improve reader's confidence in the method. + +I would recommend acceptance. I would request the authors to release a sufficiently documented and easy to use implementation. This not only allows readers to build on this work but also increase the overall impact of this method.",ICLR2021, +qcOxhxY2Ezc,1610040000000.0,1610470000000.0,1,HQoCa9WODc0,HQoCa9WODc0,Final Decision,Reject,"The paper proposes to use reconstruction error of autoencoder as the energy function and normalize the resulting density for detecting anomalous/OOD examples. Reviewers have raised several concerns with the paper, including, lack of insights into why the AE energy is better for OOD detection than other energy function parameterizations, and incremental nature of the proposed method. Authors have not responded to these concerns. The paper is not suitable for publication in its current form. ",ICLR2021, +xAg8qFgNyh4,1610040000000.0,1610470000000.0,1,2kImxCmYBic,2kImxCmYBic,Final Decision,Reject,"While I'm sure there are many merits to the underlying work here, the consensus of the reviews is to recommend a rejection as an ICLR paper. That recommendation is based on issues with significance as well as on clarity issues, noted by reviewers even after the revisions. + +One pattern I noticed was that it seemed unclear whether the paper was to be regarded primarily as a software paper or as a paper on preprocessing. Most initial reviews evaluated it primarily as a software paper, but some comments from the authors in the discussion period seemed to frame it instead as a paper about research on preprocessing (independent of software). See my other comment for more detail on this question. + +Regardless of the intended framing, on significance as an ICLR submission, reviewers did not support its acceptance by either standard: +* R1 post-response: ""Regarding how to frame the paper (either about feature pre-processing or the software library), my (favourable) interpretation is to frame it as about the software library. As a paper about feature pre-processing it would have even less merit."" +* R3 post-response: ""the work has more upside in the software contribution than the feature pre-processing research"" +* R4 post-respones: ""After reading the revised version and the author response, I am still not convinced that the paper makes a substantial contribution either on the fundamental research angle or the software library angle"" + +One specific issue was that the experiments did not adequately support the main claims: +* R2 post-response: ""I feel that the experiments in Section 8 are still too limited to demonstrate the value of the techniques and software"" +* R4 post-response: ""While the main contributions of the paper are still a little ambiguous, the experiments don't seem to support the claims (ease of use of the library). The results seem to suggest that one of the contributions is the improved accuracy of results, but then the experimentation is far too limited to draw such a conclusion."" + +On clarity: +* R1 post-response: ""The new section 5 is still very hard to understand and I still couldn't make sense of the family tree primitive mechanism"" +* R3 post-response: ""I agree with AnonReviewer1 about the difficulty of understanding the family tree primitives, both before and after the revision"" +* R2 post-response: ""I am still unclear on the family tree primitives [...] it's not clear to me what problem the family tree primitives solve"" + + +The authors showed a lot of enthusiasm and good spirits in working to improve the submission. I hope the feedback provided here is useful. + +However, based on the consensus of the reviews, I recommend rejection of this submission.",ICLR2021, +Az6NxVlMzn8,1642700000000.0,1642700000000.0,1,7AzOUBeajwl,7AzOUBeajwl,Paper Decision,Reject,"This paper studies text style transfer which aims to edit a given sentence to possess a desired style value (e.g., positive sentiment) while keeping all other styles and content unchanged. The paper specifically focuses on a challenging setting where besides the target style (e.g., sentiment) to transfer, there exists confounding attributes (e.g., product category) that correlate with the target style, making it hard to change only the target style while preserving the other. The proposed approach is to learn an invariant/unbiased style classifier using Invariance Risk Minimization (IRM), together with an orthogonal classifier for monitoring style-independent changes (e.g., product category), to supervise the generator training. The main concerns are on the experiments -- it's suggested to include experiments on other styles besides sentiment; human evaluation and/or other metrics are needed for more convincing comparison; it's also encouraged to experiment with large language models (e.g., GPT-2, BART) besides the small LSTM/CNN networks as in the present work.",ICLR2022, +f1nr9JxkG7H,1642700000000.0,1642700000000.0,1,2_dQlkDHnvN,2_dQlkDHnvN,Paper Decision,Reject,"The paper proposes a new defense against backdoor attacks utilizing an improved version of defenses against noisy label attacks. The connection between these two problems is interesting and novel, which is acknowledge by all reviewers. The main drawback of the paper is, however, its experimental evaluation. The experiments are carried out only on one benchmark and the considered attack ratios are indicative for indiscriminative poisoning attacks rather than targeted backdoors. The AC addressed the summarized response and is not convinced that the reviews are biased. Despite the scarce response of the reviewers to the author rebuttal, the limitations of the experimental evaluation seem to persist in the revised version of the paper. While acknowledging the novelty and the overall good quality of the paper, the weakness of its experimental evaluation puts at in the position marginally below the acceptance threshold. The AC encourages the authors to revise the paper and improve on the pointed out weaknesses and is confident that this work will be well accepted by the scientific community.",ICLR2022, +Bye3zdIWxE,1544800000000.0,1545350000000.0,1,Hylyui09tm,Hylyui09tm,Some interesting ideas with concerns about motivation and experiments,Reject,"This paper proposes a method to compute embeddings of states and actions that facilitate computing measures of surprise for intrinsic reward. Though some of the ideas are quite interesting, there are currently issues with the experiments and the motivation. + +The experiments have high variance across the 5 runs, with significant overlap of shaded regions representing just one standard deviation from the mean. It is hard to draw any conclusions about improved performance, and statements like the following are much too strong: ""For vision-based exploration tasks, our results in Figure 5 show that EMI achieves the state of the art performance on Freeway, Frostbite, Venture, and Montezuma’s Revenge in comparison to the baseline exploration methods."" Further, the proposed approach has three new hyperparameters (lambdas), without much understanding into how to set them or their effect on the results. Specific values are reported for the different game types, without explanation for how or why these values were chosen. + +Similarly strong claims, that are not well substantiated, are given for the proposed approach. This paper seems to suggest that this is a principled approach to using surprise for exploration, contrasted to other ad-hoc approaches (""Other approaches utilize more ad-hoc measures (Pathak et al., 2017; Tang et al., 2017) that aim to approximate surprise.""). Yet, the paper does not define surprise (say by citing work by Itti and Baldi on Bayesian surprise), and then proposes what is largely a intuitive approach to providing a good intrinsic reward related to surprise. For example, ""we show that imposing linear topology on the learned embedding representation space (such that the transitions are linear), thereby offloading most of the modeling burden onto the embedding function itself, provides an essential informative measure of surprise when visiting novel states."" This might be intuitively true, but I do not see a clear demonstration in Section 4.2 actually showing that this restriction provides a measure of surprise. Additionally, some of the choices in Section 4.2 are about estimating ""irreducible error under the linear dynamics model"", but irreducible error is about inherent uncertainty (due to stochasticity and partial observability), not due to the choice of modeling class. In general, many intuitive choices in the algorithm need to be better justified, and some claims disparaging other work for being ad-hoc should be toned down. + +Overall, this paper is as yet a bit preliminary, in terms of clarity and experiments. In a further iteration, with some improvements, it could be a useful contribution for exploration in image-based environments. ",ICLR2019,4: The area chair is confident but not absolutely certain +KysmspMNp8,1642700000000.0,1642700000000.0,1,pC00NfsvnSK,pC00NfsvnSK,Paper Decision,Reject,"The paper proposes a new offline RL technique to generalize across domains. The paper was initially confusing (i.e., MDP vs POMDP) and weak empirically. The authors greatly improved the paper. However, a the end of the day, it is still not clear why the proposed approach performs better than existing techniques. We can think of the cumulant function with the discrete labels as essentially computing some statistics of future actions, observations and rewards. This is what every self supervised technique does. They differ in terms of their particular choice of statistics and architecture. The paper does not sufficiently motivate the particular architecture. Interestingly, in the experiments, the best statistics are cumulative rewards, which are closely related to the Q-values. In that case, it is even less clear why the approach should be beneficial since RL techniques that generalize across domains by learning state representations to predict Q-values seem very closely related. + +Despite the updates to the paper, the POMDP references are still confusing. The issue is that the paper embeds observations as if they were sufficient to predict future observations and rewards. This corresponds to the memoryless approach where a policy is optimized based on the last observation instead of the history of past actions and observations. Memoryless strategies are effective only when the last observation is a sufficient statistic, meaning that we really have a (near) fully observable MDP. The paper should discuss this and acknowledge that the approach will suffer in domains where memory of past actions and observations is critical.",ICLR2022, +Byx-KBrxg4,1544730000000.0,1545350000000.0,1,Byx7LjRcYm,Byx7LjRcYm,Strong agreement for rejection,Reject,"Average score of 3.33, highest score of 4. +The AC recommends rejection. +",ICLR2019,4: The area chair is confident but not absolutely certain +FMLO_8jk-b,1610040000000.0,1610470000000.0,1,7MjfPd-Irao,7MjfPd-Irao,Final Decision,Reject,"I thank the authors for their submission and very active participation in the author response period. I want to start by stating that I rank the paper higher as is currently reflected in the average score of the reviewers. The reasons for this are that a) R2 and R3, while responding to the author's rebuttal, do not seem to have updated their score or indicated that they want to keep their initial assessment of the paper -- in particular, R2 has acknowledged that additional experiments by the authors were useful and results on KeyCorridorS4/S5R3 are nice, and b) I disagree with R2's sentiment that MiniGrid is not a suitable testbed -- it is by now an established benchmark for evaluating RL exploration and representation learning methods (see list of publications on https://github.com/maximecb/gym-minigrid). However, despite my more positive stance on the paper, I fully agree with R1 and R2 that a comparison to EC is needed in order to shed light into which factors of EC-SimCLR actually led to improvements in comparison to RIDE. I therefore recommend rejection, but I strongly encourage the authors to take the feedback from the reviewers and work on a revised submission to the next venue.",ICLR2021, +gsh6tfUcK2,1576800000000.0,1576800000000.0,1,H1loF2NFwr,H1loF2NFwr,Paper Decision,Accept (Poster),"This is one of several recent parallel papers that pointed out issues with neural architecture search (NAS). It shows that several NAS algorithms do not perform better than random search and finds that their weight sharing mechanism leads to low correlations of the search performance and final evaluation performance. Code is available to ensure reproducibility of the work. + +After the discussion period, all reviewers are mildly in favour of accepting the paper. + +My recommendation is therefore to accept the paper. The paper's results may in part appear to be old news by now, but they were not when the paper first appeared on arXiv (in parallel to Li & Talwalkar, so similarities to that work should not be held against this paper).",ICLR2020, +HklkprlZlV,1544780000000.0,1545350000000.0,1,HkNGYjR9FX,HkNGYjR9FX,Simple but strong method,Accept (Poster),"This work proposes a simple but useful way to train RNN with binary / ternary weights for improving memory and power efficiency. The paper presented a sequence of experiments on various benchmarks and demonstrated significant improvement on memory size with only minor decrease of accuracy. Authors' rebuttal addressed the reviewers' concern nicely. + +",ICLR2019,5: The area chair is absolutely certain +y5mxAZhlLO2,1642700000000.0,1642700000000.0,1,YxWU4YZ4Cr,YxWU4YZ4Cr,Paper Decision,Reject,"This paper studies different inductive biases that would improve OOD generalization (and in particular under translation, rotation and scaling) for image tasks. The study is focused on a toy dataset which allows authors to have more control over the data generation process and the transformations. Authors further show that iterative training using an auto-encoder and presenting data in log-polar space helps with rotation and scaling transformations on their toy dataset. + + +Strong Points: +- The paper is well written and easy to follow. +- The data generation process and the resulting toy dataset are novel and interesting. +- The experiment design and evaluations are solid. + +Weak Points: +- No natural image datasets: While using a toy dataset has several benefits it does not grant that the conclusions would generalize to realistic settings. Reviewers have suggested several realistic datasets and I encourage authors to evaluate their findings on some of these datasets. +- Limited Baselines: As reviewers have pointed, comparison with baselines can be improved by including stronger baselines as well as more clear discussion about other techniques such as augmenting the data with the transformations. +- Related Work: A proper discussion of related work to set the context and highlight the contributions of this paper is missing. In particular, reviewers have pointed to prior work on the benefits of presenting the image in log-polar space. + +Unfortunately, authors did not engage with reviewers during the discussion period. Given the prior work and lack of any natural image dataset, I think the novelty and significance of this work is limited. Therefore, I recommend rejecting the paper. However, I encourage authors to improve the paper by addressing the points raised by the reviewers.",ICLR2022, +EsM4dbXVtaJ,1610040000000.0,1610470000000.0,1,qYZD-AO1Vn,qYZD-AO1Vn,Final Decision,Accept (Poster),"This paper proposes a differentiable trust region based on closed-form projects for deep reinforcement learning. The update is derived for three types of trust regions: KL divergence, Wasserstein L2 distance, and Frobenius norm, applied to PPO and PAPI, and shown to perform comparably to the original algorithms. + +While empirically the proposed solutions does not bring clear benefits in terms of performance, as correctly acknowledged by the authors, it is rigorously derived and carefully described, bringing valuable insights and new tools to the deep RL toolbox. The authors improved the initial submission substantially based on the reviews during the discussion period, and the reviewers generally agree that the work is of sufficient quality that merits publication. To improve the paper and its impact, I would recommend applying the method to also off-policy algorithms for completeness. Overall, I recommend accepting this submission.",ICLR2021, +Byok41prM,1517250000000.0,1517260000000.0,269,SJLlmG-AZ,SJLlmG-AZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"An interesting model, for an interesting problem but perhaps of limited applicability - doesn't achieve state of the art results on practical tasks. +Paper has other limitations, though the authors have addressed some in rebuttals.",ICLR2018, +SJg6oHJK1N,1544250000000.0,1545350000000.0,1,SJMZRsC9Y7,SJMZRsC9Y7,meta-review,Reject,"The paper is poorly written and below the bar of ICLR. The paper could be improved with better exposition and stronger experiment results (or clearer exposition of the experimental results.) +",ICLR2019,4: The area chair is confident but not absolutely certain +add47weFkz-,1610040000000.0,1610470000000.0,1,WZnVnlFBKFj,WZnVnlFBKFj,Final Decision,Reject,"Although the reviewers acknowledge some contributions of the paper, it has some limitations on both theoretical results and numerical experiments. It is still unclear about the effectiveness of the proposed method. The authors should consider the following issues for the future submission: + +1) The justification of $\tau$ is not clear in federated learning with respect to communication efficiency (please see Reviewer 1’s comments). + +2) The bounded stochastic gradient assumption is not applicable in the strongly convex case. This issue has been discussed clearly in [Nguyen et. al, “SGD and Hogwild! Convergence Without the Bounded Gradients Assumption”, ICML 2018]. Therefore, the constant G in Section 3.2. would damage the complexity bound since it could be arbitrarily large. + +3) Although the goal is to illustrate the benefits of the proposed quantization approach, the numerical experiments and the theoretical contributions are limited. The theoretical results are incremental from the existing optimization theory (both strongly convex and non-convex). Moreover, network architecture and data sets are not enough to justify the efficiency of the method. +",ICLR2021, +3DJag8mAbKw,1610040000000.0,1610470000000.0,1,QxQkG-gIKJM,QxQkG-gIKJM,Final Decision,Reject,"This paper represents a practical extension of sound theoretical uncertainty propagation ideas for exploration in deepRL. All the reviewers agreed this was a promising direction and the empirical results strong. It was nice to see additional qualitative analysis of the proposed method, beyond the typical ""my number is bigger than yours"" type of claims. The discussion was extensive; reviewers with specific subject matter expertise provided high quality and detailed reviews. Several reviewers were in favour of the paper, but none were willing to champion it as a clear accept. + +Indeed the discussion highlighted important concerns with the paper. Several reviewers found the paper to overclaim: most importantly the paper suggests strong theoretical underpinnings of the method without clear evidence. Some found the text very imprecise. The reviewers were torn if such changes represented wording changes or major rewrites. The original submission missed two key pieces of work which were added during the rebuttal phase---UBE was added by including the scores from the literature, and the other (Bayesian-DQN) was only added to the discussion. Good on the authors for doing so, though ideally both would be implemented again. + +The AC's own reading of the paper highlighted a few other concerns. The writing needs work. In addition, the majority of improvement in overall performance appears to be due to very large improvements in a handful of games (e.g. Atlantis, Krull) and significant losses in performance in other games. This was not discussed at all. More surprisingly these games with huge performance gains were not used in the qualitative visualizations of the utility of the bonus found in the paper. The game breakout was used for analysis instead. Oddly the proposed method actually does worse or the same as SOTA methods (e.g., Adaptive EBU, Boot-DQN, UBE) in breakout. This is difficult to get to the bottom of because: (a) the per-game score tables in the appendix don't include the scores achieved by important baselines (e.g., Boot-DQN, EBU, Adapt EBU), (b) different setups are used across the relevant literature (Boot-Q & UBE papers use 200m frames, EBU paper uses 10m frames, and this paper uses 20m), (c) the per-game analysis in the appendix focuses on comparing methods proposed in the paper under review. That might all make sense, but it is left to the reader to figure out and I never got to the bottom of it all (table 3 of the Lee et al contains some of the relevant comparison data). Such missing details and lack of analysis are particularly important when the paper boldly claims state of the art performance improvement. + +All put together a clear picture emerges: the paper needs polishing, is unclear in places, over-claiming, and missing important analysis and explanations---in regards to both the theory and the experiments. The reviews are extensive and have provided many insights in how to improve the paper. ",ICLR2021, +YIIbXTrMDV,1576800000000.0,1576800000000.0,1,SJeeL04KvH,SJeeL04KvH,Paper Decision,Reject,"This manuscript proposes strategies to improve both the robustness and accuracy of federated learning. Two proposals are online reinforcement learning for adaptive hyperparameter search, and local distribution matching to synchronize the learning trajectories of different local models. + +The reviewers and AC agree that the problem studied is timely and interesting, as it addresses known issues with federated learning. However, this manuscript also received quite divergent reviews, resulting from differences in opinion about the novelty and clarity of the conceptual and empirical results. Taken together, the AC's opinion is that the paper may not be ready for publication.",ICLR2020, +RkfrS1Ai6nO,1642700000000.0,1642700000000.0,1,C81udlH5yMv,C81udlH5yMv,Paper Decision,Reject,The reviewers are in consensus. I recommend that the authors take their recommendations into consideration in revising their manuscript.,ICLR2022, +zFYzaW1i65C,1610040000000.0,1610470000000.0,1,vcKVhY7AZqK,vcKVhY7AZqK,Final Decision,Reject,"This paper proposes a measure of task complexity based on a decision-DAG like ""encoder"" where we iteratively branch on some test on the input and the selection of future tests depends on the answer to previous tests until we reach a terminal node in the DAG. We require that if $x$ and $x'$ reach the same terminal node then $P(y|x) = P(y|x')$. The complexity of the task (the complexity of the distribution $p(x,y)$) is the minimum over all such DAGs of the expected depth of the terminal node for $x$ when drawing $x$ from the marginal $p(x)$. + +The reviewers are not enthusiastic and I agree.",ICLR2021, +SylmTLmLeN,1545120000000.0,1545350000000.0,1,BJzuKiC9KX,BJzuKiC9KX,"Interesting empirical observations of the advantage of RWS, but lacking formal theoretical analysis, and larger scale experiments",Reject,"The paper presents a well conducted empirical study of the Reweighted Wake Sleep (RWS) algorithm (Bornschein and Bengio, 2015). It shows that it performs consistently better than alternatives such as Importance Weighted Autoencoder (IWAE) for the hard problem of learning deep generative models with discrete latent variables acting as a stochastic control flow. +The work is well-written and extracts valuable insights supported by empirical observations: in particular the fact that increasing the number of particles improves learning in RWS but hurts in IWAE, and the fact that RWS can also be successfully applied to continuous variables. +The reviewers and AC note the following weaknesses of the work as it currently stands: a) it is almost exclusively empirical and while reasonable explanations are argued, it does not provide a formal theoretical analysis justifying the observed behaviour b) experiments are limited to MNIST and synthetic data, confirmation of the findings on larger-scale real-world data and model would provide a more complete and convincing evidence. +The paper should be made stronger on at least one (and ideally both) of these accounts. + +",ICLR2019,4: The area chair is confident but not absolutely certain +ICoYgsbIug1,1642700000000.0,1642700000000.0,1,Ucx3DQbC9GH,Ucx3DQbC9GH,Paper Decision,Accept (Poster),"We appreciate the authors for addressing the comments raised by the reviewers during the discussion period, which includes providing more experimental results to address the concerns. We believe the publication of this paper can contribute to the important topic of data augmentation. + +The authors are highly recommended to consider all the comments and suggestions made by the reviewers when further revising their paper for publication.",ICLR2022, +604L9jxmtrSu,1642700000000.0,1642700000000.0,1,6kruvdT0yfY,6kruvdT0yfY,Paper Decision,Reject,"This paper presents work on classification with a background class. The reviewers appreciated the important, standard problem the paper considers. However, concerns were raised regarding presentation, empirical evaluation, clarity, novelty, and signficance of the work. The reviewers considered the authors' response in their subsequent discussions but felt the concerns were not adequately addressed. Based on this feedback the paper is not yet ready for publication in ICLR.",ICLR2022, +ByyO8J6rz,1517250000000.0,1517260000000.0,810,B1ydPgTpW,B1ydPgTpW,ICLR 2018 Conference Acceptance Decision,Reject,"Reviewers concur that the paper and the application area are interesting but that the approaches are not sufficiently novel to justify presentation at ICLR. +",ICLR2018, +Gl2Gw4q727S,1642700000000.0,1642700000000.0,1,o_HsiMPYh_x,o_HsiMPYh_x,Paper Decision,Accept (Poster),"The authors propose a simple method to estimate the accuracy of a classifier on an unlabeled dataset given an in-distribution validation set. In extensive experiments the authors show that the proposed method is significantly more accurate than previous methods and other baselines. + +The reviewers are quite consistent in their judgement, just the weighting of the different aspects is different. +After the rebuttal four out of five reviewers recommend acceptance. + +Strong points: +- simplicity of the method +- strong experimental results for various tasks and domain shift problems + +Weak points: +- there is no clear theoretical statement when the method is supposed to work +- the discussion in Section 3.1 is pretty obvious and seems a bit like a waste of space whereas the motivation for the actual method is very short + +While I agree with the reviewers that there is little theoretical justification for the method, the strong experimental results on various datasets, tasks and different domain shifts make this paper interesting for a large audience. Thus this paper is a nice contribution to ICLR and I recommend acceptance. +However, I strongly recommend to the authors to add more motivation in Section 4 and add a limitation section where the cases are discussed where the method is definitely not working. Section 3 is pretty obvious and could be significantly shortened or integrated into the limitations section. + +One case which is highly relevant for this limitations section is the provable asymptotic overconfidence of neural networks which is discussed in + +Hein et al, Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem, CVPR 2019 + +This would definitely lead to a failure of the presented method as all predictions would get a score above the threshold. I would also assume that the method would predict high accuracy values for out-of-distribution tasks which are semantically similar e.g. training on CIFAR10 and then using CIFAR100 as unlabeled dataset. In that context it would be interesting to evaluate OOD-aware classifiers using ATC such as discussed in + +Hendrycks et al, Deep Anomaly Detection with Outlier Exposure, ICLR 2019 + +Also it would be helpful to understand better the influence of the classifier performance on the original task on the performance of ATC on unlabeled data.",ICLR2022, +nKvormicLoX,1642700000000.0,1642700000000.0,1,qhkFX-HLuHV,qhkFX-HLuHV,Paper Decision,Accept (Poster),"This paper regards video understanding as an image classification task, and reports promising performance against state of the arts on several standard benchmarks. Though the method is quite simple, it achieves good results. The visualization in this paper also provides good insight. All reviewers give positive recommendations for this paper.",ICLR2022, +ryKaoM8Ox,1486400000000.0,1486400000000.0,1,Hk8TGSKlg,Hk8TGSKlg,ICLR committee final decision,Accept (Poster),"This paper proposes a memory-enhanced RNN in the vein of NTM, and a novel training method for this architecture of cloze-style QA. The results seem convincing, and the training method is decently novel according to reviewers, although the evaluation seemed somewhat incomplete according to reviewers and my own reading. For instance, it is questionable whether or not the advertised human performance on CNN/DM is accurate (based on 100 samples from a 300k+ dataset), so I'm not sure this warrants not evaluating or reporting performance on it. Overall this looks like an acceptable paper, although there is room for improvement.",ICLR2017, +9imcaj1Tnn,1576800000000.0,1576800000000.0,1,Skep6TVYDB,Skep6TVYDB,Paper Decision,Accept (Spotlight),The paper considers an interesting algorithm on zeorth-order optimization and contains strong theory. All the reviewers agree to accept.,ICLR2020, +x_SYovaK2DM,1610040000000.0,1610470000000.0,1,XPZIaotutsD,XPZIaotutsD,Final Decision,Accept (Poster),"All reviewers gave, though not very strong, positive scores for this work. Although the technical contribution of the paper is somewhat incremental, the reviewers agree that it solidly addresses the known important issues in BERT, and the experiments are extensive enough to demonstrate the empirical effectiveness of the method. The main concerns raised by the reviewers are regarding the novelty and the discussion with respect to related work as well as some unclear writings in the detail, but I think the pros outweigh the cons and thus would like to recommend acceptance of the paper. + +We do encourage authors to properly take in the reviewers' comments to further polish the paper in the final version. + +",ICLR2021, +qB2-Mhcl4Rk,1642700000000.0,1642700000000.0,1,iFf26yMjRdN,iFf26yMjRdN,Paper Decision,Reject,"Dear authors, + +I have read the reviewers and your careful rebuttals. I would have liked to see much more engagement from the reviewers. However, even after your rebuttal, no reviewer suggested acceptance, with two reviewers proposing reject (3) and two proposing weak reject (5). + +The reviewers found the paper well written. I concur. The reviewers also notice that the contributions are very marginal compared to prior literature. Personalized FL formulation studied here was in a simpler form first proposed by Hanzely and Richtarik (Federated learning of a mixture of global and local models, 2020) and later generalized by Hanzely et al - a paper the authors cite. That work performed an in-depth analysis, also including the nonconvex case, which the authors (claim) did not notice. Compared to that work, the authors perform an analysis in the partial participation regime. However, partial participation is by now a standard technique which can usually be combined with other techniques without much difficulty. The authors tried to argue that their analysis approach is unique, but the reviewers remained unconvinced. + +In summary, I think this is a solid piece of work which is perhaps judged, looking at the raw scores, a bit too harshly. However, most verbal comments are indeed fair. I am also of the opinion that the paper in its current form does not reach the necessary bar for acceptance. I would encourage the authors to carefully revise the manuscript, taking into account all feedback that they find useful. I think the paper can be improved, with not too much effort perhaps, to a state in which the bar could be reached. + +Kind regards, + +Area chair",ICLR2022, +P_-HXdVY_1b,1610040000000.0,1610470000000.0,1,Hrtbm8u0RXu,Hrtbm8u0RXu,Final Decision,Reject,"The AC, the reviewers, and the authors had many discussions about the results in the paper during the discussion period. Below is a brief summary. + +1. The paper shows that with $O(N^{⅔})$ parameters, a feedforward neural network can memorize $N$ inputs with arbitrary labels if the inputs satisfy some mild assumptions. + +2. AC brought up in the discussion phase two central questions (one of which has been raised by R3 as well) + +a. The results rely on using the infinite precisions of real values in the neural networks. The results wouldn’t hold if the precision of the neural nets is finite. The subtlety about the infinite precision was not prominently discussed in the paper. + +b. It’s unclear to the AC what’s the practical implication of the results to generalization or optimization. In particular, it’s unclear to the AC what a finite-sample memorization result within infinite precisions would entail. The AC thinks there is a fundamentally big difference between expressivity and finite-sample expressivity. Expressivity is a very important topic to study, whereas the motivation for studying finite sample expressivity with infinite precision is unclear. (This is raised by R3 in the reviews as well.) + +3. R1 supports the paper with the following main points (The AC rephrased these with some approximations, and might misinterpret to a certain degree.) + +a. The paper’s result is surprising and mathematically non-trivial. + +b. Memorization is an important question to study. Many prior works study it, e.g., for showing tight VC dimension bound. It can be considered as an established setting. + +c. Relying on the infinite dimension is not uncommon in ML theory. + +4. R2 does not object R1’s point 3a, but seems to have a reservation to strongly recognize the technical significance of the results because it seems potentially likely to obtain the results by combining existing methods. Both R2 and the AC had some (partial) arguments to obtain the results of the paper with non-standard architecture or non-standard activations (which doesn’t subsume the paper’s results because the paper uses standard activations and feedforward net). This does make the AC unwilling to strongly recognize the technical significance of the result, but the AC doesn’t think the results are trivial either. In any case, this issue is not among the main concerns of the AC. + +5. Regarding 3b, the AC thinks that unlike the prior work, the memorization results in this paper do not have an implication to the VC dimension (and in return, the dependency on $N$ is better), and this makes the significance and impact of the result in this paper somewhat unclear. + +6. In summary, because the paper’s average score is somewhat borderline and because the AC has the concern 2a and 2b and was not quite convinced by the R1’s points or the authors’ responses, the AC is recommending rejection for the paper. The AC personally thinks the paper’s result has a strong potential and with additional clarification for the subtlety in 2a and additional results on the connections to generalization or optimization, the paper can be a strong one for future ML venues. +",ICLR2021, +Wgc2c03yMcr,1642700000000.0,1642700000000.0,1,ArY-zkyHI_l,ArY-zkyHI_l,Paper Decision,Reject,"This paper proposes a new ensemble training method for improving adversarial robustness to multiple attacks (e.g., $\ell_2$, $\ell_1$ and $\ell_\infty$). Specifically, authors adopt the recent Multi-Input Multi-Output (MIMO) ensemble architecture for computational efficiency. Then, the authors construct the adversarial examples using the outputs of multiple attacks simultaneously. With these examples, standard adversarial training is conducted on MIMO ensemble. + +All reviewers are on the negative side. AC agrees with reviewers’ concerns on limited novelty and insufficient empirical evaluation. AC also thinks that the improvement is not that significant compared to the existing method, especially concerning the real-world dataset. Overall, AC recommends rejection.",ICLR2022, +6VLKHFWHAMR,1642700000000.0,1642700000000.0,1,KLh86DknDj7,KLh86DknDj7,Paper Decision,Reject,"The reviewers are rather critical about the paper and the authors did not take a part in the discussion phase. Let me also add that the paper ignores a vast number of papers dealing with a similar problem. The column generation algorithm is a core of LPBooting also used for rule learning (""Rule Learning with Monotonicity Constraints"", ""The Linear Programming Set Covering Machine""). There are many other papers also using linear relaxation of integer programming to build rule models. Logical analysis of Data is also a well-known method being close to such approaches. There is also a plenty other rule learning systems that should be compared in the experimental study such as Ripper, Slipper, MLRules, or Ender (to mention only a few of them).",ICLR2022, +0QZH9Xb2Zj,1610040000000.0,1610470000000.0,1,GPuvhWrEdUn,GPuvhWrEdUn,Final Decision,Reject,"In this paper, the authors change the loss function of NNs to reduce the separability of the different classes in one of the hidden layers. The rationale for this assumption that the trained network will be more robust against white-box model inversion attack. The reviewers all concur that the paper had some merit, but that the paper is not well presented and believe the paper is not ready to be presented at ICLR. + +Also, the separability issue is not totally explained, because a reduced L2 norm might not be the whole story that explains why a white-box model inversion would rely on for leaking information. This might need to be proof further and a couple of experiments in which there is still leakage of information shows the additional robustness from the new penalty. +",ICLR2021, +3oNYtwtzz,1576800000000.0,1576800000000.0,1,Bke89JBtvB,Bke89JBtvB,Paper Decision,Accept (Poster),"The paper describes a method to train a convolutional network with large capacity, where channel-gating (input conditioned) is implemented - thus, only parts of the network are used at inference time. The paper builds over previous work, with the main contribution being a ""batch-shaping"" technique that regularizes the channel gating to follow a beta distribution, combined with L0 regularization. The paper shows that ResNet trained with this technique can achieve higher accuracy with lower theoretical MACs. Weakness of the paper is that more engineering would be required to convert the theoretical MACs into actual running time - which would further validate the practicality of the approach. +",ICLR2020, +ZDA3DWpwB4d,1610040000000.0,1610470000000.0,1,9hgEG-k57Zj,9hgEG-k57Zj,Final Decision,Reject,"This paper addresses an distribution shift and biased Q-values that happens when offline agents are finetuned in an online manner. The final revision of the paper is very well written and easy to understand. The proposed method in the paper is interesting, and aiming to address an important issue in RL. The proposed method involves a combination of two well-known methods in RL to tackle the distribution shift issue, the paper first suggests to use a balanced replay mechanism a replay for online experiences and another one for the offline. The second improvement is coming from the ensemble distillation. + +It seems like in the light of the reviews, the authors have improved manuscript. However, I would like to recommend the paper for rejection. I would like the authors to do further experiments on the individual components of the algorithms, for example what if we run all the experiments only with BR or only using ED how would the performance change. How much improvement is coming from each one of those individual components? As it stands, it is not clear to me right now, and the proposed solution looks a bit complicated and hacky. + +The balanced replay mechanism is very similar to the replay approaches that are used for learning from demos methods like R2D3 [1] and DQfD. Also the ensemble distillation approach is very akin to RAND [2] and distillation approaches that are used in lifelong learning algorithms. It is not clear, why it is that important for offline RL. It should potentially improve online RL as well, perhaps some experiments on online RL would be interesting. + +Nevertheless, I think the paper is very interesting and attempting to address a very important problem in RL. I would recommend the authors to resubmit the paper to a different venue after doing some small changes on it. + +[1] Gulcehre, C., Le Paine, T., Shahriari, B., Denil, M., Hoffman, M., Soyer, H., ... & Barth-Maron, G. (2019, September). Making Efficient Use of Demonstrations to Solve Hard Exploration Problems. In International Conference on Learning Representations. + +[2] Burda, Y., Edwards, H., Storkey, A., & Klimov, O. (2018). Exploration by random network distillation. arXiv preprint arXiv:1810.12894. + + + +",ICLR2021, +Rsme9Mbtacf,1642700000000.0,1642700000000.0,1,EG5Pgd7-MY,EG5Pgd7-MY,Paper Decision,Reject,"This work provides a formal framework for discussing membership inference attacks (MIA). It then examines existing attacks and proposes some new ones. The attacks are evaluated on several datasets. The framework mostly formalizes the types of information and error types that an attack may use and is presented as the main contribution of this work. However the presented formalizations do not appear to contribute significantly beyond the existing work on MIAs. The new attacks may be of interest and, according to the presented experiments, (mildly) improve on some of the existing MIAs. At the same time, as presented, the discussion of the the benefits of the new attacks is relatively short and reviewers did not find the results to be sufficiently convincing. Therefore I cannot recommend acceptance for this work in its current form.",ICLR2022, +r1pciMUdl,1486400000000.0,1486400000000.0,1,HkNKFiGex,HkNKFiGex,ICLR committee final decision,Accept (Poster),"Here is a summary of strengths and weaknesses as per the reviews: + + Strengths + Work/application is exciting (R3) + Enough detail for reproducibility (R3) + May provide a useful analysis tool for generative models (R1) + + Weaknesses + Clarity of the IAN model - presentation is scattered and could benefit from empirical analysis to tease out which parts are most important (R3,R2,R1); AC comments that the paper was revised in this regard and R3 was satisfied, updating their score + Lack of focus: is it the visualization/photo manipulation technique, or is it the generative model? (R3, R2) + Writing could use improvement (R2) + Mathematical formulation of IAN not precise (R2) + + The authors provided a considerable overhaul of the paper, re-organizing/re-focusing it in response to the reviews and adding additional experiments. + + This, in turn, resulted in R1, R2 and R3 upgrading their scores. The paper is more clearly in accept territory, in the ACÕs opinion. The AC recommends acceptance as a poster.",ICLR2017, +FKBUg23Q-4,1610040000000.0,1610470000000.0,1,iox4AjpZ15,iox4AjpZ15,Final Decision,Reject," +While reviewers find the ideas in the paper interesting, they also raise several major concerns. +In particular, R1 and R4 find the claims of ""invertible"" and ""lossless"" to be potentially misleading. +The bijective property is achieve on the first stage (L-1 layers) due to a sequence of one-to-one mappings, as is done in previous work (e.g. i-RevNet) so the novelty is limited. As stated by R3, since the paper is a combination of previous methods, the writing should be substantially improved to clarify what the real, new contributions are. The interpretation of the results (e.g. Figure 4) should also be better explained.",ICLR2021, +Nqg-yyai43,1576800000000.0,1576800000000.0,1,BkgzqRVFDr,BkgzqRVFDr,Paper Decision,Reject,"This was a borderline paper, with both pros and cons. In the end, it was not considered sufficiently mature to accept in its current form. The reviewers all criticized the assumptions needed, and lamented the lack of clarity around the distinction between reinforcement learning and planning. The paper requires a clearer contribution, based on a stronger justification of the approach and weakening of the assumptions. The submitted comments should be able to help the authors strengthen this work.",ICLR2020, +dY-DZPZv_s,1610040000000.0,1610470000000.0,1,Tq_H_EDK-wa,Tq_H_EDK-wa,Final Decision,Reject,"This paper studies a timely problem and consider an interesting approach, but overall there were many concerns about technical details and the validity of the framework. The positive reviewer also mentioned concerns about the experiments, which others also found to be an insular comparison with weak baselines. Following the response period, in discussion there are additional concerns arising related to the lack of details, for instance related to possible unidentifiability of the model. As one reviewer discusses, the authors are attempting to use RNNs to impute missing infection status labels when the missingness mechanism is assumed to be (i) not at random, (ii) playing out over time (as it is unclear whether Y^t is assumed (conditionally) independent of Y^t' with t' << t), and (iii) subject to interference (whether someone is tested is the 'treatment' here since it's a missingess problem and one person's propensity to be tested could causally affect another person's downstream infection status since apparently no Markov independence is assumed. There is also consensus that the writing quality can be greatly improved. Overall this work contains some ideas with potential in a thorough revision ",ICLR2021, +xjw8HecmS8j,1642700000000.0,1642700000000.0,1,Zq2G_VTV53T,Zq2G_VTV53T,Paper Decision,Accept (Poster),The reviewers agree that the paper introduces an interesting approach for estimating Shaley values in real run-time. The effectiveness of the method is well demonstrated across different tasks/datasets.,ICLR2022, +pqSkNdc3e282,1642700000000.0,1642700000000.0,1,9kpuB2bgnim,9kpuB2bgnim,Paper Decision,Accept (Poster),This paper proposes an adaptive sparse Huber additive model for for forecasting non-stationary time series. The prior work has considered similar models for Gaussian innovations which is overly restrictive for a variety of applications such as finance. The results are supported both by theory and experiments. The results are novel and are of interest to ICLR and machine learning communities in general.,ICLR2022, +t04_C3A726X,1610040000000.0,1610470000000.0,1,bFnn6lPn3Sp,bFnn6lPn3Sp,Final Decision,Reject,The reviewers pointed out several opportunities for improvements and concurred that the paper needs significant work before it is ready for publication. The authors did not provide a rebuttal. We hope the review process was useful to the authors. ,ICLR2021, +5bgvZ4BKJ,1576800000000.0,1576800000000.0,1,Bye3P1BYwr,Bye3P1BYwr,Paper Decision,Reject,"The authors propose an approach for anomaly detection in the setting where the training data includes both normal and anomalous data. Their approach is a fairly straightforward extension of existing ideas, in which they iterate between clustering the data into normal vs. anomalous and learning an autoencoder representation of normal data that is then used to score normality of new data. The results are promising, but the experiments are fairly limited. The authors argue that their experimental settings follow those of prior work, but I think that for such an incremental contribution, more empirical work should be done, regardless of the limitations of particular prior work.",ICLR2020, +SklDy910y4,1544580000000.0,1545350000000.0,1,Hyg74h05tX,Hyg74h05tX,Lovely main idea,Reject,"Strengths: +-------------- +This paper was clearly written, contained novel technical insights, and had SOTA results. In particular, the explanation of the generalized dequantization trick was enlightening and I expect will be useful in this entire family of methods. The paper also contained ablation experiments. + +Weaknesses: +------------------ +The paper went for a grab-bag approach, when it might have been better to focus on one contribution and explore it in more detail (e.g. show that the learned pdf is smoother when using variational quantization, or showing the different in ELBO when using uniform q as suggested by R2). + +Also, the main text contains many references to experiments that hadn't converged at submission time, but the submission wasn't updated during the initial discussion period. Why not? + +Points of contention: +----------------------------- +Everyone agrees that the contributions are novel and useful. The only question is whether the exposition is detailed enough to reproduce the new methods (the authors say they will provide code), and whether the experiments, which meet basic standards, of a high enough standard for publication, because there was little investigation into the causes of the difference in performance between models. + +Consensus: +---------------- +The consensus was that this paper was slightly below the bar.",ICLR2019,2: The area chair is not sure +SkxjFRjWgV,1544830000000.0,1545350000000.0,1,HJl2Ns0qKX,HJl2Ns0qKX,Interesting idea but insufficient evaluation of the method to establish benefits over existing methods,Reject,The idea of the paper -- imposing a GAN type loss on the latent interpolations of an autoencoder -- is interesting. However there are strong concerns from R2 and R3 about limited experimental evaluation of the proposed method which falls short of demonstrating its advantages over latent spaces learned by existing GANs. Another point of concern was the use of only one real dataset (CelebA). Authors made substantial revisions to the paper in addressing many of the reviewers' points but these core concerns still persist with the current draft and it's not ready for publication at ICLR. Authors are encouraged to address these concerns and resubmit to another venue. ,ICLR2019,5: The area chair is absolutely certain +dXOs7kkw2,1576800000000.0,1576800000000.0,1,SkxlElBYDS,SkxlElBYDS,Paper Decision,Reject,"There is no author response for this paper. The paper addresses the issue of catastrophic forgetting in continual learning. The authors build upon the idea from [Zheng,2019], namely finding gradient updates in the space perpendicular to the input vectors of the previous tasks resulting in less forgetting, and propose an improvement, namely to use principal component analysis to enable learning new tasks without restricting their solution space as in [Zheng,2019]. +While the reviewers acknowledge the importance to study continual learning, they raised several concerns that were viewed by the AC as critical issues: (1) convincing experimental evaluation -- an analysis that clearly shows how and when the proposed method can solve the issue that [Zheng,2019] faces with (task similarity/dissimilarity scenario) would substantially strengthen the evaluation and would allow to assess the scope and contributions of this work; also see R3’s detailed concerns and questions on empirical evaluation, R2’s suggestion to follow the standard protocols, and R1’s suggestion to use PackNet and HAT as baselines for comparison; (2) lack of presentation clarity -- see R2’s concerns how to improve, and R1’s suggestions on how to better position the paper. +A general consensus among reviewers and AC suggests, in its current state the manuscript is not ready for a publication. It needs clarifications, more empirical studies and polish to achieve the desired goal. +",ICLR2020, +vGutMyhG4O,1576800000000.0,1576800000000.0,1,H1l0O6EYDH,H1l0O6EYDH,Paper Decision,Reject,"This paper presents an approach to utilize conventional frequency domain basis such as DWHT and DCT to replace the standard point-wise convolution, which can significantly reduce the computational complexity. The paper is generally well-written and easy to follow. However, the technical novelty seems limited as it is basically a simple combination of CNNs and traditional filters. Moreover, as reviewers suggested, it is our history and current consensus in the community that learned representations have significantly outperformed traditional pre-defined features or filters as the training data expands. I do understand the scientific value of revisiting and challenging that belief as commented by R1, but in order to provoke meaningful discussion, experiments on large-scale dataset like ImageNet are definitely necessary. For these reasons, I think the paper is not ready for publication at ICLR and would like to recommend rejection.",ICLR2020, +WnqIRMPcn,1576800000000.0,1576800000000.0,1,r1eCukHYDH,r1eCukHYDH,Paper Decision,Reject,"This work proposes a GAN architecture that aims to align the latent representations of the generator with different interpretable degrees of freedom of the underlying data (e.g., size, pose). + +Reviewers found this paper well-motivated and the proposed method to be technically sound. However, they cast some doubts about the novelty of the approach, specifically with respect to DMWGAN and MADGAN. The AC shares these concerns and concludes that this paper will greatly benefit from an additional reviewing cycle that addresses the remaining concerns. +",ICLR2020, +mjlR7RXOcYL,1610040000000.0,1610470000000.0,1,TuK6agbdt27,TuK6agbdt27,Final Decision,Accept (Poster),"This paper proposed a way to combine LSTMs with Fast weights for associative inference. + +While reviewers had concerns about comparison with Ba et al., and experimental results, the authors addressed all the concerns and convinced the reviewers. The revision strengthened the paper significantly. I recommend an accept.",ICLR2021, +XI0nnh1rDpr,1610040000000.0,1610470000000.0,1,ptbb7olhGHd,ptbb7olhGHd,Final Decision,Reject,"One reviewer is positive, but that review is not of high quality. The other reviewers agree that this paper is interesting, but has too many limitations to be accepted by a highly competitive venue such as ICLR.",ICLR2021, +lYMnW89R3pA,1642700000000.0,1642700000000.0,1,FndDxSz3LxQ,FndDxSz3LxQ,Paper Decision,Accept (Poster),"Dear Authors, + +The paper was received nicely and discussed during the rebuttal period. The current discussions mostly lie on the acceptance side. + +Some prons of the paper include: + +- Timely topic: This paper deals with the problem of distributed training of GNNs. +- New algorithm: this method captures the idea of transmitting only local averages but adds a centralized step on the server to account for global structural information lost in the subgraph partition. +- Theory: The authors further provide theoretical convergence guarantees. +- Clarity: The paper is fairly well written and the proposed result is simple and powerful. + +The current consensus is that the paper deserves publication. + +Best AC",ICLR2022, +02YqasPKDg,1576800000000.0,1576800000000.0,1,Bklg1grtDr,Bklg1grtDr,Paper Decision,Reject,"This paper demonstrates a framework for optimizing designs in auction/contest problems. The approach relies on considering a multi-agent learning process and then simulating it. + +To a large degree there is agreement among reviewers that this approach is sensible and sound, however lacks substantial novelty. The authors provided a rebuttal which clarified the aspects that they consider novel, however the reviewers remained mostly unconvinced. Furthermore, it would help if the improvement over past approaches is demonstrated in a more convincing way, for example with increased scope experiments that also involve richer analysis. +",ICLR2020, +nDzIMR8yGs,1576800000000.0,1576800000000.0,1,S1x63TEYvr,S1x63TEYvr,Paper Decision,Reject,"This paper proposes a novel approach, Latent Question Reformulation Network (LQR-net), a multi-hop and parallel attentive network designed for question-answering tasks that require multi-hop reasoning capabilities. Experimental results on the HotPotQA dataset achieve competitive results and outperform the top system in terms of exact match and F1 scores. However, reviewers note the limited setting of the experiments on the unrealistic, closed-domain setting of this dataset and suggested experimenting with other data (such as complex WebQuesitons). Reviewers were also concerned about the scalability of the system due to the significant amount of computations. They also noted several previous studies were not included in the paper. Authors acknowledged and made changes according to these suggestions. They also included experiments only on the open-domain subset of the HotPotQA in their rebuttal, unfortunately the results are not as good as before. Hence, I suggest rejecting this paper.",ICLR2020, +5ic6a2vjtv,1576800000000.0,1576800000000.0,1,Hkg0olStDr,Hkg0olStDr,Paper Decision,Reject,"This paper proposes a solution to the decentralized privacy preserving domain adaptation problem. In other words, how to adapt to a target domain without explicit data access to other existing domains. In this scenario the authors propose MDDA which consists of both a collaborator selection algorithm based on minimal Wasserstein distance as well as a technique for adapting through sharing discriminator gradients across domains. + +The reviewers has split scores for this work with two recommending weak accept and two recommending weak reject. However, both reviewers who recommended weak accept explicitly mentioned that their recommendation was borderline (an option not available for ICLR 2020). The main issues raised by the reviewers was lack of algorithmic novelty and lack of comparison to prior privacy preserving work. The authors agreed that their goal was not to introduce a new domain adaptation algorithm, but rather to propose a generic solution to extend existing algorithms to the case of privacy preserving and decentralized DA. The authors also provided extensive revisions in response to the reviewers comments. Though the reviewers were convinced on some points (like privacy preserving arguments), there still remained key outstanding issues that were significant enough to cause the reviewers not to update their recommendations. + +Therefore, this paper is not recommended for acceptance in its current form. We encourage the authors to build off the revisions completed during the rebuttal phase and any outstanding comments from the reviewers. ",ICLR2020, +rJxx4F2ZgN,1544830000000.0,1545350000000.0,1,r1l-e3Cqtm,r1l-e3Cqtm,metareview,Reject,"The proposed method is compressing video sequences with an end-to-end approach, by extending a variational approach from images to videos. The problem setting is interesting and somewhat novel. The main limitation, as exposed by the reviewers, is that evaluation was done on very limited and small domains. It is not at all clear that this method scales well to non-toy domains or that it is possible in fact to get good results with an extension of this method beyond small-scale content. There were some concerns about unfair comparisons to classical codes that were optimized for longer sequences (and I share those concerns, though they are somewhat alleviated in the rebuttal). + +While the paper presents an interesting line of work, the reviewers did present a number of issues that make it hard to recommend it for acceptance. However, as R1 points out, most of the problems are fixable and I would advise the authors to take the suggested improvements (especially anything related to modeling longer sequences) and once they are incorporated this will be a much stronger submission.",ICLR2019,4: The area chair is confident but not absolutely certain +yNvVbsNy0f,1576800000000.0,1576800000000.0,1,HkxZVlHYvH,HkxZVlHYvH,Paper Decision,Reject,"This paper presents a way of adapting an HMC-based posterior inference algorithm. It's based on two approximations: replacing the entropy of the final state with the entropy of the initial state, and differentiating through the MH acceptance step. Experiments show it is able to sample from some toy distributions and achieves slightly higher log-likelihood on binarized MNIST than competing approaches. + +The paper is well-written, and the experiments seem pretty reasonable. + +I don't find the motivations for the aforementioned approximations very convincing. It's claimed that encouraging entropy of P_0 has a similar effect to encouraging entropy of P_T, but it seems easy to come up with situations where the algorithm could ""cheat"" by finding a high-entropy P_0 which leads straight downhill to an atypically high-density region. Similarly, there was some reviewer discussion about whether it's OK to differentiate through the indicator function; while we differentiate through nondifferentiable functions all the time, it makes no sense to differentiate through a discontinuous function. (This is a big part of why adaptive HMC is hard.) + +This paper has some promising ideas, but overall the reviewers and I don't think this is quite ready. +",ICLR2020, +HkPAhz8ue,1486400000000.0,1486400000000.0,1,BJYwwY9ll,BJYwwY9ll,ICLR committee final decision,Accept (Poster),"Significant problem, interesting and simple solution, broad evaluation, authors highly responsive in incorporating feedback, all reviewers recommend acceptance. I agree.",ICLR2017, +KT5HGOFWpL,1642700000000.0,1642700000000.0,1,gCmCiclZV6Q,gCmCiclZV6Q,Paper Decision,Reject,"The authors set out on an important question of whether abstract and culturally specific concepts like offensiveness can be detected in images. The novelty of this work comes in part from tackling this question and attempting to create a technology which can operationalize it. However, despite the authors' insistence that offensiveness is not ""just another label"", in practice the work treats it very much that way and therefore does not present a compelling innovation either in modeling or in juxtaposition to other labeling tasks. Known training and inspection techniques are used on existing representations and more powerful models with more training data generalize better. It is unclear what is novel in the approach or unique to offensiveness over other labels (including abstract ones).",ICLR2022, +2RgTqCB4qq,1576800000000.0,1576800000000.0,1,ByxtHCVKwB,ByxtHCVKwB,Paper Decision,Reject,"This paper contributes to the recently emerging literature about applying reinforcement learning methods to combinatorial optimization problems. +The authors consider TSPs and propose a search method that interleaves greedy local search with Monte Carlo Tree Search (MCTS). +This approach does not contain learned function approximation for transferring knowledge across problem instances, which is usually considered the main motivation for applying RL to comb opt problems. + +The reviewers state that, although the approach is a relatively straight-forward combination of two existing methods, it is in principle somewhat interesting. +However, the experiments indicate a large gap to SOTA solvers for TSPs. +No rebuttal was submitted. + +In absence of both SOTA results and methodological novelty, as assessed by the reviewers and my owm reading, I recommend to reject the paper in its current form.",ICLR2020, +yrNl6qJAoA,1642700000000.0,1642700000000.0,1,E-dq2kN8lt,E-dq2kN8lt,Paper Decision,Reject,"This paper proposes a new federated learning method which uses the recently developed PAGE gradient estimator in the local updates, and provides convergence analysis for both convex and nonconvex loss functions. There are several technical questions raised by the reviewers that are not addressed by the author rebuttal. Given such technical issues and limited novelty and empirical evidence, I cannot recommend acceptance.",ICLR2022, +mUBpUzTcGf,1576800000000.0,1576800000000.0,1,HJxWl0NKPB,HJxWl0NKPB,Paper Decision,Reject,"This paper extends state of the art semi-supervised learning techniques (i.e., MixMatch) to collect new data adaptively and studies the benefit of getting new labels versus adding more unlabeled data. Active learning is incorporated in a natural and simple (albeit, unsurprising) way and the experiments are convincing that this approach has merit. + +While the approach works, reviewers were concerned about the novelty of the combination given that its somewhat obvious and straightforward to accomplish. Reviewers were also concerned that the space of both semi-supervised learning algorithms and active learning algorithms was not sufficiently exhaustively studied. As one reviewer points out: neither of these ideas are new or particular to deep learning. + +Due to lack of novelty, this paper is not suited for a top tier conference. ",ICLR2020, +HylDrWF1xV,1544680000000.0,1545350000000.0,1,BJlMcjC5K7,BJlMcjC5K7,authors must make it as easy as possible for readers to understand the contribution,Reject,"There is a clear reviewer consensus to reject this paper so I am also recommending rejecting it. The paper is about an interesting and underused technique. However, ultimately the issue here is that the paper does not do a good enough job of explaining the contribution. I hope the reviews have given the authors some ideas on how to frame and sell this work better in the future. + +For instance, from my own reading of the abstract, I do not understand what this paper is trying to do and why it is valuable. Phrases such as ""we exploit the sparsity"" do not tell me why the paper is important to read or what it accomplishes, only how it accomplishes the seemingly elided contribution. I am forced to make assumptions that might not be correct about the goals and motivation. It is certainly true that the implicit one-hot representation of words most common in neural language models is not the only possibility and that random sparse vectors for words will also work reasonably well. I have even tried techniques like this myself, personally, in language modeling experiments and I believe others have as well, although I do not have a nice reference close to hand (some of the various Mikolov models use random hashing of n-grams and I believe related ideas are common in the maxent LM literature and elsewhere). So when the abstract says things like ""We show that guaranteeing approximately equidistant vector representations for unique discrete inputs is enough to provide the neural network model with enough information to learn"" my immediate reaction is to ask why this would be surprising or why it would matter. Based on the reviews, I believe these sorts of issues affect other parts of the manuscript as well. There needs to be a sharper argument that either presents a problem and its solution or presents a scientific question and its answer. In the first case, the problem should be well motivated and in the second case the question should not yet have been adequately answered by previous work and should be non-obvious. I should not have to read beyond the abstract to understand the accomplishments of this work. + +Moving to the conclusion and future work section, I can see the appeal of the future work in the second paragraph, but this work has not been done. The first paragraph is about how it is possible to use random projections to represent words, which is not something I think most researchers would question. Missing is a clear demonstration of the potential advantages of doing so. +",ICLR2019,5: The area chair is absolutely certain +njL3h_TnEHY,1610040000000.0,1610470000000.0,1,pHgB1ASMgMW,pHgB1ASMgMW,Final Decision,Reject,"The reviews were a bit mixed, with some concerns on the novelty and experimental evaluation. While the authors' efforts during rebuttable were appreciated, the overall sentiment is that this work, in its current form, cannot be accepted to ICLR yet. Please consider revising your work based on the excellent reviews. Some more comments from the AC's independent assessment: + +(a) Further elaboration on the novelty is needed. Currently the main message appears to be that if we combine two existing approaches (AT and EntM or LS) then we get better results. This is perhaps not too surprising and more elaboration on the significance would be appreciated. + +(b) More comparisons in the experiments, including the SOTA performances and alternative defenses (some below). + +(c) The analysis in Section 6 adds more confusion than clarification. It is clear that EntM and LS would largely decrease M_f, but why would they also decrease the Lipschitz constant even more sharply? If this explanation is useful, why not directly regularize the Lipschitz constant and maximize the margin M_f? There is in fact a large body of work on this, see for example: + +1. Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation + +2. Parseval Networks: Improving Robustness to Adversarial Examples + +3. L2-Nonexpansive Neural Networks + +4. and the many references since. + +",ICLR2021, +y8aDfpoSc_,1576800000000.0,1576800000000.0,1,rJgDb1SFwB,rJgDb1SFwB,Paper Decision,Reject,"The problem of introducing interpretability into sepsis prediction frameworks is one that I find a very important contribution, and I personally like the ideas presented in this paper. However, there are two reviewers, who have experience at the boundary of ML and HC, who are flagging this paper as currently not focusing on the technical novelty, and explaining the HC application enough to be appreciated by the ICLR audience. As such my recommendation is to edit the exposition so that it more appropriate for a general ML audience, or to submit it to an ML for HC meeting. Great work, and I hope it finds the right audience/focus soon. ",ICLR2020, +lSINJIEfz7D,1610040000000.0,1610470000000.0,1,YUGG2tFuPM,YUGG2tFuPM,Final Decision,Accept (Poster),"The authors develop a novel strategy, Deep Partition Aggregation, to train models to be certifiably robust to data poisoning attacks based on flipping labels of a small subset of the training data or introducing poisoned input features. They improve upon existing certified defences against data poisoning and are the first to establish certified guarantees against general poisoning attacks. + +Most reviewers were in support of acceptance. Reviewer concerns were raised in the rebuttal phase but were convincingly addressed in the rebuttal phase. One reviewer did raise concerns on the weakness of experimental results on CIFAR-10, but the fact that this method has established the first certified defence in the general poisoning setting and that the results are stronger on other datasets certainly warrant acceptance. I would encourage the authors to clarify this in the final version. + +",ICLR2021, +6aOP6-FJt9D,1642700000000.0,1642700000000.0,1,L2a_bcarHcF,L2a_bcarHcF,Paper Decision,Reject,"The paper demonstrates that transformer architectures can be trained to compute solutions of linear algebra problems with high accuracy. This is an interesting direction and, as the the reviews and the discussion show it is a ""good data point and insightful"", as one reviewer puts it. I fully agree with this but also agree with one other reviewer in that this is ""yet another"" application of a known transformer architecture. The author should place the model into context and provide some perspective. Without, the motivation behind solving the specific set of linear algebra problems considered is a bit unclear. For instance, could a transformer now learn to solve corresponding ML problems? Moreover, the dimensions of the considered matrices are rather small, and the generalization to larger dimension appear to be tricky.",ICLR2022, +SkgFB16rM,1517250000000.0,1517260000000.0,606,ryHM_fbA-,ryHM_fbA-,ICLR 2018 Conference Acceptance Decision,Reject,"there are two separate ideas embedded in this submission; (1) language modelling (with the negative sampling objective by mikolov et al.) is a good objective to use for extracting document representation, and (2) CNN is a faster alternative to RNN's, both of which have been studied in similar contexts earlier (e.g., paragraph vectors, CNN classifiers and so on, most of which were pointed out by the reviewers already.) Unfortunately reading this manuscript does not reveal too clearly how these two ideas connect to each other (and are separate from each other) and are related to earlier approaches, which were again pointed out by the reviewers. in summary, i believe this manuscript requires more work to be accepted.",ICLR2018, +PhMx2p6_kBW,1642700000000.0,1642700000000.0,1,#NAME?,#NAME?,Paper Decision,Reject,"In this paper, the authors extend the performative prediction framework of Perdomo et al. (2020) to a multi-agent, game-theoretic setting, and they examine how and when multi-agent performative learning may lead to performative stability/optimality. + +The authors' results and contributions can be summarized as follows: +- They consider a multi-agent location-scale distribution map with parameters constrained in a simplex, and they study the dynamics of an exponentiated gradient descent algorithm (EGDA for short) inspired by Kivinen and Warmuth (1997). +- If the learning rate is small enough, the authors show that EGDA converges to a performatively stable point (under the same assumptions that guarantee existence of a convex potential). +- On the other hand, if the learning rate is large, the algorithm behaves chaotically. + +The reviewers' initial assessment was mixed, but after the authors' rebuttal, some concerns were partially addressed and the scores of the paper were upgraded to borderline positive. On a point-by-point basis, the authors' result on the convergence of EGDA with a small learning rate was appreciated by the reviewers, but it was not otherwise deemed significant enough relative to existing convergence results for gradient methods. Instead, most of the discussion centered on the authors' result on chaos (Theorem 4.6), which was viewed as the most significant contribution of the paper. However, continued discussion between committee members revealed that this result follows directly from Theorem 3.11 and Corollary 3.12 of the arxiv preprint ""A family of chaotic maps from game theory"" by T. Chotibut, F. Falniowski, M. Misiurewicz, and G. Piliouras , which is not discussed in the paper. [As was pointed out, the update map (7) of the paper coincides with the update rule (7) of the arxiv preprint, and the proof techniques are likewise identical.] + +This overlap with previous work was considered a ""big omission"" and it pushed the paper below the acceptance threshold. In the end, the paper was not supported by any of the reviewers, so a ""reject"" recommendation was made.",ICLR2022, +uUMV2qRsgw,1576800000000.0,1576800000000.0,1,SJgBra4YDS,SJgBra4YDS,Paper Decision,Reject,"The paper proposes a combination of a delay embedding as well as an autoencoder to perform representation learning. The proposed algorithm shows competitive performance with deep image prior, which is a convnet structure. The paper claims that the new approach is interpretable and provides explainable insight into image priors. + +The discussion period was used constructively, with the authors addressing reviewer comments, and the reviewers acknowledging this an updating their scores. + +Overall, the proposed architecture is good, but the structure and presentation of the paper is still not up to the standards of ICLR. The current presentation seems to over-claim interpretability, without sufficient theoretical or empirical evidence. +",ICLR2020, +r1ehP-7yxV,1544660000000.0,1545350000000.0,1,ByeTHsAqtX,ByeTHsAqtX,Summary review,Reject,"The paper is overally interesting and addresses an important problem, however reviewers ask for more rigorous empirical study and less restrictive settings.",ICLR2019,5: The area chair is absolutely certain +ryeg41TBM,1517250000000.0,1517260000000.0,273,HyZoi-WRb,HyZoi-WRb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The authors analyze the IWAE bound as an estimator of the marginal log-likelihood and show how to reduce its bias by using the jackknife. They then evaluate the effect of using the resulting estimator (JVI) for training and evaluating VAEs on MNIST. This is an interesting and well written paper. It could be improved by including a convincing explanation of the relatively poor performance of the JVI-trained, JVI-evaluated models.",ICLR2018, +M3BC_J96GJQ,1642700000000.0,1642700000000.0,1,zrW-LVXj2k1,zrW-LVXj2k1,Paper Decision,Accept (Poster),"This paper has been independently assessed by three expert reviewers. The results place it at the borderline of acceptance decision: while one of the reviewers gave it a straight accept evaluation, two others assessed it as marginally rejectable, even after discussion with the authors. All of the reviewers agreed that the theoretical results provided should help promote the use of MLE estimators over perhaps more prevalently used in current practice TMO, and that is the main contribution of this work. The reviewers were concerned with the clarity of the presentation and with a confusing notation used. Some of these issues have been addressed in the authors' responses. All things considered, I conclude that this work can be of some interest to the ICLR audience, and as such it can be assessed as marginally acceptable for this conference: ""accept if needed"". I will recommend it as such for consideration by the Senior Area Chair and the Program Committee.",ICLR2022, +RqhNWo1ALJO,1642700000000.0,1642700000000.0,1,JSsjw8YuG1P,JSsjw8YuG1P,Paper Decision,Reject,"The paper uses several types of information to predict one specific +lab test response for patients. The predictions are made by combining +and tailoring mainly existing techniques. + +The reviewers raised a number of concerns, and the authors clarified +many of them and provided additional results. In particular the +following issues were discussed: Comparing to state-of-the-art methods +and methods having the same information available, specifics of +empirical evaluations and of methodological novelty, choice of the +particular data sets, and justifiability of conclusions. + +The main remaining weakness is the limited novelty, which should not +be interpreted as the contributions of the paper being trivial. + +In contrast, the solid engineering work done by the authors in this +paper will be valuable in developing clinical decision support tools, +and the authors are encouraged to incorporate the new results and +feedback in future work and submissions.",ICLR2022, +KWM6uzvTUb,1576800000000.0,1576800000000.0,1,B1e3OlStPB,B1e3OlStPB,Paper Decision,Accept (Spotlight),"This paper proposes a novel methodology for applying convolutional networks to spherical data through a graph-based discretization. The reviewers all found the methodology sensible and the experiments convincing. A common concern of the reviewers was the amount of novelty in the approach, as in it involves the combination of established methods, but ultimately they found that the empirical performance compared to baselines outweighed this.",ICLR2020, +yPg59u0RdRf,1642700000000.0,1642700000000.0,1,5qwA7LLbgP0,5qwA7LLbgP0,Paper Decision,Reject,"This paper proposes a method of multi-agent reinforcement learning that separately deals with the risk associated with uncertainties of the other agents and the risk associated with the uncertainties of the environment. This allows for example to be agent-wise risk seeking and environment-wise risk averse. The proposed approach is largely heuristic with little theoretical justifications. The experimental results are promising but not sufficiently convincing, given the lack of formalism. Further improvement on clarity might complement the lack of formalism or theoretical justifications.",ICLR2022, +DBUcoBMeL5,1576800000000.0,1576800000000.0,1,HJeANgBYwr,HJeANgBYwr,Paper Decision,Reject,"This paper proposes a graph neural network based approach for scaling up imitation learning (e.g., of swarm behaviors). Reviewers noted key limitations in the discussion of related work, size of the proposed contribution in terms of model novelty, and evaluation / comparison to strong baselines. Reviewers appreciated the author replies which resolved some concerns but agree that the paper is overall not ready for publication.",ICLR2020, +Tex24iaipr1,1642700000000.0,1642700000000.0,1,cuGIoqAJf6p,cuGIoqAJf6p,Paper Decision,Reject,"This paper considers transferability measures both in the supervised and unsupervised domain. It identifies instabilities in the way that H-score is computed and proposes to correct the issue with a shrinkage-based covariance estimations. The proposed fix results in 80% absolute gain over the original H-score and makes it competitive with state-of-the-art LogME metric. The new shrinkage-based H-score is much faster to compute. + +Reviewers agree that the paper makes interesting and important contributions. In particular, the reviewers appreciate that the paper takes a deeper look at existing metrics and propose valuable fixes instead of proposing yet another new metric. The paper demonstrates depth of statistic knowledge and proposes shrinkage operators to estimate high dimensional covariance. + +There are a few shortcomings of the paper, however, that suggests that the paper can benefit of another round of improvement. In particular, the paper is very dense with little motivation. Some of the choices in the paper can be motivated better. For instance, the hypothesis of lack of robustness in estimating H-score is not demonstrated empirically. The reviewers also felt that the paper should extend experiments to other domains beyond image.",ICLR2022, +KFaHMBgQjm,1576800000000.0,1576800000000.0,1,HJxN0CNFPB,HJxN0CNFPB,Paper Decision,Reject,"This paper proposes a new type of Polynomial NN called Ladder Polynomial NN (LPNN) which is easy to train with general optimization algorithms and can be combined with techniques like batch normalization and dropout. Experiments show it works better than FMs with simple classification and regression tasks, but no experiments are done in more complex tasks. All reviewers agree the paper addresses an interesting question and makes some progress but the contribution is limited and there are still many ways to improve.",ICLR2020, +hIZHkAVf_t,1610040000000.0,1610470000000.0,1,lbc44k2jgnX,lbc44k2jgnX,Final Decision,Reject,"This paper proposes a new sampling method named Random Coordinate LMC (RC-LMC), which integrates the idea of randomized coordinate descent and Langenvine dynamic. The authors prove the total complexity of RC-LMC for log-concave probability distributions, which are better than that of LMC under different settings. The idea of this paper is very neat and the reviewers are in general positive about it. However, as pointed out by one of the reviewers and seconded by the other reviewers, the proof in the original submission is flawed, and the fix needs some substantial work. The new version needs to be carefully checked before publication, which is far beyond the review process of ICLR. Therefore, I encourage the authors to carefully revise the paper and submit it to the next conference. ",ICLR2021, +pEa4-UA8oN,1642700000000.0,1642700000000.0,1,0lSoIruExF,0lSoIruExF,Paper Decision,Reject,"This paper proposed to improve hybrid neighborhood-based recommender systems by incorporating learned user-item similarity. Overall, the scores are towards negative. The reviewers did acknowledge that the paper proposed a simple-to-implement method and reads well. However, the negatives are plenty: the lacking of a comprehensive literature review as well as more relevant state-of-the-art baselines in the experiments is a common concern among most reviewers. The novelty of the proposed approach is also rather limited as incorporating user-item similarity from user rating and item contents is a well-explore topic within the literature. Finally, using rating prediction as evaluation method ignores the missing-not-at-random nature of a recommender system. The authors didn't provide any response. Therefore, I vote for reject.",ICLR2022, +mLlPu6PjrRL,1610040000000.0,1610470000000.0,1,uFBBOJ7xnu,uFBBOJ7xnu,Final Decision,Reject,"It appears that this paper can benefit from additional detail and work before it becomes a stronger publication that is more convincing. The authors have done an impressive job responding to the reviewers and updating their paper, and multiple reviewers raised their score consequently. However, while multiple reviewers now recommend acceptance, there is no agreement on it. Even among the reviewers who recommended acceptance, there is a feeling on being on the fence specifically about the ability of the paper to make a convincing argument without considering a real life scenario and while only using toy settings. Indeed, this is a problematic aspect of the paper because the value of the paper lies in making that argument. Further, the paper would gain further from clarifying the writing further and connecting the paper more directly with the neuroscientific literature it aims to be connected to.",ICLR2021, +HyB171Trf,1517250000000.0,1517260000000.0,46,H1VGkIxRZ,H1VGkIxRZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The reviewers agree that the method is simple, the results are quite good, and the paper is well written. The issues the reviewers brought up have been adequately addressed. There is a slight concern about novelty, however the approach will likely be quite useful in practice.",ICLR2018, +srZ7H-3QtN,1576800000000.0,1576800000000.0,1,BkgNqkHFPr,BkgNqkHFPr,Paper Decision,Reject,"This paper was assessed by three reviewers who scored it as 6/3/6. +The reviewers liked some aspects of this paper e.g., a good performance, but they also criticized some aspects of work such as inventing new names for existing pooling operators, observation that large parts of improvements come from the pre-processing step rather than the proposed method, suspected overfitting. Taking into account all positives and negatives, AC feels that while the proposed idea has some positives, it also falls short of the quality required by ICLR2020, thus it cannot be accepted at this time. AC strongly encourages authors to go through all comments (especially these negative ones), address them and resubmit an improved version to another venue. + +",ICLR2020, +6_oiua7zQI,1610040000000.0,1610470000000.0,1,#NAME?,#NAME?,Final Decision,Reject,"The reviewers have some following concerns: + +1) There is lack of experimental result. The experiment on MNIST with small CNN architecture is definitely not sufficient to verify the efficiency of the proposed method. Moreover, the advantage of the proposed method is not very clear due to the choices of the parameters. The choice of the learning rates is quite sensitive. + +2) It is not clear why the authors could argue that $ \mathbb{E}(V_T) = \mathcal{O}(T)$ without any theoretical and empirical support. Even if this is correct, this term could dominate the first term unless $ \mathbb{E}(V_T) \leq \mathcal{O}(\sqrt{T})$, which is too strong. If assuming $\mathbb{E}(V_T) = \mathcal{O}(T)$, the convergence results are upper bounded by some constant (note that $\epsilon$ is a constant in this scenario, not arbitrarily small). Hence, the authors failed to show the convergence to a stationary point. + +There are some suggestions to improve the paper as follows: + +1) Show $\mathbb{E}(V_T) = \mathcal{O}(T)$ and revise the theory properly to make it rigorously by showing upper bounded by some function $R(T) \to 0, T \to \infty$ rather than showing the convergence to some fixed neighborhood. (Note that $\frac{C_4}{\sqrt{N}}$ is a fixed constant). + +2) Do more experiments on various datasets and network architectures to verify the efficiency of the proposed method and show the clear advantages compared to others. + +3) Provide convergence rate comparisons with other decentralized algorithms (e.g., as a table). It would be nice if the authors also provide the assumptions and the dependent constants so that the readers could really see the differences. + +4) Explicitly derive the convergence measure based on the standard one, that is, $\frac{1}{T} \sum_{t=1}^{T} \mathbb{E} [ \| \nabla f ( X_t ) \|^2 ] $ and add the dependency of $G_{\infty}^2$ to the bound. + +5) Revise the paper and implement all necessary comments from the reviewers consistently with the content. +",ICLR2021, +SJgJU6TbxN,1544830000000.0,1545350000000.0,1,H1lS8oA5YQ,H1lS8oA5YQ,metareview,Reject,"All in all, while the reviewers found that the problem at hand is interesting to study, the submission's contributions in terms of significance/novelty did not rise to the standards for acceptance. The reasoning is most succinctly discussed by R3 who argues that IFS and EFS are basically feature selection and applying them to feature attribution is not particularly novel from a methodological point of view. ",ICLR2019,4: The area chair is confident but not absolutely certain +rke9HKDNlE,1545010000000.0,1545350000000.0,1,rygo9iR9F7,rygo9iR9F7,incremental work,Reject,"The paper proposes a progressive pruning technique that achieves high pruning ratio. Reviewers have a consensus on rejection. Reviewer 1 pointed out that the experimental results are weak. Reviewer 2 is also concerned about the proposed method and experiments. Reviewer 3 is is concerned that this paper is incremental work. Overall, this paper does not meet the standard of ICLR. Recommend for rejection. ",ICLR2019,5: The area chair is absolutely certain +B07G94c34P8,1642700000000.0,1642700000000.0,1,VTNjxbFRKly,VTNjxbFRKly,Paper Decision,Accept (Poster),"The paper provides the theoretical justification for the ""label trick"" (using labels in graph-based semisupervised learning tasks). The authors performed a thorough evaluation of their analysis, which constitutes an experimental contribution. The authors provided a rebuttal that the AC finds to have reasonably addressed the reviewers' concerns. We recommend acceptance.",ICLR2022, +FVfa9gVOhs_,1610040000000.0,1610470000000.0,1,yuXQOhKRjBr,yuXQOhKRjBr,Final Decision,Reject,"The reviewers, including me, agreed that considering sampling diversity is interesting and reasonable when designing GNNs. However, the proposed method is too heuristic and empirical. Without the authors' feedback, I tend to reject this work.",ICLR2021, +BzbN8qReO-,1610040000000.0,1610470000000.0,1,7IElVSrNm54,7IElVSrNm54,Final Decision,Reject,"The paper studies the problem of satisfying group-based fairness constraints in the situation where some demographics are not available in the training dataset. The paper proposes to disentangle the predictions from the demographic groups using adversarial distribution-matching on a ""perfect batch"" generated by a clustered context set. + +Pros: +- The problem of satisfying statistical notions of fairness under ""invisible demographics"" is a new and well-motivated problem. +- Creative use of recent works such as DeepSets and GANs applied to the fairness problem. + +Cons: +- Makes a strong assumption that the clustering of the context set will result in a partitioning that has information about the demographics. This requires at the very least a well-behaved embedding of the data w.r.t. the demographic groups, and a well-tuned clustering algorithm (where optimal tuning is difficult in practice on unsupervised problems) -- but at any rate, as presented, the requirements for a ""perfect batch"" is neither clear nor formalized. +- Lack of theoretical guarantees. +- Various concerns in the experimental results (i.e. proposed method does not clearly outperform other baselines, high variance in experimental results, and other clarifications). + +Overall, the reviewers agreed the studied problem is new, interesting and relevant to algorithmic fairness; however, there were numerous concerns (see above) which were key reasons for rejection. +",ICLR2021, +rJGLH1arM,1517250000000.0,1517260000000.0,568,HyXBcYg0b,HyXBcYg0b,ICLR 2018 Conference Acceptance Decision,Reject,"The authors make an experimental study of the relative merits of RNN-type approaches and graph-neural-network approaches to solving node-labeling problems on graphs. They discuss various improvements in gnn constructions, such as residual connections. + +This is a borderline paper. On one hand, the reviewers feel that there is a place for this kind of empirical study, but on the other, there is agreement amongst the reviewers that the paper is not as well written as it could be. Furthermore, some reviewers are worried about the degree of novelty (of adding residual connections to X). + +I will recommend rejection, but urge the authors to clarify the writing and expand on the empirical study and resubmit. ",ICLR2018, +PIEAX-z6fo7,1610040000000.0,1610470000000.0,1,H8UHdhWG6A3,H8UHdhWG6A3,Final Decision,Accept (Poster),"The authors present a simple modification of existing byzantine resistant techniques for training in the presence of worst case failures/attacks. The paper studies two of the strongest attacks to date, that no other method, till now, has been able to address. The novelty is significant for the related byzantine ML literature. The authors further do a fantastic job in their experiments and sharing reproducible code. Some weak aspects of theory are in fact attributed to what the metrics and guarantees that the related literature studies. The novelty of this paper does not lie so much in the theory contribution, but more so on their experiments and presented intuition. I believe this will be a paper that people will build up on and the ideas presented here are of solid value and importane.",ICLR2021, +dWBjn-o9m,1576800000000.0,1576800000000.0,1,r1x63grFvH,r1x63grFvH,Paper Decision,Reject,"The present paper establishes uniform approximation theorems (UATs) for PointNet and DeepSets that do not fix the cardinality of the input set. + +Two nonexperts read the paper and came away not understanding what this exercise has taught us and why the weakening of the hypotheses was important. The authors made no attempt to argue these points in their rebuttals and so I went looking at the paper to find the answer in their revisions, but did not find it after scanning through the paper. I think a paper like this needs to explain what is gained and what obstructions earlier approaches met, and why the current techniques side step those. One of the reviewers felt that the fixed cardinality assumption was mild. I'm really not sure why the authors didn't attack this idea. Maybe it is mild in some technical sense? + +What I read of the paper seemed excellent in term of style and clarity. I think the paper simply needs to make a better case that it is not merely an exercise in topology. I think the result here is publishable on its own grounds, but for the paper to effectively communicate those findings, the authors should have revised it to address these issues. They chose not to and so I recommend ICLR take a pass. Once the reviewers revised the framing and scope/impact, provided it doesn't sound trivial, I think it'll be ready for publication. + +",ICLR2020, +rJghoyXlgN,1544720000000.0,1545350000000.0,1,H1z-PsR5KX,H1z-PsR5KX,An insightful paper presenting analyses of recurrent machine translation models,Accept (Poster),"Strong points: + +-- Interesting, fairly systematic and novel analyses of recurrent NMT models, revealing individual neurons responsible for specific type of information (e.g., verb tense or gender) + +-- Interesting experiments showing how these neurons can be used to manipulate translations in specific ways (e.g., specifying the gender for a pronoun when the source sentence does not reveal it) + +-- The paper is well written + +Weak points + +-- Nothing serious (e.g., maybe interesting to test across multiple runs how stable these findings are). + +There is a consensus among the reviewers that this is a strong paper and should be accepted. + +",ICLR2019,5: The area chair is absolutely certain +BkxNE3NZgN,1544800000000.0,1545350000000.0,1,BkxgbhCqtQ,BkxgbhCqtQ,Not acceptable for ICLR in current form,Reject,The reviewers agree this paper is not good enough for ICLR.,ICLR2019,5: The area chair is absolutely certain +W2nWBXkTRNH,1642700000000.0,1642700000000.0,1,B7abCaIiN_v,B7abCaIiN_v,Paper Decision,Reject,"The submission proposes triangular dropout training to provide adaptive capacity of the network at inference time. The proposed approach is simple and sound. However, the experiments are lacking in terms of complexity of the task and up-to-date architectures (e.g., transformers or convolutional layers) to demonstrate the effectiveness of the method. +Therefore I recommend this paper for rejection.",ICLR2022, +rk4piGUul,1486400000000.0,1486400000000.0,1,SkBsEQYll,SkBsEQYll,ICLR committee final decision,Reject,"There is consensus among the reviewers that the novelty of the paper is limited, and that the experimental evaluation of the proposed method needs to be improved.",ICLR2017, +6drPZnD-CBe,1610040000000.0,1610470000000.0,1,784_F-WCW46,784_F-WCW46,Final Decision,Reject,"The paper provides empirical evidence that the sampling strategy used in point cloud GANs can drastically impact the generation quality of the network. Specifically, the authors show that discriminators that are not sensitive to sampling have clustering artifact errors, while those that are sensitive to sampling do not produce reasonable looking point clouds. They also provide a simple way (i.e. including AVG feature pooling) to improve generation quality for insensitive discriminator GAN setups. The reviewers agree that this is an interesting insight into the problem and this insight can help the community. + +Based on the reviewers' comments and subsequent discussions, it becomes clear that the paper would be stronger and more compelling if the underlying hypothesis (i.e. the idea of sampling spectrum) is more rigorously defined (e.g. ideally with a theoretical grounding) and the claims/analyses are tied in with this definition. Such a grounded and precise setup would help in analyzing future generation discriminators that may not simply fall into the two discrete groups defined in the paper (i.e. sampling over-sensitive and sampling-insensitive). The results have promise, so the authors are encouraged to take into consideration the reviewer discussions to produce a stronger future submission. ",ICLR2021, +U5sP0TIcNBt,1610040000000.0,1610470000000.0,1,X5ivSy4AHx,X5ivSy4AHx,Final Decision,Reject,"The paper introduces a new variant (SREDA-Boost) of a variance-reduced method SEDRA for nonconvex-strongly-concave min-max optimization. Given that SEDRA is already optimal in the worst case, the proposed modification is intended to improve practical performance of the method, by relaxing conditions needed at initialization and allowing larger step sizes. While the reviewers appreciated the main ideas of the paper, they shared concerns about the significance of the paper's technical contributions, which were ultimately not addressed by the authors in the rebuttal phase. ",ICLR2021, +p3IQR8uFSCH,1610040000000.0,1610470000000.0,1,GMgHyUPrXa,GMgHyUPrXa,Final Decision,Accept (Poster),"The paper initially received mixed ratings, with one reviewer strongly supporting the paper given that the idea of combining unrolled algorithms and NAS is new and interesting, and one reviewer not convinced by the significance of the results. His/her main concern was the use of synthetic data only, which is not realistic. This was a legitimate concern as the performance of sparse estimation algorithms can change drastically when there is correlation in the design matrix. See for instance, the benchmarks in +F. Bach, R. Jenatton, J. Mairal and G. Obozinski. Optimization with Sparsity-Inducing Penalties. + +The rebuttal addresses this concern in a satisfactory manner and the area chair is happy to recommend an accept.",ICLR2021, +Y8RaLJ2XJ4,1576800000000.0,1576800000000.0,1,BJge3TNKwH,BJge3TNKwH,Paper Decision,Accept (Spotlight),The paper addresses an important problem (preventing catastrophic forgetting in continual learning) through a novel approach based on the sliced Kramer distance. The paper provides a novel and interesting conceptual contribution and is well written. Experiments could have been more extensive but this is very nice work and deserves publication.,ICLR2020, +Xk6pUgqh0lT,1610040000000.0,1610470000000.0,1,IMPnRXEWpvr,IMPnRXEWpvr,Final Decision,Accept (Poster),"The paper is proposing a multi-task learning approach extending existing weighting approaches. An important and novel contribution of the paper is separating the magnitude and direction information in gradient based information. The joint gradient direction is searched by using angle bisectors of task gradients and magnitude is searched by simply finding a scaling which results in uniform loss scales. This approach solves issues like small gradient norm bias of MGDA, etc. The proposed method works well and authors show that it is conceptually relevant to most of the existing algorithms. These conceptual unification is a strong contribution of the paper. The paper is reviewed by three reviewers and received both accept and reject scores. Specifically, + +- R#2: Championed the paper and argued for its acceptance +- R#3: Argues that the novelty is limited and SOTA claim is problematic. +- R#4: Argues that the gap between the empirical performance of the proposed method and existing algorithms is small. + +Arguments on the empirical performance and the SOTA are irrelevant to the decision since ICLR does not require algorithms to be SOTA or performed significantly better. Hence, the remaining issues are: claim of the SOTA being true or not, and lack of novelty. I read the paper in detail and decided to accept it with the following comments about the reviews: +- The paper is clearly novel. Direction and magnitude are first time treated separately. Moreover, resulting unification of the existing approaches and theoretical derivations of the important connections of existing methods are also significant. +- The SOTA claim of the paper is technically correct but little misleading. I would recommend authors to simply rephrase it ""proposed method outperforms existing methods loss weighting methods under the same experimental settings"". The reason for this is the fact that; in principle, ""art"" includes every possible solution for that problem. Hence, claiming SOTA in a fair and limited evaluation is rather misleading. + +In addition to the reviewer comments, here are additional issues which should be addressed by the camera-ready deadline: +- I think the discussion about MGDA is a bit problematic since removing $\alpha \geq 0$ assumption simply removes the Pareto stationarity guarantee of the method. The resulting direction can increase some loss function and this disagrees with the main point of the Pareto optimality. Hence, I would recommend authors to clarify this while making the connection. Frank-Wolfe algorithm is also not really inefficient and unstable since the problem is quadratic with linear constraints and the stability as well as extremely quick convergence can trivially be proved. +- In addition to the previous point, the proposed method can actually increase some loss functions as there is no consistency constraint enforced. This is an interesting observation and empirical results suggest that increasing loss of some objectives might actually be valuable. I think this observation deserves some discussion even in the introduction. + +",ICLR2021, +OORew_JXfd,1576800000000.0,1576800000000.0,1,rkezdaEtvH,rkezdaEtvH,Paper Decision,Reject,"While there was some support for the ideas presented in this paper, it was on the borderline, and ultimately did not make the cut for publication at ICLR. + +Concerns were raised as to the significance of the contribution, beyond that of past work.",ICLR2020, +ULs0M7poU9o,1642700000000.0,1642700000000.0,1,ccWaPGl9Hq,ccWaPGl9Hq,Paper Decision,Accept (Spotlight),"The authors’ present a precise definition of deployment efficient RL, where each new update of the policy may be costly, and theoretically analyze this for finite-horizon linear MDPs. The authors include an information-theoretic lower bound for the number of deployments required. The reviewers found this an important setting of interest and appreciated the theoretical contributions. The authors’ carefully addressed the raised points and also addressed questions about deployment complexity and sample complexity in their revised work. One weakness of the paper is that it does not provide empirical results and the linear MDP assumption, while quite popular in theoretical RL over the last few years, is quite restrictive. However,the paper still provides a very interesting theoretical contribution for an important topic and I recommend acceptance.",ICLR2022, +nG3eCTQWnGr,1642700000000.0,1642700000000.0,1,E0zOKxQsZhN,E0zOKxQsZhN,Paper Decision,Reject,"This paper recognizes that several common sub-problems studied in RL, such as meta RL and generalization in RL, can be cast as POMDPs. Using this observation, the authors evaluate how a straightforward approach to deal with POMDPs---using a recurrent neural network---compares to more specialized approaches. The reviewers agree that the research question studied in this paper is very interesting. However, after careful deliberation, I share the view of reviewer 2WFY that the results insufficiently support the claims made in the paper. In particular, I view the main claim from the abstract ""We find that a careful architecture and hyperparameter decisions yield a recurrent model-free implementation that performs on par with (and occasionally substantially better than) more sophisticated recent techniques in their respective domains."" as insufficiently supported. The main issue with the experiments is that only a small number of simple domains are considered. As Luisa points out in the public comments, variBAD dominates recurrent baselines when more complex tasks are considered, while on simpler domains such as the Cheetah-Vel domain considered in this paper, it performs similar to a recurrent model-free baseline. In the rebuttal the authors have added a more complex domain to address this, showing that a recurrent model-free baseline outperforms an off-policy version of variBAD. However, I view these results as inconclusive, as only a single complex domain is considered and they appear to contradict previous results with on-policy variBAD. For these reasons, I don't think the work in its current form is ready for publication at ICLR. But I want to encourage the authors to work out this direction further. In particular, adding more complex domains and also considering the on-policy variBAD method, can make this work stronger.",ICLR2022, +BoXGDxyOKbw,1610040000000.0,1610470000000.0,1,VCAXR34cp59,VCAXR34cp59,Final Decision,Reject,"This paper evaluates the extent to which disentangled representations can be recovered from pre-trained GANs with style-based generators by finding an orthogonal basis in the space of style vectors, and then training an encoder to map images to coordinates in the resulting latent space. To construct the orthogonal basis, the authors consider 3 recently proposed methods for controllable generation, along with a newly developed generalization of one of these methods. The authors evaluate metrics for disentanglement for 4 datasets, consider an abstract visual reasoning task, and compute unfairness scores. + +Reviewers expressed diverging opinions on this paper. R2 is in support of acceptance, R3 finds the paper borderline but is leaning towards acceptance, whereas R4 is critical. R2 and R4 engaged in a relatively detailed discussion, but maintained their scores. + +Having read the paper, the metareviewer feels this submission indeed has strengths and weaknesses. On the one hand, the main results are notable; it is worth reporting that disentangled representations can be recovered from pretrained GANs is a relatively straightforward manner. In this context, the metareviewer feels that some comments by R4 are more critical than is warranted. The authors do not necessarily have to show that GAN-based methods uniformly improve upon VAE-based methods, either in terms of disentanglement metrics or in terms of sensitivity to hyperparameters. The main claim in this submission is that GAN-based methods are mostly comparable to VAE-based methods, and this claim is both sufficiently notable and sufficiently supported by experimental results. + +At the same time, this submission is not without flaws. The writing is on the rough side, and as R4 notes the authors have removed all white space between paragraphs. The metareviewer also feels it is not satisfactory to show a box plot for GAN-based methods in Figure 2 and ask the reader to compare these plots to the violin plots for VAE-based methods in the Locatello paper. The authors need to find a way to make a more direct comparison here. R4's comments about the comparison in the abstract-reasoning setting are also well-taken –– here the baseline employs standard (entangled) models, so it is unclear what conclusions we should draw from this experiment. Similarly the unfairness results once again appeal to an indirect comparison to results in the Locatello paper on this topic. + +On balance, the metareviewer is inclined to say that this submission, in its current form, falls just below the threshold for acceptance. These results are clearly of note to the community and worth reporting, but the presentation has enough flaws that another round of reviews is warranted based on a revised manuscript. The metareviewer hopes to see this paper appear a conference in the (near) future. ",ICLR2021, +J8JO0tuheOO,1610040000000.0,1610470000000.0,1,OItp-Avs6Iy,OItp-Avs6Iy,Final Decision,Reject,"The paper proposes to effectively learn representation of 3D data (point clouds/meshes) using a spherical GNN architecture over concentric spherical maps. A method for converting point clouds to concentric spherical images is also proposed. Evaluation is done via 3D classification tasks on rotated data. + +Strengths: +- Interesting novel method for learning 3D representations +- Technically sound +- Performs similarly to spherical CNNs and other STOA on the Modelnet40 dataset + +Weaknesses: +- Presentation of the work needs to be further improved such that it is easier for others to reproduce +- More in-depth experiments are needed to justify how much Spherical GNN improves over other STOA, particular given how classification accuracy is very similar to STOA.",ICLR2021, +SxwkOxh_k-,1610040000000.0,1610470000000.0,1,h9XgC7JzyHZ,h9XgC7JzyHZ,Final Decision,Reject,"We thank the authors for their submission. The paper feels more like an early draft, with several fundamental factual mistakes (mistake on computational and statistical complexities) as highlighted by the reviewers. There's plenty of material in the reviews to help authors improve their submission, we encourage them to use these recommendations to improve motivation / experiments.",ICLR2021, +g1IPeNzfCxj,1610040000000.0,1610470000000.0,1,yZBuYjD8Gd,yZBuYjD8Gd,Final Decision,Reject,"This paper empirically studies the impact of different types of negatives used in recent contrastive self-supervised learning methods. Results were initially shown on Mocov2, though after rebuttal simCLR was also added, and several interesting findings were found including that only hardest 5% of the negatives are necessary and sufficient. While the reviewers saw the benefit of rigorously studying this aspect of recent advances in self-supervised learning, a number of issues were raised including: 1) The limited scope of the conclusions, given that only two (after rebuttal) algorithms were used on one datasets, 2) Limited connections drawn to existing works on hard negative mining (which is very common across machine learning including metric learning and object detection), and 3) Limited discussion of some of the methodological issues such as use of measures that are intrinsically tied to the model's weights (hence being less reliable early in the training) and WordNet as a measure for semantic similarity. Though the authors provided lengthy rebuttals, the reviewers still felt some of these issues were not addressed. As a result, I recommend rejection in this cycle, and that the authors bolster some of these aspects for a submission to future venues. + +I would like to emphasize that this type of work, which provides rigorous empirical investigation of various phenomena in machine learning, is indeed important and worth doing. Hence, the lack of a new method (e.g. to address the selection of negatives) was not the basis of the decision. While the paper clearly does a thorough job at investigating these issues for a limited scope (e.g. in terms of datasets), a larger contribution is expected for empirical papers such that 1) we can ensure the generality of the conclusions (across methods and datasets), 2) we have a conceptual framework for understanding the empirical results especially with respect to what is already known in adjacent areas (e.g. metric learning and object detection), and 3) we understand some of the methodological choices that were made and why they are sufficiently justified. ",ICLR2021, +HJnqQy6Hz,1517250000000.0,1517260000000.0,205,Hko85plCW,Hko85plCW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This clearly written paper describes a simple extension to hard monotonic attention -- the addition of a soft attention mechanism that operates over a fixed length window of inputs that ends at the point selected by the hard attention mechanism. Experiments on speech recognition (WSJ) and on a document summarization task demonstrate that the new attention mechanism improves significantly over the hard monotonic mechanism. About the only ""con"" the reviewers noted is that the paper is a minor extension over Raffel et al., 2017, but the authors successfully argue that the strong empirical results render this simplicity a ""pro."" +",ICLR2018, +Bk9ZnMUux,1486400000000.0,1486400000000.0,1,SkhU2fcll,SkhU2fcll,ICLR committee final decision,Accept (Poster),"The reviews for this paper were quite mixed, with one strong accept and a marginal reject. A fourth reviewer with strong expertise in multi-task learning and deep learning was brought in to read the latest manuscript. Due to time constraints, this fourth review was not entered in the system but communicated through personal communication. + + Pros: + - Reviewers in general found the paper clear and well written. + - Multi-task learning in deep models is of interest to the community + - The approach is sensible and the experiments show that it seems to work + + Cons: + - Factorization methods have been used extensively in deep learning, so the reviewers may have found the approach incremental + - One reviewer was not convinced that the proposed method would work better than existing multi-task learning methods + - Not all reviewers were convinced by the experiments + - The fourth reviewer found the approach very sensible but was not excited enough to champion the paper + + The paper was highly regarded by at least one reviewer and two thought it should be accepted. The PCs also agree that this paper deserves to appear at the conference.",ICLR2017, +uNRxy6jLO-_,1610040000000.0,1610470000000.0,1,awMgJJ9H-0q,awMgJJ9H-0q,Final Decision,Reject,"The paper proposes a discretization of Wasserstein gradient flow with an euler scheme, and propose a way to estimate each step of the euler scheme using ratio estimators from samples regularized with gradient penalties. Statistical bounds are given to bound the estimated flows from the wasserstein flow. + +Reviewers have raised concerns regarding the assumptions under which results present in the paper hold, this was clarified by the authors (goedesic lambda convexity, log sobolev constant for the target density . lipchitizity of velocity fields). The paper needs a revision to incorporate that feedback and to be in shape for publication. + +Other concern were on earlier claims in the paper regarding the monge ampere equation and approximation of the optimal mapping this was addressed by the rebuttal. + +Other concerns were also on explaining the relation of the work to score based models and energy based models. + +Overall the paper needs to state in a clearer way the assumptions for the theoretical results and to acknowledge the limitations of those assumptions in analyzing the euler scheme. + +",ICLR2021, +DA3Y3Mcgc4,1642700000000.0,1642700000000.0,1,S5qdnMhf7R,S5qdnMhf7R,Paper Decision,Reject,"In general, the reviewers appreciated the elegant concept behind the paper and the good results. However, they also raised considerable reservations about the significance of a method that decreases the parameter count but not necessarily computational efficiency (FLOPS) or memory. While the additional analysis that the authors provided definitely helps to understand the limitations of the method, the reviewers were in the end quite divided on the significance of the results. In addition, all reviewers agreed that the writing was in somewhat rough shape and needed improvement. + +In summary, this is definitely a borderline paper, but given the current reviewer assessment, I would recommend that it is not quite ready for publication.",ICLR2022, +BJ-t7JprM,1517250000000.0,1517260000000.0,181,rknt2Be0-,rknt2Be0-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper investigates emergence of language from raw pixels in a two-agent setting. The paper received divergent reviews, 3,6,9. Two ACs discussed this paper, due to a strong opinion from both positive and negative reviewers. The ACs agree that the score ""9"" is too high: the notion of compositionality is used in many places in the paper (and even in the title), but never explicitly defined. Furthermore, the zero-shot evaluation is somewhat disappointing. If the grammar extracted by the authors in sec. 3.2 did indeed indicate the compositional nature of the emergent communication, the authors should have shown that they could in fact build a message themselves, give it to the listener with an image and ask it to answer. On the other hand, ""3"" is also too low of a score. In this renaissance of emergent communication protocol with multi-agent deep learning systems, one missing piece has been an effort toward seriously analyzing the actual properties of the emergent communication protocol. This is one of the few papers that have tackled this aspect more carefully. The ACs decided to accept the paper. However, the authors should take the reviews and comments seriously when revising the paper for the camera ready.",ICLR2018, +5_4dOKek9Mt,1642700000000.0,1642700000000.0,1,mQxt8l7JL04,mQxt8l7JL04,Paper Decision,Accept (Poster),"This paper proposes an extension to learning a representation: it motivates, proposes and evaluates a new regularizer term that promotes smoothness via enforcing the representation to be geometry-preserving (isometry, conformal mapping of degree k). Comparisons with a standard VAE and FMVAE (Chen et al. 2020) are shown and experiments are provided on CelebA with several different attributes as target classification tasks. + +The paper has received extensive reviews and the authors have successfully answered most of the concerns raised, mostly regarding comparisons to other techniques that try to introduce a regularization based on the properties of the Jacobian of the decoder network. +The appendix has been extended as a result of the rebuttal and the paper could be accepted. + +Notes: +I find the formulation based on the notion of the isometric decoder somewhat surprising as the encoder is a key object of interest that controls the nature of the representation. The authors should clarify the assumption 3 in 3.3 better by the consideration of potentially $dim(z) << dim(x)$, how the isometry of the decoder effects the encoder, + +Additionally, for the latent space flattening an ablation using SVD (merely a linear mapping for $i(\cdot)$) could be considered. + +Reviewer ZGHS has noted that they raise their grade to 6 in their comment, but this is still not currently reflected.",ICLR2022, +B1edwt0xg4,1544770000000.0,1545350000000.0,1,r1V0m3C5YQ,r1V0m3C5YQ,"recurrent models for polyphonic music composition, quality seems to be the issue",Reject,"This paper proposes novel recurrent models for polyphonic music composition and demonstrates the approach with qualitative and quantitative evaluations as well as samples. The technical parts in the original write-up were not very clear, as noted by multiple reviewers. During the review period, the presentation was improved. Unfortunately the reviewer scores are mixed, and are on the lower side, mainly because of the lack of clarity and quality of the results.",ICLR2019,3: The area chair is somewhat confident +gJ2UgpVVBO9,1642700000000.0,1642700000000.0,1,zfmB5vgfaCt,zfmB5vgfaCt,Paper Decision,Reject,"The papers studies a novel problem and proposes an interesting algorithm. That said, the reviewers question the motivation of the paper. That is whether this method presents a viable attack on existing MT systems. The attack is not black box and MT systems often have an output length threshold beyond which the output is trimmed. Given the motivational concerns, I recommend that the paper is revised and resubmitted to other venues.",ICLR2022, +KjAhxlicF6j,1642700000000.0,1642700000000.0,1,IY4IsjvUhZ,IY4IsjvUhZ,Paper Decision,Reject,"The paper analyses the loss landscape induced by AUC loss. Reviewers found critical issues with the paper, and the Authors have not provided feedback. As such I have to recommend rejecting the paper. I thank the Authors for submitting the paper to the ICLR conference. I hope the reviews will be helpful in improving the paper.",ICLR2022, +8y1munM40ZS,1642700000000.0,1642700000000.0,1,NdOoQnYPj_,NdOoQnYPj_,Paper Decision,Accept (Poster),"The article introduces a Bayesian approach for online learning in non-stationary environments. The approach, which bears similarities with weighted likelihood estimation methods, associate a binary weight to each past observation, indicating if this observation should be including or not to compute the posterior. The weights are estimated via maximum a posteriori. + +The paper is well written, the approach is novel and its usefulness demonstrated on a number of different experiments. The original submission missed some relevant references that have been added in the revision. The approach has some limitations, highlighted by the reviewers: +* it requires to solve a binary optimisation problem whose complexity scales exponentially with the size of the dataset; although the greedy procedure proposed by the authors seems to work fine on the examples shown, the approach may not be applicable to larger datasets +* it requires to store all the data +* it requires the traceability of the marginal likelihood + +Despite these limitations, there was a general agreement that this paper offers a novel and useful contribution, and I recommend acceptance. + +As noted by reviewer o4TK, I also think that the title is not very accurate. Bayesian methods naturally allow recursive updates of one's beliefs, and therefore have ""memory"". Maybe change the title for ""Bayes with augmented selective/adaptive memory""?",ICLR2022, +Hkl9GIxXe4,1544910000000.0,1545350000000.0,1,Bkxbrn0cYX,Bkxbrn0cYX,incremental but interesting contribution to life long learning for neural networks.,Accept (Poster),"Two of the reviewers raised their scores during the discussion phase noting that the revised version was clearer and addressed some of their concerns. As a result, all the reviewers ultimately recommended acceptance. They particularly enjoyed the insights that the authors shared from their experiments and appreciated that the experiments were quite thorough. All the reviewers mentioned that the work seemed somewhat incremental, but given the results, insights and empirical evaluation decided that it would still be a valuable contribution to the conference. One reviewer added feedback about how to improve the writing and clarity of the paper for the camera ready version.",ICLR2019,4: The area chair is confident but not absolutely certain +4NWuHZ8Dhry,1610040000000.0,1610470000000.0,1,w8iCTOJvyD,w8iCTOJvyD,Final Decision,Reject,"This work proposes new learning algorithms that fine-tune (""tailor"") a model at test-time using unsupervised objectives. This formulation allows for introducing an inductive bias into the model that might improve generalization on unseen data. The proposed algorithm is demonstrated on two example tasks. + +The reviewers like the topic and also find the proposed approach to be interesting. However, they are unconvinced by the current empirical evaluation of the method. Additional experimental evaluation could improve our understanding of the proposed method and help contrast it to previously proposed techniques. Given these reviews I recommend rejecting the paper at this time.",ICLR2021, +SlCZeX6Ydy,1576800000000.0,1576800000000.0,1,S1ekaT4tDB,S1ekaT4tDB,Paper Decision,Reject,"This paper proposes an alternative explanation of the emergence of oriented bandpass filters in convolutional networks: rather than reflecting observed structure in images, these filters would be a consequence of the convolutional architecture itself and its eigenfunctions. +Reviewers agree that the mathematical angle taken by the paper is interesting, however they also point out that crucial prior work making the same points exists, and that more thorough insights and analyses would be needed to make a more solid paper. +Given the closeness to prior work, we cannot recommend acceptance in this form.",ICLR2020, +SJe-DEG4xE,1544980000000.0,1545350000000.0,1,r1xdH3CcKX,r1xdH3CcKX,Interesting for ICLR but can benefit from further evaluation,Accept (Poster),"This paper proposes a unified approach for performing state estimation and future forecasting for agents interacting within a multi-agent system. The method relies on a graph-structured recurrent neural network trained on temporal and visual (pixel) information. + +The paper is well-written, with a convincing motivation and a set of novel ideas. + +The reviewers pointed to a few caveats in the methodology, such as quality of trajectories (AnonReviewer2) and expensive learning of states (AnonReviewer3). However, these issues do not discount much of the papers' quality. Besides, the authors have rebutted satisfactorily some of those comments. + +More importantly, all three reviewers were not convinced by the experimental evaluation. AnonReviewer1 believes that the idea has a lot of potential, but is hindered by the insufficient exposition of the experiments. AnonReviewer3 similarly asks for more consistency in the experiments. + +Overall, all reviewers agree on a score ""marginally above the threshold"". While this is not a particularly strong score, the AC weighted all opinions that, despite some caveats, indicate that the developed model and considered application fit nicely in a coherent and convincing story. The authors are strongly advised to work further on the experimental section (which they already started doing as is evident from the rebuttal) to further improve their paper.",ICLR2019,3: The area chair is somewhat confident +MCAKzZPaAv,1576800000000.0,1576800000000.0,1,SJlDDnVKwS,SJlDDnVKwS,Paper Decision,Reject,"Evolutionary strategies are a popular class of method for black-box gradient-free optimization and involve iteratively fitting a distribution from which to sample promising input candidates to evaluate. CMA-ES involves fitting a Gaussian distribution and has achieved state-of-the-art performance on a variety of black-box optimization benchmarks when the underlying function is cheap to evaluate. In this work the authors replace this distribution instead with a much more flexible deep generative model (i.e. NICE). They demonstrate empirically that this method is effective on a number of synthetic global optimization benchmarks (e.g. Rosenbrock) and three direct policy search reinforcement learning problems. The reviewers all believe the paper is above borderline for acceptance. However, two of the reviewers said they were on the low end of their respective scores (i.e. one wanted to give a 5 instead of a 6 and another a 7 instead of 8.) A major issue among the reviewers was the experiments, which they noted were simple and not very convincing (with one reviewer disagreeing). The synthetic global optimization problems do seem somewhat simple. In the RL problems, it's not obvious that the proposed method is statistically significantly better, i.e. the error bars are overlapping considerably. Thus the recommendation is to reject. Hopefully stronger experiments and incorporating the reviewer comments in the manuscript will make this a stronger paper for a future conference.",ICLR2020, +NLWSul354f,1610040000000.0,1610470000000.0,1,3YQAVD9_Dz3,3YQAVD9_Dz3,Final Decision,Reject,"All reviewers generally admit that the motivation of realizing search-free autoaugment is reasonable and important. However, they also raised many concerns regarding the experimental evaluation to validate the practical effectiveness of the method. In particular, unclear discussion with respect to ablation studies, and the lack of the baselines implemented by the authors were the central issues that obscure the essential effect of the core contribution of the work. The authors made great efforts to conduct additional experiments and did address some of those issues, however some experiments are not yet ready such as the baseline implementation on ImageNet and testing on large models. After the discussion phase, all reviewers decided to keep their initial scores toward rejection, and the AC agreed with their opinions. + +In summary, the paper focuses on an important problem and the proposed method is potentially very useful, but the paper in its current form should be further polished and completed before publication, thus I recommend rejection this time. + ",ICLR2021, +SypIQkTSf,1517250000000.0,1517260000000.0,149,BkeqO7x0-,BkeqO7x0-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"this work adapts cycle GAN to the problem of decipherment with some success. it's still an early result, but all the reviewers have found it to be interesting and worthwhile for publication.",ICLR2018, +tNfWVsRN3i,1576800000000.0,1576800000000.0,1,SkeAaJrKDS,SkeAaJrKDS,Paper Decision,Accept (Poster),"This paper proposes Search with Amortized Value Estimates (SAVE) that combines Q-learning and MCTS. SAVE uses the estimated Q-values obtained by MCTS at the root node to update the value network, and uses the learned value function to guide MCTS. + +The rebuttal addressed the reviewers’ concerns, and they are now all positive about the paper. I recommend acceptance.",ICLR2020, +eBbmdRIusa6,1642700000000.0,1642700000000.0,1,jaLDP8Hp_gc,jaLDP8Hp_gc,Paper Decision,Accept (Poster),This paper receives positive reviews. The authors provide additional results and justifications during the rebuttal phase. All reviewers find this paper interesting and the contributions are sufficient for this conference. The area chair agrees with the reviewers and recommends it be accepted for presentation.,ICLR2022, +UWK--eoa4b,1576800000000.0,1576800000000.0,1,HJg2b0VYDr,HJg2b0VYDr,Paper Decision,Accept (Poster),"This paper proposes to perform sample selection for deep learning - which can be very computationally expensive - using a smaller and simpler proxy network. The paper shows that such proxies are faster to train and do not substantially harm the accuracy of the final network. + +The reviewers were all in agreement that the problem is important, and that the paper is comprehensive and well executed. I therefore recommend it should be accepted.",ICLR2020, +F_K9g5fHwp7,1610040000000.0,1610470000000.0,1,SkUfhuFsvK-,SkUfhuFsvK-,Final Decision,Reject,This paper presents a self-training idea for GCN models to help improve the node classification. The reviewers agreed that the technical contribution of the proposed approach is limited and the performance improvement seems marginal. ,ICLR2021, +S1s8B1aBf,1517250000000.0,1517260000000.0,576,Sk1NTfZAb,Sk1NTfZAb,ICLR 2018 Conference Acceptance Decision,Reject,"While the reviewers feel there might be some merit to this work, they find enough ambiguities and inaccuracies that I think this paper would be better served by a resubmission.",ICLR2018, +BJli5EK3y4,1544490000000.0,1545350000000.0,1,S1lg0jAcYm,S1lg0jAcYm,"Good contribution, still a slog to read",Accept (Poster),"This paper introduces a new way to estimate gradients of expectations of discrete random variables by introducing antithetic noise samples for use in a control variate. + +Quality: The experiments are mostly appropriate, although I disagree with the choice to present validation and test-set results instead of training-time results. If the goal of the method is to reduce variance, then checking whether optimization is improved (training loss) is the most direct measure. However reasonable people can disagree about this. + +I also think the toy experiment (copied from the REBAR and RELAX paper) is a bit too easy for this method, since it relies on taking two antithetic samples. I would have liked to see a categorical extension of the same experiment. + +Clarity: I think that this method will not have the impact it otherwise could because of the authors' fearless use of long equations and heavy notation throughout. This is unavoidable to some degree, but +1) The title of the paper isn't very descriptive +2) Why not follow previous work and use \theta instead of \phi for the parameters being optimized? +The presentation has come a long way, but I fear that few besides our intrepid reviewers will have the stomach. I recommend providing more intuition throughout. + +Originality: The use of antithetic samples to reduce variance is old, but this seems like a well-thought-through and non-trivial application of the idea to this setting. + +Significance: Ultimately I think this is a new direction in gradient estimators for discrete RVs. I don't think this is the last word in this direction but it's both an empirical improvement, and will inspire further work.",ICLR2019,3: The area chair is somewhat confident +TcfcFvVWIFf,1642700000000.0,1642700000000.0,1,UdxJ2fJx7N0,UdxJ2fJx7N0,Paper Decision,Accept (Poster),"The paper addresses the problem of non-convex non-concave min-max optimization under the perspective of application of smoothed algorithms between two opponents. +The paper examines a model where the max-player applied a zero-memory smooth (from differential perspective) algorithm and min-player SGD/SNAG or proximal methods providing results similar with the state-of-art. Convergence guarantees proposed were sound and experimental results on generative adversarial networks and adversarial training demonstrate the efficiency of the proposed algorithms.",ICLR2022, +NfYZZ7waM7x,1642700000000.0,1642700000000.0,1,wIK1fWFXvU9,wIK1fWFXvU9,Paper Decision,Reject,"All reviewers agree that the proposed idea looks interesting but the paper is seriously lacking in the definition of its scope: there is no quantitative result, experiments are quite limited, and there is not enough discussion of the limitations. With more work this could become a very interesting paper.",ICLR2022, +PIOVA1p81,1576800000000.0,1576800000000.0,1,HJg6VREFDH,HJg6VREFDH,Paper Decision,Reject,"This paper proposes a new way to stabilise GAN training. + +The reviews were very mixed but taken together below acceptance threshold. + +Rejection is recommended with strong motivation to work on the paper for next conference. This is potentially an important contribution. ",ICLR2020, +cCpxBE-eZjm,1610040000000.0,1610470000000.0,1,3u3ny6UYmjy,3u3ny6UYmjy,Final Decision,Reject,"While the authors appreciated the proposed contrastive training scheme and the strong related work summary, all authors agreed that the approach was severeley limited by being a pure selection-based method. Without the help of another model that proposes molecules, the approach can only select reactants from an existing set. As target molecules become more complicated, the modeller must make a choice: (a) use a much larger initial candidate set which hopefully encompases all molecules necessary to make the target molecule, or (b) use another model to propose new intermediate molecules. The authors went with (b) which harmed their novelty claim: a big reason why retrosynthesis is hard is because of the need to generate unseen molecules, and if this is left to an already proposed model, the current approach is not adding much methodological novelty. While their approach does improve upon existing work in the multi-step setting, there's even more recent work that has not been compared against (e.g., https://arxiv.org/pdf/2006.07038.pdf) so the improved performance may be outperformed. + +The fix is straightforward: modify the methodology to also propose intermediate molecules. This will fix the novelty complaint and strengthen the practicality argument: practitioners could directly use this approach to discover synthesis routes. The authors could slightly update the related work, add comparisons against recent methods, and take into account the other feedback given by the authors. The paper is very nicely written, the proposed changes are purely methodological, and not insurmountable in my opinion. I would urge the authors to make these changes which I believe will result in a very nice paper.",ICLR2021, +TwiVIEjxiFV,1610040000000.0,1610470000000.0,1,HO80-Z4l0M,HO80-Z4l0M,Final Decision,Reject,"The paper proposes to create models that address tail classes by computing a linear combination over models (concatenated weight vectors). Reviewers had grave concerns about the technical contribution, including justification of linear averaging of non-linear models, and about the experimental results, which improve on tail classes but hurt overall performance. As a result, the paper cannot be accepted to ICLR. ",ICLR2021, +r1JfhfI_g,1486400000000.0,1486400000000.0,1,HkIQH7qel,HkIQH7qel,ICLR committee final decision,Reject,"The program committee appreciates the authors' response to concerns raised in the reviews. Unfortunately, most reviewers are not leaning sufficiently towards acceptance. In particular, it is unfortunate that authors can not evaluate their model on the leaderboard due to copyright issues. The role of standard datasets and benchmarks is to allow for meaningful comparisons. Evaluation on non-standard splits defeats this purpose. Fortunately, sounds like authors are working on getting their model evaluated on the leaderboard. Resolving that and incorporating reviewers' feedback will help make the paper stronger.",ICLR2017, +rkr9N1TSM,1517250000000.0,1517260000000.0,411,ryG6xZ-RZ,ryG6xZ-RZ,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"This is a fascinating paper, and representative of the sort of work which is welcome in our field and in our community. It presents a compiler framework for the development of DSLs (and models) for Deep Learning and related methods. Overall, reviewers were supportive of and excited by this line of work, but questioned its suitability for the main conference. In particular, the lack of experimental demonstrations of the system, and the disconnect between domain-specific technical knowledge required to appreciate this work and that of the average ICLR attendee were some of the main causes for concern. It is clear to me that this paper is not suitable for the main conference, not due to its quality, but due to its subject matter. I would be happy, however, to tentatively recommend it for acceptance to the workshop as this topic deserves discussion at the conference, and this would provide the basis for a useful bridge between the compilers community and the deep learning community.",ICLR2018, +HJBV2ML_g,1486400000000.0,1486400000000.0,1,ByldLrqlx,ByldLrqlx,ICLR committee final decision,Accept (Poster),"This is a well written paper that attempts to craft a practical program synthesis approach by training a neural net to predict code attributes and exploit these predicted attributes to efficiently search through DSL constructs (using methods developed in programming languages community). The method is sensible and appears to give consistent speedups over baselines, though its viability for longer programs remains to be seen. There is potential to improve the paper. One of the reviewers would have liked more analysis on what type of programs are difficult and how often the method fails, and how performance depends on training set size etc. The authors should improve the paper based on reviewer comments.",ICLR2017, +ZV7W_EdO0Dvf,1642700000000.0,1642700000000.0,1,B2pZkS2urk_,B2pZkS2urk_,Paper Decision,Reject,"In this paper the authors demonstrate the use of meta-learning in plastic recurrent neural networks with an evolutionary approach, avoiding gradients. They show that this approach can be used to develop networks that can solve problems like sequence prediction and simple navigation. + +The reviews for this paper all had scores below the acceptance threshold (3,5,3,3). The principal concerns were: + +(1) The lack of novelty. Other papers have taken very similar approaches (e.g. Najarro & Risi, 2020 or Miconi et al., 2019), and fundamentally this paper simply ties together different elements in one package. + +(2) Lack of demonstration of the approach beyond some very simple tasks. + +(3) Lack of connection to the related literature on neuro-evolution and ML. + +(4) General clarity and style of writing issues. + +The authors responded to the reviewers, but the responses did not convince the reviewers enough to increase their scores past threshold. Given this, a reject decision was reached.",ICLR2022, +xCndmfmgLog,1610040000000.0,1610470000000.0,1,T0tmb7uhRhD,T0tmb7uhRhD,Final Decision,Reject,"This paper proposes a model-agnostic FL method called FedKT that performs only one communication round and reduces the communication complexity of federated learning. The reviewers have the following concerns about the paper: +* Limited novelty because the proposed method is directly based on PATE +* Insufficient experiments + +The authors did a great job of responding to the reviewers' comments and also added some new experimental results in the updated version. But the reviewers still recommend significant revision of the paper and resubmission to a future venue. I hope the authors will find their constructive and detailed comments below helpful!",ICLR2021, +7uEbcwt6Ga,1576800000000.0,1576800000000.0,1,B1lDoJSYDH,B1lDoJSYDH,Paper Decision,Accept (Poster),The paper proposes an approach for N-D continuous convolution on unordered particle set and applies it to Lagrangian fluid simulation. All reviewers found the paper to be a novel and useful contribution towards the problem of N-D continuous convolution on unordered particles. I recommend acceptance. ,ICLR2020, +BkHFXy6SG,1517250000000.0,1517260000000.0,185,B17JTOe0-,B17JTOe0-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This work shows how activation patterns of units reminiscent of grid and border cells emerge in RNNs trained on navigation tasks. While the ICLR audience is not mainly focused on neuroscience, the findings of the paper are quite intriguing, and grid cells are sufficiently well-known and ""mainstream"" that this may interest many people.",ICLR2018, +KZy-KhXbv,1576800000000.0,1576800000000.0,1,BJedHRVtPB,BJedHRVtPB,Paper Decision,Accept (Poster),Three knowledgable reviewers give a positive evaluation of the paper. The decision is to accept.,ICLR2020, +zvP0ePNfrR6,1642700000000.0,1642700000000.0,1,o86_622j0sb,o86_622j0sb,Paper Decision,Reject,"In this paper, the authors propose to use segmentation priors for black-box attacks such that the perturbations are limited in the salient region. They also find that state-of-the-art black-box attacks equipped with segmentation priors can achieve much better imperceptibility performance with little reduction in query efficiency and success rate. Hence, the auithors propose the Saliency Attack, a new gradient-free black-box attack, that can further improve the imperceptibility by refining perturbations in the salient region. +The reviewers think that the proposed method is simple and important, and the authors have responded properly to some comments. +However, the reviewers still are not satisfied with the experimental evaluation and comparisons, as the authors can only try to compare with other ideas and test more models in the future. +In summary, I think the manuscript at its current staus cannot be accepted.",ICLR2022, +0seEwRk1kv,1610040000000.0,1610470000000.0,1,udbMZR1cKE6,udbMZR1cKE6,Final Decision,Reject,"Like the reviewers, I find this paper extremely borderline. On the one hand, it is clearly written, about a topic I find fascinating, and generally well motivated if not shockingly novel (i.e. removing some of the simplifying assumptions from Zhong et al. 2020, e.g. requiring grounding to be learned, use of real language rather than synthetically generated). On the other hand, I agree with the leitmotiv present amongst the reviews that the problem at the centre of the experimental setting is very, very simple (3 objects, 3 descriptions). I am mindful of the fact that access to computational resources is unevenly distributed, and am not expecting a paper like this to immediately scale their experiments to highly complex settings with photorealism, etc, but I can't help but feel that a more challenging task, with a deeper analysis of the problems presented by both grounding and the use of non-synthetic language, would both have been highly desirable to make this paper uncontroversially worth accepting. + +As a result, the decision is to not accept the paper in its present form. Work on this topic should definitely be presented at ICLR, but it's a shame this paper did not make a stronger case for itself.",ICLR2021, +r1gTKCvelV,1544740000000.0,1545350000000.0,1,rkxacs0qY7,rkxacs0qY7,Good work which can become more mature with further experiments,Accept (Poster),"This paper shows a promising new variational objective for Bayesian neural networks. The new objective is obtained by effectively considering a functional prior on the parameters. The paper is well-motivated and the mathematics are supported by theoretical justifications. + +There has been some discussion regarding the experimental section. On one hand, it contains several real and synthetic data which show the good performance of the proposed method. On the other hand, the reviewers requested deeper comparisons with state-of-the art (deep) GP models and more general problem settings. The AC decided that the paper can be accepted with the experiments contained in the new revision, although the authors would be strongly encouraged to address the reviewers’ comments in a “non-cosmetic manner (as R2 put it). +",ICLR2019,3: The area chair is somewhat confident +sXY7TqRFzBV,1610040000000.0,1610470000000.0,1,bhKQ7P7gyLA,bhKQ7P7gyLA,Final Decision,Reject,"The paper received borderline and negative reviews but has raised many questions and discussions, showing that the paper has some merit. Many concerns were however raised on various aspects of the paper such as mathematical rigor, clarity, and motivation of manifold regularization that is too disconnected from the robustness to local random perturbation which is encouraged by the method. The rebuttal addresses some of these comments and the reviewers have appreciated the detailed answer. Yet, it was not sufficient to change the reviewer's opinions. + +In its current form, the paper is not ready for publication and the area chair agrees with most of the reviewer's comments. He recommends a reject, but encourage the authors to take into account the feedback from the reviewer before resubmitting to a future venue.",ICLR2021, +xuka8WB5vug,1610040000000.0,1610470000000.0,1,ARFshOO1Iu,ARFshOO1Iu,Final Decision,Reject,"This paper introduces a self-training strategy for semi-supervised learning for few shot sequence learning. It builds on ideas from an existing work on robust deep learning that adaptively reweights examples for learning to reduce impact of noisy examples, here the noisy examples are introduced to the student network training by the teacher network. Two main novel points, one is on selectively constructing the validation set used for adaptive reweighting. Another idea is to move from the sentence level reweighting to token level reweighting. The paper shows strong results suggesting the proposed method can effectively learn under few-shot learning. +A primary concern from the reviewers is that the paper has limited novelty given that it primarily applies existing ideas to a slightly different problem. Another concern is that the system consists of many components, each of the choices could have other viable options. The ablation studies indicate these components are useful compared to when removed, but fail to explore possible alternative choices. One of the questions is whether token-level reweighting is necessary. It would have been nice to see an ablation study comparing against a baseline using sentence-level reweighting. + +",ICLR2021, +Sy0pMypHz,1517250000000.0,1517260000000.0,38,Hk3ddfWRW,Hk3ddfWRW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper presents a sampling inference method for learning in multi-modal demonstration scenarios. Reference to imitation learning causes some confusion with the IRL domain, where this terminology is usually encountered. Providing a real application to robot reaching, while a relatively simple task in robotics, increases the difficulty and complexity of the demonstration. That makes it impressive, but also difficult to unpick the contributions and reproduce even the first demonstration. It's understandable at a meeting on learning representations that the reviewers wanted to understand why existing methods for learning multi-modal distributions would not work, and get a better understanding of the tradeoffs and limitations of the proposed method. The CVAE comparison added to the appendix during the rebuttal period just pushed this paper over the bar. The demonstration is simplified, so much easier to reproduce, making it more feasible others will attempt to reproduce the claims made here. +",ICLR2018, +djnJkyyO020,1610040000000.0,1610470000000.0,1,H92-E4kFwbR,H92-E4kFwbR,Final Decision,Reject,"I thank authors and reviewers for discussions. Reviewers found the paper (specially the CAT-r method proposed in the rebuttal period) interesting but there are some remaining concerns about the significance of the results and experiments. Given all, I think the paper still needs a bit of more work before being accepted. I encourage authors to address comments raised by the reviewers to improve their paper. + +- AC",ICLR2021, +BkPW4J6BM,1517250000000.0,1517260000000.0,293,HJewuJWCZ,HJewuJWCZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The paper addresses the problem of learning a teacher model which selects the training samples for the next mini-batch used by the student model. The proposed solution is to learn the teacher model using policy gradient. It is an interesting training setting, and the evaluation demonstrates that the method outperforms the baseline. However, it remains unclear how the method would scale to larger datasets, e.g. ImageNet. I would strongly encourage the authors to extend their evaluation to larger datasets and state-of-the-art models, as well as include better baselines, e.g. from Graves et al.",ICLR2018, +1lD8hlE93H,1642700000000.0,1642700000000.0,1,gPvB4pdu_Z,gPvB4pdu_Z,Paper Decision,Accept (Spotlight),I recommend this paper to be accepted. All reviewers are in agreement that this paper is above the bar.,ICLR2022, +Y25bjTaexVW5,1642700000000.0,1642700000000.0,1,_Ko4kT3ckWy,_Ko4kT3ckWy,Paper Decision,Reject,"This paper proposes a scalable learning method for GNN that gradually increases the training data size by randomly adding vertexes generated from a graphon. Theoretical justification to the proposed method is given that bounds the difference between the gradients on the sampled network and on the graphon. A numerical experiment was conducted to support the validity of the proposed method. + +Unfortunately, this paper contains several issues as listed below: +1. Novelty: There are already some existing work to address the issue of scalability of training a graph neural network model. However, the relation to them is not appropriately exposed. +2. Experiments: Although the main purpose of this paper is to resolve the scalability of GNN, the numerical experiments are conducted only on a small scale dataset ($\sim$1k). +3. Practicality: There are several hyperparameters. However, the theory and methodology do not give a practical guideline to determine them (e.g., how many vertexes should be added at each epoch). +4. Correctness: The proof of the theorems would contain some flaws, which should be resolved by the authors. However, there was no response from the authors. + +For these reasons, this paper would not be appropriate to appear in ICLR.",ICLR2022, +8mLSR1giU,1576800000000.0,1576800000000.0,1,HyxjOyrKvr,HyxjOyrKvr,Paper Decision,Accept (Poster),"The paper proposed a novel way to compress arbitrary networks by learning epitiomes and corresponding transformations of them to reconstruct the original weight tensors. The idea is very interesting and the paper presented good experimental validations of the proposed method on state-of-the-art models and showed good MAdd reduction. The authors also put a lot of efforts addressing the concerns of all the reviewers by improving the presentation of the paper, which although can still be further improved, and adding more explanations and validations on the proposed method. Although there's still concerns on whether the reduction of MAdd really transforms to computation reduction, all the reviewers agreed the paper is interesting and useful and further development of such work would be useful too. ",ICLR2020, +ByO8B1pBz,1517250000000.0,1517260000000.0,573,S1q_Cz-Cb,S1q_Cz-Cb,ICLR 2018 Conference Acceptance Decision,Reject,"While the reviewers considered the basic idea of adding supervision intermediate to differentiable programming style architectures to be interesting and worthy of effort, they were unsure if +1: the proposed abstractions for discussing ntm and nram are well motivated/more generally applicable +2: the methods used in this work to give intermediate supervision are more generally applicable + ",ICLR2018, +Ht1ipxli5mw,1642700000000.0,1642700000000.0,1,U_Jog0t3fAu,U_Jog0t3fAu,Paper Decision,Reject,"The paper studies federated learning with various sketching techniques used for communication. + +The main concerns from the reviewers are: + +1) the presentation can be improved; + +2) the novelty and related work section is not satisfactory since there have been papers on sketched federated learning; + +3) there is no numerical study to validate the efficacy of the method. + +I suggest the authors to take into consideration the feedback from the reviewers in the revision of the paper.",ICLR2022, +PHQU4kh_jG,1610040000000.0,1610470000000.0,1,JiNvAGORcMW,JiNvAGORcMW,Final Decision,Reject,"I thank the authors for their submission and participation in the author response period. The updated experiments are appreciated. However, after discussion all reviewers unanimously agree that the paper is not ready for publication and encourage resubmission to another venue. In particular, R2 and R3 have raised concerns regarding additional widely available baselines that need to be evaluated against and that the rebuttal has not addressed. I agree with this assessment, and thus recommend rejection.",ICLR2021, +rNMWQeTngF_,1610040000000.0,1610470000000.0,1,oGzm2X0aek,oGzm2X0aek,Final Decision,Reject,"This paper is rejected. + +The authors focus on offline RL for the sequential recommender system problem and propose an approach that: +* builds multiple models based on splits of the offline data using domain knowledge +* splits the policy into a context extraction system and context conditioned policy (similar to Rakelley et al.) + +While R1 and R4 appreciate the changes, they both feel that the paper is not ready for publication at this time. R1's main concerns is the generalizability of the proposed solution because it relies heavily on manually defined rules and domain expert knowledge. R4 was concerned with the definition and precision of robustness. How is robustness quantified? Finally, many of the baselines were not built for partially observed environments, so it is unsurprising that they perform poorly. Baselines with recurrent policies would strengthen the paper. +",ICLR2021, +_21m7u_xT3m,1642700000000.0,1642700000000.0,1,Nh7CtbyoqV5,Nh7CtbyoqV5,Paper Decision,Accept (Poster),"The authors propose a normalization method for cross-lingual text representations. The goal is to normalize the monolingual embeddings based on spectral normalization. The study shows that produced text representations keep their meaning and improve performance on downstream tasks. + +There is a disagreement among the reviewers. The main concern is whether the main contribution is an empirical study or a novel idea. I think the authors well-addressed the concerns of most reviewers. The idea and empirical study are enough for publication for ICLR-2022.",ICLR2022, +rkz9V16rG,1517250000000.0,1517260000000.0,408,BkA7gfZAb,BkA7gfZAb,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"All the reviewers noted that the dual formulation, as presented, only applies to the logistic family of classifiers. The kernelization is of course something that *can* be done, as argued by the authors, but is not in fact approached in the submission, only in the rebuttal. The toy-ish nature of the problems tackled in the submission limits the value of the presentation. + +If the authors incorporate their domain adaptation results (SVHN-->MNIST and others) using the kernelized approach and do the stability analysis for those cases, and obtain reasonable results on domain adaptation benchmarks (70% on SVHN-->MNIST is for instance on the low side compared to the pixel-transfer-based GAN approaches out there!) then I think it'd be a great paper. + +As such, I can only recommend it as an invitation to the workshop track, as the dual formulation is interesting and potentially useful.",ICLR2018, +HkL92MUOl,1486400000000.0,1486400000000.0,1,B1E7Pwqgl,B1E7Pwqgl,ICLR committee final decision,Reject,"While the paper may have an interesting theoretical contribution, it seems to greatly suffer from problems in the presentation: the basic motivation is of the system is hardly mentioned in the introduction, and the conclusion does not explain much either. I think the paper should be rewritten, and, as some of the reviewers point out, the experiments strengthened before it can be accepted for publication. + + (I appreciate the last-minute revisions by the authors, but I think it really came too late, 14th/21st/23rd Jan, to be taken seriously into account in the review process.)",ICLR2017, +SkeBYW-exV,1544720000000.0,1545350000000.0,1,ByeWdiR5Ym,ByeWdiR5Ym,evaluation and results not convincing,Reject,"The paper presents a modification of the convolution layer, where the convolution weights are generated by another convolution operation. While this is an interesting idea, all reviewers felt that the evaluation and results are not particularly convincing, and the paper is not ready for acceptance.",ICLR2019,5: The area chair is absolutely certain +rJxg2eSZxN,1544800000000.0,1545350000000.0,1,BklUAoAcY7,BklUAoAcY7,Not quite enough for acceptance,Reject,The overall view of the reviewers is that the paper is not quite good enough as it stands. The reviewers also appreciates the contributions so taking the comments into account and resubmit elsewhere is encouraged. ,ICLR2019,5: The area chair is absolutely certain +MbIPg3Nmji,1576800000000.0,1576800000000.0,1,BkeaxAEKvB,BkeaxAEKvB,Paper Decision,Reject,"While there was some support for the ideas presented, the majority of reviewers felt that this submission is not ready for publication at ICLR in its present form. + +Concerns were raised as to the generality of the approach, thoroughness of experiments, and clarity of the exposition.",ICLR2020, +xlpLoXd8ubH,1610040000000.0,1610470000000.0,1,i7aDkDEXJQU,i7aDkDEXJQU,Final Decision,Reject,"This paper attempts to explain why popular UNMT training objective components (back-translation and denoising autoencoding) are effective. The paper provides experimental analysis and draws connections with ELBO and mutual information. Reviewers generally agree that the paper's goal is worthy: trying to form a better theoretical understanding of successful approaches to UNMT. +However, most reviewers raised serious concerns about the current draft and suggested another round of revision and resubmission. Specifically, reviewers were concerned that some of the analogies used to explain UNMT are underdeveloped. Further, reviewers pointed to issues with clarity that made some of the arguments hard to follow. Finally, one reviewer argued that many of the results are expected and agree with common understanding of UNMT in the literature, thus undermining their value to some extent. I tend to agree with reviewers that this paper is not ready for publication in its current form. Thus I recommend rejection. ",ICLR2021, +GfMkFffnqL_,1642700000000.0,1642700000000.0,1,sNuFKTMktcY,sNuFKTMktcY,Paper Decision,Accept (Poster),"The paper proposes a new goal-conditioned hierarchical RL method aimed at improving performance on sparse reward tasks. Compared to prior work the novelty lies in a new way of improving the stability of goal representation learning and in an improved exploration strategy for proposing goals while taking reachability into account. + +The paper does a good job of motivating the main ideas around stability and combining novelty with reachability. Reviewers found the quantitative evaluation and the choice of baselines to be good with the exception of not including Feudal Networks which the authors explained was due to poor performance on the hard exploration tasks (something that has been observed in prior work). Reviewers also found the thoroughness of the ablations and insightful visualizations to be highlights. Overall, reviewers were unanimous in recommending acceptance, which I support.",ICLR2022, +A_3Ru-_Cw,1576800000000.0,1576800000000.0,1,SJlyta4YPS,SJlyta4YPS,Paper Decision,Reject,"The authors address the problem of CTR prediction by using a Transformer based encoder to capture interactions between features. They suggest simple modifications to the basic Multiple Head Self Attention (MSHA) mechanism and show that they get the best performance on two publicly available datasets. + +While the reviewers agreed that this work is of practical importance, they had a few objections which I have summarised below: +1) Lack of novelty: The reviewers felt that the adoption of MSHA for the CTR task was straightforward. The suggested modifications in the form of Bilinear similarity and max-pooling were viewed as incremental contributions. +2) Lack of comparison with existing work: The reviewers suggested some additional baselines (Deep and Cross) which need to be added (the authors have responded that they will do so later). +3) Need to strengthen experiments: The reviewers appreciated the ablation studies done by the authors but requested for more studies to convincingly demonstrate the effect of some components. One reviewer also pointed that the authors should control form model complexity to ensure an apples-to-apples comparison (I agree that many papers in the past have not done this but going froward I have a hunch that many reviewers will start asking for this) . + +IMO, the above comments are important and the authors should try to address them in subsequent submissions. + +Based on the reviewer comments and lack of any response from the authors, I recommend that the paper in it current form cannot be accepted. ",ICLR2020, +V6thUKigye,1576800000000.0,1576800000000.0,1,Hyez1CVYvr,Hyez1CVYvr,Paper Decision,Reject,"The paper proposes a method for out-of-distribution (OOD) detection for neural network classifiers. + +The reviewers raised several concerns about novelty, choice of baselines and the experimental evaluation. While the author rebuttal addressed some of these concerns, I think the paper is still not ready for acceptance as is. + +I encourage the authors to revise the paper and resubmit to a different venue.",ICLR2020, +H1vHQJTHG,1517250000000.0,1517260000000.0,131,HkwVAXyCW,HkwVAXyCW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper explores what might be characterized as an adaptive form of ZoneOut. +With the improvements and clarifications added to the paper during the rebuttal the paper could be accepted. +",ICLR2018, +6QcgKYSKCrl,1642700000000.0,1642700000000.0,1,DhP9L8vIyLc,DhP9L8vIyLc,Paper Decision,Accept (Poster),"All reviewers were clear in their opinion that the paper deserves to be accepted. One reviewer also indicated a wish to increase the score from 6 to 7 but was not able to do that, so it isn't reflected in the final score. The reviewers appreciated the methodological contribution made by the paper.",ICLR2022, +mNXP2e_QOS,1610040000000.0,1610470000000.0,1,fV2ScEA03Hg,fV2ScEA03Hg,Final Decision,Reject,"The paper addresses learning with noisy labels, by detecting and correcting samples with noisy labels. Reviewers had concerns about the empirical evaluations, specifically about comparing to additional methods, about hyperparameter tuning, and about the improvements being vey small. There was also a concern that the analysis of the objective does not take into account explicitly the L2 regularization induced by weight decay. Based on these concerns the paper is not ready yet for publication. + + +",ICLR2021, +aR1Fyn6GoLl,1610040000000.0,1610470000000.0,1,wE-3ly4eT5G,wE-3ly4eT5G,Final Decision,Reject,"The paper introduces variants of RL algorithms that can consume factored state representations. Under the assumption that actions only affect a few factors, these factored RL algorithms can learn more efficiently than their vanilla counterparts. Learning a factored dynamics model (to be used in a model-based algorithm) or representing factorized action-selection policies (to be optimized by a model-free RL algorithm) make intuitive sense in the problem settings that the paper considers. However, the paper should clarify the implicit assumptions being made about how the reward decomposes across factors. For instance, the factored DQN approach seems to require a linear reward decomposition across the factors. +The factored DQN approach is also reminiscent of the Hydra algorithm on MsPacMan (https://papers.nips.cc/paper/2017/file/1264a061d82a2edae1574b07249800d6-Paper.pdf Section 4.2) which assigns an RL agent to each factor (""ghost"" in MsPacMan) to learn a factor-specific Q-function. The linear aggregator that they use is identical to the factored DQN in this paper. + +The reviewers all rate the paper as borderline. All reviewers suggest that being able to learn the factor graph (or at least parts of it) will greatly widen the scope of applications where the approach can be fruitfully applied -- the paper acknowledges this as a compelling line of future work. The biggest weakness is originality -- the core message of the paper is just that, where factored representations of state/actions exist RL algorithms must use it. This is not a surprising or novel message. The paper advocates for incorporating the factorization information in the most straightforward way (state-masking, followed by action concatenation). Simple-in-retrospect is usually an excellent feature of an algorithm, not a bug; however, the proposal is literally the first idea a reader will likely think of. It might help to explore other ways of incorporating factorization information (e.g., rather than parameter sharing, have a separate network for each factor; rather than masking, have different width input layers to consume different number of parents in the DAG; etc.) and verifying that they are inferior to factored NN.",ICLR2021, +AaQ2bkl9IUB,1610040000000.0,1610470000000.0,1,OLOr1K5zbDu,OLOr1K5zbDu,Final Decision,Reject,"This paper introduces a methodology for jointly optimizing neural network architecture, quantization policy, and hardware architecture. There are two key ideas: +- Heterogeneous sampling strategy to tackle the dilemma between exploding memory cost and biased search. +- Integrates a differentiable hardware search engine to support co-search in a differentiable manner. + +The paper tackles an important research problem and experimental results are good. + +There are two related issues with this paper: +1. Comparison to one-shot NAS: one-shot NAS methods only need to train the super-net once and then can be applied to multiple use-cases, while the proposed methodology needs to be executed for each use-case. +2. It is not clear whether differentiable search is needed for this joint optimization problem modulo existing tools. + +Overall, my assessment is that the paper is somewhat borderline and with some more work will be ready for publication. ",ICLR2021, +eKIAp0vNQ9-,1610040000000.0,1610470000000.0,1,R2ZlTVPx0Gk,R2ZlTVPx0Gk,Final Decision,Accept (Poster),"This paper proposes a new method of learning ensembles of neural networks based on the Information Bottleneck theory, which increases the diversity in an ensemble by minimizing the mutual information between latent features of the different ensemble models. It shows promising results on classification, calibration and uncertainty estimation. The paper is well-written and the comments were properly addressed.",ICLR2021, +y6h5v6TsHq,1576800000000.0,1576800000000.0,1,Hke-WTVtwr,Hke-WTVtwr,Paper Decision,Accept (Spotlight),"This paper describes a new language model that captures both the position of words, and their order relationships. This redefines word embeddings (previously thought of as fixed and independent vectors) to be functions of position. This idea is implemented in several models (CNN, RNN and Transformer NNs) to show improvements on multiple tasks and datasets. + +One reviewer asked for additional experiments, which the authors provided, and which still supported their methodology. In the end, the reviewers agreed this paper should be accepted.",ICLR2020, +H1e8v-B-x4,1544800000000.0,1545350000000.0,1,H1l7bnR5Ym,H1l7bnR5Ym,Interesting new model with good performance ,Accept (Poster),"The paper proposes a new method that builds on the Bayesian modelling framework for GANs and is supported by a theoretical analysis and an empirical evaluation that shows very promising results. All reviewers agree, that the method is interesting and the results are convincing, but that the model does not really fit in the standard Bayesian setting due to a data dependency of the priors. I would therefore encourage the authors to reflect this by adapting the title and making the differences more clear in the camera ready version.",ICLR2019,4: The area chair is confident but not absolutely certain +MIj-EvXKF2,1576800000000.0,1576800000000.0,1,B1eCk1StPH,B1eCk1StPH,Paper Decision,Reject,"The authors introduce a notion of stability to pruning and argue through empirical evaluation that pruning leads to improved generalization when it introduces instability. The reviewers were largely unconvinced, though for very different reasons. The idea that ""Bayesian ideas"" explain what's going on seems obviously wrong to me. The third reviewer seems to think there's a tautology lurking here and that doesn't seem to be true to me either. It is disappointing that the reviewers did not re-engage with the authors after the authors produced extensive rebuttals. Unfortunately, this is a widespread pattern this year. + +Even though I'm inclined to ignore aspects of these reviews, I feel that there needs to be a broader empirical study to confirm these findings. In the next iteration of the paper, I believe it may also be important to relate these ideas to [1]. It would be interesting to compare also on the networks studied in [1], which are more diverse. + + +[1] The Lottery Ticket Hypothesis at Scale (Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, and Michael Carbin) https://arxiv.org/abs/1903.01611",ICLR2020, +Gszzy4DxyK_,1610040000000.0,1610470000000.0,1,gRr_gt5bker,gRr_gt5bker,Final Decision,Reject,"This paper applies an existing tool (copulas) to MARL to represent dependencies between variables. + +The reviewers appreciate the use of the copulas for this problem. The experimental section shows promising results on several problems. I appreciate that the authors have answered and addressed many points of concerns of the reviewers. The paper is well written + +The reviewers seem to see this paper as a first step only, showing promising results but of moderate significance in itself. In particular, reviewer 3 would like to see more justifications for the use of the copulas, and a more experimental settings would make a stronger paper. +",ICLR2021, +zaXWpiLtzkh,1642700000000.0,1642700000000.0,1,tJhIY38d2TS,tJhIY38d2TS,Paper Decision,Reject,"Reviewers raised various concerns and authors sent in no rebuttal. In view of the negative consensus, this paper then made a clear rejection case.",ICLR2022, +7NS65KG_OHo,1610040000000.0,1610470000000.0,1,M3NDrHEGyyO,M3NDrHEGyyO,Final Decision,Reject,"The paper is about a reinforcement learning algorithm that operates in a Constrained MDP and is provided with a baseline policy. +Although the reviewers acknowledge that the paper has some merits (well-written, clearly organized, significant empirical evaluation, reproducible experimental results), some concerns have been raised about the novelty of the proposed solution and of its theoretical analysis. The reviewers feel that the authors' responses have not properly addressed all their doubts. +The paper is borderline and I think that it is not ready for publication in the current form. +I encourage the authors to update their paper following the reviewers' suggestions and try to submit it in one of the forthcoming machine learning conferences. ",ICLR2021, +eNIoRUovnm,1576800000000.0,1576800000000.0,1,HJeO7RNKPr,HJeO7RNKPr,Paper Decision,Accept (Poster),"This work proposes a CNN architecture for joint depth and camera motion estimation from videos. The paper presents a differentiable formulation of the problem to allow its end-to-end learning, and the reviewers unanimously find the proposed approach reasonable and agree that this is a solid paper. Some of the reviewers find the method itself to be too mechanical, but they all agree that this is a well-engineered solution.",ICLR2020, +HyxoXmXFxV,1545320000000.0,1545350000000.0,1,r1GkMhAqYm,r1GkMhAqYm,metareview,Reject,"The reviewers raise a number of concerns including no methodological novelty, limited experimental evaluation, and relatively uninteresting application with very limited real-world application. This set of facts has been assessed differently by the three reviewers, and the scores range from probable rejection to probable acceptance. I believe that the work as is would not result in a wide interest by the ICLR attendees, mainly because of no methodological novelty and relatively simplistic application. The authors’ rebuttal failed to address these issues and I cannot recommend this work for presentation at ICLR.",ICLR2019,4: The area chair is confident but not absolutely certain +WknBMp7_lr,1576800000000.0,1576800000000.0,1,HyezBa4tPB,HyezBa4tPB,Paper Decision,Reject,"This paper proposes adding a Dirichlet distribution as a wrapper on top of a black box classifier in order to better capture uncertainty in the predictions. This paper received four reviews in total with scores (1,1,1,6). The reviewer who gave the weak accept found the paper well written, easy to follow and intuitive. The other reviewers, however, were primarily concerned about the empirical evaluation of the method. They found the baselines too weak and weren't convinced that the method would work well in practice. The reviewers also cited a lack of comparison to existing literature for their scores. One reviewer noted that while the method addresses aleatoric uncertainty, it doesn't provide any mechanism for epistemic uncertainty, which would be necessary for the applications motivating the work. + +The authors did not provide a response and thus there was no discussion. ",ICLR2020, +VPhuNpvbXj,1642700000000.0,1642700000000.0,1,_kJXRDyaU0X,_kJXRDyaU0X,Paper Decision,Reject,"This paper studies imitation learning from a causal inference perspective. The authors propose a method to remove the effects of confounders on expert action a using instrumental variable regression, which presumably leads to better estimation of P(a|s), and hence better imitation. The reviews were negative overall at the start. After the discussions, one reviewer stated that he would change his recommendation to accept, although his score is not changed on the review form. However, another reviewer is still not convinced that the causal formalism introduced in the paper improves over the existing RL literature.",ICLR2022, +rHaLOtbj72H,1642700000000.0,1642700000000.0,1,tJCwZBHm-jW,tJCwZBHm-jW,Paper Decision,Reject,"This paper proposes to transfer the image-pretrained model to a point cloud model by inflating 2D convolutional filters to 3D convolutional filters and finetuning the inflated image-pretrained model, so that 3D point cloud tasks can benefit from 2D image pretraining. Extensive experiments are conducted to validate the effectiveness of the proposed method. Even though the performance gain using the 2D pre-training is notable, the novelty of the paper is limited since inflating 2D model to 3D video action recognition has been studied, and theoretical understanding of the proposed model is lacking. During the rebuttal period, the authors addressed most of the reviewers’ concerns by conducting additional experiments. Even though the performance is compelling, all reviewers agree that the novelty for the paper is limited and the discussion on why this method work is not convincing. Meanwhile, one reviewer points out that some claims made by authors are not well supported. Besides, one reviewer points out that the paper might have a broader impact in a computer vision conference but only provide a limited contribution to the ICLR community. After an internal discussion with reviewers. the AC agrees with the reviewers on their judgments and recommends rejecting the paper because of the limited novelty of the paper.",ICLR2022, +HQYAZOJyQ5a,1610040000000.0,1610470000000.0,1,RmB-88r9dL,RmB-88r9dL,Final Decision,Accept (Oral),"The paper designs a new way (in some sense a new perspective) on how neural networks can be used to model intervention variables when the goal is to estimate ADRF. Basically, the idea is to emphasize the importance of the intervention variable by ensuring that it appears not just in every layer but also in every neural of a neural network. + +Reviewers mostly agree that this is a good paper with varying degrees, although there are some criticisms on e.g., assuming away the confounders. However, I believe the authors address the criticisms of R4 satisfactorily. + +Overall I find the idea new and interesting and the experimental results strong, hence I happily recommend accepting the paper. + +I do have a few quips myself and some comments that may help the authors to further improve the paper. + +1. Re: the design that models each parameter as a spline. +This is equivalent to introducing additional parameters (coefficients for spline basis) and adding a fixed linear layer (spline basis themselves) to every layer of the neural networks. t is taken as an input in all layers thus it makes sure that the model prioritizes on learning the impact of t. + +2. If you use a B-spline basis (that comes with kernels of bounded support), then the proposed method is very similar to stratifying the data according to different bins of t, and then fitting a separate model for each t. The only difference is that the different bins are now smooth kernels and they overlap somewhat. As a side note, the authors should clearly write out how they are choosing the knots to specify the basis functions. Otherwise the paper will not be reproducible. + +3. I am not sure how this method would compare to naive (non-deep) baselines. Maybe this was considered in a prior work? If not, then I tend to side with Reviewer 4 that the evaluations are mostly ablation studies and they are not really comparing to representative work in this domain. Given that there is a large body of work on this before deep learning takes over, it is important to somehow compare with the right baselines. + +",ICLR2021, +phI1I6Oav,1576800000000.0,1576800000000.0,1,B1xGxgSYvH,B1xGxgSYvH,Paper Decision,Reject,"This paper provides a new theoretical framework for domain adaptation by exploring the compression and adaptability. + +Reviewers and AC generally agree that this paper discusses about an important problem and provides new insight, but it is not a thorough theoretical work. The reviewers identified several key limitations of the theory such as unrealistic condition and approximation. Some important points still require more work to make the framework practical for algorithm design and computation. The presentation could also be improved. + +Hence I recommend rejection.",ICLR2020, +s6kX83JFzL,1576800000000.0,1576800000000.0,1,HyeuP2EtDB,HyeuP2EtDB,Paper Decision,Reject,"The paper proposes an algorithm for zero-shot generalization in RL via learning a scoring a function from. + +The reviewers had mixed feelings, and many were not from the area. A shared theme was doubts about the significance of the experimental setting, and also the generality of the approach. + +As this is my field, I read the paper, and recommend rejection at this time. The proposed method is quite laborious and requires quite a bit of assumptions on the environments to work, as well as fine tuning parameters for each considered task (number of regions, etc). I also agree that the evaluation is not convincing -- stronger baselines need to be considered and the experiments to better address the zero-shot transfer aspect that the paper is motivated by. I encourage the authors to take the review feedback into account and submit a future version to another venue.",ICLR2020, +uxae6IxaYO,1642700000000.0,1642700000000.0,1,5K7RRqZEjoS,5K7RRqZEjoS,Paper Decision,Accept (Poster),"The paper points out how set equivariant functions limit the types of functions that can be represented on multisets. They develop an new notion of multiset equivariance to address this limitation. The paper improves an existing multi-set equivariant Deep Set Prediction Network through implicit differentiation, which is an area of rising interest. The reviewers and I note that the paper is well written.",ICLR2022, +S1xrFvOTk4,1544550000000.0,1545350000000.0,1,HJej3s09Km,HJej3s09Km,Too narrow,Reject,"I appreciate that the authors are refuting a technical claim in Poole et al., however the paper has garnered zero enthusiasm the way it is written. I suggest to the authors that they rewrite the paper as a refutation of Poole et al., and name it as such.",ICLR2019,4: The area chair is confident but not absolutely certain +z8RctS87so,1576800000000.0,1576800000000.0,1,SJe-HkBKDS,SJe-HkBKDS,Paper Decision,Reject,"The paper proposes a text normalisation model for Amharic text. The model uses word classification, followed by a character-based GRU attentive encoder-decoder model. The paper is very short and does not present reproducible experiments. It also does not conform to the style guidelines of the conference. There has been no discussion of this paper beyond the initial reviews, all of which reject it with a score of 1. It is not ready to publish and the authors should consider a more NLP focussed venue for future research of this kind. +",ICLR2020, +_oFmbm-ZIln,1610040000000.0,1610470000000.0,1,C_p3TDhOXW_,C_p3TDhOXW_,Final Decision,Reject,"The meta-reviewer agrees with the reviewers that this is a marginal case. Conditioned on the quality of content and comparisons to other works: +Constrained Reinforcement Learning With Learned Constraints (https://openreview.net/forum?id=akgiLNAkC7P) +Parrot: Data-Driven Behavioral Priors for Reinforcement Learning (https://openreview.net/forum?id=Ysuv-WOFeKR) +PERIL: Probabilistic Embeddings for hybrid Meta-Reinforcement and Imitation Learning (https://openreview.net/forum?id=BIIwfP55pp) + +We believe that the paper is not ready for publication yet. We would strongly encourage the authors to use the reviewers' feedback to improve the paper and resubmit to one of the upcoming conferences. +",ICLR2021, +QgT6ATXmNA9,1610040000000.0,1610470000000.0,1,EXkD6ZjvJQQ,EXkD6ZjvJQQ,Final Decision,Reject,"This paper derives CLT type results for the minimum $\ell_2$ norm least squares estimator allowing both n and p to grow. + +Pros: +As one reviewer puts it: Asymptotic confidence intervals for different prediction risks are derived. These results seem new. + +Cons: +It's not clear what has been gained by having these results, other than having them. + +Reasoning: +Staring at Figure 1 for a while, what jumps out is how little the CI matters. Unless $p\approx n$, the band is essentially uniform around the first-order result derived elsewhere. The claim the authors seem to make at the bottom of page 1 is that, ""supposing I have 90 observations and 100 predictors, it may not be so bad to collect 8 more observations. Even though on average I'm worse off, perhaps not for my data?"" The flip side of this argument is ""why am I using min-norm OLS""? I think that the authors are making the wrong argument in this paper. The point of analyzing this problem is not to understand what happens when $p\approx n$ but to understand why $p \gg n$ is good, and thereby try to justify parameter explosion in deep learning. I should be looking at the left side of Figure 1, not the center. Even the language ""more data hurt"" is the wrong statement. The point isn't to show that collecting data is bad but to justify adding parameters. We should say ""more parameters help"". If the authors' proof technique added to the understanding in that case, then this paper would be more convincing. As is, I find it hard to overrule with the reviewers who appear to be mainly on the fence with little enthusiasm. +",ICLR2021, +HkegL_4egV,1544730000000.0,1545350000000.0,1,rkgqCiRqKQ,rkgqCiRqKQ,Interesting idea but paper needs more work,Reject,"The authors study an inverse reinforcement learning problem where the goal is to infer an underlying reward function from demonstration with bias. To achieve this, the authors learn the planners and the reward functions from demonstrations. As this is in general impossible, the authors consider two special cases in which either the reward function is observed on a subset of tasks or in which the observations are assumed to be close to optimal. They propose algorithms for both cases and evaluate these in basic experiments. The problem considered is important and challenging. One issue is that in order to make progress the authors need to make strong and restrictive assumptions (e.g., assumption 3, the well-suited inductive bias). It is not clear if the assumptions made are reasonable. Experimentally, it would be important to see how results change if the model for the planner changes and to evaluate what the inferred biases would be. Overall, there is consensus among the reviewers that the paper is interesting but not ready for publication. +",ICLR2019,4: The area chair is confident but not absolutely certain +QmzjHxfEMbG,1610040000000.0,1610470000000.0,1,LxhlyKH6VP,LxhlyKH6VP,Final Decision,Reject,"This paper addresses an interesting learning problem of a generative neural network on a simulated ensemble of protein structures obtained using molecular simulation to characterize the distinct structural fluctuations of a protein bound to various drug molecules. The main technical contribution is a geometric autoencoder architecture with separate latent spaces for representing intrinsic and extrinsic geometry. However, the reviewers think the benefit for modeling intrinsic and extrinsic geometry is not clearly explained and the experiments are not convincing at the moment. The paper can be potentially improved by addressing these two main issues. ",ICLR2021, +BkuaVyarz,1517250000000.0,1517260000000.0,453,B1ZZTfZAW,B1ZZTfZAW,ICLR 2018 Conference Acceptance Decision,Reject,"Overall I agree with the assessment of R1 that the paper touches on many interesting issues (deep learning for time series, privacy-respecting ML, simulated-to-real-world adaptation) but does not make a strong contribution to any of these. Especially with respect to the privacy-respecting aspect, there needs to be more analysis showing that the generative procedure does not leak private information (noting R1 and R3’s comments). I appreciate the authors clarifying the focus of the work, and revising the manuscript to respond to the reviews. Overall it’s a good paper on an important topic but I think there are too many issues outstanding for accept at this point.",ICLR2018, +sHewdUot0AjP,1642700000000.0,1642700000000.0,1,zXM0b4hi5_B,zXM0b4hi5_B,Paper Decision,Accept (Spotlight),"All reviewers suggest acceptance of this paper, which reports the relationship between perceptual distances, data distributions, and contemporary unsupervised machine learning methods. I believe this paper will be of broad interest to different communities at ICLR.",ICLR2022, +2TUEz6rFkb,1576800000000.0,1576800000000.0,1,B1eWbxStPH,B1eWbxStPH,Paper Decision,Accept (Spotlight),"This paper studies Graph Neural Networks for quantum chemistry by incorporating a number of physics-informed innovations into the architecture. In particular, it considers directional edge information while preserving equivariance. + +Reviewers were in agreement that this is an excellent paper with strong empirical results, great empirical evaluation and clear exposition. Despite some concerns about the limited novelty in terms of GNN methodology ( for instance, directional message passing has appeared in previous GNN papers, see e.g. https://openreview.net/forum?id=H1g0Z3A9Fm , in a different context). Ultimately, the AC believes this is a strong, high quality work that will be of broad interest, and thus recommends acceptance. ",ICLR2020, +8Hgnvjzmk2F,1610040000000.0,1610470000000.0,1,ARQAdp7F8OQ,ARQAdp7F8OQ,Final Decision,Reject,"This paper conducts a comparison between a small set of models (4 in total) for unsupervised learning. Specifically, the authors focus on comparing Bayesian Confidence Propagating Neural Networks (BCPNN), Restricted Boltzmann Machines (RBM), a recent model by Krotov & Hopfield (2019) (KH), and auto-encoders (AE). The authors compare trained weight distributions, receptive field structures, and linear classification on MNIST using the learned representations. The first two comparisons are essentially qualitative comparisons, while on classification accuracy, the authors report similar accuracy levels across the models. + +This paper received mixed reviews. Reviewers 4 and 5 felt it did not contribute enough for acceptance, while Reviewers 2 & 3 were more positive. However, as noted by a few of the reviewers, this paper does not appear to achieve much, and provides very limited analysis and experiments on the models. It isn't introducing any new models, nor does it make any clear distinctions between the models examined that would help the field to decide which directions to pursue. The experiments add little insight into the differences between the models that could be used to inform new work. Thus, the contribution provided here is very limited. + +Moreover, the motivations in this paper are confused. In general, it is important for researchers at the intersection of neuroscience and machine learning to decide what their goal is when building and or comparing models. Specifically, is the goal: (1) finding a model that may potentially explain how the brain works, or (2) finding better machine learning tools? + +If the goal is (1), the performance on benchmarks is less important. However, clear links to experimental data, such that experimental predictions may be possible, are very important. That's not to say that a model must be perfectly biologically realistic to be worthwhile, but it must have sufficient grounding in biology to be informative for neuroscience. However, in this manuscript, as was noted by Reviewer 4, the links to biology are tenuous. The principal claim for biological relevance for all the models considered seems to be that the update rules are local. But, this is a loose connection at best. There are many more models of unsupervised learning with far more physiological relevance that are not considered here (see e.g. Olshausen & Field, 1996, Nature; Zylberberg et al. 2011, PLoS Computational Biology; George et al., 2020, bioRxiv: https://doi.org/10.1101/2020.09.09.290601). It is true that some of these models use non-local information, but given the emerging evidence that locality is not actually even a strict property in real synaptic plasticity (see e.g. Gerstner et al., 2018, Frontiers in Neural Circuits; Williams & Holtmaat, 2018, Neuron; Banerjee et al., 2020, Nature), an obsession with rules that only use pre- and post-synaptic activity is not even clearly a desiderata for neuroscience. + +If the goal is (2), then performance on benchmarks, and some comparison to the SotA, is absolutely critical. Yet, this paper does none of this. Indeed, the performance achieved with the four models considered here is, as noted by Reviewer 4, very poor. In contrast, there have been numerous advances in unsupervised (or ""self-supervised"") learning in ML in recent years (e.g. Contrastive Predictive Coding, SimCLR, Bootstrap Your Own Latent, etc.), all of which achieve far better results than the four models considered here. Thus, the models being compared here cannot inform machine learning, as they do not appear to provide any technical advances. Of course, some models may combine goals (1) & (2), e.g. seeking increased physiological relevance while also achieving decent benchmark performance (see e.g. Sacramento et al., 2018, NeurIPS), but that is not really the situation faced here, as the models considered have little biological plausibility (as noted above) and achieve poor performance at the same time. + +Altogether, given these considerations, although this paper received mixed reviews, it is clearly not appropriate for acceptance at ICLR in the Area Chair's opinion.",ICLR2021, +oP8RDa9IWX,1610040000000.0,1610470000000.0,1,V8YXffoDUSa,V8YXffoDUSa,Final Decision,Reject,"This work provides evidence against the hypothesis that ResNets implement iterative inference, or that iterative convergent computation is a good inductive bias to have in these models. The reviewers indicate that they think this hypothesis is interesting and relevant to the ICLR community, but they do not find the current work sufficiently convincing. Both theoretically and experimentally the paper does not fully demonstrate the claim that iterative inference is not useful in ResNets, and the reviewers are unanimous in their recommendation to reject the paper until the evidence for this claim is strengthened. +",ICLR2021, +BycYLJaSf,1517250000000.0,1517260000000.0,832,ryb83alCZ,ryb83alCZ,ICLR 2018 Conference Acceptance Decision,Reject,"The authors propose a hierarchical VAE model with a discrete latent variable in the top-most layer for unsupervised learning of discriminative representations. While the reported results on the two flow cytometry datasets are encouraging, they are insufficient to draw strong conclusions about the general effectiveness of the proposed architecture. Also, as two of the reviewers stated the proposed model is very similar to several VAE models in the literature. This paper seems better suited for a more applied venue than ICLR.",ICLR2018, +Bylf8M_i1V,1544420000000.0,1545350000000.0,1,S1g_EsActm,S1g_EsActm,metareview: no rebuttal,Reject,All reviewers agree that the paper should be rejected and there is no rebuttal.,ICLR2019,5: The area chair is absolutely certain +rkeMK1HWg4,1544800000000.0,1545350000000.0,1,B1MbDj0ctQ,B1MbDj0ctQ,Not quite not enough for acceptance,Reject,The overall view of the reviewers is that the paper is not quite good enough as it stands. The reviewers also appreciates the contributions so taking the comments into account and resubmit elsewhere is encouraged. ,ICLR2019,5: The area chair is absolutely certain +ZjiBIEPyMOn,1642700000000.0,1642700000000.0,1,AjGC97Aofee,AjGC97Aofee,Paper Decision,Accept (Poster),This paper receives positive reviews. The authors provide additional results and justifications during the rebuttal phase. All reviewers find this paper interesting and the contributions are sufficient for this conference. The area chair agrees with the reviewers and recommends it be accepted for presentation.,ICLR2022, +n1yYBtQ9V4B,1610040000000.0,1610470000000.0,1,i3Ui1Csrqpm,i3Ui1Csrqpm,Final Decision,Reject,"This paper introduces a set of techniques that can be used to obtain smaller models on downstream tasks, when fine-tuning large pre-trained models such as BERT. Some reviewers have noted the limited technical novelty of the paper, which can be seen more as a combination of existing methods. This should not be a reason for rejection alone, but unfortunately, the results in the experimental section are also a bit weak (eg. see [1-4]), there are not very insightful analysis and it is hard to compare to existing work. For these reasons, I believe that the paper should be rejected. + + +[1] DynaBERT: Dynamic BERT with Adaptive Width and Depth + +[2] Training with quantization noise for extreme model compression + +[3] MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices + +[4] SqueezeBERT: What can computer vision teach NLP about efficient neural networks?",ICLR2021, +rJAJQ1prf,1517250000000.0,1517260000000.0,54,r1gs9JgRZ,r1gs9JgRZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"meta score: 8 + +The paper explores mixing 16- and 32-bit floating point arithmetic for NN training with CNN and LSTM experiments on a variety of tasks + +Pros: + - addresses an important practical problem + - very wide range of experimentation, reported in depth + +Cons: + - one might say the novelty was minor, but the novelty comes from the extensive analysis and experiments",ICLR2018, +DDeV81eZWv,1642700000000.0,1642700000000.0,1,wRODLDHaAiW,wRODLDHaAiW,Paper Decision,Accept (Oral),"The paper introduces a novel control-based variational inference approach that learns latent dynamics in an *input-driven* state-space model. An optimal control solution (iLQR) is implicitly used as the recognition model which is fast and compact. Reviewers unanimously agree on the high quality writing and high significance of the work. This paper advances the horizon of nonlinear dynamical system models with unobserved input, an impactful contribution to the neuroscience and time series communities.",ICLR2022, +Iiol0ipfZZh,1610040000000.0,1610470000000.0,1,Rhsu5qD36cL,Rhsu5qD36cL,Final Decision,Accept (Spotlight),"This paper presents a density ratio estimation approach to make the early decision for sequential data. The main contribution of this paper is the mathematical soundness of the proposed algorithm and all reviewers are unanimously positive about this paper with pretty good scores (7, 8, 6, 9, 7). However, despite the good scores, the verbal comments by the reviewers are not very strong except for one reviewer (R2); the reviewer with the highest score (9) did not provide detailed information about his/her rating. Also, the evaluation of this work is relatively weak because synthetic or simple datasets were employed for the experiment and the baseline methods are too straightforward. Also, it is not clear how the proposed algorithm can handle the data with sparse observations (data with idle times in the middle). Moreover, it does not provide rigorous stopping criteria although the authors proposed a simple method to determine the threshold, which is contradictory to the main objective of the proposed algorithm---making early predictions on sequential data---because the method requires ""plotting the speed-accuracy tradeoff curve on the test dataset."" This response implies that it at least requires a withheld dataset. Although this issue can be regarded as a separate problem, the paper could have provided an ablation study with respect to the criteria. + +Considering all these facts--high scores but relatively low supports and confidences, and practical limitations, I would recommend accepting this paper as a spotlight presentation. +",ICLR2021, +lqKzFYDjK,1576800000000.0,1576800000000.0,1,rylUOn4Yvr,rylUOn4Yvr,Paper Decision,Reject,"The paper proposes a gradient rescaling method to make deep neural network training more robust to label noise. The intuition of focusing more on easier examples is not particularly new, but empirical results are promising. On the weak side, no theoretical justification is provided, and the method introduces extra hyperparameters that need to be tuned. Finally, more discussions on recent SOTA methods (e.g., Lee et al. 2019) as well as further comprehensive evaluations on various cases, such as asymmetric label noise, semantic label noise, and open-set label noise, would be needed to justify and demonstrate the effectiveness of the proposed method. ",ICLR2020, +rkGjVJpBM,1517250000000.0,1517330000000.0,421,HJjvxl-Cb,HJjvxl-Cb,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"The reviewers agree that the results are promising and there are some interesting and novel aspect to the formulation. However, two of the reviews have raised concerns regarding the exposition and the discussion of previous work. The paper benefits from a detailed description of soft Q-learning, PCL, and off-policy actor-critic algorithms, and how SAC is different from those. Instead of differentiating against previous work by saying soft Q-learning and PCL are not actor-critic algorithms, discuss the similarities and differences and present empirical evaluation.",ICLR2018, +kBOLbku1PW1,1642700000000.0,1642700000000.0,1,Tu6SpFYWTA,Tu6SpFYWTA,Paper Decision,Reject,"The paper presents a new approach for distinguishing synonyms and antonyms via an extension of a parasiamese neural network, called ""the repelling parasiamese network"". The strengths of the paper, as identified by reviewers, are a novel architecture for antonymy detection, a new dataset, and solid empirical results. However, there are major drawbacks identified by reviewers w5dj and hoTU. Specifically, there are clarity issues in writing, lack of a proper justification to the proposed architecture, insufficient details about the quality of datasets, insufficient contextualization in prior work. The scores are borderline, but unfortunately, the authors did not use the rebuttal opportunity to sufficiently address these questions/concerns raised by the reviewers. I thus recommend to reject the paper.",ICLR2022, +Bk50I1TrM,1517250000000.0,1517260000000.0,900,SJtChcgAW,SJtChcgAW,ICLR 2018 Conference Acceptance Decision,Reject,"Dear authors, + +While the reviewers appreciated the idea, the significant loss of accuracy was a concern. Even though you made significant changes to the submission, it is unfortunately unrealistic to ask the reviewers to do another review of a heavily modified version in such a short amount of time. + +Thus, I cannot accept this paper for publication but I encourage you to address the reviewers' concerns and resubmit at a later conference.",ICLR2018, +CFvGcuTA93K,1610040000000.0,1610470000000.0,1,lfJpQn3xPV-,lfJpQn3xPV-,Final Decision,Reject,"This is an empirical paper that proposed a few different settings for applying GNNs on temporal data, including what context window to use, code-start vs warm-start, incremental training vs static. This paper also proposed and released a few more temporal graph datasets, which could be useful. + +The consensus assessment of the reviewers is that the contributions of this paper are incremental, and the results are expected and not exciting enough. + +I want to in particular point out that the results highlighted in the paper, that a GNN with window size 1 is sufficient to recover 90% of the performance of the model on full graph, is probably not the correct message to communicate. This either indicates that the data and task used in the benchmarks do not require sophisticated long-horizon temporal information (which makes the comparison between any methods uninteresting), or it indicates that the metric is not sensitive enough to sufficiently distinguish models trained with different settings. + +I would recommend rejection and encourage the authors to improve this paper.",ICLR2021, +cUmx9f0LETj,1610040000000.0,1610470000000.0,1,7YctWnyhjpL,7YctWnyhjpL,Final Decision,Reject,"The paper is very interesting and novel, and all reviewers are of the same opinion. +The main concern, however, is on the experimental section that is limited to image classification benchmarks and that some critical comparisons are missing (e.g. clarify factors that play key role in improvement, more computation and therefore more free parameters, how about non discriminative tasks, etc). +The heterogeneity question is in my opinion only partially answered by the authors but I also feel proper handling of this matter would require a proper multi-task setup and different target for the work. +I also personally find applicability of the approach quite limited, I encourage the authors to further improve their work as I feel that with a proper revision would make a nice contribution for the community.",ICLR2021, +B1QWLJ6SM,1517250000000.0,1517260000000.0,720,rkaqxm-0b,rkaqxm-0b,ICLR 2018 Conference Acceptance Decision,Reject,"This paper presents a neural compositional model for visual question answering. The overall idea may be exciting but the committee agrees with the evaluation of Reviewer 1: the experimental section is a bit thin and it only evaluates against an artificial dataset for visual QA that does not really need a knowledge base. It would have been better to evaluate on more traditional question answering settings where the answer can be retrieved from a knowledge base (WebQuestions, Free917, etc.), and then compare with state of the art on those.",ICLR2018, +gDoYF23FX,1576800000000.0,1576800000000.0,1,HyePberFvH,HyePberFvH,Paper Decision,Reject,"The paper studies the impact of rounding errors on deep neural networks. The +authors apply Monte Carlos arithmetics to standard DNN operations. +Their results indeed show catastrophic cancellation in DNNs and that the resulting loss of +significance in the number representation correlates with decrease in validation +performance, indicating that DNN performances are sensitive to rounding errors. + +Although recognizing that the paper addresses an important problem (quantized / +finite precision neural networks), the reviewers point out the contribution of +the paper is somewhat incremental. +During the rebuttal, the authors made an effort to improve the manuscript based +on reviewer suggestions, however review scores were not increased. + +The paper is slightly below acceptance threshold, based on reviews and my own +reading, as the method is mostly restricted to diagnostics and cannot yet be used +to help training low-precision neural networks.",ICLR2020, +79JlDwwqOlu,1642700000000.0,1642700000000.0,1,R2aCiGQ9Qc,R2aCiGQ9Qc,Paper Decision,Reject,"This paper focuses on investigating the relations between the heterophily and over-smoothness problem. However, the relationship is not clear. + +The over-smoothness problem considers the features and the adjacency matrix, while the heterophily incorporates the adjacency matrix and the labels. They have different views on the graph. It may not be treated as the same coin. Besides, the stacked aggregations lead to indistinguishable node representations and poor performance in the over-smoothing problem. The same phenomenon appears in the heterophily problem because the features in different classes are falsely mixed, leading to indistinguishable nodes [2]. They have the same phenomenon but different origins. It may be not a necessity to combine these two problems. + +Besides, MADGap[1] is proposed to evaluate the over-smoothness problem. It is unreliable to use the accuracy and the degree to measure this problem. Therefore, in section 3, the relations between node degrees and the homophily ratio cannot infer the relations between the heterophily and over-smoothness problem. + +As a result, the authors should carefully re-organize their paper and results. + +A suggestion is to pack the submission as a new method to learn from heterophily instead of trying to make such a close relationship with over-smoothing. + +- [1] Measuring and Relieving the Over-smoothing Problem for Graph Neural Networks from the Topological View. AAAI 2020 +- [2] Beyond homophily in graph neural networks: Current limitations and effective designs. NeurIPS 2020",ICLR2022, +erHYQ9Z-ACF,1610040000000.0,1610470000000.0,1,1TIrbngpW0x,1TIrbngpW0x,Final Decision,Reject,"The reviewers agree that the idea of introducing structural biases in the attention mechanism is interesting but the results and presentation right now is not convincing. Improvements are seen on only some datasets and the comparisons are not exact. +A reject.",ICLR2021, +CUeWaYfwR,1576800000000.0,1576800000000.0,1,rkx35lHKwB,rkx35lHKwB,Paper Decision,Reject,"This paper proposes a method for reinforcement learning with unseen actions. More precisely, the problem setting considers a partitioned action space. The actions available during training (known actions) are a subset of all the actions available during evaluation (known and unknown actions). The method can choose unknown actions during evaluation through an embedding space over the actions, which defines a distance between actions. The action embedding is trained by a hierarchical variational autoencoder. The proposed method and algorithmic variants are applied to several domains in the experiments section. + +The reviewers discussed both strengths and weaknesses of the paper. The strengths described by the reviewers include the use of the hierarchical VAE and the explanatory videos. The primary weakness is the absence of sufficient detail when describing the solution. The solution description is not sufficiently clear to understand the details of the regularization metrics. The details of regularization are essential when some actions are never seen in training. The reviewers also mentioned that the experiment analysis would benefit from more care. + +This paper is not ready for publication, as the solution methods and experiments are not presented with sufficient detail.",ICLR2020, +xsCjYMc1d7B,1642700000000.0,1642700000000.0,1,vkZtFD0zga8,vkZtFD0zga8,Paper Decision,Reject,"This paper exposes a method for video compression based on multi-head models. +The reviewers seem to agree that the results are interesting, and worth publishing. +On the other hand, there are many concerns raised on the quality of the writing, with grammatical mistakes and confusing parts. The motivation for the multi-head models, as well as its novelty, has been questioned in all reviews. Although the authors rebuttal has lead some reviewers to increase their score, it's still very concerning that authors needed to explain the main point of the paper to each reviewers. I think that the authors should polish this paper, taking into account the reviewers feedback, which would make a stronger paper, and submit it again in a future venue. I therefore recommend reject for this paper.",ICLR2022, +Skrf6fI_l,1486400000000.0,1486400000000.0,1,Sk8J83oee,Sk8J83oee,ICLR committee final decision,Reject,"This paper was reviewed by three experts. While they find interesting ideas in the manuscript, all three point to deficiencies (problems with the use of GAM metric, lack of convincing results) and unanimously recommend rejection. I do not see a reason to overturn their recommendation.",ICLR2017, +B155ozUde,1486400000000.0,1486400000000.0,1,S1HEBe_Jl,S1HEBe_Jl,ICLR committee final decision,Reject,Interesting paper but not over the accept bar.,ICLR2017, +dUGQeOPcLkq,1642700000000.0,1642700000000.0,1,UI4K-I2ypG,UI4K-I2ypG,Paper Decision,Reject,"This paper surveys a collection of existing works that the author frames as evidential deep learning. + +While the paper has been recognized as a nicely written survey, all reviewers have raised the major concern that the paper does not have a sufficient academic contribution compared to the surveyed papers. In particular, novelty appears to be limited as the paper does not offer novel views into the surveyed subfield. + +Given the strong consensus among reviewers, I recommend rejecting this paper.",ICLR2022, +r1ALXJaBG,1517250000000.0,1517260000000.0,150,Sy-dQG-Rb,Sy-dQG-Rb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"this submission proposes an efficient parametrization of a recurrent neural net by using two transition functions (one large and one small) to reduce the amount of computation (though, without actual improvement on GPU.) the reviewers found the submission very positive. + +please, do not forget to include all the result and discussion on the proposed approach's relationship to VCRNN which was presented at the same conference just a year ago.",ICLR2018, +EcIM5IbToRf,1642700000000.0,1642700000000.0,1,SuKTLF9stD,SuKTLF9stD,Paper Decision,Reject,"The work presents a theoretical analysis of data augmentation, presenting evidence that data augmentation enlarges the smaller the singular values of the network Jacobian. Based on this theory the authors present a method for selecting a subset of training data to use with augmentation that decently approximates performance of training w/ augmentation on the full dataset. Reviewers overall agreed that the theoretical analysis was interesting, and did not find any flaws (though it is worth noting that the theory is restricted to additive perturbations). However, multiple reviewers found the presented experiments unconvincing, and questioned the stated motivation. The AC agrees with reviewers that most simple augmentations are not prohibitive in training speed. Certainly training on less data with a fixed epoch budget would require less compute time, but this is has nothing to do with augmentation and instead is a result of fewer steps taken in training. In the rebuttal, the authors argued that training on Imagenet is prohibitive with a single GPU (taking 2 weeks to do full training). However, given the authors claim their method speeds up training by a factor of 6.3x, then reducing ImageNet training from 2 weeks to 2 days would be a more convincing application of their method and would strengthen the work.",ICLR2022, +rtHpcMbbKO,1610040000000.0,1610470000000.0,1,mnj-9lYJgu,mnj-9lYJgu,Final Decision,Reject,"The reviewers and AC appreciate the improvements made to the paper and thank the authors for engaging with the reviewer questions. +There are now quite a few neuro-symbolic approaches, and they are all rather similar. This places a larger burden on the authors to have a thorough and systematic experimental comparison and related work discussion. Reviewers also believe the clarity of the paper should still be improved. The revised paper already made good progress in addressing these concerns, yet the reviewers still believe the paper would strongly benefit from another round of revisions.",ICLR2021, +wyH122wlkj,1642700000000.0,1642700000000.0,1,gbe1zHyA73,gbe1zHyA73,Paper Decision,Accept (Poster),"The paper proposes a method for hybrid model-based/ML learning, where a model is decomposed into an interpretable parametric prior and a neural net residual. In this case, the prediction error minimization does no identify the parametric component, and an alternating optimization method is proposed to augments prediction error loss with component-specific losses. Empirical and theoretical results are obtained. Initial questions of several reviewers were addressed.",ICLR2022, +XNoWy6eSwcX,1642700000000.0,1642700000000.0,1,qhAeZjs7dCL,qhAeZjs7dCL,Paper Decision,Accept (Poster),"All of the reviewers appreciate the clarity of exposition and the importance of the problem studied. That said, I agree with Reviewer P9Ys that the results are somewhat underwhelming. The baselines appear weak and are likely not well tuned on the Stanford car dataset. Key question that remains unanswered in my opinion is whether this is the most effective approach to using synthetic data to improve classification accuracy (e.g., in contrast to [Ravuri & Vinyals, 2019](https://arxiv.org/abs/1905.10887) and follow-up work). Nevertheless, I believe the community will benefit from this paper's contributions and this line of work.",ICLR2022, +Syxg2653JV,1544490000000.0,1545350000000.0,1,rJEjjoR9K7,rJEjjoR9K7,Original work for domain generalization with strong experimental evidence,Accept (Oral),"The paper presents a new approach for domain generalization whereby the original supervised model is trained with an explicit objective to ignore so called superficial statistics present in the training set but which may not be present in future test sets. The paper proposes using a differentiable variant of gray-level co-occurrence matrix to capture the textural information and then experiments with two techniques for learning feature invariance. All reviewers agree the approach is novel, unique, and potentially high impact to the community. + +The main issues center around reproducibility as well as the intended scope of problems this approach addresses. The authors have offered to include further discussions in the final version to address these points. Doing so will strengthen the paper and aid the community in building upon this work. ",ICLR2019,4: The area chair is confident but not absolutely certain +H1lgxRmMgE,1544860000000.0,1545350000000.0,1,ryxwJhC9YX,ryxwJhC9YX,Meta-Review,Accept (Poster),"This paper addresses a promising method for unpaired cross-domain image-to-image translation that can accommodate multi-instance images. It extends the previously proposed CycleGAN model by taking into account per-instance segmentation masks. All three reviewers and AC agree that performing such transformation in general is a hard problem when significant changes in shape or appearance of the object have to be made, and that the proposed approach is sound and shows promising results. As rightly acknowledged by R1 ‘The formulation is intuitive and well done!’ + +There are several potential weaknesses and suggestions to further strengthen this work: +(1) R1 and R2 raised important concerns about the absence of baselines such as crop & attach simple baseline and CycleGAN+Seg. Pleased to report that the authors showed and discussed in their response some preliminary qualitative results regarding these baselines. In considering the author response and reviewer comments, the AC decided that the paper could be accepted given the comparison in the revised version, but the authors are strongly urged to include more results and evaluations on crop & attach baseline in the final revision if possible. +(2) more quantitative results are needed for assessing the benefits of this approach (R3). The authors discussed in their response to R3 that more quantitative results such as the segmentation accuracy of the synthesized images are not possible since no ground-truth segmentation labels are available. This is true in general for unpaired image-to-image translation, however collecting annotations and performing such quantitative evaluation could have a substantial impact for assessing the significance of this work and can be seen as a recommendation for further improvement. +(3) the proposed model performs translation for a pair of domains; extending the work to multi-domain translation like StarGAN by Choi et al 2018 or GANimation by Pumarola 2018 would strengthen the significance of the work. The authors discussed in their response to R3 that this is indeed possible. +",ICLR2019,5: The area chair is absolutely certain +BJPc2zLdl,1486400000000.0,1486400000000.0,1,rJiNwv9gg,rJiNwv9gg,ICLR committee final decision,Accept (Poster),"This paper optimizes autoencoders for lossy image compression. Minimal adaptation of the loss makes autoencoders competitive with JPEG2000 and computationally efficient, while the generalizability of trainable autoencoders offers the added promise of adaptation to new domains without domain knowledge. + The paper is very clear and the authors have tried to give additional results to facilitate replication. The results are impressive. While another ICLR submission that is in the same space does outperform JPEG2000, this contributions nevertheless also offers state-of-the-art performance and should be of interest to many ICLR attendees.",ICLR2017, +05eoG3uM9ST,1610040000000.0,1610470000000.0,1,NTP9OdaT6nm,NTP9OdaT6nm,Final Decision,Reject,"The paper proposes formulating safety constraints as formal language constrains, as a step toward bridging the gap between ML and software engineering, and enabling safe exploration in RL. The authors responded and improved the paper significantly during the rebuttal period. Despite that, the reviewers raise the question, and I agree, that the significance of the paper, especially the novelty of the method, do not meet ICLR standard. The future version of the paper should be developed more in terms of the novelty, evaluations, and related works. +",ICLR2021, +Kuss3NOqAceL,1642700000000.0,1642700000000.0,1,Azh9QBQ4tR7,Azh9QBQ4tR7,Paper Decision,Accept (Poster),"The authors propose a simple addition to adversarial training methods that improves model performance without significantly changing the complexity of training. The initial reviews raised some questions about whether experiments were sufficiently extensive, but these issues were resolved during the rebuttal and discussion period, resulting in a strong consensus that the paper should be published.",ICLR2022, +K7LkJ6BimXq,1610040000000.0,1610470000000.0,1,pAJ3svHLDV,pAJ3svHLDV,Final Decision,Reject,"The paper has good contributions to a challenging problem, leveraging a Faster-RCNN framework with a novel self-supervised learning loss. However reviewer 4 and other chairs (in calibration) considered that the paper does not meet the bar for acceptance. The other reviewers did not champion the paper either, hence i am proposing rejection. + +Pros: +- R1 and R3 agree that the proposed model improves over related models such as MONET. +- The value of the proposed self-supervised loss connecting bounding boxes and segmentations is well validated in experiments. + +Cons: +- R4 gives good suggestions that may be useful to reach a broader readership, namely introducing more of the concepts used in the paper., e.g. ""stick breaking, spatial broadcast decoder, multi-otsu thresholding"" so it becomes more self-contained. R4 also suggests improving the writing more generally. +- R4 still finds the proposed ""method quite complex yet derivative"" after the rebuttal. +- All reviewers complain about lack of experiments in real data, but the authors did revise their paper and add some coco results in the appendix. These could be part of the main paper in a future version.",ICLR2021, +X4C0a-q2ucI,1642700000000.0,1642700000000.0,1,huXTh4GF2YD,huXTh4GF2YD,Paper Decision,Reject,"This paper tackles the open-set recognition problem, specifically the subset that looks at rejecting test data that with unknown classes that are related to the training data. The proposed approach uses an existing distance-based classifier (based on LDA) combined with a new background class regularizer. Results, comparing to a few prior OSR methods, are shown across image/text datasets. + + The reviewers gave a mixed set of scores, with concerns about visualization/ablation studies and the lambda parameter with affect on classification accuracy (1wRX, ujMG, rop6), computation complexity and efficiency (1wRX), limited novelty and discussion of relationship to prior works (Mkdh, rop6), and limited comparison to state of art as only a few algorithms are compared to the proposed approach (Mkdh), and initialization method. Notably, the authors make a strong claim for the latter point that the method should only be compared to previous BCR methods (as opposed to softmax-based classifiers, for example); this seems to ignore whole classes of different methods that can approach the OSR problem. While it is true that comparing to previous BCR methods can directly show your approach is superior to them under similar class of algorithms (thereby showing that it is an improvement), putting the method within the context of the entire literature is absolutely necessary to discuss relative impact to the field. For example, the improvements in AUROC are not that great (and in some cases worse) than even the methods you compare to, while OSCR is improved significantly, so it is not clear how it stacks up with respect to the current state of art. Even if it doesn't beat it, you could argue your contribution, but not presenting it all prevents a holistic perspective that is necessary. + + The authors provided thorough rebuttals, including additional ablations and experiments. However, after the review period the scores remain mixed (5,5,6,8) and the reviewers expressed remaining concerns about novelty and comparison to the current state of art (not just BCR-based methods). As a result of these remaining concerns, I recommend rejection at this time.",ICLR2022, +HJk2szIdl,1486400000000.0,1486400000000.0,1,ryUPiRvge,ryUPiRvge,ICLR committee final decision,Invite to Workshop Track,"This paper proposes using functions such as sin and cos as basis functions, then training a neural network with L1 regularization to obtain a simple estimate of functions that can extrapolate under some circumstances. + + Pros: + - the paper has a wide-ranging discussion connecting extrapolation in regression problems to adjacent fields of system identification and causal learning. + - the method is sensible enough, and should probably be a baseline in the time-series literature. It also seems like an advance on the hard-to-optimize Eureqa method. + + Cons: + I agree with the authors that Reviewer 5's comments aren't very helpful, but this paper really does ignore or dismiss a lot of recent progress and related methods. Specifically: + - The authors claim that cross-validation can't be used to choose the model, since it wouldn't encourage extrapolation - but why not partition the data in contiguous chunks, as is done in time-series methods? + - The authors introduce an annealing trick to help with the L1 objective, but there is a rich literature on gradient-based optimization methods with L1 regularization that address exactly this problem. + - The authors mostly consider toy data, limiting the potential impact of their method. + - The authors don't compare against closely related methods developed to address the exact same setting. Namely, Schmit + Lipson's Eureqa method, and the Gaussian process methods of Duvenaud, Lloyd, Grosse, Tenenbaum and Ghahramani. + - The authors invent their own ad-hoc model-selection procedure, again ignoring a massive literature. + + Given the many ""cons"", it is recommended that this paper not be presented at the conference track, but be featured at the workshop track.",ICLR2017, +yXunlvThf7,1610040000000.0,1610470000000.0,1,P__qBPffIlK,P__qBPffIlK,Final Decision,Reject,"This paper proposes a heuristic for removing privacy sensitive attributes and replacing them with sythetically generated ones. The technique is closely related to an existing work and, as pointed out in the reviews, the experimental evaluation is insufficient for properly evaluating the approach.",ICLR2021, +rk2pjf8ug,1486400000000.0,1486400000000.0,1,HJpfMIFll,HJpfMIFll,ICLR committee final decision,Accept (Poster),The paper considers an important problem largely ignored by continuous word representation learning: polysemy. The approach is mathematically grounded and interesting and well explored.,ICLR2017, +r1bonkIpQi,1576800000000.0,1576800000000.0,1,HylwpREtDr,HylwpREtDr,Paper Decision,Reject,"The authors propose a method of selecting nodes to label in a graph neural network setting to reduce the loss as efficiently as possible. Building atop Sener & Savarese 2017 the authors propose an alternative distance metric and clustering algorithm. In comparison to the just mentioned work, they show that their upper bound is smaller than the previous art's upper bound. While one cannot conclude from this that their algorithm is better, at least empirically the method appears to have a advantage over state of the art. + +However, reviewers were concerned about the assumptions necessary to prove the theorem, despite the modifications made by the authors after the initial round. + +The work proposes a simple estimator and shows promising results but reviewers felt improvements like reducing the number of assumptions and potentially a lower bound may greatly strengthen the paper.",ICLR2020, +f35uw_brz0k,1642700000000.0,1642700000000.0,1,Sqv6rs_TRV,Sqv6rs_TRV,Paper Decision,Reject,"The topic and ambition of this paper has been judged as important by all reviewers. Yet there is +a consensus that the theoretical and experimental contribution is not strong enough to effectively +argue for an important novel lead which would justify publication at ICLR. For these rejections, +this paper cannot be endorsed for publication at ICLR 2022.",ICLR2022, +O58nz2IAIQ,1576800000000.0,1576800000000.0,1,HkgB2TNYPS,HkgB2TNYPS,Paper Decision,Accept (Poster),"The reviewers generally found the paper's contribution to be valuable and informative, and I believe that this paper should be accepted for publication and a poster presentation. I would strongly recommend to the authors to carefully read over the reviews and address any comments or concerns that were not yet addressed in the rebuttal.",ICLR2020, +Bk8aGk6rM,1517250000000.0,1517260000000.0,31,rk07ZXZRb,rk07ZXZRb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This is a paper introducing a hierarchical RL method which incorporates the learning of a latent space, which enables the sharing of learned skills. + +The reviewers unanimously rate this as a good paper. They suggest that it can be further improved by demonstrating the effectiveness through more experiments, especially since this is a rather generic framework. To some extent, the authors have addressed this concern in the rebuttal. +",ICLR2018, +v32WGpka0,1576800000000.0,1576800000000.0,1,rkg-TJBFPB,rkg-TJBFPB,Paper Decision,Accept (Poster),"This paper tackles the problem of exploration in deep reinforcement learning in procedurally-generated environments, where the same state is rarely encountered twice. The authors show that existing methods do not perform well in these settings and propose an approach based on intrinsic reward bonus to address this problem. More specifically, they combine two existing ideas for training RL policies: 1) using implicit reward based on latent state representations (Pathak et al. 2017) and 2) using implicit rewards based on difference between subsequent states (Marino et al. 2019). + +Most concerns of the reviewers have been addressed in the rebuttals. Given that it builds so closely on existing ideas, the main weakness of this work seems to be the novelty. The strength of this paper resides in the extensive experiments and analysis that highlight the shortcomings of current techniques and provide insight into the behaviour of trained agents, in addition to proposing a strategy which improves upon existing methods. + +The reviewers all agree that the paper should be accepted. I therefore recommend acceptance.",ICLR2020, +R4XcLYBxVplJ,1642700000000.0,1642700000000.0,1,MEpKGLsY8f,MEpKGLsY8f,Paper Decision,Accept (Spotlight),"All reviewers believe that this paper is valuable, and the authors have made a significant, careful contribution. + +Some suggestions from the area chair: +- ""in causality"" is not a standard technical term and also not non-technical idiomatic English, so it should be explained the first time it is used. +- The authors should briefly cite and discuss research on so-called positive and unlabeled (PU) learning. This seems like the special case where there is exactly one known class and one novel class. The distinction between sampling in causality and labeling in causality appears in the PU literature, though not under this name. +- The authors could also mention the obvious but surprising point that if data are generated by two clusters, then a classifier can be learned using exactly one labeled example--not even one from each class. +- I have read the reference EJ A’Court Smith. Discovery of remains of plants and insects. _Nature_, 1874 and I fail to see its relevance. It is only one paragraph. Work from the 1800s should not be cited merely to suggest a veneer of scholarliness. +- The writing uses italics for emphasis much too often.",ICLR2022, +5klWZJpK_9,1576800000000.0,1576800000000.0,1,r1e8WTEYPB,r1e8WTEYPB,Paper Decision,Reject,"This paper presents sparse attention mechanisms for image captioning. In addition to recent sparsemax based method, authors proposed to extend it by incorporating structural constraints in 2D images, which is called TVMAX. The proposed methods are shown to improve the quality of captioning, particularly in terms of fewer erroneous repetitions, and obtain better human evaluation scores. +Through reviewer discussion, one reviewer updated the score to rejection. A major concern raised by the reviewers is that the motivation of introducing sparse attention is not clear, and the reason why it improves the quality (particularly, why it can reduce repetition) is not convincing. While we understand it is plausible for long sequences as in text domain, we are not convinced that it is really necessary for image captioning problems. Although authors seem to have some ideas, we cannot see how they will be reflected in the paper so I’d like to recommend rejection. +I recommend authors to polish the paper with a clearer description of the motivation and high-level analysis of the method as well as testing on other visual tasks to show its generality. +",ICLR2020, +3B_MQ3ThSnV,1642700000000.0,1642700000000.0,1,Y0cGpgUhSvp,Y0cGpgUhSvp,Paper Decision,Reject,"The paper provides a method to accelerate training by choosing a subset of points. After the initial submission, the reviewers raised a major concern about the practicality of the method. In the rebuttal phase the authors provided additional experiments on a large datasets that addressed this concern. That being said, the reviews are still quite borderline. The biggest remaining concern is about the quality of writing. Specifically, there are still requests to “fix the narrative” (NdhY, DHeZ). In addition, some details seem to remain vague regarding the positioning of the paper when compared to the active learning literature (BBTj). +Overall, the paper seems to have potential, especially with the new experiments. However, the changes it required when compared to the originally submitted version are simply too extensive to be thoroughly reviewed in a rebuttal phase.",ICLR2022, +5FggfxCZlKL,1610040000000.0,1610470000000.0,1,YicbFdNTTy,YicbFdNTTy,Final Decision,Accept (Oral),"This paper has generated a lot of great discussion and it presents a very different way of doing image recognition at scale compared to current state of the art practices. All reviewers rated this paper as an accept. +This work is interesting enough that in my view it really deservers further exposure and discussion and an oral presentation at ICLR would be a good way to achieve that.",ICLR2021, +y2_yusx4li,1576800000000.0,1576800000000.0,1,Bkg0u3Etwr,Bkg0u3Etwr,Paper Decision,Accept (Poster),The authors propose the use of an ensembling scheme to remove over-estimation bias in Q-Learning. The idea is simple but well-founded on theory and backed by experimental evidence. The authors also extensively clarified distinctions between their idea and similar ideas in the reinforcement learning literature in response to reviewer concerns.,ICLR2020, +SyxnMfrXeN,1544930000000.0,1545350000000.0,1,BJfYvo09Y7,BJfYvo09Y7,accept; vision-enabled/memory-enabled/mocap-mimicing humanoid,Accept (Poster),"A hierarchical method is presented for developing humanoid motion control, +using low-level control fragments, egocentric visual input, recurrent high-level control. +It is likely the first demonstration of 3D humanoids learning to do memory-enabled tasks using only +proprioceptive and head-based ego-centric vision. The use of control fragments as opposed +to mocapclip-based skills allows for finer-grained repurposing of pieces of motion, while +still allowing for mocap-based learning + +Weaknesses: It is largely a mashup up of previously known results (R2). Caveat: this can be said for all research +at some sufficient level of abstraction. The motions are jerky when transitions happen between control fragments (R2,R3). +There are some concerns as to whether the method compares against other methods; the authors note +that they are either not directly comparable, i.e., solving a different problem, or are implicitly +contained in some of the comparisons that are performed in the paper. + +Overall, the reviewers and AC are in broad agreement regarding the strengths and weaknesses of the paper. + +The AC believes that the work will be of broad interest. Demonstrating memory-enabled, vision-driven, +mocap-imitating skills is a broad step forward. The paper also provides a further datapoint as +to which combinations of method work well, and some of the specific features required to make them work. + +The paper could acknowledge motion quality artifacts, as noted by the reviewers and +in the online discussion. Suggest to include [Peng et al 2017] as some of the most relevant related HRL humanoid control work, as per the reviews & discussion. + +",ICLR2019,4: The area chair is confident but not absolutely certain +3pzvKyF42LO,1610040000000.0,1610470000000.0,1,4RbdgBh9gE,4RbdgBh9gE,Final Decision,Accept (Poster),"This paper proposes an interesting unified framework for meta-learning with commentaries, which contains information helpful for learning about new tasks or new data points. The authors present three kinds of different instantiations, i.e., example weighting, example blending, and attention mask, and show the effectiveness with the extensive experiments. The proposed method has a potential to be used for a wide variety of tasks.",ICLR2021, +GHn2B5XQvA,1576800000000.0,1576800000000.0,1,Skeq30NFPr,Skeq30NFPr,Paper Decision,Reject,"This paper takes results related to the convergence and implicit regularization of stochastic mirror descent, as previously applied within overparameterized linear models, and extends them to the nonlinear case. Among other things, conditions are derived for guaranteeing convergence to a global minimizer that is (nearly) closest to the initialization with respect to a divergence that depends upon the mirror potential. Overall the paper is well-written and likely at least somewhat accessible even for non-experts in this field. + +That being said, two reviewers voted to reject while one chose accept; however, during the rebuttal period the accept reviewer expressed a somewhat borderline sentiment. As for the reviewers that voted to reject, a common criticism was the perceived similarity with reference (Azizan and Hassibi, 2019), as well as unsettled concerns about the reasonableness of the assumptions involved (e.g., Assumption 1). With respect to the former, among other similarities the proof technique from both papers relies heavily on Lemma 6. It was then felt that this undercut the novelty somewhat. + +Beyond this though, even the accept reviewer raised an unsettled issue regarding the ease of finding an initialization point close to the manifold that nonetheless satisfies the conditions of Assumption 1. In other words, as networks become more complex such that points are closer to the manifold of optimal solutions, further non-convexity could be introduced such that the non-negativity of the stated divergence becomes more difficult to achieve. While the author response to this point is reasonable, it feels a bit like thoughtful speculation forged in the crunch time of a short rebuttal period, and possibly subject to change upon further reflection. In this regard a less time-constrained revision could be beneficial (including updates to address the other points mentioned above), and I am confident that this work can be positively received at another venue in the near future.",ICLR2020, +SJgKUwKWlE,1544820000000.0,1545350000000.0,1,SJggZnRcFQ,SJggZnRcFQ,meta review,Accept (Poster),"This paper considers the problem of learning symbolic representations from raw data. The reviewers are split on the importance of the paper. The main argument in favor of acceptance is that bridges neural and symbolic approaches in the reinforcement learning problem domain, whereas most previous work that have attempted to bridge this gap have been in inverse graphics or physical dynamics settings. Hence, it makes for a contribution that is relevant to the ICLR community. The main downside is that the paper does not provide particularly surprising insights, and could become much stronger with more complex experimental domains. +It seems like the benefits slightly outweigh the weaknesses. Hence, I recommend accept.",ICLR2019,3: The area chair is somewhat confident +rJgOT2SYkV,1544280000000.0,1545350000000.0,1,S1M6Z2Cctm,S1M6Z2Cctm,New objective term enforcing consistent similarity between image patches across domains. Improvements made based on reviews.,Accept (Poster),"The proposed method introduces a method for unsupervised image-to-image mapping, using a new term into the objective function that enforces consistency in similarity between image patches across domains. Reviewers left constructive and detailed comments, which, the authors have made substantial efforts to address. + +Reviewers have ranked paper as borderline, and in Area Chair's opinion, most major issued have been addressed: + +- R3&R2: Novelty compared to DistanceGAN/CRF limited: authors have clarified contributions in reference to DistanceGAN/CRF and demonstrated improved performance relative to several datasets. +- R3&R1: Evaluation on additional datasets required: authors added evaluation on 4 more tasks +- R3&R1: Details missing: authors added details. + +",ICLR2019,3: The area chair is somewhat confident +ZB6tYwBy7dd,1642700000000.0,1642700000000.0,1,2yITmG7YIFT,2yITmG7YIFT,Paper Decision,Reject,"There appears to be to be a fundamental error in the paper, w.r.t. the application of the proposed approach to finite fields. As a result, the paper cannot be accepted in its current form.",ICLR2022, +SkgTh_R7gN,1544970000000.0,1545350000000.0,1,rJVoEiCqKQ,rJVoEiCqKQ,Area chair recommendation,Reject,"Strengths: +The method extends [21], which proposes an unordered set prediction model for multi-class classification. +The submission proposes a formulation to learn the distribution over unobservable permutation variables based on deep networks and uses a MAP estimator for inference. +While the failure of NMS to detect overlapping objects is expected, the experiments showing that perm-set prediction handles them well is interesting and promising. + +Weaknesses: + +Reviewer 1: ""I find the paper still too scattered, trying to solve diverse problems with a hammer without properly motivating / analyzing key details of this hammer. So I keep my rating."" +Reviewer 2: ""I'm glad that the authors are seeing good performance and seem to have an effective method for matching outputs to fixed predictions, however the quality of the paper is too poor for publication."" + +Points of contention: + + Although there was one reviewer who gave a high rating, they were not responsive in the rebuttal phase. The other two reviewers took into account the author responses, and a contributed comment by an unaffiliated reviewer, and both concluded that the paper still had serious issues. The main issues were: lack of clear methodology and poor clarity (AnonReviewer2), and poor organization and lack of motivation for modeling choices (AnonReviewer1).",ICLR2019,5: The area chair is absolutely certain +Hk6NIJ6rM,1517250000000.0,1517260000000.0,766,S1EfylZ0Z,S1EfylZ0Z,ICLR 2018 Conference Acceptance Decision,Reject,"The authors propose to detect anomaly based on its representation quality in the latent space of the GAN trained on valid samples. + +Reviewers agree that: +- The proposed solution lacks novelty and similar approaches have been tried before. +- The baselines presented in the paper are primitive and hence do not demonstrate the clear benefits over traditional approaches. +",ICLR2018, +puSotMhhn0,1576800000000.0,1576800000000.0,1,rkl44TEtwH,rkl44TEtwH,Paper Decision,Reject,"The submission presents a semi-parametric approach to motion synthesis. The reviewers expressed concerns about the presentation, the relationship to existing work, and the scope of the results. After the authors' responses and revision, concerns remain. The AC also notes that the submission is 10 pages long. The AC recommends rejecting the submission.",ICLR2020, +ziypwp6txdV,1642700000000.0,1642700000000.0,1,m4BAEB_Imy,m4BAEB_Imy,Paper Decision,Reject,"This paper deals with a problem of significant practical relevance: memory efficient neural networks. The authors propose some pruning methods for binary networks. However, several weaknesses were identified by the reviewers (novelty, lack of extensive experiments, problems with the presentation of the paper), and several valid points of concern were raised. These points of criticism were not adequately addressed, hence the paper in its current form cannot be recommended for publication.",ICLR2022, +ceU2EwYANL-,1610040000000.0,1610470000000.0,1,rWZz3sJfCkm,rWZz3sJfCkm,Final Decision,Accept (Poster),"This paper proposes an efficient approach for computing equivariant spherical CNNs, significantly reducing the memory and computation costs. Experiments validate the effectiveness of the proposed approach. + +Pros: +1. Speeding up equivariant spherical CNNs is a valuable topic in deep learning. +2. The proposed approach is effective, in all parameter size, memory footprint and computation time. +3. The theory underpinning the speedup method is sound. + +Cons: +1. The readability should be improved. Two of the reviewers complained that the paper is hard to read and only Reviewer #2 reflected that it is ""easy"" to read (but only under the condition that the readers are familiar with the relevant mathematics), and this situation is improved after rebuttal. Nonetheless, this should be further done. +2. The experiments are a bit limited. This may partially be due to limited benchmark datasets for spherical data, but for the existing datasets used for comparison, Esteves et al. (2020) is not compared on all of them. Esteves et al. (2020) is only reported on spherical MNIST, which has very close performance to the proposed one. This worries the AC, who is eager to see whether on QM7 and SHREC’17 the results would be similar. + +After rebuttal, three of the reviewers raised their scores. So the AC recommended acceptance.",ICLR2021, +ByaZQJTBz,1517250000000.0,1517260000000.0,81,HJIoJWZCZ,HJIoJWZCZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The general consensus is that this method provides a practical and interesting approach to unsupervised domain adaptation. One reviewer had concerns with comparing to state of the art baselines, but those have been addressed in the revision. + +There were also issues concerning correctness due to a typo. Based on the responses, and on the pseudocode, it seems like there wasn't an issue with the results, just in the way the entropy objective was reported. + +You may want to consider reporting the example given by reviewer 2 as a negative example where you expect the method to fail. This will be helpful for researchers using and building on your paper.",ICLR2018, +r1e-mHg0yE,1544580000000.0,1545350000000.0,1,Sklv5iRqYX,Sklv5iRqYX,Theoretical contribution limiting worst case performance in the multi-domain setting for adversarial based adaptation methods,Accept (Poster),"This paper extends the single source H-divergence theory for domain adaptation to the case of multiple domains. Thus, drawing on the known connection between H-divergence and learning the domain classifier for adversarial adaptation, the authors propose a multi-domain adversarial learning algorithm. The approach builds upon the gradient reversal version of adversarial adaptation proposed by Ganin et al 2016. + +Overall, multi-domain learning and limiting the worst case performance on any single domain is an interesting problem which has been relatively underexplored. Though this work does not have the highest performance on all datasets across competing methods, as noted by reviewers, it proposes a useful theoretical result which future research may build on. I would encourage the reviewers to compare against and discuss the missing prior work cited by Rev 3. ",ICLR2019,5: The area chair is absolutely certain +HkLFLJpBz,1517250000000.0,1517260000000.0,830,SkZ-BnyCW,SkZ-BnyCW,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers agreed that while this is a well-written paper, it is low on novelty and does not make a substantial enough contribution. They also pointed out that although the reported MNIST results are highly competitive, possibly due to the use of a powerful ResNet decoder, the CIFAR10/ImageNet results are underwhelming.",ICLR2018, +JO5rta5HQPT,1642700000000.0,1642700000000.0,1,vcUmUvQCloe,vcUmUvQCloe,Paper Decision,Accept (Poster),"The reviewers think the topic is important and challenging. The results are novel, and the experimental section provides a nice illustration how the joint Shapley values can be used. However, the paper can be improved by including more real world applications and experiments.",ICLR2022, +bY2gZ6bPQcFQ,1642700000000.0,1642700000000.0,1,9Vimsa_gGG5,9Vimsa_gGG5,Paper Decision,Reject,"The paper proposes an initialization method to initialize residual networks in an expressive subspace of weights. Although the reviewers highlighted some positive aspects, they found the contribution to be limited compared to prior work. Some reviewers also raised some concerns regarding the experimental results not backing up the claims made in the paper. The authors did not respond, so I can therefore not recommend acceptance. This will hopefully provide useful feedback for a potential revision.",ICLR2022, +BklsDJHXgE,1544930000000.0,1545350000000.0,1,BylBns0qtX,BylBns0qtX,Interesting but not good enough.,Reject,"This paper shows experiments in favor of learning and using heteroscedastic noise models for differentiable Bayes filter. Reviewers agree that this is interesting and also very useful for the community. However, they have also found plenty of issues with the presentation, execution and evaluations shown in the paper. Post rebuttal, one of the reviewer increased their score, but the other has reduced the score. Overall, the reviewers are in agreement that more work is required before this work can be accepted. + +Some of existing work on variational inference has not been included which, I agree, is problematic. Simple methods have been compared but then why these methods were chosen and not the other ones, is not completely clear. The paper definitely can improve on this aspect, clearly discussing relationships to many existing methods and then picking important methods to clearly bring some useful insights about learning heteroscedastic noise. Such insights are currently missing in the paper. + +Reviewers have given many useful feedback in their review, and I believe this can be helpful for the authors to improve their work. In its current form, the paper is not ready to be accepted and I recommend rejection. I encourage the authors to resubmit this work. +",ICLR2019,5: The area chair is absolutely certain +HJj2Vk6SG,1517250000000.0,1517260000000.0,443,r1kP7vlRb,r1kP7vlRb,ICLR 2018 Conference Acceptance Decision,Reject,"The pros and cons of this paper can be summarized as follows: + +Pros: +* It seems that the method has very good intuitions: consideration of partial rewards, estimation of rewards from modified sequences, etc. + +Cons: +* The writing of the paper is scattered and not very well structured, which makes it difficult to follow exactly what the method is doing. If I were to give advice, I would flip the order of the sections to 4, 3, 2 (first describe the overall method, then describe the method for partial rewards, and finally describe the relationship with SeqGAN) +* It is strange that the proposed method does not consider subsequences that do not contain y_{t+1}. This seems to go contrary to the idea of using RL or similar methods to optimize the global coherence of the generated sequence. +* For some of the key elements of the paper, there are similar (widely used) methods that are not cited, and it is a bit difficult to understand the relationship between them: +** Partial rewards: this is similar to ""reward shaping"" which is widely used in RL, for example in the actor-critic method of Bahdanau et al. +** Making modifications of the reference into a modified reference: this is done in, for example, the scheduled sampling method of Bengio et al. +** Weighting modifications by their reward: A similar idea is presented in ""Reward Augmented Maximum Likelihood for Neural Structured Prediction"" by Norouzi et al. + +The approach in this paper is potentially promising, as it definitely contains a lot of promising insights, but the clarity issues and fact that many of the key insights already exist in other approaches to which no empirical analysis is provided makes the contribution of the paper at the current time feel a bit weak. I am not recommending for acceptance at this time, but would certainly encourage the authors to do clean up the exposition, perhaps add a comparison to other methods such as RL with reward shaping, scheduled sampling, and RAML, and re-submit to another venue.",ICLR2018, +nBQojDFYksn,1610040000000.0,1610470000000.0,1,AJTAcS7SZzf,AJTAcS7SZzf,Final Decision,Reject,"The work focuses on a new method for sampling hyper-parameter based on an ""Population-Based Training"" schedule that tend to sample more often configurations that gave good performances in the past. The authors have conducted experiments to verify the superior of their method, especially for the effectiveness and generalisability. + +Pros: +- simple method that can be implemented without much effort, +- good empirical performances on Imagenet, +- paper well organised and written. + +Cons: +- lack of explanation about the DensNet121 performance degradation [partially addressed in the rebuttal], +- additional simple experiments in Section 4.4 were recommended to evaluate the generality of the method [addressed in Table 5], +- empirical validation seems not sufficient [partially addressed in the rebuttal], +- similarity with respect to prior art, such as the focal loss [partially addressed in the rebuttal], +- clarification of the randomisation strategy in experiments [addressed in the rebuttal]. + +Despite most of the issues being addressed, the reviewers decided that this paper would benefit more work to be accepted for the conference this year.",ICLR2021, +HyAthz8ul,1486400000000.0,1486400000000.0,1,H1oRQDqlg,H1oRQDqlg,ICLR committee final decision,Invite to Workshop Track,"This paper presents an idea with a sensible core (augmenting amortized inference with per-instance optimization) but with an overcomplicated and ad-hoc execution. The reviewers provided clear guidance for how this paper could be improved, and thus I invite the authors to submit this paper to the workshop track.",ICLR2017, +gJhbcOs794L,1610040000000.0,1610470000000.0,1,KubHAaKdSr7,KubHAaKdSr7,Final Decision,Reject,"All reviewers noted the significance of the problem tackled by this paper and felt that it is going in the right direction. However, they also all noted that the paper was not finalized and polished well enough to be granted publication: details missing, typos, clarifications needed. The reviewers acknowledged the large amount of work that went into improving the paper during the discussion period. R1 even increased their score to reflect that. + +Still the paper still needs some work to be accepted at ICLR. In particular, we encourage the authors to improve on 2 axes. +1. Clarifying motivations and contribution: it is still unclear if the main point of the paper is to propose new methods around FTM & constrained updates, etc. or around proposing a new benchmark for catastrophic forgetting, lifelong learning. +2. Reorganizing experimental section: the experimental section should be organized to support #1. Reviewers made a lot of suggestions, like moving Table 4 from the appendix, that should be further refined + +We hope that this will allow to increase the clarity and impact of this research work.",ICLR2021, +H1gq27WIgE,1545110000000.0,1545350000000.0,1,SkxxIs0qY7,SkxxIs0qY7,"Novel approach with promising results for generative modeling, but with incorrect claims and insufficiently analysed heuristic shortcuts",Reject,"The paper proposes an original and interesting alternative to GANs for optimizing a (proxy to) Jensen-Shannon divergence for discrete sequence data. Experimental results seem promising. Official reviewers were largely positive based on originality and results. However, as it currently stands, the paper still makes false claims that are not well explained or supported, in particular its repeated central claim to provide a ""low-variance, bias-free algorithm"" to optimize JS. Given that these central issues were clearly pointed out in a review from a prior submission of this work to another venue (review reposted on the current OpenReview thread on Nov. 6), the AC feels that the authors had had plenty of time to look into them and address them in the paper, as well as occasions to reference and discuss relevant related work pointed in that review. The current version of the paper does neither. The algorithm is not unbiased for at least two reasons pointed out in discussions: a) in practice a parameterized mediator will be unable to match the true P+G, at best yielding a useful biased estimate (not unlike how GAN's parameterized discriminator induces bias). b) One would need to use REINFORCE (or similar) to get an unbiased estimate of the gradient in Eq. 13, a key detail omitted from the paper. From the discussion thread it is possible that authors were initially confused about the fact that this fundamental issue did not disappear with Eq. 13 (they commented ""most important idea we want to present in this paper is HOW TO avoid incorporating REINFORCE. Please refer to Eq.13, which is the key to the success of this.""). But rather, as guessed by a commentator, that a heuristic implementation, not explained in the paper, dropped the REINFORCE term thus effectively trading variance for bias. +On December 4th authors posted a justification confirming heuristically dropping the REINFORCE terms when taking the gradient of Eq. 13, and said they could attach detailed analysis and experiment results in the camera-ready version. However if one of the ""most important idea"" of the paper is how to avoid REINFORCE (as still implied and highlighted in the abstract), the AC finds it worrisome that the paper had no explanation of when and how this was done, and no analysis of the bias induced by (unreportedly) dropping the term. + +The approach remains original, interesting, and potentially promising, but as it currently stands, AC and SAC agreed that inexact theoretical over-claiming and insufficient justification and in-depth analysis of key heuristic shortcuts/tradeoffs (however useful) are too important for their fixing to be entrusted to a final camera-ready revision step. A major revision that clearly adresses these issues in depth (both in how the approach is presented and in supporting experiments) will constitute a much more convincing, sound, and impactful research contribution. + +",ICLR2019,4: The area chair is confident but not absolutely certain +KA6q3c-zbnV,1610040000000.0,1610470000000.0,1,GzHjhdpk-YH,GzHjhdpk-YH,Final Decision,Reject,"This paper present novel formulations to address the problem of unbalanced Gromov. The Conic formulation is very interesting but stays theoretical until optimization algorithms are available. The Unbalanced Gromov is a nice extension of Gromov and comes with relatively efficient solvers. Some very limited numerical experiment show the proposed UGW used between 2D distributions (two moons) and graphs. + +The paper had some mixed reviews with reviewers acknowledging the novelty of the approach (albeit an extension similar to unbalanced OT) and of the theoretical results. The detailed a very well written response to the reviewers comment has been appreciated. But all reviewers also noted a lack of numerical experiments outside of the very simple illustrations in the paper. This paper is a very nice contribution to the theory of optimal transport but fails at illustrating its relevance to the ML community. Despite acknowledging the theoretical contributions of the paper, the AC recommends a reject but strongly encourages the authors to complete the experimental section with some ML applications or at least proof of concepts (graph classification, domain adaptation, ...). +",ICLR2021, +Skg0kCeTyE,1544520000000.0,1545350000000.0,1,HJlfAo09KX,HJlfAo09KX,ICLR 2019 decision,Reject,This paper shows local convergence results for gradient descent on one hidden layer network with Gaussian inputs and sigmoid activations. Later it shows global convergence by using spectral initialization. All the reviewers agree that the results are similar to existing work in the literature with little novelty. There are also some concerns about the correctness of the statements expressed by some reviewers. ,ICLR2019,4: The area chair is confident but not absolutely certain +rycZhzIug,1486400000000.0,1486400000000.0,1,HkvS3Mqxe,HkvS3Mqxe,ICLR committee final decision,Reject,"Unfortunately this paper is not competitive enough for ICLR. The paper is focusing on efficiency where for the results to be credible it is of utmost importance to present experiments on large scale data and state-of-the-art models. + I am afraid the reviewers were not able to take into account the latest drafts of the paper that were submitted in January, but those submissions came very late.",ICLR2017, +bah78GtxCgg,1642700000000.0,1642700000000.0,1,GugZ5DzzAu,GugZ5DzzAu,Paper Decision,Accept (Poster),"The reviewers have agreed that the paper is in borderline. Although the reviewers are not really convinced about the authors’ responses, they still acknowledge that the paper is interesting and developed some new techniques for the analysis of distributed optimization. + +The following concerns are raised by the reviewers from their discussions: + +1) The paper is heavily based on existing work. +2) The theoretical advantages are based on the regime Hessian variance is 0 or small, but it is not clear if and when the Hessian variance is small for more complicated models, which means we will not know if the theory will be helpful in practice. Although the authors provide some experimental results in the rebuttal showing that $(L_+)^2 / (L_{\pm})^2$ can be large at initial iterations, it is still not clear how long will this advantage keep during training and how much the advantage is. +3) Reviewer wjjy increased the score from 5 to 6, considering that the additional result truly suggests that the implicit setting can hold in some case at the beginning of iteration, which makes the submission a complete story for him/her now, to some extent. But if we are more strict on the evaluation, the experiment result also suggests that the implicit assumption will not hold anymore over iterations, because the ratio is approaching 1 quickly, i.e., $ L_\pm$ is about the same order as $ L_+$, so there is a mismatch between theory and practice, which even brings out the risk that the paper will fail from the beginning because Sec 4.1 will not make sense anymore. +4) On the theory side, two main contributions of this paper is relaxation on the compressors used in MARINA and a new assumption to refine the analysis. These two contributions seem rather limited if only used to analyze this specific algorithm -- it's unclear what the authors mean in practice or how they correlated to the MARINA. For example, it is still hard for us to compare or understand MARINA with another algorithm as we wouldn't know if the improvement from MARINA is due to a better design, or this additional assumption. The reviewer also finds the authors statement that ""their analysis focuses on MARINA because it is SOTA"" confusing. Different from NLP and vision community where standard benchmarks are usually used to evaluate new models, He/she is confused by what it means for a newly proposed optimization algorithm to be SOTA. +5) Since the paper proposes a specific algorithm named PermK, it's quite reasonable to question how it relates to some previously proposed sparsification methods with similar design such as (Jianqiao et al., 2017). However, the authors insist their main contribution is in theory, and the small-scale experiments comparing with TopK and RandK are sufficient. The reviewer disagrees about this. As communication compression is usually need in larger scales (at least beyond MNIST), and TopK/RandK are not SOTA baselines of sparsification. + +The authors are expected to address them for the clarifications in the final version.",ICLR2022, +BJlL2Dz-gV,1544790000000.0,1545350000000.0,1,B1x0enCcK7,B1x0enCcK7,questions regarding usefulness of the problem formulation given non impressive empirical outcomes,Reject,"The paper presents a novel problem formulation, that of generating 3D object shapes based on their functionality. They use a dataset of 3d shapes annotated with functionalities to learn a voxel generative network that conditions on the desired functionality to generate a voxel occupancy grid. However, the fact that the results are not very convincing -resulting 3D shapes are very coarse- raises questions regarding the usefulness of the proposed problem formulation. +Thus, the problem formulation novelty alone is not enough for acceptance. Combined with a motivating application to demonstrate the usefulness of the problem formulation and results, would make this paper a much stronger submission. Furthermore, the authors have greatly improved the writing of the manuscript during the discussion phase.",ICLR2019,5: The area chair is absolutely certain +SkeA7uWQeV,1544910000000.0,1545350000000.0,1,ryGvcoA5YX,ryGvcoA5YX,Meta-Review,Accept (Poster),"This paper presents a promising model to avoid catastrophic forgetting in continual learning. The model consists of a) a data generator to be used at training time to replay past examples (and removes the need for storage of data or labels), b) a dynamic parameter generator that given a test input produces the parameters of a classification model, and c) a solver (the actual classifier). The advantages of such combination is that no parameter increase or network expansion is needed to learn a new task, and no previous data needs to be stored for memory replay. + +There is reviewer disagreement on this paper. AC can confirm that all three reviewers have read the author responses and have significantly contributed to the revision of the manuscript. + +All three reviewers and AC note the following potential weaknesses: (1) presentation clarity needed substantial improvement. Notably, the authors revised the paper several times while incorporating the reviewers suggestions regarding presentation clarity. R2 has raised the final rating from 4 to 5 while retaining doubts about clarity. +(2) weak empirical evidence: evaluation with more than three tasks and using more recent/stronger baseline methods would substantially strengthen the evaluation (R2, R3). AC would like to report the authors added an experiment with five tasks and provided a verbal comparison with ""Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence"", ECCV-2018 by reporting the authors results on the MNIST dataset. +(3) as noted by R2, an ablation study of different model components could strengthen the evaluation. The authors included such ablation study in Table 4 of the revised paper. +(4) reproducibility of the model could be difficult (R1). In their response, the authors promised to make the code publicly available. + +AC can confirm that all three reviewers have contributed to the final discussion. Given the effort of the reviewers and authors in revising this work and its potential novelty, the AC decided that the paper could be accepted, but the authors are strongly urged to further improve presentation clarity in the final revision if possible. +",ICLR2019,3: The area chair is somewhat confident +xReF5YaNbB,1576800000000.0,1576800000000.0,1,H1eCw3EKvH,H1eCw3EKvH,Paper Decision,Accept (Poster),"In my opinion, the main strength of this work is the theoretical analysis and some observations that may be of great interest to the NLP community in terms of better analyzing the performance of RL (and ""RL-like"") methods as optimizers. The main weakness, as pointed out by R3, the limited empirical analysis. + +I would urge the authors to take R3's advice and attempt insofar as possible to broaden the scope of the empirical analysis in the final. I believe that this is important for the paper to be able to make its case convincingly. + +Nonetheless, I do think that the paper makes a significant contribution that will be of interest to the community, and should be presented at ICLR. Therefore, I would recommend for it to be accepted.",ICLR2020, +HkQCEy6rM,1517250000000.0,1517260000000.0,463,r1RQdCg0W,r1RQdCg0W,ICLR 2018 Conference Acceptance Decision,Reject,"There is a very nice discussion with one of the reviewers on the experiments, that I think would need to be battened down in an ideal setting. I'm also a bit surprised at the lack of discussion or comparison to two seemingly highly related papers: + +1. T. G. Dietterich and G. Bakiri (1995) Solving Multiclass via Error Correcting Output Codes. +2. Hsu, Kakade, Langford and Zhang (2009) Multi-Label Prediction via Compressed Sensing. +",ICLR2018, +H1dC71aSG,1517250000000.0,1517260000000.0,251,BJuWrGW0Z,BJuWrGW0Z,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"PROS: + +1. Interesting and clearly useful idea +2. The paper is clearly written. +3. This work doesn't seem that original from an algorithmic point of view since Reed & De Freitas (2015) and Cai et. al (2017) among others have considered using execution traces. However the application to program repair is novel (as far as I know). +4. This work can be very useful for an educational platform though a limitation is the need for adding instrumentation print statements by hand. + +CONS: + +1. The paper has some clarity issues which the authors have promised to fix. + +---",ICLR2018, +LVHv3MD7V6M,1642700000000.0,1642700000000.0,1,BRFWxcZfAdC,BRFWxcZfAdC,Paper Decision,Accept (Poster),"This paper discusses the problem of cross-domain lossy compression on the basis of its reformulation as an entropy-constrained optimal transport. Two average distortion measures (without and with common randomness) are defined (Definitions 2 and 3), and some of their properties are investigated, as summarized in Theorems 1-3. The authors also demonstrated in Section 2.2 that in the Bernoulli-Hamming case the common randomness can indeed improve the performance under some conditions. Results of the numerical experiments on super-resolution and denoising are presented to illustrate the principles derived from the theoretical considerations. + +This paper received 5 reviews, with score/confidence being 8/3, 6/3, 6/2, 8/3, 3/4, which exhibit a relatively large spread across the borderline. Upon reading the reviews and the author responses, as well as the paper itself, I think that this paper proposes an interesting framework of optimal transport with entropy bottleneck, as well as architectural designs supported by the theoretical development, with potential image-processing applications. The authors have provided further numerical results in their response. + +My main concern is that the arguments in this paper are somehow confusing in that they borrow several notions and terms from the context of lossy compression and rate-distortion theory in the field of information theory, and use them in quite different meanings without explicitly stating so. (It seems to me that this would have been one major reason for the negative evaluation by Reviewer LfvG.) Examples are: +1. **Target distribution:** In rate-distortion theory the target distribution $p_Y$ is not fixed, whereas it is fixed in this paper. +2. **Rate constraint:** In rate-distortion theory the rate constraint is imposed in terms of the mutual information $I(X;Y)$ between the source $X$ and the target $Y$, in a form $I(X;Y)\le R$. The justification of this particular form of the rate constraint rather than any other forms is that it is compatible with the operational achievability/converse arguments via explicit construction of encoder/decoder pairs. In this paper, on the other hand, the authors consider a Markov chain $X\to Z\to Y$ and impose the rate constraint on the entropy $H(Z)$ of the intermediate random variable $Z$. Under the Markov assumption one has $I(X;Y)\le H(Z)$, so that the rate constraint in this paper is stronger than that in rate-distortion theory, and one would have no control over how tight/loose the adopted constraint $H(Z)\le R$ is against $I(X;Y)\le R$. In relation to this, the expression ""identify the tradeoff between the compression rate and minimum achievable distortion"" (page 2, line 12) would be at best misleading, as the arguments in this paper might be suboptimal, not necessarily providing the theoretically best achievable results. +3. **Extension versus single-shot:** In rate-distortion theory one usually considers $n$th extension of a source and a block encoder/decoder pair with blocklength $n$. On the other hand, this paper considers what is called the ""single-shot"" setting, in which one does not consider extension of sources. There are some pieces of work on lossy compression in the single-shot setting [C1][C2], so that I would be interested in how such pieces of work and the development in this paper will be related, an issue not explored at all in this paper. + +As a result, although the quantities $D_{\mathrm{ncr}}$ and $D_{\mathrm{cr}}$ defined in Definitions 2 and 3 would look quite like the distortion-rate functions in rate-distortion theory, they are actually not the distortion-rate functions at all. Although the authors, perhaps carefully, did not call them distortion-rate functions, there should still be some explicit explanation on the difference of their framework from the standard one in information theory. + +[C1] Nir Elkayam and Meir Feder, ""One shot approach to lossy source coding under average distortion constraints,"" IEEE International Symposium on Information Theory, 2389-2393, 2020. [link](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9173943) + +[C2] ibid., ""One shot approach to lossy source coding under average distortion constraints,"" [Online]. Available: https://arxiv.org/abs/2001.03983 + +Another concern of mine, from the viewpoint of the theory of optimal transport, is regarding the optimal transport map. It is known that under fairly general conditions the optimal transport *plan* exists. However the optimal transport *map* is not guaranteed to exist, and even if it exists it can be highly irregular. (See, e.g., Villani, 2009) The descriptions in this paper, such as ""computing the optimal transport map is not straightforward"" and ""learn approximately optimal mappings"" on page 4 would be too naive in that they would assume existence and approximability of the optimal mapping. + +Despite these concerns, most reviewers agree that this paper presents an interesting piece of work. I would therefore like to recommend acceptance of this paper, and would appreciate it if the authors consider addressing appropriately the above concerns of mine in the final version.",ICLR2022, +B1dDN1pHz,1517250000000.0,1517260000000.0,373,HkcTe-bR-,HkcTe-bR-,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"The paper creates a dataset for exploration of RL for molecular design and I think this makes it a strong contribution to the community at the intersection of the two. For a methods focussed conference such as ICLR however, it may not be the best fit. Hence I would recommend submitting to a workshop track or targeting a more focussed venue such as a bioinformatics conference. ",ICLR2018, +p7pK47YIlO5,1610040000000.0,1610470000000.0,1,OPyWRrcjVQw,OPyWRrcjVQw,Final Decision,Accept (Poster),"A good paper with significant contribution on XAI and the on- vs off- data manifold explainability. +Reviewers have appreciated authors’ feedback and update of the paper (R1, R2, R4). I would like to personally thank the authors for a smooth, extensive and focused interaction w/ updates. +",ICLR2021, +YiwpKYBYhB_,1610040000000.0,1610470000000.0,1,N6SmiyDrkR5,N6SmiyDrkR5,Final Decision,Reject,"This paper proposes a method to explore neuron interactions within a neural network by deriving rules for the activations of units at different layers. The rules can presumably help interpret the inner workings of the neural network. +The reviewers have very different opinions on the paper and the views did not converge. However, there is a common concern on the lack of quantitative evaluation on the faithfulness of the rules to the models. I therefore do not recommend accept. + +R1[5]: On a related note, I felt the evaluation presented by authors while extensive is rather qualitative in nature. +R2[3]: Given that I could provide you with a couple of references that you admit is relevant, and this was just off the top of my head, would you care to comment on a quantitative comparison with the referenced approaches? +R3[8]: The examples look very impressive, but my main concern is with whether the examples could have been cherry-picked, in the sense that most of the thousands of rules produced may not be useful. + + +",ICLR2021, +HJgCiGL_e,1486400000000.0,1486400000000.0,1,r1xUYDYgg,r1xUYDYgg,ICLR committee final decision,Invite to Workshop Track,"A summary of the reviews and discussion is as follows: + + Strengths + Code for matrix library sushi2 and DL library sukiyaki2 are on Github, including live demos -- work is reproduceable (R2) + Work/vision is exciting (R2) + + Weaknesses + Projects preliminary (documentation, engineering of convolutions, speed, etc.) (R2) + Perhaps not the right fit for ICLR? (R3) AC comment: ICLR specifically lists *implementation issues, parallelization, software platforms, hardware* as one of the topics of interest + Doesn’t advance the state-of-the-art in performance (e.g. no new algorithm or UI/UX improvement) (R3) + + The authors responded to the pre-review questions and also the official reviews; they updated their demo and paper accordingly. + + Looking at the overall sentiment of the reviews, the extensive feedback from the authors, and the openness of the project I feel that it is a valuable contribution to the community. + + However, given that the paper doesn't clearly advance the state of the art, the PCs believe it would be more appropriate to present it as part of the Workshop Track.",ICLR2017, +r1xEZFR7gE,1544970000000.0,1545350000000.0,1,Syx9rnRcYm,Syx9rnRcYm,not a well polished paper,Reject,"The paper compared between different CNNs for UAV trail guidance. The reviewers arrived at a consensus on rejection due to lack of new ideas, and the paper is not well polished. ",ICLR2019,4: The area chair is confident but not absolutely certain +ez98Jn1B73,1642700000000.0,1642700000000.0,1,LUpE0A3Q-wz,LUpE0A3Q-wz,Paper Decision,Reject,"This paper proposes a federated averaging Langevin dynamics (FA-LD) for numerical mean prediction with uncertainty quantification under the setting of federated learning. Convergence analysis for the proposed method under the smoothness and strong-convex assumptions is also provided, and the results are summarized in Theorems 5.7-5.10, each of which bounds the Wasserstein-2 distance $W_2(\mu_k,\pi)$ between the model distribution $\mu_k$ and the target distribution $\pi$ under different settings. + +This paper received 5 reviews in total, with scores 6, 5, 3, 5, and 3. Some reviewers evaluated positively the novelty of the idea of using the Langevin dynamics in the federated setting, which I would also like to acknowledge. Upon reading the paper by myself, however, I find that the mathematical formulations are in some places not correct. What I think problematic is the third equation in equation (3): The right-hand side is a function of $N$ variables $\\\{\theta_k^c\\\}$, and they undergo different local updates at different clients when $k\not\equiv 0\mod K$ (i.e., the synchronization does not take place). Also $\nabla\tilde{f}^c$ is in general a nonlinear function of its argument. Therefore, the right-hand side cannot be written in general as a function of a single variable $\theta_k$ which is defined as $\theta_k=\sum_{c=1}^Np_c\theta_k^c$, making this equation incorrect. This problem would affect various parts of the arguments to follow in this paper, such as the first two equations in equation (16) on page 14, the two inline equations just after equation (16), equation (18), the second equality in the inline equation in page 15, line 1, and the third line in equation (25) on page 18, to mention a few. Thus I have to question the validity of the theoretical development in this paper. + +Another point I would like to mention is that I did not understand the definition of Schemes I and II in Section 5.4. It is not stated at all that $\mathcal{S}_k$ is a random quantity here. Furthermore, the conditions ""with/without replacement"" are not described at all. Still another point to mention is that I did not understand the claim in page 7, lines 30-31. Does it mean: If one knows the number $T_\epsilon$ of steps to achieve the precision $\epsilon$, then one should set the number $K$ of local steps per synchronization should be set of the order of $\sqrt{T_\epsilon}$. But $T_\epsilon$ depends on $K$, so that it would be unnatural to assume that one knows $T_\epsilon$ irrespective of $K$ in the first place. + +Because of these, I would judge that this paper is not yet ready for presentation in its current form. I would therefore not be able to recommend acceptance of this paper. + +Minor points: +- Citation style: The authors use throughout the paper what are called the *narrative citations* even though there are occasions where what are called the *parenthetical citations* (the author name and publication date are both enclosed in parentheses) should be used. +- page 3, line 7: is (the -> an) unbiased stochastic gradient; There are several unbiased estimators for the gradient, and what is mentioned here is only one instance of them. +- page 3, lines 23-24: The aggregation should take place not on each client but on the central server. +- page 3, line 36: a(n) energy function; a(n) unbiased estimate +- page 5, lines 17-20: The contents of Assumptions 5.1 and 5.2 are not assumptions but definitions. +- page 6, line 2: to obtain (the -> a) lower bound +- page 6, line 18: $\mathcal{D}^2$ is undefined. +- page 8, line 39: (a -> the) probability $p_c$ if it is meant to be the one defined in page 3, line 8. Otherwise, use of the same symbol to represent different quantities should be avoided. +- page 14, line 25: mod ($E$ -> $K$) =0 +- page 15, line 30: $H_\rho^2$ -> $H_\rho$",ICLR2022, +xS_-Mv8nvg,1576800000000.0,1576800000000.0,1,BJl07ySKvS,BJl07ySKvS,Paper Decision,Accept (Poster),"The paper consider the problem of program induction from a small dataset of input-output pairs; the small amount of available data results a large set of valid candidate programs. +The authors propose to train an neural oracle by unsupervised learning on the given data, and synthesizing new pairs to augment the given data, therefore reducing the set of admissible programs. +This is reminiscent of data augmentation schemes, eg elastic transforms for image data. + +The reviewers appreciate the simplicity and effectiveness of this approach, as demonstrated on an android UI dataset. +The authors successfully addressed most negative points raised by the reviewers in the rebuttal, except the lack of experimental validating on other datasets. + +I recommend to accept this paper, based on reviews and my own reading. +I think the manuscript could be further improved by more explicitly discussing (early in the paper) the intuition why the authors think this approach is sensible: +The additional information for more successfully infering the correct program has to come from somewhere; as no new information is eg given by a human oracle, it was injected by the choice of prior over neural oracles. +It is essential that the paper discuss this. ",ICLR2020, +B1czEyprf,1517250000000.0,1517260000000.0,310,HyWrIgW0W,HyWrIgW0W,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Dear authors, + +Based on the comments and your rebuttal, I am glad to accept your paper at ICLR.",ICLR2018, +HJeAUO-Ql4,1544920000000.0,1545350000000.0,1,Bkg2viA5FQ,Bkg2viA5FQ,A straightforward but solidly useful contribution.,Accept (Poster),"The paper generalizes the concept of ""hindsight"", i.e. the recycling of data from trajectories in a goal-based system based on the goal state actually achieved, to policy gradient methods. + +This was an interesting paper in that it scored quite highly despite all three reviewers mentioning incrementality or a relative lack of novelty. Although the authors naturally took some exception to this, AC personally believes that properly executed, contributions that seem quite straightforward in hindsight (pun partly intended) can be valuable in moving the field forward: a clean and didactic presentation of theory backed by well-designed and extensive empirical investigation (both of which are adjectives used by reviewers to describe the empirical work in this paper) can be as valuable, or moreso, than a poorly executed but higher-novelty works. To quote AnonReviewer3, ""HPG is almost certainly going to end up being a widely used addition to the RL toolbox"". + +Feedback from reviewers prompted extensive discussion and a direct comparison with Hindsight Experience Replay which reviewers agreed added significant value to the manuscript, earning it a post-rebuttal unanimous rating of 7. It is therefore my pleasure to recommend acceptance.",ICLR2019,5: The area chair is absolutely certain +BJgNCp3-eN,1544830000000.0,1545350000000.0,1,ryeh4jA9F7,ryeh4jA9F7,Paper decision,Reject,"Reviewers mostly recommended to reject after engaging with the authors, with one reviewer slightly suggesting to accept, but with confidence 1. Please take reviewers' comments into consideration to improve your submission should you decide to resubmit.",ICLR2019,4: The area chair is confident but not absolutely certain +zYGganLx2C,1576800000000.0,1576800000000.0,1,SyecdJSKvr,SyecdJSKvr,Paper Decision,Reject,"After reading the author's rebuttal, the reviewer still hold that the main contribution is just the simple combination of already known losses. And the paper need to pay more attention on the clarity of the paper.",ICLR2020, +_h9uIK631Zh,1642700000000.0,1642700000000.0,1,lsQCDXjOl3k,lsQCDXjOl3k,Paper Decision,Reject,"This paper modifies the conditional diffusion model guided by a classifier, as introduced by Dhariwal & Nichol 2021, by replacing the explicit classifier with an implicit classifier. This implicit classifier is derived under Bayes' rule and combined with the conditional diffusion model. This combination can be realized by mixing the score estimates of a conditional diffusion model and an unconditional diffusion model. A trade-off between sample quality and diversity, in terms of the IS and FID scores, can be achieved by adjusting the mixing weight. The paper is clearly written and easy to follow. However, the reviewers do not consider the modification to be that significant in practice, as it still requires label guidance and also increases the computational complexity. From the AC's perspective, the practical significance could be enhanced if the authors can generalize their technique beyond assisting conditional diffusion models.",ICLR2022, +rT4dCu9Ps1M,1610040000000.0,1610470000000.0,1,WznmQa42ZAx,WznmQa42ZAx,Final Decision,Accept (Spotlight),"This paper proposes a method for interpretable graph neural networks. +The idea is intuitively well motivated: after training the model, discard spurious edges that are not critical to making predictions in the graph, and only retain salient edges. +Experiments on synthetic and real datasets show that the proposed method is effective at dropping only the edges that are not useful for the task at hand; while leading to only small performance degradation. The paper is well-written. Overall, the paper brings together prior ideas in a useful way, and is well-executed.",ICLR2021, +H1lLShElgV,1544730000000.0,1545350000000.0,1,B1l08oAct7,B1l08oAct7,Meta-review,Accept (Oral),"The manuscript proposes deterministic approximations for Bayesian neural networks as an alternative to the standard Monte-Carlo approach. The results suggest that the deterministic approximation can be more accurate than previous methods. Some explicit contributions include efficient moment estimates and empirical Bayes procedures. + +The reviewers and ACs note weakness in the breadth and complexity of models evaluated, particularly with regards to ablation studies. This issue seems to have been addressed to the reviewer's satisfaction by the rebuttal. The updated manuscript also improves references to related prior work. + +Overall, reviewers and AC agree that the general problem statement is timely and interesting, and well executed. We recommend acceptance.",ICLR2019,4: The area chair is confident but not absolutely certain +U2Aen5a7Th5,1642700000000.0,1642700000000.0,1,#NAME?,#NAME?,Paper Decision,Reject,"This paper analyzes problems of existing threshold meta-learners and attentional meta-learners for few-shot learning in polythetic classifications. The threshold meta-learners such as prototypical networks require exponential number of embedding dimensionality, and the attentional meta-learners are susceptible to misclassification. The authors proposed a simple yet effective method to address these problems, and demonstrated its effectiveness in their experiments. This paper discusses meta-learning from a very unique perspective as commented by a reviewer, and clearly explained problems of widely-used meta-learning methods. However, this paper focus on prototypical networks and matching networks even though there have been proposed many meta-learning methods. Some existing methods seem not to have the problems of prototypical networks and/or matching networks. In addition, the practical benefits of the proposed approach are not well demonstrated. Although the additional experiments in the author response addressed some concerns of the reviewers, they are not enough to demonstrate the effectiveness of the proposed method.",ICLR2022, +v3Hv6oWhw1D,1642700000000.0,1642700000000.0,1,3r034NfDKnL,3r034NfDKnL,Paper Decision,Reject,"The Authors study the emergence of systematic generalization in neural networks. The paper studies a timely topic and presents a set of concrete results. For example, reviewer ZgRW emphasizes that a key strength of the paper is constructing simple datasets where systematicity emerges. I think indeed it is valuable, as systematicity is sometimes poorly defined and understood, so building a theoretical testbed might be very helpful. + +However, the reviewers found important issues, which the rebuttal was unable to address. Perhaps the key issue (raised e.g. by reviewer 9QCY) is that results do not clearly generalize to more practically relevant settings. What is somewhat missing is a clear set of guidelines or implications for how to improve systematicity in more practically relevant neural networks. + +Based on this and other issues raised by the reviewers, unfortunately, I have to recommend rejecting the paper. Thank you for your submission, and I hope that the review process will help you improve the work.",ICLR2022, +TYZj4_CM1a2,1642700000000.0,1642700000000.0,1,EXHG-A3jlM,EXHG-A3jlM,Paper Decision,Accept (Poster),"Overall, this paper receives positive reviews. The reviewers find the technical novelty and contributions are significant enough for acceptance at this conference. The authors' rebuttal helps address some issues. The area chair agrees with the reviewers and recommend it be accepted at this conference.",ICLR2022, +683rfr7amuJ,1642700000000.0,1642700000000.0,1,DrpKmCmPMSC,DrpKmCmPMSC,Paper Decision,Reject,"By the scores, this submission is quite borderline. This paper introduces stochastic weight averaging into a few-shot learning setting, The reviewers all agreed the work was sound; discussion after the author response focused on the theoretical justifications, degree of novelty and potential impact, and the empirical support. + +The primary concerns were that the work was slightly too incremental to obviously merit publication at this stage: though the empirical results were sound, they mostly follow the existing observation that SWA tends to be beneficial for generalization in other settings; apparently in few-shot learning as well. The positives would be that this is simple enough that it could become a general ""best practice"" in few-shot learning baselines, and as such communicating this is important. + +The other discussion focused around the theoretical justification relating SWA to low-rank solutions. While empirically it does seem that the solutions found by SWA lead to low-rank representations, this is not really adequately explored, and it's not clear enough why this should be expected to happen. I think if this relationship between SWA and low-rank representations were more clearly explored then the paper would be a strong accept. + +As it stands, it is quite borderline. Based on the scores (5,5,6), the recommendation is to reject, but it certainly could be included as well, as it has solid execution and is a clear topical fit for ICLR.",ICLR2022, +AJApcXFTnK1M,1642700000000.0,1642700000000.0,1,7yuU9VeIpde,7yuU9VeIpde,Paper Decision,Reject,"Two trust region constrained optimization for policy gradient RL, where the second trust region is based on a virtual policy built from a memory buffer and using an attention mechanism to combine prior policies. The reviewers agree that the paper is well written, the idea is novel, and the paper is extensively evaluated. The authors are commended for running the additional baselines during the rebuttal period. + +However, the paper still contains some shortcomings, specifically, the results are somewhat inconclusive even after the rebuttal. While it is not expected that the method wins across the board, it is important to provide an analysis of the limitations of the method. When is the algorithm appropriate to use, and when is it not? + +To make the paper stronger, in the next version of the paper should: +- move the theory in the main text (Appendix C). +- provide the analysis of the algorithm and its limitations.",ICLR2022, +rIcDsAYwjb,1576800000000.0,1576800000000.0,1,rygjmpVFvB,rygjmpVFvB,Paper Decision,Accept (Poster),The authors propose a way to generate unseen examples in GANs by learning the difference of two distributions for which we have access. The majority of reviewers agree on the originality and practicality of the idea.,ICLR2020, +LONJGohbcm,1610040000000.0,1610470000000.0,1,gMRZ4wLqlkJ,gMRZ4wLqlkJ,Final Decision,Reject,"This paper proposes a meta-learning based few-shot federated learning approach to reduce the communication overhead incurred in aggregating model updates. The use of meta-learning also gives some generalization benefits. The reviewers think that the paper has the following main issues (see reviews for more details): +* Limited technical novelty - the paper seems to simply combine meta-learning with federated learning +* Not clear whether the communication overhead is actually reduced because the meta-learning phase can require significant communication and computation. +* The experimental evaluation, in particular, the data distribution, could have been more realistic. + +I hope that the authors can use the reviewers' feedback to improve the paper and resubmit to a future venue. + +",ICLR2021, +6uPQy0dT9b,1576800000000.0,1576800000000.0,1,H1ldzA4tPr,H1ldzA4tPr,Paper Decision,Accept (Spotlight),"This paper proposes using object-centered graph neural network embeddings of a dynamical system as approximate Koopman embeddings, and then learning the linear transition matrix to model the dynamics of the system according to the Koopman operator theory. The authors propose adding an inductive bias (a block diagonal structure of the transition matrix with shared components) to limit the number of parameters necessary to learn, which improves the computational efficiency and generalisation of the proposed approach. The authors also propose adding an additional input component that allows for external control of the dynamics of the system. The reviewers initially had concerns about the experimental section, since the approach was only tested on toy domains. The reviewers also asked for more baselines. The authors were able to answer some of the questions raised during the discussion period, and by the end of it all reviewers agreed that this is a solid and novel piece of work that deserves to be accepted. For this reason I recommend acceptance.",ICLR2020, +wudeaXlvBi,1576800000000.0,1576800000000.0,1,HylKvyHYwS,HylKvyHYwS,Paper Decision,Reject,"The paper addresses the setting of learning with rejection while incorporating the ideas from learning with adversarial examples to tackle adversarial attacks. While the reviewers acknowledged the importance to study learning with rejection in this setting, they raised several concerns: (1) lack of technical contribution -- see R1’s and R2’s related references, see R3’s suggestion on designing c(x); (2) insufficient empirical evidence -- see R3’s comment about the sensitivity experiment on the strength of the attack, see R1’s suggestion to compare with a baseline that learns the rejection function such as SelectiveNet; (3) clarity of presentation -- see R2’s suggestions how to improve clarity. +Among these, (3) did not have a substantial impact on the decision, but would be helpful to address in a subsequent revision. However, (1) and (2) make it very difficult to assess the benefits of the proposed approach, and were viewed by AC as critical issues. +AC can confirm that all three reviewers have read the author responses and have revised the final ratings. AC suggests, in its current state the manuscript is not ready for a publication. We hope the reviews are useful for improving and revising the paper. +",ICLR2020, +wjUCrAnnAH4,1610040000000.0,1610470000000.0,1,mgVbI13p96,mgVbI13p96,Final Decision,Reject,"The authors provided a comprehensive rebuttal to the reviewers' feedback that addressed most of the concerns. AnonReviewer3 raised some major concerns that were partially resolved in a revision. The paper has received a split recommendation from the reviewers but within the review and discussion periods, there was no strong support towards accepting the paper. Although the paper has received some positive feedback, some of the reviewers' concerns were not fully addressed. I'd recommend the authors to address all the comments and add clarifying notes to the paper to avoid such misunderstandings if they decide to resubmit the paper to another venue. ",ICLR2021, +-8aaQQ3viHd,1642700000000.0,1642700000000.0,1,bVvMOtLMiw,bVvMOtLMiw,Paper Decision,Accept (Poster),"This paper has been independently reviewed by four expert reviewers. Two of them recommended straight acceptance, one of them assesses this work as marginally acceptable after increasing their score as a result of the author's rebuttal, and the last reviewer considers this paper marginally below the acceptance threshold. While the reviewers agree on the importance of the targeted problem and relative novelty of the presented work, the main points of criticism involve empirical evaluations - its methodology, experimental design, missing relevant and important comparisons. Since the authors have addressed most of those concerns in their rebuttal, I am leaning towards recommending acceptance of this work for ICLR.",ICLR2022, +eVwBeykihp,1610040000000.0,1610470000000.0,1,H8hgu4XsTXi,H8hgu4XsTXi,Final Decision,Reject,"This paper proposes a regularization term that enforces the orthogonality between (i) a residual between the observed outcome and its estimator and (ii) the treatment and propensity score. The method empirically performs competitively. However, there seems to exist a gap between the proposed method and the assumptions made to provide theoretical guarantees (e.g., R3, R2). R4 was also concerned about the issue and adjusted his/her score accordingly. Even though the authors provide a detailed discussion on most of the reviewers' concerns, some of the problems remain unresolved. Further, unlike other papers submitted to ICLR, the authors did not actually update the paper such that we could check whether the revisions were adequately made. As such, I believe this paper is not quite ready for publication in its current form. +",ICLR2021, +bEkRPhOMkG6,1642700000000.0,1642700000000.0,1,SC6JbEviuD0,SC6JbEviuD0,Paper Decision,Reject,"This paper studies the ""shortcut"" learning phenomenon in CNNs and proposes a simple and effective strategy (white paper) to alleviate specific shortcut patterns (e.g. ""black squares"" in the image). The proposed scheme is verified empirically and shown to improve over some existing solutions. All reviewers appreciate the simplicity of the idea, which allows its quick implementation and reproduciblity. However, reviewers y5Su and C42n believe the notion of shortcuts as studied in this paper are not only very limited, but also artificial. Consequently, they raise doubts about practical relevance/significance of the method for real world datasets with natural shortcuts. Based on these concerns, I suggest authors to identify a real setting (non-artificial data) where, alongside their synthetic shortcuts, can show the practical effectiveness of the proposed can.",ICLR2022, +PGOgSU_E8R,1642700000000.0,1642700000000.0,1,FRxhHdnxt1,FRxhHdnxt1,Paper Decision,Accept (Spotlight),"After much back and forth about prior work, 3 reviewers score this paper as an 8 and one scores it as a 3. +Other reviewers have written to the 3 and told them they believe that their review is now too harsh, in light of clarifications w.r.t. related work. I tend to agree, though I must admit that I am not an expert on this topic. +Given that there is almost unanimous support for accepting and it's possible that the one hold-out has not seen some of the extra information, I recommend acceptance. +Given the praise from the other three reviewers, I moreover recommend a spotlight.",ICLR2022, +SyDHVyTHG,1517250000000.0,1517260000000.0,344,SkHl6MWC-,SkHl6MWC-,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"R1 thought the proposed method was novel and the idea interesting. However, he/she raised concerns with consistency in the experimental validation, the trade-off between accuracy and running time, and the positioning/motivation, specifically the claim about interpretability. The authors responded to these concerns, and R1 upgraded their score. R2 didn’t raise major concerns or strengths. R3 questioned the novelty of the work and the experimental validations. All reviewers raised concerns with the writing. Though I think the work is interesting, issues raised about experiments and writing make me hesitant to go against the overall recommendation of the reviewers, which is just below the bar. I think this is a paper that could make a good workshop contribution. +",ICLR2018, +KndSbWQ73F,1642700000000.0,1642700000000.0,1,SVey0ddzC4,SVey0ddzC4,Paper Decision,Reject,"The paper presents several related results. The initial main result consists in relating GPCA to GCN, showing that GPCA can be understood as a first order approximation of some specific instance of GCN where the W matrix is directly defined on data. This result is then exploited to define a supervised version of GPCA. As a follow-up the authors propose a novel GPCA-based network (GPCANet) and a GPCANet initialisation for GNNs. The paper is well written and easy to read. Empirical results are reported to verify the above mentioned connection between GPCA and GCN, as well as the performances of GPCANet and the proposed initialisation for GNNs. Overall, while the mentioned connection was never explicitly reported in the literature, its existence is not surprising and thus its significance seems to be limited. Also the performances of GPCANet do not seem to be significant from a statistical point of view. The novel initialisation procedure for GNNs seems to be interesting and promising, although the used datasets may not make evident its full power. Authors rebuttal and discussion did not change the reviewers' initial assessment.",ICLR2022, +S1gy4rlzx4,1544840000000.0,1545350000000.0,1,SJf6BhAqK7,SJf6BhAqK7,Good but not good enough,Reject,"All reviewers wrote strong and long reviews with good feedback but do not believe the work is currently ready for publication. +I encourage the authors to update and resubmit. +",ICLR2019,5: The area chair is absolutely certain +UO8v0jRsvU,1576800000000.0,1576800000000.0,1,Bkf4XgrKvS,Bkf4XgrKvS,Paper Decision,Reject,"This paper presents a differentiable coarsening approach for graph neural network. It provides the empirical demonstration that the proposed approach is competitive to existing pooling approaches. However, although the paper shows an interesting observation, there are remaining novelty as well as clarity concerns. In particular, the contribution of the proposed work over the graph kernels based on other forms of coarsening such as the early work of Shervashidze et al. as well as higher-order WL (pointed out by Reviewer1) remains unclear. We believe the paper currently lacks comparisons and discussions, and will benefit from additional rounds of future revisions. +",ICLR2020, +N5TpEAKcyI,1576800000000.0,1576800000000.0,1,HkxWXkStDB,HkxWXkStDB,Paper Decision,Reject,"The paper in its current form was just not well enough received by the reviewers to warrant an acceptance rating. It seems this work may have promise and the authors are encouraged to continue with this line of work. +",ICLR2020, +N1BQ4uK8fyo,1610040000000.0,1610470000000.0,1,OOsR8BzCnl5,OOsR8BzCnl5,Final Decision,Accept (Poster),"The paper introduces a new idea for multi-view classification: using a Dirichlet distribution over the views to model uncertainty. + +The paper appears to be clear, well written and sound. +Also, the experimental comparison is thorough. + +The authors have given pertinent responses to the reviewers' questions, including w.r.t comparing against Bayesian/deep CCA in terms of accuracy. + +Overall, this is a good paper. +",ICLR2021, +RVcn__WBUFa,1642700000000.0,1642700000000.0,1,Kvbr8NicKq,Kvbr8NicKq,Paper Decision,Reject,"The paper focuses on the strong adversarial attack, i.e., an attack that can generate strong adversarial examples and thus can better evaluate the adversarial robustness of given deep learning models. One review gave a score of 8 while the other 3 reviewers gave negative scores. The main issue lies in the limited experiments, as a potential substitute for AA, the proposed MM should be widely tested against different defenses, just as done in the AA paper. The writing of the paper is somehow is not rigorous including many incorrect statements and unsupported claims which should be well addressed in the revision. Thus, it cannot be accepted to ICLR for its current version.",ICLR2022, +18u10xwdKpm,1610040000000.0,1610470000000.0,1,B5bZp0m7jZd,B5bZp0m7jZd,Final Decision,Reject,"# Quality: +While the paper presents an interesting approach, Reviewer 2 raised relevant questions about the assumption of the theoretical justification that needs to be thoroughly addressed. +Moreover, as noted by Reviewer4, the quality of the paper would also benefit from a more clear connection to existing model-based reinforcement learning literature, besides [Pan et al.]. For example, how much of the proposed approach and results can be applied in other algorithms? + +# Clarity: +While the paper is generally well written and only minor suggestions from the reviewers should be implemented. + +# Originality: +The proposed approach is a small but novel improvement over existing algorithms (to the best of the reviewers and my knowledge). + +# Significance of this work: +The paper deal with a relevant and timely topic. However, it is currently very difficult to gauge the significance of this work, and it unclear if the results can be extended beyond toy benchmarks and to other RL algorithms. Several reviewers suggested additional experiments to strengthen the paper. + +# Overall: +This paper deal with an interesting topic and presents new interesting results. However, the current manuscript is just below the acceptance threshold. Extending the experimental evaluation and improving the clarity of the paper would crucially increase the quality of the paper. +",ICLR2021, +mTLUj-1oO3,1576800000000.0,1576800000000.0,1,S1xGCAVKvr,S1xGCAVKvr,Paper Decision,Reject,"This paper proposes an improved (over Andrychowicz et al) meta-optimizer that tries to to learn better strategies for training deep machine learning models. The paper was reviewed by three experts, two of whom recommend Weak Reject and one who recommends Reject. The reviewers identify a number of significant concerns, including degree of novelty and contribution, connections to previous work, completeness of experiments, and comparisons to baselines. In light of these reviews and since the authors have unfortunately not provided a response to them, we cannot recommend accepting the paper.",ICLR2020, +H1nVhG8dx,1486400000000.0,1486400000000.0,1,BJmCKBqgl,BJmCKBqgl,ICLR committee final decision,Reject,The Area Chair recommends to reject this paper given the reviewers concern about the limited significance of this work and the lack of comparisons. We encourage the authors to take into account the reviewers feedback and resubmit.,ICLR2017, +BkgtE3dgxN,1544750000000.0,1545350000000.0,1,Hygvln09K7,Hygvln09K7,"Useful idea, requires more thorough experiments",Reject,"The paper introduces an interesting idea of using different rates of learning for low level vs high level computation for meta learning. However, the experiments lack the thoroughness needed to justify the basic intuition of the approach and design choices like which layers to learn fast or slow need to be further ablated.",ICLR2019,5: The area chair is absolutely certain +2KZEQ3E4bRX,1642700000000.0,1642700000000.0,1,gEZrGCozdqR,gEZrGCozdqR,Paper Decision,Accept (Oral),"This paper examines the extent to which a large language model (LM) can generalize to unseen tasks via ""instruction tuning"", a process that fine-tunes the LM on a large number of tasks with natural language instructions. At test time, the model is evaluated zero-shot on held out tasks. The empirical results are good, and the 137B FLAN model generally out performs the 175B untuned GPT-3 model. + +All reviewers voted to accept with uniformly high scores, despite two commenting on the relative lack of novelty. The discussion period focused on questions raised by two reviewers regarding the usefulness of fine-tuning with instructions vs. multi-task fine-tuning without instructions. The authors responded with an ablation study demonstrating that providing instructions at during tuning led to large gains. + +Overall the paper's approach and detailed experiments will be useful for other researchers working in this fast moving area in NLP.",ICLR2022, +BkxJXB4Vx4,1544990000000.0,1545350000000.0,1,SkgzYiRqtX,SkgzYiRqtX,an interesting direction but not ready for publication yet,Reject,"+ experiments on an interesting task: inferring relations which are not necessarily explicitly mentioned in a sentence but need to be induced relying on other relations ++ the idea to frame the relation prediction task as an inference task on a graph is interesting + +- the paper is not very well written, and it is hard to understand what exactly the contribution is. E.g., the authors contrast with previous work saying that previous work was relying on pre-defined graphs rather than inducing them. However, here they actually rely on predefined full graphs as well (i.e. full graphs connecting all entities). (See questions from R1) + +- the idea of predicting edge embeddings from the sentence is an interesting one. However, I do not see results studying alternative architectures (e.g., fixed transition matrices + gates / attention), or careful ablation studies. It is hard to say if this modification is indeed necessary / beneficial. (See also R3, agreeing that experiments look preliminary) + +- Extra baselines? E.g., what about layers of multi-head self-attention across entities? (as in Transformer). What about the number of parameters for the proposed model? Is there chance that it works better simply because it is a larger model? (See also R3) + +- evaluation only one dataset (not clear if any other datasets of this kind exist though) + +Overall, though I find the direction and certain aspects of the model quite interesting, the paper is not ready for publication.",ICLR2019,4: The area chair is confident but not absolutely certain +S1lihjEWgE,1544800000000.0,1545350000000.0,1,HJGciiR5Y7,HJGciiR5Y7,Latent model for images - presents convincing results on image impainting+,Accept (Poster),The reviewers are in general impressed by the results and like the idea but they also express some uncertainty about how the proposed actually is set up. The authors have made a good attempt to address the reviewers' concerns. ,ICLR2019,4: The area chair is confident but not absolutely certain +SyVHUy6Sz,1517250000000.0,1517260000000.0,772,ry4SNTe0-,ry4SNTe0-,ICLR 2018 Conference Acceptance Decision,Reject,"The paper aims to combine Wasserstein GAN with Improved GAN framework for semi-supervised learning. + +The reviewers unanimously agree that: + - the paper lacks novelty and such approaches have been tried before. + - the approach does not make sufficient gains over the baselines and stronger baselines are missing. + - the paper is not well written and experimental results are not satisfactory.",ICLR2018, +Bkl_fNFWgN,1544820000000.0,1545350000000.0,1,BkfxKj09Km,BkfxKj09Km,Radically different scores,Reject,"Reviewer ratings varied radically (from a 3 to an 8). However, the reviewer rating the paper as 8 provided extremely little justification for their rating. The reviewers providing lower ratings gave more detailed reviews, and also engaged in discussion with the authors. Ultimately neither decided to champion the paper, and therefore, I cannot recommend acceptance.",ICLR2019,4: The area chair is confident but not absolutely certain +ByxXmZmR3m,1541450000000.0,1545350000000.0,1,r1l73iRqKm,r1l73iRqKm,Interesting dataset and evaluation framework,Accept (Poster),"The paper proposes a new dataset for studying knowledge grounded conversations, that would be very useful in advancing this field. In addition to the details of the dataset and its collection, the paper also includes a framework for advancing the research in this area, that includes evaluation methods and baselines with a relatively new approach. +The proposed approach for dialogue generation however is a simple extension of previous work by (Zhang et al) to user transformers, hence is not very interesting. The proposed approach is also not compared to many previous studies in the experimental results. +One of the reviewers highlighted the weakness of the human evaluation performed in the paper. Moving on, it would be useful if further approaches are considered and included in the task evaluation. + +A poster presentation of the work would enable participants to ask detailed questions about the proposed dataset and evaluation, and hence may be more appropriate. +",ICLR2019,4: The area chair is confident but not absolutely certain +nWaWsvkUPp,1610040000000.0,1610470000000.0,1,pXmtZdDW16,pXmtZdDW16,Final Decision,Reject,"This work develops an approach to embed random graphs (some even with dependent edges, hence going beyond classical models such as Erdos-Renyi G(n,p)) using GNNs, and uses these to develop approximation algorithms for solving NP-hard scheduling problems, which typically involve some notion of minimizing weighted completion time (or equivalently, the reward incentivizes early completion, where the age of a job is a linear function of time). This is then used to schedule multiple identical robots to solve a given set of spatially-distributed tasks. The problems considered---Multi-Robot Reward Collection (MRRC) model vehicle-routing, rideshare etc., and are well-motivated. + +This paper takes as motivation earlier work on “structure2vec” by Dai et al. (2016) that uses GNNs to (approximately) solve other NP-hard graph problems: specifically, the random structure2vec developed here is used for an RL approach that learns near-optimal solutions for the MRRC problems considered. + +While the paper’s contributions were appreciated in general, its clarity, the fact that the (1 – 1/e) bound of Theorem 2 follows from classical work of Nemhauser et al. (1978), and the fact that real-life examples were not considered, were considered weaknesses. The authors are encouraged to work on these aspects of the paper. +",ICLR2021, +B1XXQyprz,1517250000000.0,1517260000000.0,101,rkr1UDeC-,rkr1UDeC-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"meta score: 7 + +The paper introduces an online distillation technique to parallelise large scale training. Although the basic idea is not novel, the presented experimentation indicates that the authors' have made the technique work. Thus this paper should be of interest to practitioners. + +Pros: + - clearly written, the approach is well-explained + - good experimentation on large-scale common crawl data with 128-256 GPUs + - strong experimental results + +Cons: + - the idea itself is not novel + - the range of experimentation could be wider (e.g. different numbers of GPUs) but this is expensive! + +Overall the novelty is in making this approach work well in practice, and demonstrating it experimentally.",ICLR2018, +rpTjZ3w0sNm,1642700000000.0,1642700000000.0,1,v-v1cpNNK_v,v-v1cpNNK_v,Paper Decision,Accept (Poster),"This paper proposes an efficient training-free NAS method, NASI, which exploits Neural Tangent Kernels (NTK)’s ability to estimate the performance of candidate architectures. Specifically, the authors provide a theoretical analysis showing that NAS can be realizable at initialization, and propose an efficient approximation of the trace norm of NTK that has a similar form to gradient flow, to alleviate the prohibitive cost of computing NTK. Since the method is training-free, NASI is also label- and data-agnostic. The experimental validation shows that NASI either outperforms or performs comparably to existing training-based and training-free NAS methods, while being significantly more efficient. + +The below is the summary of pros and cons of the paper, after : + +Pros +- The idea of using NTK to predict the performance of candidate neural architectures is both novel and promising, and the proposed analysis and efficient approximation are non-trivial. +- The paper provides sufficient theoretical proof of its claims, including the assumptions made. +- The method is highly efficient in terms of search cost, and the searched architectures obtain good performance on benchmark datasets. +- The method is data/label free and thus allows transfer architectures across tasks. +- The paper is well-written. + +Cons +- There is no result on ImageNet obtained by directly applying NASI on it. + +The initial reviews were split, due to other concerns regarding whether the proposed method finds good architectures, missing comparison against certain training-free baselines, and some unclear descriptions. However, they were addressed away by the authors during the rebuttal period which led to a consensus to accept the paper. + +In sum, this is a strong paper that proposes a novel idea for training-free NAS, and the proposed method seems to be both effective, efficient, and generalizes well across tasks. One remaining concern is the computational cost of running the method on larger datasets, such as ImageNet, and I suggest the authors report the results and the running time in the final paper. + +Another suggestion is to include discussion of, or comparison to other efficient NAS methods based on meta-learning, such as MetaD2A [Lee et al. 21], which is not training-free but is more efficient than the proposed NASI. + +[Lee et al. 21] Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets, ICLR 2021",ICLR2022, +BsUuisw5oWB,1642700000000.0,1642700000000.0,1,dEelotBE6e2,dEelotBE6e2,Paper Decision,Reject,"The paper presents a new defense against backdoor attacks based on the discovery of homogeneous populations in the training data and subsequent filtering of poisoned data due to its difference from the said populations. The method has a solid theoretical foundation which, however, requires strong assumptions on attacks and benign data. Due to these assumptions the theoretical guarantees alone cannot ensure that the defense is robust against adaptive attacks. The experimental validation of the proposed method is limited to one benchmark datasets (CIFAR), additional results are briefly presented in the response but not elaborated on.",ICLR2022, +_Wjx5v6z0uj,1642700000000.0,1642700000000.0,1,hR_SMu8cxCV,hR_SMu8cxCV,Paper Decision,Accept (Spotlight),"This is a strong empirical paper that studies scaling laws for NMT in terms of several new aspects, such as the model quality as a function of the encoder and decoder sizes, and how the composition of data affects scaling, etc. The extensive empricial results offer new insights to the questions and provide valuable guidance for future research on deep NMT. The datasets used in the study are non-public, which may make it hard to reproduce the evaluation.",ICLR2022, +IhTob4YxVb,1610040000000.0,1610470000000.0,1,r7L91opmsr,r7L91opmsr,Final Decision,Reject,"This paper explores the use of partial rejection control (PRC) for improved SMC-based variational bounds. While an unbiased SMC variant with PRC has been previously introduced by Kudlicka et al. (2020), this work introduces innovations that can help apply such ideas to variational inference. These bounds result in improvements in empirical performance. + +This paper was heavily discussed, with significant engagement by both the authors and the reviewers. Most reviewers recommended acceptance of this paper, with one reviewer (R4) recommending against acceptance. R4's central concerns regard the novelty of the proposed approach and its positioning relative to the existing SMC literature. The authors argued vigorously in the comments that this paper should be judged as a contribution to the VI literature and not the SMC literature. Unfortunately, I will recommend that this paper is rejected. It is my opinion that R4's concerns were not fully addressed. + +On the one hand, I agree with the authors that there is significant value to be had in exploring variants of SMC for VI. Indeed, some prior art, like FIVO and IWAE, contributed little to the Monte Carlo literature. I believe that these were good contributions. + +On the other hand, I am concerned that the current draft does not clearly circumscribe its contributions. I read the sections that disuss the works of Schmon et al. (2019) and Kudlicka et al. (2020), and the writing did not leave me with a clear enough sense of the differences. I also read the abstract and introduction of the paper. The introduction of the paper positions this work clearly within the VI literature, but does not clearly discuss prior SMC art, e.g., it does not cite Kudlicka et al. (2020). Despite citing rejection control for SMC, the writing of the abstract and introduction left me with the impression that this work was the first to introduce *unbiased, partial* rejection control for SMC. I believe that impressions matter and that the machine learning community should be generous to adjacent communities when assigning credit. + +I realize that my decision is a matter of taste. I also want to say that I am confident that the authors have a clear sense of where their contribution sits, and I suspect that it is a valuable contribution. However, I cannot recommend the draft in its current form. If this is a contribution to the VI literature, as the authors argue, then the authors should not hesitate to give full credit to prior SMC art. My reading of the current draft still leaves me confused about which aspects of the SMC estimator are actual contributions.",ICLR2021, +O5-NRxbiO,1576800000000.0,1576800000000.0,1,BJlguT4YPr,BJlguT4YPr,Paper Decision,Accept (Poster),"This paper proposes an approach to representing a symbolic knowledge base as a sparse matrix, which enables the use of differentiable neural modules for inference. This approach scales to large knowledge bases and is demonstrated on several tasks. + +Post-discussion and rebuttal, all three reviewers are in agreement that this is an interesting and useful paper. There was intiially some concern about clarity and polish, but these have been resolved upon rebuttal and discussion. Therefore I recommend acceptance. ",ICLR2020, +r1lOSyTSz,1517250000000.0,1517260000000.0,593,HJNGGmZ0Z,HJNGGmZ0Z,ICLR 2018 Conference Acceptance Decision,Reject,"Paper reviewed by three experts who have provided detailed feedback. All three recommend rejection, and this AC sees no reason to overrule their recommendation. ",ICLR2018, +IJsTAlj1y7,1610040000000.0,1610470000000.0,1,cKnKJcTPRcV,cKnKJcTPRcV,Final Decision,Reject,"The paper proposes a learning framework for Hypergraphs. The proposed method can be viewed as generalisation of GraphSAGE to hyper graphs. Though the paper emphasises that there is significant differences between Hypergraphs and Graphs and hence new methods are required. However, the proposed methods are not significantly different than that used for Graphs. Thus the novelty seems to be limited and hence it is difficult to strongly argue for acceptance. +",ICLR2021, +B1FtnzLOx,1486400000000.0,1486400000000.0,1,BysvGP5ee,BysvGP5ee,ICLR committee final decision,Accept (Poster),"The reviewers agree that this is a well executed paper, and should be accepted and will make a positive contribution to the conference. In any final version please try to make a connection to the other paper at this conference with the same aims and execution.",ICLR2017, +HyxwAYPh1V,1544480000000.0,1545350000000.0,1,Hygm8jC9FQ,Hygm8jC9FQ,"Close, but a few foundational issues, and too many major changes to re-evaluate",Reject,"This paper introduces an autoencoder architecture that can handle sequences of data, and attempts to automatically disentangle multiple static and dynamic factors. + +Quality: The main idea is relatively well-motivated. However the motivation for the particular technical choices made seems a little lacking, and the complexity of the proposed model put a lot of strain on the experiments. A lot of important updates were made by the authors in the rebuttal period, however I feel the number of changes are a lot to ask the reviewers to re-evaluate. + +Clarity: The English of the paper isn't great, including the title (should be ""Using an ..."" or ""Using the ...""). The intro is clear enough, but belabors a relatively simple point about how an image model can't model factors in video. There were some concerning parts where major issues seemed to be glossed over. E.g. ""FHVAE model uses label information to disentangle time series data, which is different setup with our FAVAE model."" As far as I understand, they both are trained from unsupervised data. + +Originality: This paper does a good job of citing related work, but seems incremental in relation to the FHVAE. But the main problem is that the proposed method makes a lot of changes from a standard time-series VAE, and the limited number of experiments means it's hard to say what the important factor in this model's performance is. + +Significance: Ultimately it's hard to say what the takeaway from this paper is. The authors motivated and evaluated a new model, but the work wasn't done in a systematic enough way to make an strong conclusions. What conclusion were asserted seem specious and overly general, e.g. "" Since dynamic factors have the same time dependency, these models cannot disentangle dynamic factors."". Why not? Why can't a dynamic model learn the time-scales of each of its factors automatically? +",ICLR2019,3: The area chair is somewhat confident +CjciZGGTfv,1576800000000.0,1576800000000.0,1,SJl9PTNYDS,SJl9PTNYDS,Paper Decision,Reject,All the reviewers recommend rejecting the paper. There is no basis for acceptance.,ICLR2020, +BJZnoGI_l,1486400000000.0,1486400000000.0,1,SJNDWNOlg,SJNDWNOlg,ICLR committee final decision,Reject,"The paper conducts a detailed evaluation of different CNN architectures applied to visual instance retrieval. The authors consider various deep neural network architectures, with a focus on architectures pre-trained for image classification. + + An important concern of the reviewers is the relevance of the evaluation given the recent impressive experimental results of deep neural networks trained end-to-end for visual instance retrieval by Gordo et al. ""End-to-end Learning of Deep Visual Representations for Image Retrieval"". Another concern is the novelty of the proposed evaluation given the evaluation of the performance for visual instance retrieval of deep neural network pre-trained for image classification performed in Paulin et al. ""Convolutional Patch Representations for Image Retrieval: An Unsupervised Approach"". + + A revision of the paper, following the reviewers' suggestions, will generate a stronger submission to a future venue.",ICLR2017, +Tnz_Ol2xLb,1576800000000.0,1576800000000.0,1,S1lslCEYPB,S1lslCEYPB,Paper Decision,Reject,"This paper centers on an unbiased variant of the Mutual Information Neural Estimation (procedure), using the so-called ""eta trick"" applied to the Donsker-Varadhan lower bound on the KL divergence. The paper's contribution is mainly theoretical though experiments are presented on synthetic Gaussian-distributed data as well as CIFAR10 and STL10 classification experiments (from learned representations). + +R1's criticism of the theoretical contributions centers on fundamental limitations on finite sample estimation of the MI, contending that the bounds simply aren't meaningful in high-dimensional settings, and that the empirical work centers on synthetic data and self-generated baselines rather than comparisons to reported numbers in the literature; they were unswayed by the author response, which contended that these criticisms were based on pessimistic worst-case analysis and that ""mild assumptions on the mutual information and function class"" could render better finite-sample bounds. Some of R3's concerns were addressed by the author rebuttal and associated updates, but remained critical of the presentation, in particular regarding the dual function, and downgraded their score. + +Because R2 disclosed that they were outside of their area of strong expertise, a 4th reviewer was sought (by this stage, the paper was the revised version). Concerns about clarity persisted, with R4 remarking that a section was ""a collection of different remarks without much coherence, some of which are imprecisely stated"". R4 felt variance and sample complexity should be dealt with experimentally, though this was not directly addressed in the author response. R4 also remarked that the plots were difficult to read and questioned the utility of supervised representation learning benchmarks at assessing the quality of MI estimation, given recent evidence in the literature. + +The theoretical contributions of this submission are slightly outside the bounds of my own expertise, but consensus among three expert reviewers appears to be that the clarity of exposition leaves much to be desired, and I concur with their assessment that the empirical investigation is insufficiently rigorous and does not draw clear comparisons to existing work in this area. I therefore recommend rejection.",ICLR2020, +HyouBy6HG,1517250000000.0,1517260000000.0,602,SyqAPeWAZ,SyqAPeWAZ,ICLR 2018 Conference Acceptance Decision,Reject,"This paper addresses the question of how to solve image super-resolution, building on a connection between sparse regularization and neural networks. +Reviewers agreed that this paper needs to be rewritten, taking into account recent work in the area and significantly improving the grammar. The AC thus recommends rejection at this time. ",ICLR2018, +B1lAH2tmlE,1544950000000.0,1545350000000.0,1,BkxAUjRqY7,BkxAUjRqY7,"Proposes measure for transferability with various selling points, but just ok ",Reject,"The paper proposes an information theoretic quantity to measure the performance of transferred representations with an operational appeal, easier computation, and empirical validation. + +The relation of the proposed measure to test accuracy is not considered. The operational meaning holds exactly only in the special case of linear fine tuning layers. The paper seems to import heavily from previous works. + +Reviewers found it difficult to understand whether the proposed method makes sense, that the computation of relevant quantities might be difficult in general, and that the comparison with mutual information was not clear. The revision addresses these points, adding experiments and explanations. Yet, none of the reviewers gives the paper a rating beyond marginally above acceptance threshold. + +All reviewers found the paper interesting and relevant, but none of them found the paper particularly strong. This is a borderline case of a sound and promising paper, which nonetheless seems to be missing a clear selling point. + +I would suggest that developing the program laid out in the conclusions could make the contributions more convincing, in particular the development of more scalable algorithms and the application of the proposed measure to the design of hierarchies for transfer learning. ",ICLR2019,4: The area chair is confident but not absolutely certain +H1A42zIde,1486400000000.0,1486400000000.0,1,r1VdcHcxx,r1VdcHcxx,ICLR committee final decision,Accept (Poster),"The reviewers believe this paper is of significant interest to the ICLR community, as it demonstrates how to get the popular batch normalization method to work in the recurrent setting. The fact that it has already been cited a variety of times also speaks to its interest within the community. The extensive experiments are convincing that the method works. One common criticism is that the authors don't address enough the added computational cost of the method in the text or empirically. Plots showing loss as a function of wall-clock time instead of training iteration would be more informative to readers deciding whether to use batch norm. + + Pros: + - Gets batch normalization to work on recurrent networks (which had been elusive to many) + - The experiments are thorough and demonstrate the method reduces training time (as a function of training iterations) + - The paper is well written and accessible + + Cons + - The contribution is relatively incremental (several tweaks to an existing method) + - The major disadvantage to the approach is the added computational cost, but this is conspicuously not addressed.",ICLR2017, +Byl2Zatlx4,1544750000000.0,1545350000000.0,1,S1fcnoR9K7,S1fcnoR9K7,meta-review,Reject,"The paper proposes a new optimization approach for neural nets where, instead +of a fixed learning rate (often hard to tune), there is one learning rate per +unit, randomly sampled from a distribution. Reviewers think the idea is +novel, original and simple. Overall, reviewers found the experiments +unconvincing enough in practice. I found the paper really borderline, +and decided to side with the reviewers in rejecting the paper.",ICLR2019,4: The area chair is confident but not absolutely certain +9cFGoP8uZA,1610040000000.0,1610470000000.0,1,1cEEqSp9kXV,1cEEqSp9kXV,Final Decision,Reject,"This work proposes a method to discover neighboring local optima around an existing one. Reviewers all found the idea interesting but argued that the paper needed more work. In particular, some of the claims are too informal or not sufficiently supported and the reviewers found the key section were difficult to follow. The paper should be resubmitted after improving the presentation of the results.",ICLR2021, +Byxx2T6glN,1544770000000.0,1545350000000.0,1,HkeKVh05Fm,HkeKVh05Fm,Meta Review,Reject,"The authors present a method for fine grained entity tagging, which could be useful in certain practical scenarios. + +I found the labeling of the CoNLL data with the fine grained entities a bit confusing. The authors did not talk about the details of how the coarse grained labels were changed to fine grained ones. This detail is important and is missing from the paper. Moreover, there are concerns about the novelty of the work, both in terms of the task definition and the model (see the review of Reviewer 1, e.g.). + +There is consensus amongst the reviewers, in that, their feedback is lukewarm about the paper. + +",ICLR2019,4: The area chair is confident but not absolutely certain +J9xLvtkRFz,1576800000000.0,1576800000000.0,1,H1xFWgrFPS,H1xFWgrFPS,Paper Decision,Accept (Spotlight),"This paper presents an idea for interpolating between two points in the decision-space of a black-box classifier in the image-space, while producing plausible images along the interpolation path. The presentation is clear and the experiments support the premise of the model. +While the proposed technique can be used to help understanding how a classifier works, I have strong reservations in calling the generated samples ""explanations"". In particular, there is no reason for the true explanation of how the classifier works to lie in the manifold of plausible images. This constraint is more of a feature to please humans rather than to explain the geometry of the decision boundary. +I believe this paper will be well-received and I suggested acceptance, but I believe it will be of limited usefulness for robust understanding of the decision boundary of classifiers.",ICLR2020, +S1g6zjske4,1544690000000.0,1545350000000.0,1,ByeSdsC9Km,ByeSdsC9Km,Strong paper,Accept (Poster),"All reviewers recommend acceptance. The problem is an interesting one. THe method is interesting. +Authors were responsive in the reviewing process. + +Good work. I recommend acceptance :)",ICLR2019,5: The area chair is absolutely certain +HJgm-a_WlE,1544810000000.0,1545350000000.0,1,HkezXnA9YX,HkezXnA9YX,meta-review,Accept (Poster),"This paper generated a lot of discussion. Paper presents an empirical evaluation of generalization in models for visual reasoning. All reviewers generally agree that it presents a thorough evaluation with a good set of questions. The only remaining concerns of R3 (the sole negative vote) were lack of surprise in findings and lingering questions of whether these results generalize to realistic settings. The former suffers from hindsight bias and tends to be an unreliable indicator of the impact of a paper. The latter is an open question and should be worked on, but in the opinion of the AC, does not preclude publication of this manuscript. These experiments are well done and deserve to be published. If the findings don't generalize to more complex settings, we will let the noisy process of science correct our understanding in the future. ",ICLR2019,4: The area chair is confident but not absolutely certain +DBZYrOO62zt,1610040000000.0,1610470000000.0,1,rEaz5uTcL6Q,rEaz5uTcL6Q,Final Decision,Reject,"This paper presents an approach to tackle visual reasoning by combining MONET and transformers. All reviewers agree that there is some performance improvement shown. But there are several concerns including clarity/writing (multiple reviewers point it), experiments (baselines) and most importantly missing insights from experiments (why it works). While some of the concerns have been handled in rebuttal, the paper still falls short on primary concern of insights/why it works (which reviewers argue is critical for a paper on reasoning). AC agrees that the paper is not yet ready for publication.",ICLR2021, +oowa-riGCq,1642700000000.0,1642700000000.0,1,WvOGCEAQhxl,WvOGCEAQhxl,Paper Decision,Accept (Spotlight),"This article introduces an interesting variant of the work of Nakkiran & Bansal (2020). It shows empirically that the test error of deep models can be approximated from the disagreement on the unlabelled test data between two different trainings on the same data. The authors then show theoretically that a calibration property can explain such behaviour, and they report experiments showing that the relationship does exist in practical situations. All reviewers agree on the practical and theoretical value of the article, which is very well organised and written. The ideas developed here are likely to lead to further work in the future, and they clearly deserve to be published at ICLR. + +I agree with one of the reviewers that the title is somewhat misleading, as the reader expects an analysis based on SGD. The title could be shortened to ""Assessing Generalization via Disagreement"", and the experimental restriction to SGD could be mentioned in the abstract.",ICLR2022, +_gwHmX90oq7,1610040000000.0,1610470000000.0,1,UFWnZn2v0bV,UFWnZn2v0bV,Final Decision,Reject,"Reviewers agree that the idea of layer wise regularization is interesting and is in line with many efforts in the optimization realm to specialize in the training procedure and the learning rate to each layer. Given the depth of some state of the art neural networks, efficiency is at stake and the idea brought up in this paper naturally falls into that. While the theoretical result in Theorem1 is sound and clear, an extended result on the impact of such « merge » and « layer skipping » on the overall predictions of the algorithm can be well appreciated. The overall goal of network compression should remain to reduce drastically the network size, and thus the training time (energy consumption etc...), while keeping a relatively good prediction accuracy (at least of the same order). Being able to back this with theory (and of course experiments) is crucial. Reviewers also pointed out that the empirical evaluations were not sufficient for ICLR. For example, there are no enough comparisons with existing algorithms and there should be more experimental results based on real datasets. Although the rebuttals did help clarify some of the issues raised by the reviewers, overall this paper does not seem to meet the bar to be accepted. + +",ICLR2021, +jyRG0R4BC0G,1610040000000.0,1610470000000.0,1,KBWK5Y92BRh,KBWK5Y92BRh,Final Decision,Reject,"This paper proposes a new NAS methods that when doing architecture search, returns flat minima using based on a notion of distance defined for two cells (Eq. (2)). Authors then evaluation the effectiveness of the proposed methods against prior work on several benchmarks. + +As authors have discussed in the paper, the idea of using flatness notion in architecture search is not new and has been first proposed by Zela et al 2020. This paper is building on Zela et al 2020 but the proposed algorithm is novel and different than Zela et al 2020. Even though the introduced algorithm is interesting, there are several concerns/areas of improvements: + +1- The proposed method's performance is highly dependent to the notion of distance defined in eq. (2). However, the current choice is not well-motived and does not seem like a well-thought-out choice. See for example the issue raised by R1. I think authors need to spend more time on this choice. One other option is to meta-learn the vector representation of each operation. + +2- All reviewers agree that the improvements marginal and in some cases not statistically significant. Authors have responded by arguing that this is typical for this area of research. I don't find this answer satisfying. For example, consider P-DARTS (Chen et al., 2019). P-DARTS improves over NA-DARTS (the proposed method) on CIFAR-10 and ImageNet and on CIFAR-100 they are on par given the standard deviation of NA-DARTS (see Tables 4 and 5). Moreover, the search cost of P-DART is 0.27% of NA-DARTS (Table 4). So P-DARTS has clear advantage over NA-DARTS. + +Given the above issues, I recommend rejecting the paper. I hope authors would take feedbacks from the reviewing process into account to improve the paper and resubmit. +",ICLR2021, +r1Majz8dl,1486400000000.0,1486400000000.0,1,SJRpRfKxx,SJRpRfKxx,ICLR committee final decision,Accept (Poster),The paper describes a model for video saliency prediction using a combination of spatio-temporal ConvNet features and LSTM. The proposed method outperforms the state of the art on the saliency prediction task and is shown to improve the performance of a baseline action classification model.,ICLR2017, +daJhnGNaPJS,1610040000000.0,1610470000000.0,1,1s1T7xHc5l6,1s1T7xHc5l6,Final Decision,Reject,"This paper introduces an approach based on filter transform for designing networks equivariant to different transformation groups. +Especially, the authors rely on the haramonic analysis view of steerable CNNs given in Weiler & Cesa (2019) to design an equivariant filter bank by computing simple transforms over base filters. + +The reviewers finds the paper technically solid but difficult to read and with a limited contribution. +The AC carefully reads the paper and discussions. Although the connection between steerable CNNs and filter transform are interesting, the AC considers that the main contributions of the paper should be consolidated, especially the positioning with respect to Weiler & Cesa (2019). \ +Therefore, the AC recommends rejection.",ICLR2021, +Hl0JagHbkA4,1642700000000.0,1642700000000.0,1,PtuQ8bk9xF5,PtuQ8bk9xF5,Paper Decision,Reject,"This paper presents a SLAM based approach for the ALFRED benchmark. The presented method, Affordance aware Multimodal Neural SLAM has two key advantages over past works: It uses a multimodal exploration strategy and it predicts an affordance aware semantic map. It also obtains a very large performance improvement over the ALFRED benchmark. The reviewers for this paper were quite impressed by the large improvements obtained by this technique. However, there were two major concerns across the reviews: (1) Are the design choices made in this paper heavily engineered towards ALFRED ? (2) Does the work make too many assumptions about the setting (unrealistic assumptions that may not really hold in more realistic environments or the real world) ? The authors have provided a detailed response and answered many questions posed to them, but the reviewers continue to have concerns about the generalizability of the proposed method. Another point of concern pointed out by a reviewer is whether it is reasonable in a realistic setting to perform exploration with a knowledge of the downstream task. This point has not really been answered satisfactorily by the authors. My takeaway is that the method presented by the authors clearly works on ALFRED. But it contains several design choices that are largely ALFRED specific and in some cases unrealistic. This provides fewer benefits to readers looking for more general insights that can be valuable across a suite of tasks. As a result of this, and in spite of the large gains, I recommend rejecting this paper.",ICLR2022, +p4aynpOYq-,1576800000000.0,1576800000000.0,1,HJe7bxBYvr,HJe7bxBYvr,Paper Decision,Reject,"This paper tackles the problem of safe exploration in RL. The proposed approach uses an imaginative module to construct a connectivity graph between all states using forward predictions. The idea then consists in using this graph to plan a trajectory which avoids states labelled as ""unsafe"". + +Several concerns were raised and the authors did not provide any rebuttal. A major point is that the assumption that the approach has access to what are unsafe states, which is either unreasonable in practice or makes the problem much simpler. Another major point is the uniform data collection about every state-action pairs. This can be really unsafe and defeats the purpose of safe exploration following this phase. These questions may be due to a miscomprehension, indicating that the paper should be clarified, as demanded by reviewers. Finally, the experiments would benefit from additional details in order to be correctly understood. + +All reviewers agree that this paper should be rejected. Hence, I recommend reject.",ICLR2020, +SkxH3a1JxE,1544650000000.0,1545350000000.0,1,SkGQujR5FX,SkGQujR5FX,Good paper but needs more revisions,Reject,The paper needs more revisions and better presentation of empirical study.,ICLR2019,5: The area chair is absolutely certain +IftRR9FsGJs,1610040000000.0,1610470000000.0,1,zg4GtrVQAKo,zg4GtrVQAKo,Final Decision,Reject,"The paper got mixed ratings. However, keeping in mind the low confidence of some of the reviewers, the paper needed an additional look. The AC himself went over the paper. The paper presents an interesting formalism for private information retrieval. As reviewers have pointed out the formalism is based on several existing ideas on utility privacy tradeoff. +The use of GANs for enforcing privacy is also not new. The rebuttal did not convince some of the reviewers about novelty which seems reasonable given the area and literature in it. + +Overall, the paper needs to consolidate all ideas of Adversarial training for privacy and compare and contrast with the proposed approach to make it compelling for publication. ",ICLR2021, +SkatSkprf,1517250000000.0,1517260000000.0,618,B16_iGWCW,B16_iGWCW,ICLR 2018 Conference Acceptance Decision,Reject,"The paper presents a boosting method and uses it to train an ensemble of convnets for image classification. The paper lacks conceptual and empirical comparisons with alternative boosting and ensembling methods. In fact, it is not even clear from the experimental results whether or not the proposed method outperforms a simple baseline model that averages the predictions of T independently trained convolutional networks.",ICLR2018, +8f7VPlJDGog,1642700000000.0,1642700000000.0,1,BM7RjuhAK7W,BM7RjuhAK7W,Paper Decision,Reject,"This paper proposes a method to learn representations in MBRL by exploiting sparsity in the model to improve data efficiency. The key idea is to build a representation for which the model is invariant. +The idea is quite interesting, but one weakness of the current draft is that there is a disconnect between the presented theory (linear case) and the relevant experimental setup (non-linear). +The paper is overall well written but would still benefit from a revision to improve clarity as pointed out by the reviewers. +The experimental results are inconclusive due to the choice of weak baselines.",ICLR2022, +_LTk68-QdC8,1642700000000.0,1642700000000.0,1,t98k9ePQQpn,t98k9ePQQpn,Paper Decision,Accept (Poster),"All reviewers agreed that the idea proposed by the paper is interesting and is well-motivated for handling long-tailed recognition problems. +As suggested by the reviewers, it seems important that the limitations the paper be addressed in the final version of the paper.",ICLR2022, +bmvxDR4rC72,1610040000000.0,1610470000000.0,1,CGQ6ENUMX6,CGQ6ENUMX6,Final Decision,Accept (Poster),"The authors use Empowerment for morphology optimisation, a quite novel idea. After initial unclarities and various improvements on the submission, the reviewers unanimously voted for acceptance of the paper. ",ICLR2021, +bMQlw5p_HFv,1642700000000.0,1642700000000.0,1,7Bc2U-dLJ6N,7Bc2U-dLJ6N,Paper Decision,Reject,"The reviewers have the following concerns: +1. The theoretical results for the proposed method are weak. Theorem 4.2 cannot be considered as a convergence result, because the bound depends on some random variables $r_{T,i}$. The reviewers agree that a proper analysis would require some knowledge on the lower bound of these variables. Although there is some empirical explanation for this, the lower bounded assumption of $r_{T,i}$ is not theoretically justified. The authors acknowledge that this is the main challenge for the present algorithm. In addition, the analysis requires bounded gradient and bounded function value, which is also strong for nonconvex settings. +2. The empirical performance is not strong. In most experiments, the proposed method is not better than the baseline AEGD. The novelty and contribution of SGEM over AEGD is quite limited, since the idea of adding momentum is not new. + +The suggestions to improve this paper are as follows +1. Since the lower bounded assumption on $r_{T,i}$ is not standard and hard to verify, the authors might consider analyzing a theoretical guarantee for it. On the other hand, they could verify more experiments with various data sets to have some sense whether this assumption may be true or not. Next, please try to relax the strong assumptions as discussed. +2. It is better if the authors can show the performance of SGEM for convex settings, and for other deep learning tasks (e.g. NLP) as suggested by the reviewers. + +The authors should consider to improve the paper based on the reviewers' comments and suggestions and resubmit this paper in the future venues.",ICLR2022, +MN0AJAVxpZ3,1642700000000.0,1642700000000.0,1,YevsQ05DEN7,YevsQ05DEN7,Paper Decision,Accept (Poster),"The theory and results presented in this paper provide a new method to avoid collapse in contrastive learning. All but one reviewer recommend acceptance. The lone negative reviewer is concerned with the limited experiments, but the other reviewers, and the AC, find the experimentation convincing enough to warrant acceptance.",ICLR2022, +BkG8my6SM,1517250000000.0,1517260000000.0,139,Hk6WhagRW,Hk6WhagRW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"All reviewers agree the paper proposes an interesting setup and the main finding that ""prosocial agents are able to learn to ground symbols using RL, but self-interested agents are not"" progresses work in this area. R3 asked a number of detail-oriented questions and while they did not update their review based on the author response, I am satisfied by the answers. ",ICLR2018, +pRNmH6d0iG,1610040000000.0,1610470000000.0,1,Xa3iM4C1nqd,Xa3iM4C1nqd,Final Decision,Reject,"The authors propose a framework which combines pretext tasks and data augmentation schemes with the goal of improving robustness of image representations. The authors show that this approach empirically can lead to increased accuracy on both corrupted and uncorrupted data simultaneously. Furthermore, the authors propose a regularization procedure which can be used to maintain a robust representation during transfer to arbitrary downstream tasks. + +The studied problem is significant and highly relevant to the ICLR community. The reviewers agreed that the work is timely, and appreciated the clarity of exposition. At the same time, the reviewers remained in disagreement in terms of novelty and significance of the results -- the proposed method is seen as a clear incremental application of existing techniques in the self-supervised setting. The authors argue that it was not clear that such augmentations would improve the robustness as well as accuracy, but these methods were developed and optimised to improve robustness. In fact, in [1] the authors conclude that “...today's supervised and self-supervised training objectives end up being surprisingly similar” as well as point out that SimCLR is more robust than competing self-supervised methods. Hence, establishing that there is indeed some empirical benefit is a step in the right direction, but not sufficient to meet the bar of acceptance. Furthermore, given the recent trend of scaling-up existing approaches, in particular in terms of the neural architectures and the batch sizes, the computational costs of the proposed regulariser in Eq (4) coupled with the additional hyperparameter to optimize make the approach less practical and general than claimed. In addition, the reviewers pointed out the lack of comparison to a proper baseline, as well as the issue with hyperparameter selection for the baseline after the author response. Finally, given access to additional data augmentations and one more hyperparameter to tune the results should substantially outperform the baselines. + +For the reasons outlined above and the incremental nature of the work, I will recommend rejection. That being said, this was a borderline case, and I urge the authors to carefully revise the manuscript with the received feedback. + +[1] https://arxiv.org/abs/2010.08377 +",ICLR2021, +pwRaydtWoNw,1642700000000.0,1642700000000.0,1,NMEceG4v69Y,NMEceG4v69Y,Paper Decision,Accept (Oral),"The authors propose a new MLP-Mixer-like architecture called Cycle MLP which has two main advantages with respect to MLP-Mixer: (i) it’s applicable to varying input image sizes, and (ii) linear computational complexity. The authors present competitive results on image classification, object detection and segmentation. + +The reviewers felt that both (i) and (ii) are key issues in the current MLP-Mixer-based models. The reviewers also appreciated the simplicity of the idea and the execution of the empirical evaluation. During the rebuttal and discussion phase the authors provided compelling evidence for the issues pointed out in the initial review. + +Given that MLP-Mixer-based architectures are becoming increasingly popular, I believe that these contributions will be of great interest to the ICLR community and I will recommend acceptance.",ICLR2022, +wUcxxqypMK,1576800000000.0,1576800000000.0,1,B1gcblSKwB,B1gcblSKwB,Paper Decision,Reject,"This paper proposes a regularization scheme for reducing meta-overfitting. After the rebuttal period, the reviewers all still had concerns about the significance of the paper's contributions and the thoroughness of the empirical study. As such, this paper isn't ready for publication at ICLR. See the reviewer's comments for detailed feedback on how to improve the paper. ",ICLR2020, +O2Y3nSvTamj,1642700000000.0,1642700000000.0,1,t5EmXZ3ZLR,t5EmXZ3ZLR,Paper Decision,Accept (Spotlight),"Four reviewers have evaluated this submission with one score 6 and three scores 8. Overall, reviewers like the work and note that *a rigorous and principled approach is taken by this work*. AC agrees and advocates an accept.",ICLR2022, +AQbKiZ4_GV,1610040000000.0,1610470000000.0,1,-2FCwDKRREu,-2FCwDKRREu,Final Decision,Accept (Oral),"This paper proposed using the state bisimulation metric to learn invariant representations for reinforcement learning. The method is generic, effective, and is supported by both theoretical and experimental results. All reviewers and I think this is a strong contribution to the area.",ICLR2021, +Y4xPi753NiR,1610040000000.0,1610470000000.0,1,uxYjVEXx48i,uxYjVEXx48i,Final Decision,Reject,"Reviewers agree that this work is promising. The paper is well-grounded in the literature and different aspects of the considered methods are investigated through a variety of experiments. Unfortunately, this paper does not provide sufficient details to allow the reader to understand what has been done nor how to adequately build from it. For example, details in the Appendix lack sufficient formalization of the equations or concepts used to train the preference-based agents. The paper would benefit from clarifications of the method, procedures, and equations used. Beyond that, a major concern lied within the evaluation of the simulated patients across different initializations. Provided that one of the proposed contributions of this paper is a robust simulation platform for RL research within healthcare, it would be important to report convincing results on the patient physiologies admitted by the simulator and characterizing the behaviors of policies learned using this simulator. Finally, issues regarding the structure of the paper, including the split between the main paper and the Appendix, should be resolved before this paper can be published. Notably, the authors should consider elevating important material from the Appendix into the main paper.",ICLR2021, +H1gHyxfmgE,1544920000000.0,1545350000000.0,1,Syez3j0cKX,Syez3j0cKX,"An interesting contribution, but lacking in depth.",Reject,"The manuscript centers on a critique of IRGAN, a recently proposed extension of GANs to the information retrieval setting, and introduces a competing procedure. + +Reviewers found the findings and the proposed alternative to be interesting and in one case described the findings as ""illuminating"", but were overall unsatisfied with the depth of the analysis, and in more than one case complained that too much of the manuscript is spent reviewing IRGAN, with not enough emphasis and detailed investigation of the paper's own contribution. Notational issues, certain gaps in the related work and experiments were addressed in a revision but the paper still reads as spending a bit too much time on background relative to the contributions. Two reviewers seemed to agree that IRGAN's significance made at least some of the focus on it justifiable, but one remarked that SIGIR may be a better venue for this line of work (the AC doesn't necessarily agree). + +Given the nature of the changes and the status of the manuscript following revision, it does seem like a more comprehensive rewrite and reframing would be necessary to truly satisfy all reviewer concerns. I therefore recommend against acceptance at this point in time.",ICLR2019,4: The area chair is confident but not absolutely certain +wL6Kx4bkX4,1576800000000.0,1576800000000.0,1,HygtiTEYvS,HygtiTEYvS,Paper Decision,Reject,"The submission proposes to improve generalization in RL environments, by addressing the scenario where the observations change even though the underlying environment dynamics do not change. The authors address this by learning an adaptation function which maps back to the original representation. The approach is empirically evaluated on the Mountain Car domain. + +The reviewers were unanimously unimpressed with the experiments, the baselines, and the results. While they agree that the problem is well-motivated, they requested additional evidence that the method works as described and that a simpler approach such as fine-tuning would not be sufficient. + +The recommendation is to reject the paper at this time.",ICLR2020, +M4aLpvyyN7q,1642700000000.0,1642700000000.0,1,KjR-3lBYB3y,KjR-3lBYB3y,Paper Decision,Reject,"This paper proposed a long-term object-based memory system for robots. The proposed method builds on existing ideas of data association filters and neural-net attention mechanisms to learn transition and observation models of objects from labelled trajectories. The proposed method was compared with baseline algorithms in a set of experiments. + +The initial reviews raised multiple concerns about the paper. Reviewers nrGQ and V7qP commented on the conceptual gap between the problem proposed in the introduction and the extent of the experiments. Reviewer qPet understood the paper to be a form of object re-identification and was concerned about the limited comparisons with related work. The author response clarified their goal of estimating the states of the objects in the world, which they state is different from the goals of long-term tracking and object reidentification mentioned by the reviewers. The authors also clarified the relationship to other work in slot-attention and data association filters. + +The ensuing discussion among the reviewers indicated that the paper's contribution remained unclear even after the author response. Two reviewers noted the paper did not clearly communicate the problem being solved (all reviewers had a different view of the problem in the paper). These reviewers wanted a better motivation for the problem being addressed in this paper. The third reviewer remained unconvinced that the problem in the paper was different from long-term object tracking. + +Three knowledgeable reviewers indicate reject as the contributions of the paper were unclear to all of them. The paper is therefore rejected.",ICLR2022, +Uhkf1TuTTnI,1610040000000.0,1610470000000.0,1,jN5y-zb5Q7m,jN5y-zb5Q7m,Final Decision,Accept (Poster),"This paper proposes information-theoretic quantification of epistemic uncertainty in autoregressive models. + +This is a difficult problem that receives much less attention than the unstructured case. The paper is well-written, contributes novel and tractable-to-estimate measures which are analysed formally and empirically with convincing experiments on ASR and NMT. + +The reviewers and myself are overall pleased by this submission. The discussion phase went well and most concerns have been resolved. +",ICLR2021, +OfmfzXwcTTYP,1642700000000.0,1642700000000.0,1,ZumkmSpY9G4,ZumkmSpY9G4,Paper Decision,Reject,"This paper presents an approach for online continual learning where only a single pass over each task's data is allowed. Instead of the oft-used softmax classification setting in continual learning, the paper proposes to use the generative setting based on the nearest class mean (NCM). The paper claims that it avoids the logits bias problem in the softmax classifier and helps combat catastrophic forgetting. + +While the reviewers found the basic idea interesting, there were concerns about novelty and lack of clarity regarding the reasons for improved performance. In particular, there are several aspects from existing work that are leveraged in this paper (e.g, replay, metric learning loss, combination of generative and discriminative classification, etc) but the paper lacks in establishing which of these components affect the performance and in what ways. + +The authors and reviewers engaged in detailed discussions; however, the reviewers were still unsatisfied and did not change their assessment. Based on my own reading of the paper as well as going through the reviews and discussions, I too concur with their assessment. It would be a stronger paper if the paper could shed more light on the above aspects as well as address the other concerns raised by the reviewers. However, in the current shape, it is not ready for publication.",ICLR2022, +UVjzdwO1h1,1576800000000.0,1576800000000.0,1,SJetQpEYvB,SJetQpEYvB,Paper Decision,Accept (Poster),"This paper presents a method to learn representations of programs via code and execution. + +The paper presents an interesting method, and results on branch prediction and address pre-fetching are conclusive. The only main critiques associated with this paper seemed to be (1) potential lack of interest to the ICLR community, and (2) lack of comparison to other methods that similarly improve performance using other varieties of information. I am satisfied by the authors' responses to these concerns, and believe the paper warrants acceptance.",ICLR2020, +_6CAwQPQ47,1610040000000.0,1610470000000.0,1,SlrqM9_lyju,SlrqM9_lyju,Final Decision,Accept (Poster),"The paper has been actively discussed, both during and after the rebuttal phase. I am thankful for the active communication that took place between the authors and some of the reviewers. + +The paper was, overall, found quite clear, with an interesting methodology (especially the introduction of a forecasting step) and a solid large-scale experimental evaluation. As a result, it is recommended for acceptance. + +However, several concerns remained after the rebuttal phase and we strongly encourage the authors to try to improve the following aspects of their submission: +* Clarify as much as possible (notably in the light of the ablation studies further added in the paper) the importance & impact of the BO component (which cast some doubts among the some reviewers on its necessity to get good performance) +* Transparently discuss the choice of, _and the robustness with respect to_, the “hyper-hyperparameters” of the proposed method (e.g., k, tau, tau’, kappa, tau_max, mini-batch size of validation set,...). Such an in-depth discussion is essential to fully demonstrate the practical value of the method. +",ICLR2021, +NlZ1JWpOen,1576800000000.0,1576800000000.0,1,HJx8HANFDH,HJx8HANFDH,Paper Decision,Accept (Poster),"This paper proposes techniques to improve training with batch normalization. The paper establishes the benefits of these techniques experimentally using ablation studies. The reviewers found the results to be promising and of interest to the community. However, this paper is borderline due in part due to the writing (notation issues) and because it does not discuss related work enough. We encourage the authors to properly address these issues before the camera ready.",ICLR2020, +0qOsCs8fAcn,1610040000000.0,1610470000000.0,1,d_Ue2glvcY8,d_Ue2glvcY8,Final Decision,Reject,"This paper proposes a controllable text generation model conditioned on desired structures, converting a text into structure information such as part of speech (POS) and participial construction (PC). It proposes a “Structure Aware Transformer” (SAT) to generate text and claims better PPL and BLEU compared with GPT-2. Reviewers pointed out that limited novelty of this paper - SAT is essentially a transformer run on multiple sequences of structure information, with sums of structure embeddings as input embeddings -  the proposed method essentially infuses structure information as features, rather than “controlling” text generation. Some references are also missing, most prominently: + +1. Zhang X, Yang Y, Yuan S, et al. Syntax-infused variational autoencoder for text generation[J]. arXiv preprint arXiv:1906.02181, 2019. +2. Casas N, Fonollosa J A R, Costa-jussà M R. Syntax-driven Iterative Expansion Language Models for Controllable Text Generation[J]. arXiv preprint arXiv:2004.02211, 2020. +3. Wu S, Zhou M, Zhang D. Improved Neural Machine Translation with Source Syntax[C]//IJCAI. 2017: 4179-4185. + +Unfortunately, no answers are provided by the authors to the questions asked by the reviewers, which makes me recommend rejection.",ICLR2021, +ryHFSJpHG,1517250000000.0,1517260000000.0,611,HknbyQbC-,HknbyQbC-,ICLR 2018 Conference Acceptance Decision,Reject,"The paper presents AdvGAN: a GAN that is trained to generate adversarial examples against a convolutional network. The motivation for this method is unclear: the proposed attack does not outperform simpler attack methods such as Carlini-Wagner attack. In white-box settings, a clear downside for the attacker is that it needs to re-train its GAN everytime the defender changes its convolutional network. + +More importantly, the work appears preliminary. In particular, the lack of extensive quantitative experiments on ImageNet makes it difficult to compare the proposed approach to alternative attacks methods such as (I-)FGSM, DeepFool, and Carlini-Wagner. The fact that AdvGAN performs well on MNIST is nice, but MNIST should be considered for what it is: a toy dataset. If AdvGANs are, as the authors state in their rebuttal, fast and good at generating high-resolution images, then it should be straightforward to perform comprehensive experiments with AdvGANs on ImageNet (rather than focusing on a small number of images on a single target, as the authors did in their revision)?",ICLR2018, +T1UVwsA1OCt,1642700000000.0,1642700000000.0,1,M_o5E088xO5,M_o5E088xO5,Paper Decision,Reject,"The paper presents a simple and intuitive method to prune the missing value in the learning and inference steps of the neural networks, leading to similar prediction performance as other methods to impute missing value. It has some really useful insights, but could benefit from one more round of revision for a strong publication: +1. improving the writing so that its sets up the right expectations on the contributions of the paper; +2. providing discussions on its connections (and differences) with zero-imputation and missing-indicator methods; +3. thoroughly investigating the experiment results to illustrate the advantages of the proposed method. + +The recommendation of reject is made based on the technical aspect of the paper. +----------------------------- +During the rebuttal phase, the authors misused the interactive and transparent (for the better or worse) openreview system by writing inappropriate comments with personal accusations to the reviewers who write negative reviews. We would like to extend the apologies to the reviewers for this unpleasant experience and thank the reviewers for their engagement and work, as well as their fair assessment of the paper.",ICLR2022, +rJhC7kpSf,1517250000000.0,1517260000000.0,255,ryTp3f-0-,ryTp3f-0-,ICLR 2018 Conference Acceptance Decision,Accept (Poster)," +PROS: +1. well-written and clear +2. added extra comparison to dagger which shows success +3. SOTA results on open ai benchmark problem and comparison to relevant related work (Shi 2017) +4. practical applications +5. created new dataset to test harder aspects of the problem + +CONS: +1. the algorithmic novelty is somewhat limited +2. some indication of scalability to real-world tasks is provided but it is limited",ICLR2018, +ByxHsYL3J4,1544480000000.0,1545350000000.0,1,HJxfm2CqKm,HJxfm2CqKm,More focus is needed on what is novel in this work,Reject,"This paper provides further insight into using RL for active learning, particularly by formulating AL as an MDP and then using RL methods for that MDP. Though the paper has a few insights, it does not sufficiently place itself amongst the many other similar strategies using an MDP formulation. I recommend better highlighting what is novel in this work (e.g., more focus on the reward function, if that is key). Additionally, avoid general statements like “To this end, we formalize the annotation process as a Markov decision process”, which suggests that this is part of the contribution, but as highlighted by reviewers, has been a standard approach. ",ICLR2019,4: The area chair is confident but not absolutely certain +HyHzVypHM,1517250000000.0,1517260000000.0,306,BJk59JZ0b,BJk59JZ0b,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The reviewers agree that the formulation is novel and interesting, but they raised concerns regarding the motivation and the complexity of the approach. I find the authors' response mostly satisfying, and I ask them to improve the paper by incorporating the comments. + +Detailed comments: +The maximum-entropy objective used in Eq. (13) reminds me of maximum-entropy RL objective in previous work including [Ziebart, 2010], [Azar, 12], [Nachum, 2017], and [Haarnoja, 2017].",ICLR2018, +S1YPnGU_g,1486400000000.0,1486400000000.0,1,By5e2L9gl,By5e2L9gl,ICLR committee final decision,Accept (Poster),"The authors present a novel layer-wise optimization approach for learning convolutional neural networks with piecewise linear nonlinearities. The proposed approach trains piecewise linear ConvNets layer by layer, reduces the sub-problem into latent structured SVM. + + Reviewers mainly expressed concerns about the experimental results, which the authors have diligently addressed in their revised versions. While the reviewers haven't updated explicitly their reviews, I believe the changes made should have been sufficient for them to do so. + + Thus, I recommend this paper be accepted.",ICLR2017, +hODE9LtIBl,1576800000000.0,1576800000000.0,1,BJg73xHtvr,BJg73xHtvr,Paper Decision,Reject,"This paper proposes using non-Euclidean spaces for GCNs, leveraging the gyrovector space formalism. The model allows products of constant curvature, both positive and negative, generalizing hyperbolic embeddings. + +Reviewers got mixed impressions on this paper. Whereas some found its methodology compelling and its empirical evaluation satisfactory, it was generally perceived that this paper will greatly benefit from another round of reviewing. In particular, the authors should improve readability of the main text and provide a more thorough discussion on related recent (and concurrent) work. ",ICLR2020, +0EeKRYwNi9Q,1642700000000.0,1642700000000.0,1,87Ks7PvYVJi,87Ks7PvYVJi,Paper Decision,Reject,"This paper studies the offline multi-agent RL problem. The finding is that the dataset collected by one agent could be very different for other agents. The authors provide two solutions to this problem. Although being interesting, the reviewers found that the there are many imprecise math statements, and some of the methods are not well motivated. Hence, the overall recommendation is a reject.",ICLR2022, +intYQZqp3z,1576800000000.0,1576800000000.0,1,rJgQkT4twH,rJgQkT4twH,Paper Decision,Accept (Poster),"This paper presents a case study of training a video classifier and subsequently analyzing the features to reduce reliance on spurious artifacts. The supervised learning task is zebrafish bout classification which is relevant for biological experiments. The paper analyzed the image support for the learned neural net features using a previously developed technique called Deep Taylor Decomposition. This analysis showed that the CNNs when applied to the raw video were relying on artifacts of the data collection process, which spuriously increased classification accuracies by a ""clever Hans"" mechanism. By identifying and removing these artifacts, a retrained CNN classifier was able to outperform an older SVM classifier. More importantly, the analysis of the network features enabled the researchers to isolate which parts of the zebrafish motion were relevant for the classification. + +The reviewers found the paper to be well-written and the experiments to be well-designed. The reviewers suggested a some changes to the phrasing in the document, which the authors adopted. In response to the reviewers, the authors also clarified their use of ImageNet for pre-training and examined alternative approaches for building saliency maps. + +This paper should be published as the reviewers found the paper to be a good case study of how model interpretability can be useful in practice. ",ICLR2020, +JAKSSGaZrln,1642700000000.0,1642700000000.0,1,figzpGMrdD,figzpGMrdD,Paper Decision,Accept (Poster),"This paper presents a comparison and analysis of continual learning methods for pretrained language models. The authors categorise continual learning methods into three categories, those that use cross task regularisation, those that employ some form of experience replay of previous training examples, and those that dynamically alter the network architecture for each task. Evaluation results from representative examples of these three paradigms are then presented and analysed. In general methods that incorporate experience reply appear to perform the best, while analysis of the predictive power of individual layers of the pretrained models suggests that some network layers are more robust to catastrophic forgetting than others, and that this also varies across architectures (BERT, ALBERT, etc.). + +In general the reviewers agree that this is a well conducted study that provides an interesting contribution to an important area of research. They also generally agree that the many of the results are unsurprising given the properties of the algorithms explored and prior work in this area. The main point of difference then becomes how valuable it is to present a thorough study of existing algorithms that confirms our assumptions. I believe that the current work raises enough interesting questions to make it a useful contribution to researchers working in continual learning. In particular the results analysing the relative differences in catastrophic forgetting across different layers in models suggests interesting avenues for follow on work.",ICLR2022, +B1VAnzIOg,1486400000000.0,1486400000000.0,1,rkEFLFqee,rkEFLFqee,ICLR committee final decision,Accept (Poster),"Here is a summary of the reviews and discussion: + + Strengths + Idea of decoupling motion and content is interesting in the context of future frame prediction (R3, R2, R1) + Quantitative and Qualitative results are good (R1) + + Weaknesses + Novelty in light of previous multi-stream networks (R3, R2) + Not clear if this kind of decoupling works well or is of broader interest beyond future frame prediction (R3) AC comment: I donÕt know if this is too serious a concern; to me the problem seems important enough -- making an advancement that improves a single relevant problem is a contribution + Separation of motion and content already prevalent in other applications, e.g., pose estimation (R2) + UCF-101 results are not so convincing (R3) + Clarity could be improved, not written with general audience in mind (R2) + Concern about static bias in the UCF dataset of the learned representations (R1, R2, R3) AC comment: The authors ran significant additional experiments to respond to this point + + The authors provided a comprehensive rebuttal which convinced R1 to upgrade their score. After engaging R2 and R3 in discussion, both said that the paper had improved in its subsequent revisions and would be happy to see the paper pass. The AC agrees with the opinion of the reviewers that the paper should be accepted as a poster.",ICLR2017, +dlmpwJYA6n6,1610040000000.0,1610470000000.0,1,onxoVA9FxMw,onxoVA9FxMw,Final Decision,Accept (Poster),"With the advent of non-recurrent sequence-processing models, it has become costumary to augment input tokens with positional embeddings providing implicit positional information. Despite their potentially crucial role in modern architectures, such positional embeddings are rarely addressed in analytical studies. The current paper provides a systematic investigation of positional embeddings, characterized in terms of properties such as monotonicity, translation invariance, and symmetry. These properties are studies for different positional embeddings using language models fine-tuned on two separated benchmarks, with an emphasis on visual analysis. + +The authors provided an impressive rebuttal, adding many of the experiments required by the reviewers. The latter are still somewhat split about the paper. I lean towards the positive side. I find that some of the criticism, while valid, is not really granting a rejection, especially after the authors' clarifications. In particular, one reviewer assumed that the authors claim that symmetry should be a property of an ideal positional embedding, whereas the authors are rather studying whether it is an important property of them, in light of the previous literature. Some claims about the results being ""interesting"" or ""surprising"" enough might depend on what the reader is looking for in the paper. I think that many readers in the ""black box NLP"" community will find the methods and analyses presented in this paper interesting and useful. +",ICLR2021, +SJxElUillN,1544760000000.0,1545350000000.0,1,r1NDBsAqY7,r1NDBsAqY7,reject,Reject,"a major issue or complaint from the reviewers seems to come from perhaps a wrong framing of this submission. i believe the framing of this work should have been a better language model (or translation model) with word discovery as an awesome side effect, which i carefully guess would've been a perfectly good story assuming that the perplexity result in Table 4 translates to text with blank spaces left in (it is not possible tell whether this is the case from the text alone.) even discounting R1, who i disagree with on quite a few points, the other reviewers also did not see much of the merit of this work, again probably due to the framing issue above. + +i highly encourage the authors to change the framing, evaluate it as a usual sequence model on various benchmarks and resubmit it to another venue.",ICLR2019,4: The area chair is confident but not absolutely certain +rylyDskZeN,1544780000000.0,1545350000000.0,1,HJMXTsCqYQ,HJMXTsCqYQ,clear rejection; no rebuttal ,Reject,"This paper proposes to use constrained Bayesian optimization to improve the chemical compound generation. Unfortunately, the reviewers raises a range of critical issues which are not responded by authors' rebuttal. ",ICLR2019,5: The area chair is absolutely certain +_sEZifimy9B,1642700000000.0,1642700000000.0,1,0bXmbOt1oq,0bXmbOt1oq,Paper Decision,Reject,"While a lot of previous work on emergent communications studies discrete protocols, this work explores a continuous and audio-based channel for learning multi-agent communication. Reviewers have commented positively on the novelty of the topic. At the same time, there are a number of concerns raised with respect to experimental design and implementation (6auy) and general approach of the topic which, as reviewers t576 and 42Xh point, doesn't really go deep into the analysis and understanding of the particular experimental setup and findings. So, unfortunately as the papers stands I cannot recommend acceptance at this time. However, given that continuous communication in emergent communication is a somewhat overlooked topic, I would encourage the authors to use the reviewers' feedback and strengthen their manuscript.",ICLR2022, +ry-shzUde,1486400000000.0,1486400000000.0,1,Hk4_qw5xe,Hk4_qw5xe,ICLR committee final decision,Accept (Oral),"The paper provides a detailed analysis of the instability issues surrounding the training of GANs. They demonstrate how perturbations can help with improving stability. Given the popularity of GANs, this paper is expected to have a significant impact.",ICLR2017, +wG6iOQRTFu0,1642700000000.0,1642700000000.0,1,VjoSeYLAiZN,VjoSeYLAiZN,Paper Decision,Reject,"The paper proposes a new neural network architecture for hyperspectral image reconstruction. The paper received borderline/negative reviews. Significant concerns were raised about the novelty and significance of the contribution. Unfortunately, the authors did not upload a rebuttal, preventing the reviewers from changing their opinion about the paper. There is therefore no reason to overturn their recommendation.",ICLR2022, +NomiQkLTyD,1610040000000.0,1610470000000.0,1,5mhViEOQxaV,5mhViEOQxaV,Final Decision,Reject,"The paper first aims to propose a new controllable Pareto multi-task learning framework to find pareto-optimal solutions. But after the revision according the comments, the paper claims to find finite Pareto stationary solutions. But the paper still can not prove their proposed method can find the Pareto stationary solutions. Even if they can find the Pareto stationary solutions, they can not guarantee find the pareto front which is conflict with the experiments and claims. There are major flaws in the paper.",ICLR2021, +VRA5vYada,1576800000000.0,1576800000000.0,1,HkeO104tPB,HkeO104tPB,Paper Decision,Reject,"This paper considers the problem of reinforcement learning with goal-conditioned agents where the agents do not have access to the ground truth state. The paper builds on the ideas in hindsight experience replay (HER), a method that relabels past trajectories with a goal set in hindsight. This hindsight mechanism enables indicator reward functions to be useful even with image inputs. Two technical contributions are reward balancing (balancing positive and negative experience) and reward filtering (a heuristic for removing false negatives). The method is tested on multiple tasks including a novel RopePush task in simulation. + +The reviewers discussed strengths and limitations of the paper. One strength was that the writing was clear for the reviewers. One limitation was the paper's novelty, as most of these ideas are already present in HER with the exception of reward filtering. Another major concern was that the experiments were not sufficiently informative. The simulation tasks did not adequately distinguish the proposed method from the baseline (in two of the three tasks) and the third task (RopePush) was simplified substantially (using invisible robot arms). The real world task did not require the pixel observations. The analysis of the method was also found to be somewhat limited by the reviewers, though this was partially addressed by the authors. + +This paper is not yet ready for publication since the proposed method has insufficient supporting evidence. A more thorough experiment could provide stronger evidence by showing a regime where the proposed method performs better than alternatives.",ICLR2020, +itbyw9qmIo,1642700000000.0,1642700000000.0,1,NHHM1jjrH1,NHHM1jjrH1,Paper Decision,Reject,"The work studied the problem of inserting backdoor into a deployed model through bit flip. + +Some important concerns have been proposed by reviewers, including: the incorrect claim of the treat model, the potential defenses are only discussed but not validated, experimental setups and analysis (e.g., the sensitivity test of hyper-parameters). Although the authors provided some responses, but all reviewers are not well convinced. + +After reading the manuscript, reviews and discussions between reviewers and authors, I think this work is not ready for publication. The reviewers' comments are supposed to be helpful to improve this work.",ICLR2022, +r1eg0LElg4,1544730000000.0,1545350000000.0,1,Byx1VnR9K7,Byx1VnR9K7,"Incremental solution, missing baselines",Reject,"The paper considers the problem of imitating multi-modal expert demonstrations using a variational auto-encoder to embed demonstrated trajectories into a structured latent space. The problem is important, and the paper is well written. The model is shown to work well on toy examples. However, as pointed out by the reviewers, given that multi-modal has been studied before, the approach should have been compared both in theory and in practice to existing methods and baselines (e.g., InfoGAIL). Furthermore, the technical contribution is somewhat limited as it using an existing model on a new application domain.",ICLR2019,4: The area chair is confident but not absolutely certain +dsOQ4-ctPoNb,1642700000000.0,1642700000000.0,1,kUGYDTJUcuc,kUGYDTJUcuc,Paper Decision,Reject,"This submission receives mixed reviews. One reviewer leans positively while two reviewers are negative. They raise several issues upon improper evaluations, insufficient experimental analysis, baseline and sota network comparisons, presentation unclarity, and technical motivations. In the rebuttal and discussion phases, the authors do not make any response to these reviews. After checking the whole submission, the AC agrees with these two reviewers that there are several drawbacks to the aspects of the technical presentation and experimental configurations. The authors shall take these suggestions into consideration and make further improvements upon the current submission.",ICLR2022, +cSTpfeXilD,1642700000000.0,1642700000000.0,1,8qWazUd8Jm,8qWazUd8Jm,Paper Decision,Reject,"The authors propose a new set of metrics for evaluation of generative models based on the well-established precision-recall framework, and an additional dimension quantifying the degree of memorization. The authors evaluated the proposed approach in several settings and compared it to a subset of the classic evaluation measures in this space. The reviewers agreed that this is an important and challenging problem relevant to the generative modeling community at large. The paper is well-written and the proposed method and motivation are clearly explained. + +The initial reviews were borderline, and after the discussion phase we have 2 borderline accepts, one strong accept, and one strong reject. After reading the manuscript, the rebuttal, and the discussion, I feel that the work should not be accepted on the grounds of insufficient empirical validation. Establishing a new evaluation metric is a very challenging task -- one needs to demonstrate the pitfalls of existing metrics, as well as how the new metric is capturing the missing dimensions in a thorough empirical validation. While the former was somewhat shown in this work (and in many other works), the latter was not fully demonstrated. The primary reason is the use of a non-standard benchmark to evaluate the utility of the proposed metrics. I agree that covering a broader set of tasks and models makes sense in general, but it shouldn’t be done at the cost of existing, well-understood benchmarks. I expected to see a thorough comparison with [1], one of the most practical metrics used today which can be easily extended to all settings considered in this work (notwithstanding the drawbacks outlined in [2]). What are the additional insights? What is [1] failing to capture in practical instances? Does the rank correlation change with respect to modern models across classic datasets (beyond MNIST and CIFAR10)? This would remove confounding variables and significantly strengthen the paper. + +My final assessment is that this work is borderline, but below the acceptance bar for ICLR. I strongly suggest the authors to showcase the additional improvements over methods such as [1] in practical and well-understood settings commonly used to benchmark generative models (e.g. on images). The experiments suggested by the reviewers are a step in the right direction, but not sufficient. + +[1] Improved Precision and Recall Metric for Assessing Generative Models. Tuomas Kynkäänniemi, Tero Karras, Samuli Laine, Jaakko Lehtinen, Timo Aila. NeurIPS ’19 + +[2] Evaluating generative models using divergence frontiers. +Josip Djolonga, Mario Lučić, Marco Cuturi, Olivier Frederic Bachem, Olivier Bousquet, Sylvain Gelly. AISTATS ‘20",ICLR2022, +vskvz-0-Av,1576800000000.0,1576800000000.0,1,BkeMXR4KvS,BkeMXR4KvS,Paper Decision,Reject,"The reviewers were confused by several elements of the paper, as mentioned in their reviews and, despite the authors' rebuttal, still have several areas of concerns. + +I encourage you to read the reviews carefully and address the reviewers' concerns for a future submission.",ICLR2020, +rynOB1prf,1517250000000.0,1517260000000.0,603,HklZOfW0W,HklZOfW0W,ICLR 2018 Conference Acceptance Decision,Reject,"This paper addresses the problem of learning neural graph representations, based on graph filtering techniques in the vertex domain. + +Reviewers agreed on the fact that this paper has limited interest in its current form, and has serious grammatical issues. The AC thus recommends rejection at this time. ",ICLR2018, +BkQFJznBL54,1642700000000.0,1642700000000.0,1,GWQWAeE9EpB,GWQWAeE9EpB,Paper Decision,Accept (Poster),"DictFormer is a method to reduce the redundancy in transformers so they can deployed on edge devices. In the method, a shared dictionary across layers and unshared coefficients are used in place of weight multiplications. The author proposed a l1 relaxation to train the non-differentiable objective to achieve both higher performance and lower parameter counts. + +All reviewers ended up giving the paper a score of 6 after increasing their scores during discussions. While the results are strong (better performance at much lower parameter counts), the paper is not clearly written. Several reviewers noted that the paper is difficult to understand and has a few unresolved points. For example, the method also ended up performing better than the base transformer model that DictFormer is supposed to compress. There seems to be a lack of understanding about what part of the model delivered the improvements. One reviewer said that this is potentially a great paper that deserves to be better explained. The basic concept of sharing a dictionary across layers should be simple enough to explain well and deserve a better explanation than eq 5. + +The authors promise to release the code, which would be necessary for a full dissemination of this work. I recommend accept.",ICLR2022, +23d-TyanYYnD,1642700000000.0,1642700000000.0,1,dUV91uaXm3,dUV91uaXm3,Paper Decision,Accept (Spotlight),"This paper has a deep analysis of the over-smoothing phenomenon in BERT from the perspective of graph. Over-smoothing refers to token uniformity problem in BERT, different input patches mapping to similar latent representation in ViT and the problem of shallower representation better than deeper (overthinking). The authors build a relationship between Transformer blocks and graphs. Namely, self-attention matrix can be regarded as a normalized adjacency matrix of a weighted graph. They prove that if the standard deviation in layer normalization is sufficiently large, the outputs of the transformer stack will converge to a low-rank subspace, resulting in over-smoothing. + +In this paper, they also provide theoretical proof why higher layers can lead to over-smoothing. Empirically , they investigate the effects of the magnitude of the two standard deviations between two consecutive layers on possible over-smoothing in diverse tasks. + +In order to overcome over-smoothing, they propose a series of hierarchical fusion strategy that adaptively fuses presentation from different layers, including concatenation fusion, max fusion and self-gate fusion into post-normalization. These strategies reduce similarities between tokens and outperforms BERT baseline on a few datasets (GLUE, SWAG and SQuAD). + +Overall I agree with reviewers that this is a good contribution.",ICLR2022, +rJx6yBXrxN,1545050000000.0,1545350000000.0,1,r1gGpjActQ,r1gGpjActQ,needs more work,Reject," + ++ sufficiently strong results + ++ a fast / parallelizable model + + +- Novelty with respect to previous work is not as great (see AnonReviewer1 and AnonReviewer2's comments) + +- The same reviewers raised concerns about the discussion of related work (e.g., positioning with respect to work on knowledge distillation). I agree that the very related work of Roy et al should be mentioned, even though it has not been published it has been on arxiv since May. + +- Ablation studies are only on smaller IWSLT datasets, confirming that the hints from an auto-regressive model are beneficial (whereas the main results are on WMT) + +- I agree with R1 that the important modeling details (e.g., describing how the latent structure is generated) should not be described only in the appendix, esp given non-standard modeling choices. R1 is concerned that a model which does not have any autoregressive components (i.e. not even for the latent state) may have trouble representing multiple modes. I do find it surprising that the model with non-autoregressive latent state works well however I do not find this a sufficient ground for rejection on its own. However, emphasizing this point and discussing the implication in the paper makes a lot of sense, and should have been done. As of now, it is downplayed. R1 is concerned that such model may be gaming BLEU: as BLEU is less sensitive to long-distance dependencies, they may get damaged for the model which does not have any autoregressive components. Again, given the standards in the field, I do not think it is fair to require human evaluation, but I agree that including it would strengthen the paper and the arguments. + + +Overall, I do believe that the paper is sufficiently interesting and should get published but I also believe that it needs further revisions / further experiments. + + +",ICLR2019,4: The area chair is confident but not absolutely certain +SkahhML_l,1486400000000.0,1486400000000.0,1,S1Y0td9ee,S1Y0td9ee,ICLR committee final decision,Invite to Workshop Track,"The authors present a novel architecture, called Shift Aggregate Extract Network (SAEN), for learning representations on social network data. SAEN decomposes input graphs into hierarchies made of multiple strata of objects. The proposed approach gives very promising experimental results on several real-world social network datasets. + + The idea is novel and interesting. However, the exposition of the framework and the approach could be significantly improved. The authors clearly made an effort to revise the paper and improve the clarity. Yet, the paper would still certainly benefit from a major revision, to clarify the exposition and spell out all the details of the framework. + + A extensive use of the space in the supplement could potentially help overcoming the space limitation in the main part of the paper which can make exposition of new frameworks challenging in general. This would allow an extensive explanation of the proposed framework and related concepts. + + A major revision of the paper will generate a stronger submission, which we invite the authors to submit to the workshop track.",ICLR2017, +EJM4RzhmHNn,1610040000000.0,1610470000000.0,1,PGmqOzKEPZN,PGmqOzKEPZN,Final Decision,Reject,"This paper discusses the likelihood ratio estimation using the Bregman divergence. The authors consider the 'train-loss hacking', which is an overfitting issue causing minus infinity for the divergence. They introduce non-negative correction for the divergence under the assumption that we have knowledge on the upper bound of the density ratio. Some theoretical results on the convergence rate are given. The proposed method shows favorable performance on outlier detection and covariate shift adaptation. + +The proposed non-negative modification of Bregman divergence is a reasonalbe solution to the important problem of density ratio estimation. The experimental results as well as theoretical justfication make this work solid. However, there are some concerns also. The paper assumes knowledge on the upper bound of density ratio and uses a related parameter essentially in the method. Assuming the upper bound is a long standing problem in estimating density ratio, and it is in practice not necesssarily easy to obtain. Also, there is a room on improvements on readability. + +Although this paper has some signicance on the topic, it does not reach the acceptance threshold unfortunately because of the high competition in this years' ICLR. ",ICLR2021, +DA8TJOjW4U,1576800000000.0,1576800000000.0,1,Sygg3JHtwB,Sygg3JHtwB,Paper Decision,Reject,The paper is rejected based on unanimous reviews.,ICLR2020, +BJZPXk6BM,1517250000000.0,1517260000000.0,153,ByJHuTgA-,ByJHuTgA-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"this submission demonstrates an existing loop-hole (?) in rushing out new neural language models by carefully (and expensively) running hyperparameter tuning of baseline approaches. i feel this is an important contribution, but as pointed out by some reviewers, i would have liked to see whether the conclusion stands even with a more realistic data (as pointed out by some in the field quite harshly, perplexity on PTB should not be considered seriously, and i believe the same for the other two corpora used in this submission.) that said, it's an important paper in general which will work as an alarm to the current practice in the field, and i recommend it to be accepted.",ICLR2018, +ByeJ418Vl4,1545000000000.0,1545350000000.0,1,SJgs1n05YQ,SJgs1n05YQ,novel approach to combine model-based and model-free RL - needs more in-depth analysis of results,Reject,"The paper presents LEAPS, a hybrid model-based and model-free algorithm that uses a Bayesian approach to reason/plan over semantic features, while low level behavior is learned in a model-free manner. The approach is designed for human-made environments with semantic similarity, such as indoor navigation, and is empirically validated in a virtual indoor navigation task, House3D. Reviewers and AC note the interesting approach to this challenging problem. The presented approach can provide an elegant way to incorporate domain knowledge into RL approaches. + +The reviewers and AC note several potential weaknesses. The reviewers are concerned about the very low success rate, and critiqued the use of success rate as a key metric itself, given that random search with a sufficiently high cut-off could solve the task. The authors added additional results in a metric that incorporates path length, and provided clarifying details. However, key concerns remained given the low success rates. The AC notes that e.g., results in the top and middle row of figure 4 show very similar results for LEAPS and the reported baselines. Further, ""figure 5"" shows no confidence / error bars, and it is not possible to assess whether any differences are statistically significant. Overall, the questions of whether something substantial has been learned, should be addressed with a detailed error analysis of the proposed approach and the baselines, to provide insight into whether and how the approaches solve the task. At the moment, the paper presents a potentially valuable approach, but does not provide convincing evidence and conceptual insights into the approach's effectiveness.",ICLR2019,5: The area chair is absolutely certain +AgvDJHV4AfR,1610040000000.0,1610470000000.0,1,87ZwsaQNHPZ,87ZwsaQNHPZ,Final Decision,Accept (Spotlight),"All of the reviewers are impressed by this paper's empirical results and they agree that this is a good paper and should be accepted. Some questions about the theoretical justification of the proposed method and its potential practical impact remain open, but the empirical results are impressive and can result in more research in understanding Cyclic Precision Training (CPT) and improving quantized training of neural nets. I suggest acceptance as a spotlight presentation.",ICLR2021, +L5XHW4H5x_Q,1610040000000.0,1610470000000.0,1,BUPIRa1D2J,BUPIRa1D2J,Final Decision,Reject,"The paper proposes a new variant of capsule networks, where iterative routing is replaced by an attention-based procedure inspired by Induced Set Attention from Set Transformers. The method is competitive on several classification benchmarks and improves generalization to unseen views on SmallNORB. + +The reviewers note that the method is presented well (R2, R3, R4), is more scalable than other capsules variants (R3, R4), and the results are good (R1, R2, R3, R4). However, the reviewers also point out missing relevant baselines (R2, R3, R4), limited amount of generalization experiments (R3), and issues with the positioning of the method and the details of the formulation (R1). In particular, R1 did a very thorough job at reading the paper and discussing with the authors. The issue with missing baselines has been satisfactorily addressed in the updated version of the paper. + +Considering all this feedback and after reading the paper myself, I would summarize the pros and cons of the paper as follows. + +Pros: +1. Good presentation +2. The method is more scalable than prior capsule-based models +3. Competitive results on several small- to mid-scale classification datasets +4. Good results on viewpoint generalization on SmallNORB + +Cons: +1. Classification results on all datasets are worse than non-capsules models (SE-ResNet, AA-ResNet). I could not find a discussion of this fact either in the paper, or in the authors’ responses. Given this fact, superior generalization (or some other nice properties) would be a potential advantage of the proposed model. Which leads to the next point. +2. Generalization results on SmallNORB are encouraging, but it is just a single dataset. If these results are key to showing the benefit of the method (as argued in the previous point), it is crucial to demonstrate this generalization in more settings, e.g. at least on MultiMNIST and AffNIST, as suggested by R3. +3. Scalability of the method is only studied in limited detail (I do appreciate Figure 2). The best indication in the direction of scalability is that the model can be trained on ImageNet (which is great), but it performs worse than the ResNet-50 used as a backbone and it is not explained why (even after one of the reviewers asked about it) and how expensive computationally the model is. +4. I share the concerns of R1 regarding the use of the term “MoG”. It is a mathematical term, so one would expect mathematical precision when using it. + 4a. It is unclear how the mixing probabilities \phi are learned (IIUC they get no gradient, as described by R1) and if they are in some way actually learned, it is unclear how it is guaranteed that they sum to one. + 4b. MoG usually comes with the standard procedure of fitting it to data (EM), which IIUC the authors are not following here. This should be clearly explained. +5. A relatively more minor concern: again, as pointed out by R1, the use of “self-” in “self-attention” does not seem accurate. Self-attention assumes inputs to the attention procedure attend to themselves in some sense. As one consequence, the output sequence has the same length as the input sequence. ISAB from Set Transformer can be seen as a factorized version of self-attention where first inducing points attend to the inputs and then the inputs attend to the inducing points, so the output of the whole block is still the same length as the input. But in the proposed model this second step of going back to the inputs is absent and the length of the output sequence is generally different from the length of the input sequence. + +Note: +I partially share the doubts R1 raised on the positioning of the method as “capsules” as opposed to “attention”, but I believe it is not the authors’ fault that the definition of what capsules are is historically vague and that this term has been used in many different ways in the past. I would strongly recommend to discuss this point in the updated version of the paper and I hope the capsules community manages to get more clarity on what exactly capsules are. But I do not count this point as a weakness here. + +Based on all this evidence, I recommend rejection at this point. The paper has its merit, but it has unfortunate gaps both on the experimental and the presentation sides, as listed above. Some of these have been mentioned during the discussion phase, but the authors have not quite addressed them. There is no mechanism to ensure these are fixed in the final version, so resubmission to a different venue is the only option. +",ICLR2021, +r1lM-wD-xV,1544810000000.0,1545350000000.0,1,B1MXz20cYQ,B1MXz20cYQ,Meta-review,Accept (Poster),"Important problem (explainable AI); sensible approach, one of the first to propose a method for the counter-factual question (if this part of the input were different, what would the network have predicted). Initially there were some concerns by the reviewers but after the author response and reviewer discussion, all three recommend acceptance (not all of them updated their final scores in the system). ",ICLR2019,4: The area chair is confident but not absolutely certain +qzVsKznI6qT,1642700000000.0,1642700000000.0,1,J_PHjw4gvXJ,J_PHjw4gvXJ,Paper Decision,Accept (Poster),"To solve imbalance classification problem, this paper proposes a method to learn example weights together with the parameters of a neural network. The authors proposed a novel mechanism of learning with a constraint, which allows accurate training of the weights and model at the same time. Then they combined this new learning mechanism and the method by Hu et al. (2019), and demonstrated its usefulness in extensive experiments. + +I would like to thank the authors for their detailed feedback to the initial reviews, which clarified most of the unclear points in the manuscript. Overall the paper is well written and the effectiveness was demonstrated in experiments. Since the contribution is valuable to ICLR2022, I suggest its acceptance.",ICLR2022, +B1eGWH0RyE,1544640000000.0,1545350000000.0,1,B1GMDsR5tm,B1GMDsR5tm,worthwhile improvement of an existing method,Accept (Poster),"The paper investigates a novel initialisation method to improve Equilibrium Propagation. In particular, the results are convincing, but the reviewers remain with small issues here and there. + +An issue with the paper is the biological plausibility of the approach. Nonetheless publication is recommended. ",ICLR2019,4: The area chair is confident but not absolutely certain +GCR8QdVVfHZ,1642700000000.0,1642700000000.0,1,BsDYmsrCjr,BsDYmsrCjr,Paper Decision,Reject,"There were concerns that the paper has a fairly limited novelty, being based on the combination of two known ideas: bucketing and 2-party secure median for distributed learning. Also, the scale of experiments is quite limited. Other issues include the lack of comparison to relevant related work, some doubts on correctness, and issues with independence and scalability that weren't fully resolved. Overall the reviewers felt that the paper shoud not be accepted in its current form.",ICLR2022, +kEGZgNbQIpR,1610040000000.0,1610470000000.0,1,4TSiOTkKe5P,4TSiOTkKe5P,Final Decision,Accept (Poster),"This paper is concerned with finding causal relations from temporal processes and extends the Convergent Cross Mapping (CCM) method. It focuses on finding information of chaotic dynamical systems from short, noisy and sporadic time series, and the idea of using the latent space of neural ODEs to replace the delay embeddings in CCM seems interesting. All reviewers like the idea. Please try to make the paper more self-contained and provide some of the justifications suggested by the reviewers.",ICLR2021, +SJgmqUmyeN,1544660000000.0,1545350000000.0,1,rkzfuiA9F7,rkzfuiA9F7,"A robust extension of prototypical networks, but needs a clear motivation for this property.",Reject,"The reviewers all like the idea, and though the performance is a little better when compared to prototypical networks, the reviewers felt that the contribution over and above prototypical networks was marginal and none of them was willing to champion the paper. There is merit in that there is increased robustness to outliers, and future iterations of the paper should work to strengthen this aspect. + +As a quick nitpick: based on my reading, and on Figure 3, it looks like there might be a typo in the definition of X_k (bottom of page 4). Right now it is defined in terms of the original data space x, when I think it should be defined in terms of the embedding space f(x). Overall this paper is a good contribution to the few-shot learning area.",ICLR2019,3: The area chair is somewhat confident +QWC3NcpZUro,1642700000000.0,1642700000000.0,1,HFPTzdwN39,HFPTzdwN39,Paper Decision,Accept (Poster),"This paper proposes a method for inspecting and interpreting the visual representations learned by self-supervised methods. +The method is conceptually simple and intuititive, the authors assume that concept labels for the images are available, and then go on to learn a mapping between the learned image vectors and the human-provided descriptions of the images. The key insight is to learn a reverse mapping, i.e., to map label vectors to representation vectors. Specifically, feature vectors are quantized using k-means to obtain clusters; images are labeled (automatically) with a diverse set of concepts from expert models trained with supervision on +external data sources, and a linear model is trained to map concepts to clusters, measuring the mutual information between the representation and human-interpretable concepts. + +Reviewers raised some questions regarding the relation of the approach to topic models, the difference between reverse probing and linear probing, implementation details and computation. The authors addressed reviewers comments convincingly with additional experiments and/or explanations.",ICLR2022, +BJg_4zIQxN,1544930000000.0,1545350000000.0,1,Sklqvo0qt7,Sklqvo0qt7,"Interesting contribution, but not quite ready",Reject,"I enjoyed reading the paper myself and agree with some of the criticisms raised by the reviewers, but not all of them. In particular, I don't think it's a major issues that this work studies an explicit regularization scheme BECAUSE the state of our understanding of generalization in deep learning is so embarrassingly poor!! + +Unlike a lot of work, this work is engaging with the *approximation* error and developing risk bounds (called ""generalization error"" here ... not my favorite term for the risk!) rather than just controlling the generalization gap. The simple proof in the bounded noiseless case was nice to see. On the other hand, not being familiar with the work of Klusowski and Barron (2016), I'm not willing to overrule the reviewers on judgments that this work is not novel enough. I would suggest the authors take control of this and paint a more detailed picture of how these two bodies of work relate, including how the proof techniques and arguments overlap. + +Some other comments: + +1. your theorem requires \lambda > 4, but then you're using \lambda = 0.1. this seems problematic to me. + +2. your ""nonvacuous upper bound"" is path-norm/sqrt(n) ... but do the numbers in the table include the constants? looking at the constants that are likely to show up, (4Bn sqrt(2 log 2d), they are easily contributing a factor greater than 10 which would make these bounds vacuous as well. you need to explain how you are calculating these numbers more carefully. + +3. several times Arora et al and Neyshabur et al are cited when reference is being made to numerical experiments to show that existing bounds are vacuously large. But Dziugaite and Roy, who you cite for the term ""nonvacuous"", made an earlier analysis of path-norm bounds in their appendix and point out that they are vacuous. + +4. the paper does not really engage with the fact that you are unlikely to be exactly minimizing the functional J. any hope of bridging this gap? + +5. the experiments in general are a bit too vaguely described. also, you control squared error but then only report classification error. would be interested to see both.",ICLR2019,4: The area chair is confident but not absolutely certain +By2qL1TSG,1517250000000.0,1517260000000.0,847,B1bgpzZAZ,B1bgpzZAZ,ICLR 2018 Conference Acceptance Decision,Reject,"This paper provides a method for eliminating options in multiple-answer reading comprehension tasks, based on the contents of the text, in order to reduce the ""answer space"" a machine reading model must consider. While there's nothing wrong with this, conceptually, reviewers have questioned whether or not this is a particularly useful process to include in a machine reading pipeline, versus having agents that understand the text well enough to select the correct answer (which is, after all, the primary goal of machine reading). Some reviewers were uncomfortable with the choice of dataset, suggesting SQuAD might be a better alternative), and why I am not sure I agree with that recommendation, it would be good to see stronger positive results on more than one dataset. At the end of the day, it is the lack of convincing experimental results showing that this method yields substantial improvements over comparable baselines which does the most harm to this well written paper, and I must recommend rejection.",ICLR2018, +ctV0cFEYcE,1576800000000.0,1576800000000.0,1,r1l-VeSKwS,r1l-VeSKwS,Paper Decision,Reject,"I had a little bit of difficulty with my recommendation here, but in the end I don't feel confident in recommending this paper for acceptance, with my concerns largely boiling down to the lack of clear description of the overall motivation. + +Standard adversarial attacks are meant to be *imperceptible* changes that do not change the underlying semantics of the input to the human eye. In other words, the goal of the current work, generating ""semantically meaningful"" perturbations goes against the standard definition of adversarial attacks. This left me with two questions: + +1. Under the definition of semantic adversarial attacks, what is to prevent someone from swapping out the current image with an entirely different image? From what I saw in the evaluation measures utilized in the paper, such a method would be judged as having performed a successful attack, and given no constraints there is nothing stopping this. + +2. In what situation would such an attack method would be practically useful? + +Even the reviewers who reviewed the paper favorably were not able to provide answers to these questions, and I was not able to resolve this from my reading of the paper as well. I do understand that there is a challenge on this by Google. In my opinion, even this contest is somewhat ill-defined, but it also features extensive human evaluation to evaluate the validity of the perturbations, which is not featured in the experimental evaluation here. + +While I think this work is potentially interesting, it seems that there are too many open questions that are not resolved yet to recommend acceptance at this time, but I would encourage the authors to tighten up the argumentation/evaluation in this regard and revise the paper to be better accordingly!",ICLR2020, +3swumqPnT4R,1642700000000.0,1642700000000.0,1,zBhwgP7kt4,zBhwgP7kt4,Paper Decision,Reject,"There wasn't enough enthusiasm to push this paper over the bar, based on no reviewer championing the paper (the one score above 6 was consulted and thought this was a fair assessment). The reviewers appreciated the contributions of the paper but felt that in terms of technical depth, there was a lot of overlap with prior work, and the statements of the results themselves were good but not exciting enough to convince the reviewers. Some suggestions for further improvement that came up were to try to extend this to update time for low rank approximation, which was an application that other work that built off of Cohen et al did, see, e.g., https://arxiv.org/abs/1805.03765 . Regarding presentation, it would be great if in a re-submission the authors handle the presentation concerns of some of the reviewers regarding the experiments.",ICLR2022, +Z_u4iTa8UUs,1642700000000.0,1642700000000.0,1,dNigytemkL,dNigytemkL,Paper Decision,Accept (Poster),"This paper investigates the linear mode connectivity of the loss landscape of neural networks, i.e. whether a convex combination of two parameters of local optima on the SGD paths has low loss values (i.e. low barrier) up to some permutations. To probe this question, this paper empirically studies the loss gap, named as “barrier”, between two local minima and their convex combinations or linear interpolation. Before permutations, such barriers are typically non-zero; yet, after taking into account of permutation invariance of models, such barriers could be reduced along to zero with the width increasing, a main conjecture formulated in the paper. To support this conjecture, the authors proposed a simulated annealing algorithm to search for such permutations, demonstrating that the barrier reduces after such permutations. + +The reviewers unanimously accept the paper, if the authors make the proposed improvement in the final version. In particular, a reader points out a paper by Singh and Jaggi, 'Model Fusion via Optimal Transport', NeurIPS 2020, that supports the same conjecture with a constructive algorithm to find optimal permutations or matching using optimal transport. This should be included in the final version as the authors replied.",ICLR2022, +SJ917yTHG,1517250000000.0,1517260000000.0,50,B1n8LexRZ,B1n8LexRZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper presents a learned inference architecture which generalizes HMC. It defines a parameterized family of MCMC transition operators which share the volume preserving structure of HMC updates, which allows the acceptance ratio to be computed efficiently. Experiments show that the learned operators are able to mix significantly faster on some simple toy examples, and evidence is presented that it can improve posterior inference for a deep latent variable model. This paper has not quite demonstrated usefulness of the method, but it is still a good proof of concept for adaptive extensions of HMC. + +",ICLR2018, +7zUvtZNC7lWh,1642700000000.0,1642700000000.0,1,RQ428ZptQfU,RQ428ZptQfU,Paper Decision,Accept (Poster),"Four knowledgeable referees recommend Accept. I also think the paper provides a unique contribution to the field of deep survival models and I, therefore, recommend Accept",ICLR2022, +rkeGvgxPg4,1545170000000.0,1545350000000.0,1,HyzMyhCcK7,HyzMyhCcK7,A novel and promising approach to quantized deep nets,Accept (Poster),"A novel approach for quantized deep neural nets is proposed, which is more principled than commonly used straight-through gradient method. A theoretical analysis of the algorithm's converegence is presented, and empirical results show advantages of the proposed approach. ",ICLR2019,4: The area chair is confident but not absolutely certain +05Kh1ixCXTn,1642700000000.0,1642700000000.0,1,morSrUyWG26,morSrUyWG26,Paper Decision,Reject,"The reviews are of good quality. The responses by the authors are commendable, but reviewers remain of the opinion that the scientific contribution of the paper is limited, no matter how strong the software engineering contribution may be.",ICLR2022, +B1xOH5nWe4,1544830000000.0,1545350000000.0,1,BkgtDsCcKQ,BkgtDsCcKQ,Paper decision,Accept (Poster),Reviewers are in a consensus and recommended to accept after engaging with the authors. Please take reviewers' comments into consideration to improve your submission for the camera ready.,ICLR2019,4: The area chair is confident but not absolutely certain +H1-h4kTrG,1517250000000.0,1517260000000.0,434,H15RufWAW,H15RufWAW,ICLR 2018 Conference Acceptance Decision,Reject,"This paper proposes an implicit model of graphs, trained adversarially using the Gumbel-softmax trick. The main idea of feeding random walks to the discriminator is interesting and novel. However, +1) The task of generating 'sibling graphs', for some sort of bootstrap analysis, isn't well-motivated. +2) The method is complicated and presumably hard to tune, with two separate early-stopping thresholds that need to be tuned +3) There is not even a mention of a large existing literature on generative models of graphs using variational autoencoders.",ICLR2018, +Wav_8y5zd6B,1642700000000.0,1642700000000.0,1,1wVvweK3oIb,1wVvweK3oIb,Paper Decision,Accept (Poster),"This is a borderline paper. The most enthusiastic reviewer does not have much confidence in the score. The other reviewers think the paper has some value after the rebuttal, but also feel there is little technical novelty. The proposed applications of the approach are interesting. + +After reading the reviews, rebuttal, and the paper, I agree that there is little technical novelty. The idea of adding node-label noise to a GNN to improve GNN expressiveness dates back to (Murphy et al., 2019) and has been also explored by (Dasoulas et al., 2019), (Vignac et al., 2020), (Loukas, 2020) among others [one of which is suggested by a reviewer] (this literature is entirely missing from the paper). The paper has some novelty in proposing a regularization method for tackling the node-level noise by augmenting the loss function with a denoising term. The oversmoothing justification is not properly investigated (whether the proposed solution really solves the issue in practice). + +If there is space in the borderline decision boundary, this paper could be a worthwhile inclusion. + +Dasoulas, G., Santos, L.D., Scaman, K. and Virmaux, A., 2019. Coloring graph neural networks for node disambiguation. arXiv preprint arXiv:1912.06058. +Loukas, A., 2020. How hard is to distinguish graphs with graph neural networks?. arXiv preprint arXiv:2005.06649. +Vignac, C., Loukas, A. and Frossard, P., 2020. Building powerful and equivariant graph neural networks with structural message-passing. arXiv preprint arXiv:2006.15107. +Murphy, R., Srinivasan, B., Rao, V. and Ribeiro, B., 2019, May. Relational pooling for graph representations. In International Conference on Machine Learning (pp. 4663-4673). PMLR.",ICLR2022, +B1gJuct8eV,1545140000000.0,1545350000000.0,1,rygp3iRcF7,rygp3iRcF7,reject,Reject,"although the idea is a straightforward extension of the usual (flat) attention mechanism (which is positive), it does show some improvement in a series of experiments done in this submission. the reviewers however found the experimental results to be rather weak and believe that there may be other problems in which the proposed attention mechanism could be better utilized, despite the authors' effort at improving the result further during the rebuttal period. this may be due to a less-than-desirable form the initial submission was in, and when the new version with perhaps a new set of more convincing experiments is reviewed elsewhere, it may be received with a more positive attitude from the reviewers.",ICLR2019,3: The area chair is somewhat confident +HyCfpz8_g,1486400000000.0,1486400000000.0,1,BJAFbaolg,BJAFbaolg,ICLR committee final decision,Accept (Poster),"Infusion training is a new, somewhat heuristic, procedure for training deep generative models. It's an interesting novel idea and a good paper, which has also been improved after the authors have been responsive to reviewer feedback.",ICLR2017, +r7d_Zq8Zh,1576800000000.0,1576800000000.0,1,rkgHY0NYwr,rkgHY0NYwr,Paper Decision,Accept (Poster),"The work presents a novel and effective solution to learning reusable motor skills. The urgency of this problem and the considerable rebuttal of the authors merits publication of this paper, which is not perfect but needs community attention.",ICLR2020, +Hybb2G8_x,1486400000000.0,1486400000000.0,1,rky3QW9le,rky3QW9le,ICLR committee final decision,Reject,"This paper learns affine transformations from images jointly with object features. The motivation is interesting and sound, but the experiments fail to deliver and demonstrate the validity of the claims advanced -- they are restricted to toy settings. What is presented as logical next steps for this work (extending to higher scale multilayer convolutional frameworks, beyond toy settings) seems necessary for the paper to hold its own and deliver the promised insights.",ICLR2017, +xZyM4u9x8d,1576800000000.0,1576800000000.0,1,SJeHwJSYvH,SJeHwJSYvH,Paper Decision,Reject,"This paper provides and analyzes an interesting approach to ""de-biasing"" a predictor from its training set. The work is valuable, however unfortunately just below the borderline for this year. I urge the authors to continue their investigations, for instance further addressing the reviewer comments below (some of which are marked as coming after the end of the feedback period).",ICLR2020, +#NAME?,1610040000000.0,1610470000000.0,1,Aoq37n5bhpJ,Aoq37n5bhpJ,Final Decision,Reject,"The paper presents a personalized federated learning approach using a mixture of global and local models. Four reviewers evaluated this paper; one of the reviewers is luke-warm (6) while the rest of the reviewers pretty negative to this work (3, 3, 3). The reviewers pointed out many weaknesses, especially about novelty, motivation, contribution, presentation, etc. Most importantly, although the idea of a ""mixture of experts"" makes sense, it is not clear what the real technical contribution of this paper is in terms of federated learning. + +Considering all the comments by the reviewers, I believe that this paper is not ready yet for publication. The authors need to improve the novelty and technical soundness of the proposed direction to convince the readers including reviewers. ",ICLR2021, +14ovF4aP1MQ,1610040000000.0,1610470000000.0,1,2G9u-wu2tXP,2G9u-wu2tXP,Final Decision,Reject,Reviewers raised several concerns about the paper guided by unfounded heuristics as well as the artificiality of the tasks involved. Rebuttal only answered a few of them and did not convince the reviewers which has been clearly stated in the response. We hope that the authors will improve the paper for future submission based on the reviews. ,ICLR2021, +wCdCtZ9WQ_,1576800000000.0,1576800000000.0,1,rJxAo2VYwr,rJxAo2VYwr,Paper Decision,Accept (Poster),"This paper considers black box adversarial attacks based on perturbations of the intermediate layers of a neural network classifier, obtained by training a binary classifier for each target class. + +Reviewers were happy with the novelty of the approach as well as the presentation, described the presentation as rigorous and were pleased with the situation of this method relative to the literature. R3 had concerns about evaluation, success rate, and that the procedure was ""cumbersome"". Some of their concerns were addressed in rebuttal, but remained steadfast that the method was too cumbersome to be practical. + +I agree with R1 & R2 that this approach is novel and interesting and disagree with R3 that it is too impractical. The paper could be stronger with the addition of adversarial training experiments (and I disagree with the authors that ""there are currently no whitebox attacks that do well at attacking AT models"", this is very much not the case), but I concur with R1 & R2 that this is interesting work that may stimulate further exploration, enough so to warrant acceptance.",ICLR2020, +S1eUsqmXlN,1544920000000.0,1545350000000.0,1,Hye-LiR5Y7,Hye-LiR5Y7,A simple approach for transfer learning but limited experimental evaluation,Reject,"The paper proposes an approach for transfer learning by assigning weights to source samples and learning these jointly with the network parameters. Reviewers had a few concerns about experiments, some of which have been addressed by the authors. The proposed approach is simple which is a positive but it is not evaluated on any of the regular transfer learning benchmarks (eg, the ones used in Kornblith et al., 2018 ""Do Better ImageNet Models Transfer Better?""). The tasks used in the paper, such as CIFAR noisy -> CIFAR and SVHN0-4 -> MNIST5-9, are artificially constructed, and the paper falls short of demonstrating the effectiveness of the approach on real settings. + +The paper is on the borderline with current scores and the lack of regular transfer learning benchmarks in the evaluations makes me lean towards not recommending acceptance. ",ICLR2019,5: The area chair is absolutely certain +rH6CD7BOXHL,1642700000000.0,1642700000000.0,1,_gZ8dG4vOr9,_gZ8dG4vOr9,Paper Decision,Reject,"This paper presents an empirical study which shows that pruning FBNets with larger capacity results in a model with higher accuracy than one searched via neural architecture search. The below are pros and cons of the paper mentioned by the reviewers: + +Pros +- The observation that optimized architectures such as FBNets can benefit from pruning is interesting. +- The paper is well-written and easy to follow. + +Cons +- It is trivially known that training larger model and then pruning it will yield a better performing model, than training a smaller model from scratch. +- The authors do not propose a novel pruning technique for optimized CNN architectures, and use existing pruning techniques for all experiments. +- The experimental validation is only done with FBNets on ImageNet, and it does not show when pruning starts to break down. + +All reviewers unanimously voted for rejection, especially since the main “findings” of this paper that compact architectures can be further pruned down for improved accuracy/efficiency tradeoff, and that pruning a larger compact model results in models that outperform smaller models trained from scratch, have been already shown in many of the previous works on neural pruning. In fact, compact networks such as MobileNets and EfficientNets are the standard architectures for measuring the effectiveness of pruning techniques, and thus the contribution of this work reduces down to showing that the same results can be obtained with FBNets. This could be of interest to some practitioners, but is definitely not sufficient to warrant publication.",ICLR2022, +4B-9IdtLzW,1576800000000.0,1576800000000.0,1,ryxUkTVYvH,ryxUkTVYvH,Paper Decision,Reject,"This work performs fast controllable and interpretable face completion, by proposing a progressive GAN with frequency-oriented attention modules (FOAM). The proposed FOAM encourages GANs to highlight more to finer details in the progressive training process. This paper is well written and is easy to understand. While reviewer #1 is overall positive about this work, the reviewer #2 and #141 rated weak reject with various concerns, including unconvincing experiments, very common framework, limited novelty, and the lack of ablation study. The authors provided response to the questions, but did not change the rating of the reviewers. Given the various concerns raised, the ACs agree that this paper can not be accepted at its current state.",ICLR2020, +Sy5_2zLOg,1486400000000.0,1486400000000.0,1,rkFBJv9gg,rkFBJv9gg,ICLR committee final decision,Accept (Poster),"There was some question as to weather ICLR is the right venue for this sort of dataset paper, I tend to think it would be a good addition to ICLR as people from the ICLR community are likely to be among the most interested. The problem of note identification in music is indeed challenging and the authors revised the manuscript to provide more background on the problem. No major issues with the writing or clarity. Experiments and models explored are not extremely innovative, but help create a solid dataset introduction paper.",ICLR2017, +6vcI7JEIiB72,1642700000000.0,1642700000000.0,1,MsHnJPaBUZE,MsHnJPaBUZE,Paper Decision,Accept (Poster),"The paper extends the original work on flooding to individual instance level to prevent overfitting. Even though the technique is a intuitive extension, the reviewers appreciate its simplicity and effectiveness, and consider the extension necessary. Most reviewers' concerns were addressed through rebuttal.",ICLR2022, +uzs-mPa4_,1576800000000.0,1576800000000.0,1,rkgO66VKDS,rkgO66VKDS,Paper Decision,Accept (Poster),"Main content: Paper is about training low precision networks to a high-accuracy. + +Discussion: +reviewer 2: impressive results, main questions are around some clarity in the experiments tried, but sounds like authors addressed most of this in rebuttal. +reviewer 1: well written paper, but authors think some technical details could be clarified. +reviewer 3: well written but experimental section could be improved. +Recommendation: all reviewers are in consensus, well written paper but some experiments/technical details could be improved. i vote poster.",ICLR2020, +gQpucrwCor,1576800000000.0,1576800000000.0,1,BJeFQ0NtPS,BJeFQ0NtPS,Paper Decision,Reject,"The paper proposed a non-autoregressive attention based encoder-decoder model for text-to-sepectrogram using attention distillation. It is shown to bring good speedup to conventional autoregressive ones. The paper further adopted VAE for the vocoder training which trains from scratch although performs worse than existing method (e.g. ClariNet). + +The main concerns for this paper come from the unclear presentation: +* As the reviewer pointed out, there're some misleading claims that the speedup gains was obtained without the consideration of the full context (i.e. not including the whole inference time). +* The paper failed to clear present the architectures developed/used in the paper and the differences from those used in the literature. The reviewers suggested the use of diagram to aid the presentation. +* The two contributions are unbalanced presented. Due to the complexities involved, it's better to explain things in more details. +The authors acknowledged the reviewers comments during rebuttal, but did not make any changes to the paper.",ICLR2020, +38-eqZ408C3,1610040000000.0,1610470000000.0,1,ZHkbzSR56jA,ZHkbzSR56jA,Final Decision,Reject,"The authors present BASGD and asynchronous version of SGD that attempts to be robust against byzantine failures/attacks. +The papers is overall well written and clearly presents the results. Some novelty is present as there have been limited work in asynchronous algorithms for byzantine ML. + +However, there have been several concerns raised by the reviewers, on which I agree, and they have not been fully addressed: +1) the tradeoff between asynchrony and robustness, as BASGD cannot handle the case of a buffer being straggler, which limits some of the novelty in this work +2) issues with the definition of privacy leakage has not been fully addressed +3) some reviewers mentioned the theoretical results being of limited importance, but arguably this is true for other related work in this area. Perhaps a general criticism is valid as to what is the operational value of the proposed guarantees. That is convergence does not exclude a model that has undesirable properties, eg has bad prediction accuracy for a small subset of tasks. +4) Finally, the motivation of the system model of the paper ( eg storing gradients as opposed to instances) paper is of unclear practical relevance, as was raised by multiple reviewers. + +Overall the consensus was that the paper does have merits, however, some of the most major concerns were not properly addressed. This paper can potentially be improved for a future venue. + +",ICLR2021, +pxeXWoyC8kd,1610040000000.0,1610470000000.0,1,uHjLW-0tsCu,uHjLW-0tsCu,Final Decision,Reject,"The authors propose a low-bit floating point quantization method to reduce energy and time consumption for deep learning training. Dynamic quantization and MLS tensor arithmetic are used to enhance the effectiveness of MLS. The motivation is clear and the efficient training is an important problem to address. However, the effectiveness of proposed method is not well justified and experimental results are less convincing. In addition, the clarify of paper still needs to be further improved. ",ICLR2021, +lgHm7Jq98b7,1642700000000.0,1642700000000.0,1,dzZQEvQ6dRK,dzZQEvQ6dRK,Paper Decision,Reject,"The paper provides additional empirical evidence that self-supervised learning methods can help disentangling factors of variation in a dataset. That said, the paper can benefit from better framing and perhaps comparison with existing work (e.g., https://arxiv.org/abs/2102.08850 and https://arxiv.org/abs/2007.00810). Furthermore, the authors acknowledge that there was a bug in their code, which I believe should at least lead to softening the claims about group disentanglement. Accordingly, please consider revising the paper and re-submitting to other venues.",ICLR2022, +J_XftsjELqX,1642700000000.0,1642700000000.0,1,q2DCMRTvdZ-,q2DCMRTvdZ-,Paper Decision,Reject,"All reviewers recommended rejection, and I agree. +I encourage the authors to follow the reviewers' recommendation and resubmit.",ICLR2022, +rklvOfKleN,1544750000000.0,1545350000000.0,1,B1xf9jAqFQ,B1xf9jAqFQ,Accept,Accept (Poster),"The authors obtain nice speed improvements by learning to skip and jump over input words when processing text with an LSTM. At some points the reviewers considered the work incremental since similar ideas have already been explored, but at the end two of the reviewers ended up endorsing the paper with strong support.",ICLR2019,4: The area chair is confident but not absolutely certain +x4Kyjmgyef,1576800000000.0,1576800000000.0,1,Hyes70EYDB,Hyes70EYDB,Paper Decision,Reject,"This work focuses on how one can design models with robustness of interpretations. While this is an interesting direction, the paper would benefit from a more careful treatment of its technical claims. + +",ICLR2020, +jNIh1FEtsqa,1610040000000.0,1610470000000.0,1,c1xAGI3nYST,c1xAGI3nYST,Final Decision,Reject,"This paper is rejected. + +All of the reviewers found the empirical results strong. However, R3 and R4 pointed out concerns with the positioning of the work relative to prior work and that their approach is conceptually similar to previous work. The authors have tried to address these concerns in their rebuttal. Both reviewers appreciate the changes, but still have remaining concerns that I agree with. Based on these concerns, it is unclear if the strong empirical results are mostly due to using the NVAE architecture, rather than a methodological improvement over previous methods. The authors should work on positioning their paper in the context of prior work and the comparisons requested by R3 and R4 for a resubmission. +",ICLR2021, +fe5ZowCsSfv,1610040000000.0,1610470000000.0,1,chPj_I5KMHG,chPj_I5KMHG,Final Decision,Accept (Poster),"This paper presents a new approach to grounding language-based RL tasks via an intermediate semantic representation, in an architecture called language-goal-behavior (LGB). The architecture permits learning a mapping from internal goals to behavior (GB) separately from learning a mapping from language to internal goals (LG), and prior to flexibly combining all three (LGB). The architecture is studied in a specific implementation called DECSTR. The architecture has multiple desired attributes including support for intrinsic motivation, decoupling skill acquisition from language grounding, and strategy switching. The experiments demonstrate the utility of different components in the architecture with a variety of ablation results. + +The reviews initially found the paper to be poorly organized with required content described only in the appendix (R1, R2, R4), with unclear main contributions (R1, R2, R4), and with results restricted to demonstrations (R3). Despite these reservations, the reviewers found the content to be potentially relevant though narrow in scope. + +The authors substantially revised the paper. They improved its organization, clarified contributions, separated the architecture from the specific examples, and improved the experimental baselines. After reading the revised paper, the reviewers agreed that the paper's organization and insights were improved, making the new paper's contribution and insight clear. The experimental baselines were also improved, providing more support for the potential utility of the proposed method. + +Three reviewers indicate to accept this paper for its contribution of a novel approach to grounding language and behavior with an intermediate semantic representation. No substantial concerns were raised on the content of the revised paper. The paper is therefore accepted.",ICLR2021, +1Vrqs0PFcLg,1642700000000.0,1642700000000.0,1,TYw3-OlrRm-,TYw3-OlrRm-,Paper Decision,Accept (Poster),"This paper studies the problem of training tiny networks, by proposing a new training method called Network Augmentation (NetAug). The main challenge for training tiny networks lies in underfitting, which data augmentation and dropout etc. regularizations may suffer from for tiny networks. To overcome this hurdle, the proposed method first embeds or augments the tiny network as a subnet into a larger network, mostly by enlarging the width; then the gradients from the larger network are used as additional or auxiliary supervision. With this training strategy, the tiny model can perform better than the conventional training scheme on ImageNet and several downstream tasks. The proposed method is simple to implement and complementary with other techniques such as knowledge distillation and pruning. While there are lots of works studying how to improve the accuracy of large models, there are relatively fewer works focusing on the tiny network training. Despite that there are existing works sharing a similar idea of NetAug for large model training, which slightly hurts the novelty of this work, the majority of reviewers still like the idea and suggest to accept the paper.",ICLR2022, +6M645r7aLL1,1642700000000.0,1642700000000.0,1,gccdzDu5Ur,gccdzDu5Ur,Paper Decision,Reject,"The manuscript proposes a framework for imposing priors on the feature extraction in deep visual processing models. The core contribution of this manuscript is the systematic formulation and investigation of how different, distinct feature priors leads to complementary feature representations that can be combined to provide more robust data representations. The manuscript uses early work on co-training and also the more recent work on self-supervision and self-training. Experiments are performed with classical shape- and texture-biased models, and show that diverse feature priors are able to robustly create a set of complementary data views. + +Positive aspects of the manuscript includes: +1. The topic of this paper, creating and combining robust, generalizable and diverse feature representations, is of high relevance; +2. Positive results from co-training of groups image classification models designed to focus on shape but not texture or vice versa. + +There are also several major concerns, including: +1. The ensemble results presented in section 3.2 are generated using very primitive ensembling techniques; +2. The absence-of-spurious-correlation-in-unlabelled data assumption be presented more cautiously; +3. Definition of feature prior; +4. Analysis on another domain aside from image classification. + +During the rebuttal period, the Authors provided additional experiments using a more sophisticated method (“stacking”), and additional discussion of where spurious correlations are likely to occur. The manuscript has high rating variance. Some reviewers think that the manuscript lacks the technical novelty, and the results presented are the results of an empirical study. The focus of this manuscript is on two natural feature priors (i.e., shape and texture). It would strengthen the manuscript if the Authors can provide further analysis to emphasise the generality of the proposed framework that it could accommodate any two feature priors as long as they are sufficiently diverse.",ICLR2022, +QLm8QXyaL,1576800000000.0,1576800000000.0,1,SJeUm1HtDH,SJeUm1HtDH,Paper Decision,Reject,"This paper investigates using sound to improve classification, motion prediction, and representation learning all from data generated by a real robot. + +All the reviewers were intrigued by the work. The paper provides experiments on real robots (never a small task), and a data-set for the community, and a sequence of illustrative experiments. Because the paper combines existing techniques, its main contribution is the empirical demonstrations of the utility of using sound. Overall, it was not quite enough for the reviewers. The main issues were: (1) motion prediction is perhaps expected given the physical setup, (2) lack of comparison with other approaches, (3) lack of diversity in the demonstrations (10 objects, one domain). + +The authors added two new experiments with a different setup, further demonstrating their claims. In addition the authors highlighted that the novelty of this task means there are no clear baselines (to which r3 agreed). The new experiments are briefly described in the response (and visuals on a website), but the authors did not update the paper. The new experiments could potentially significantly strength the paper. However, the terse description in the response and the supplied visuals made it difficult for the reviewers to judge their contribution. + +Overall, this is certainly a very interesting direction. The results on real world data demonstrate promise, even if they are not the benchmarking style the community is used too. ",ICLR2020, +auVSjphEk0W,1642700000000.0,1642700000000.0,1,OzyXtIZAzFv,OzyXtIZAzFv,Paper Decision,Accept (Poster),"The paper is an interesting take on representation learning, using (prior) tasks to determine which information is important. The problem setting is somewhat difficult to pin down, so that that finding the correct comparisons is not obvious and opinions differ on many details of the setup. However, this is not a fault of the paper; it is a general problem the further one moves away from clean settings like classical supervised learning. + +There was a lengthy and detailed back-and-forth between the authors and reviewers, where the authors clarified most of the points raised, extended their results, resulting in one reviewer switching from reject to accept.",ICLR2022, +Aw2AJfLYECsL,1642700000000.0,1642700000000.0,1,xspalMXAB0M,xspalMXAB0M,Paper Decision,Reject,"The paper proposes a boosting algorithm for RL based on online boosting. The main advantage of the result is that the sample complexity does not explicitly depend on number of states. Post rebuttal, some of the reviewers have changed their opinion on the paper. However, overall the reviewers still seem to be on the fence about this paper. Seems like the paper combines the techniques from Hazan Singh’21 along with a frank-wolfe algorithm to deal with non-convex sets but the reviewers seem to view this as not as significant a new contribution. + +I see the paper as being interesting but do agree with some of the comments of the reviewers and am leaning to a reject.",ICLR2022, +VIiMnHntt8,1642700000000.0,1642700000000.0,1,GOr80bgf52v,GOr80bgf52v,Paper Decision,Reject,"This paper presents a GNN-based attention mechanism and tests it on a robotic stacking task. + +While all the reviewers agree that this work is novel and interesting, they also are unanimous (even after the rebuttal) in pointing to the insufficient experimental evaluation of the proposed method. + +I encourage the authors to incorporate the feedback of all the reviewers.",ICLR2022, +O0GsLp6GR7i,1642700000000.0,1642700000000.0,1,qpcG27kYK6z,qpcG27kYK6z,Paper Decision,Reject,"This paper addresses the problem of learning representation of 3D point clouds and introduces an interesting approach of concentric spherical GNN with the property of rotationally equivariant. It shows some promising results on point cloud classification under SO(3) transformations and on predicting electronic state density of graphene allotropes. The reviews suggest that, while it does not suffer from any major flaws, the paper has a fairly large number of minor issues that add up to make it subpar for publication. The proposed approach have several hyperparameters, but the authors do not seem to be up front about how the parameters are selected except for stating that they use ""standard tuning techniques"" --- this is not a satisfactory answer and appears to be dodging the question. Many technical details and specific choices could use more thorough explanation and analysis. The distinction of the proposed approach in relation to the large body of existing literature could be more clearly spelled out. Collectively, these issues made the contribution of this paper less clear.",ICLR2022, +pHPodim7hg,1642700000000.0,1642700000000.0,1,T0B9AoM_bFg,T0B9AoM_bFg,Paper Decision,Accept (Poster),"This paper investigates a tighter bound for mutual information and proposes some novel estimators of MI from the importance sampling perspective. The proposed approach provides a unifying framework for mutual information bounds that can deduce many existing approaches. The theoretical and experimental analyses well justify the proposed approach and shows the bounds are much tighter than the existing ones. +Overall, this paper is well written. The relevant literatures are exhaustively reviewed and well compared with the proposed method. The experimental results show remarkable superiority of the proposed method. The proposed framework would shed light on the literatures and open up a new direction of the relevant researches. +For those reasons, I would like to recommend this paper to be accepted by ICLR2022 conference.",ICLR2022, +wiiGLtM9HMZ,1610040000000.0,1610470000000.0,1,I4pQCAhSu62,I4pQCAhSu62,Final Decision,Reject,"This paper proposes Feature Contractive Learning (FCL), a training framework that takes a more nuanced view of robustness, refining it to the sensitivity of the feature. There are some differing opinions among the reviewers, with some applauding the simplicity of this new take on robustness while others are unsure of its underlying definitions and relationship to adversarial robustness. The authors claimed to have clarified some of these points in their rebuttal / revision, but unfortunately, there was not much follow-up discussion by the reviewers. Ultimately, there are still enough lingering issues that rejection is warranted.",ICLR2021, +AzZUG1xaI,1576800000000.0,1576800000000.0,1,B1eB5xSFvr,B1eB5xSFvr,Paper Decision,Accept (Poster),"The paper provides a language for optimizing through physical simulations. The reviewers had a number of concerns related to paper organization and insufficient comparisons to related work (jax). During the discussion phase, the authors significantly updated their paper and ran additional experiments, leading to a much stronger paper.",ICLR2020, +4NGYY_v5o,1576800000000.0,1576800000000.0,1,rygRP2VYwB,rygRP2VYwB,Paper Decision,Reject,All the reivewers find the similarity between this paper and the references in terms of the algorithm and the proof. The theoretical results may not better than the existing results.,ICLR2020, +SJlterMflE,1544850000000.0,1545350000000.0,1,SJl2niR9KQ,SJl2niR9KQ,"An interesting contribution, although some concerns regarding the claims",Accept (Poster),"The paper describes the use of differentiable physics based rendering schemes to generate adversarial perturbations that are constrained by physics of image formation. + +The paper puts forth a fairly novel approach to tackle an interesting question. However, some of the claims made regarding the ""believability"" of the adversarial examples produced by existing techniques are not fully supported. Also, the adversarial examples produced by the proposed techniques are not fully ""physical"" at least compared to how ""physical"" adversarial examples presented in some of the prior work were. + +Overall though this paper constitutes a valuable contribution. ",ICLR2019,4: The area chair is confident but not absolutely certain +Qzh8cMDsNcD,1642700000000.0,1642700000000.0,1,vtDzHJOsmfJ,vtDzHJOsmfJ,Paper Decision,Reject,"The paper considers learning classifiers under a fairness constraint which enforces the loss to be equal on certain subgroups. Reviewers found the work to be well-motivated, but raised concerns on the lack of discussion and comparison to relevant prior work. Notable examples in the fairness literature are Donini et al., ""Empirical Risk Minimization under Fairness Constraints"", Celis et al., ""Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees"", while in the more broader constrained optimization literature, Kumar et al. ""Implicit Rate-Constrained Optimization of Non-decomposable Objectives"". The authors are encouraged to incorporate reviewers' detailed comments for a revised version of this work.",ICLR2022, +S1eO6vY4lV,1545010000000.0,1545350000000.0,1,rJVorjCcKQ,rJVorjCcKQ,A very interesting new contribution to privacy and neural networks,Accept (Oral),"The authors propose a new method of securely evaluating neural networks. + +The reviewers were unanimous in their vote to accept. The paper is very well written, the idea is relatively simple, and so it is likely that this would make a nice presentation.",ICLR2019,5: The area chair is absolutely certain +iRXjHFh1W0,1610040000000.0,1610470000000.0,1,6htjOqus6C3,6htjOqus6C3,Final Decision,Reject,"The paper is in general well written and easy to follow, and the considered approach of controlling beta is sensible. However, all reviewers identify shortcomings in the empirical analysis of the proposed method (missing comparison with stronger baselines, convergence issues of the considered baselines, considered datasets, etc.). Furthermore, compared to the ControlVAE the contribution of the paper seems limited; and the empirical evaluation is insufficient to claim superior results in general. The authors did not address most of the concerns raised by the reviewers in their rebuttal. The authors can improve their paper substantially by performing the experimental results proposed by the reviewers and clarifying differences to the ControlVAE—but in its current form the paper does not meet the standard of ICLR. +",ICLR2021, +I7m8My9x4w9,1642700000000.0,1642700000000.0,1,iRCUlgmdfHJ,iRCUlgmdfHJ,Paper Decision,Accept (Oral),"This paper makes an important evaluation study to further the understanding of the representation capacity of DNNs. The novelty is in using the multi-order interaction proposed in Zhang et al. (2020) to understand the complexity of interactions in DNNs. The authors discovered an interesting representation bottleneck phenomenon, i.e., in a normally trained DNN, low-order and high-order interaction patterns are easy to be learned, while middle-order interaction patterns are difficult to be learned. They also propose two novel loss functions, which allow the model to encode interactions of specific orders, including middle-order interaction. All reviews are positive.",ICLR2022, +r1l1Pn-VlN,1544980000000.0,1545350000000.0,1,rJeZS3RcYm,rJeZS3RcYm,"A method that is appealing for its simplicity, but reviewer concerns regarding fairness of comparison persist.",Reject,"The paper considers a procedure for the generation of adversarial examples under a black box setting. The authors claim simplicity as one of the main selling points, with which reviewers agreed, while also noting that the results were impressive or ""promising"". There were concerns over novelty and some confusion over the contribution compared to Guo et al, which I believe has been clarified. + +The highest confidence reviewer (AnonReviewer2), a researcher with significant expertise in adversarial examples, raised issues of inconsistent threat models (and therefore unfair comparisons regarding query efficiency), missing baselines. A misunderstanding about comparison against a concurrent submission to ICLR 2019 was resolved on the basis that the relevant results are mentioned but not originally presented in the concurrent submission. + +While I disagree with AnonReviewer2 that results on attacking a particular image from previous work (when run against the Google Cloud Vision API) would be informative, the reviewer has remaining unaddressed concerns about the fairness of comparison (comparing against results reported in previous work rather than re-run in the same setting), and rightly points out that as many variables should be controlled for as possible when making comparisons. Running all methods under the same experimental setting with the same *collection* of query images is therefore appropriate. + +The authors have not responded to AnonReviewer2's updated post-rebuttal review, and with the remaining sticking point of fairness of comparison with respect to query efficiency I must recommend rejection at this point in time, while noting that all reviewers considered the method promising; I thus would expect to see the method successfully published in the near future once issues of the experimental protocol have been solidified.",ICLR2019,3: The area chair is somewhat confident +5BQ8h4NdkF6,1642700000000.0,1642700000000.0,1,Xa8sKVPnDJq,Xa8sKVPnDJq,Paper Decision,Reject,"This paper proposed a compositional approach to (conditionally) steer pre-trained music transformers to the direction intended by the user. Overall the scores are mostly negative. The reviewers pointed out some interesting aspects of the paper (e.g., using hard binary constraints as opposed to the soft ones, the contrastive approach). However, one common issue shared by all the reviewers is the clarity of the presentation, which led to many reviewers being confused about various aspects of the paper especially the empirical evaluation. The authors did provide a detailed response to address some of the concerns, but to fully address all the points I anticipate it would require quite substantial change to the paper. A couple reviewers also raised the concerns regarding the limited contribution of the paper. Finally, there appears to be some disagreement between the authors and reviewers regarding how to interpret the listening test results. I hope the authors can take the comments into consideration to further improve this paper for the next submission.",ICLR2022, +VWSBwKM9kz,1576800000000.0,1576800000000.0,1,rJeXDANKwr,rJeXDANKwr,Paper Decision,Reject,"This paper introduces a neural architecture search method that is geared towards yielding good uncertainty estimates for out-of-distribution (OOD) samples. + +The reviewers found that the OOD prediction results are strong, but criticized various points, including the presentation of the OOD results, novelty as a NAS paper, missing citations to some recent papers, and a lack of baselines with simpler ensembles. +The authors improved the presentation of their OOD results and provided new experiments, which causes one reviewer to increase his/her score from a weak reject to an accept. The other reviewers appreciated the rebuttal, but preferred not to change their scores from a weak reject and a reject, mostly due to lack of novelty as a NAS paper. + +I also read the paper, and my personal opinion is that it would definitely be very novel to have a good neural architecture search for handling uncertainty in deep learning; it is by no means the case that ""NAS for X"" is not interesting just because there are now a few papers for ""NAS for Y"". As long as X is relevant (which uncertainty in deep learning definitely is), and NAS finds a new state-of-the-art, I think this is great. For such an ""application"" paper of the NAS methodology, I do not find it necessary to introduce a novel NAS method, but just applying an existing one would be fine. The problem is more that the paper claims to introduce a new method, but that that method is too similar to existing ones, without a comparison; actually just using an existing NAS method would therefore make the contribution and the emphasis on the application domain clearer. +I have one small question to the authors about a part that I did not understand: to optimize WAIC (Eq 1), why is it not optimal to just set the parameterization \phi such that the variance is minimized, i.e., return a delta distribution p_\phi that always returns the same architecture (one with a strong prediction)? Surely, that's not what the authors want, but wouldn't that minimize WAIC? I hope the authors will clarify this in a future version. + +In the private discussion of reviewers and AC, the most positive reviewer emphasized that the OOD results are strong, but admitted that the mixed sentiment is understandable since people who do not follow OOD detection could miss the importance and context of the results, and that the paper could definitely improve its messaging. The other reviewers' scores remained at 1 and 3, but the reviewers indicated that they would be positive about a future version of the paper that fixed the identified issues. My recommendation is to reject the paper and encourage the authors to continue this work and resubmit an improved version to a future venue.",ICLR2020, +huD4-g75u7E,1610040000000.0,1610470000000.0,1,whAxkamuuCU,whAxkamuuCU,Final Decision,Reject,"This paper proposed a new type of models that are invariant to entities by exploring the symbolic property of entities. This problem is important in language modeling since it gives intrinsically more proper representation of sentences, which can better generalize to new entities. However I still suggest to reject this paper for the following reasons +1. The description of model is not clear enough which can certainly use a serious round of revision. +2. The experiments on bAbi is not convincing enough since it is an overly simple and toyish data-set with many ways to hack +3. Similar entity-invariant idea has been explored long time ago by (https://arxiv.org/pdf/1508.05508.pdf) which attempted to represent entities as “variables” + +",ICLR2021, +bKpM13gmQWX,1610040000000.0,1610470000000.0,1,jGeOQt3oUl1,jGeOQt3oUl1,Final Decision,Reject,"The paper studies three aspects of the representational capabilities of normalizing flows, with a particular focus on affine coupling layers. Normalizing flows are valuable generative-modelling tools, so advancing our understanding of their theoretical properties is an important research direction. + +Reviewers #2 and #4 found the contribution of the paper significant without expressing major concerns, and so recommended acceptance. + +Reviewer #3 reviewed the paper very thoroughly, and expressed some concerns mainly about the experimental evaluation. Most of their concerns were addressed in the rebuttal, so they recommended weak acceptance, recognizing the merits of the paper but also pointing out the potential for improvement. + +Reviewer #1 was the most critical: they expressed major concerns regarding the significance of the contributions and the overall clarity of the exposition. Despite a long exchange between the reviewer and the authors, a consensus was not reached, so the concerns remain. + +The discussion so far has led me to believe that there are potentially valuable theoretical contributions in the paper, however it's clear that there is significant room for improvement in getting the contributions across. Given the strong concerns expressed, the lack of consensus, and the clear potential for improvement, I'm unable to recommend acceptance of the paper in its current form. However, I do believe that the work has potential, and I hope that the discussion here will help improve the paper for a future submission.",ICLR2021, +46GQdLlT11,1610040000000.0,1610470000000.0,1,Ip195saXqIX,Ip195saXqIX,Final Decision,Reject,"The paper received four negative reviews. The overall idea was found to be interesting, but several concerns were raised. There is a general consensus that the experimental part and the results are not convincing. Several comments have also been made regarding the clarity and motivation, which needs to be strengthened. R4 also mentions references from the sparse estimation literature that would help for positioning the paper. The rebuttal did address some of these points, but it was not sufficient to change their opinion. + +Overall, the area chair agrees with the reviewers and follows their recommendation.",ICLR2021, +Co4Zd1yVnKF,1642700000000.0,1642700000000.0,1,6HN7LHyzGgC,6HN7LHyzGgC,Paper Decision,Accept (Poster),"The paper considers the important problem of performance degradation under distribution shift and proposes a simple yet effective method to alleviate this problem. They do so by considering feature statistic to be non-deterministic and rather a multivariate Gaussian distribution. The model can be integrated into networks without additional parameters and experiments show that it works better than BN as well as if the assumed distribution was uniform. The latter was added during rebuttal period. + +There were two main concerns regarding distinguishing the work from AdaIN and baseline that were addressed during rebuttal and some parts of the paper were re-written to address repetition.",ICLR2022, +zCWmSMEbxAq,1610040000000.0,1610470000000.0,1,MpStQoD73Mj,MpStQoD73Mj,Final Decision,Reject,"This paper introduces a framework for automatic differentiation with weighted finite-state transducers (WFSTs), which would allow user-specified graphs in structured output prediction tasks and easy plug-and-play of graphs through the composition operation (demonstrated with variants of CTC). The authors demonstrated their framework on the OCR and ASR domains, which are important application scenarios. All reviewers agree the work is useful and can potentially be significant. However, the reviewers think the paper needs more discussions of similar/parallel work and the key differences from them, and clear description of the novelty in terms of either machine learning insights or algorithmic implementations. We understand that this may be an implementation-heavy work, but the level of details provided in the current version does not convince the reviewers that the proposed approach is already efficient and can scale up. This could be shown by fair comparison with existing approaches (e.g., hard-coded error back-propagation implementation with a fixed graph) in runtime and accuracy. ",ICLR2021, +9dFBtRxa83,1576800000000.0,1576800000000.0,1,ByeUBANtvB,ByeUBANtvB,Paper Decision,Accept (Poster),"Initial reviews of this paper cited some concerns about a lack of comparison to SOTA and baselines, and also some debate over claims of what is (or is not) ""biologically plausible."" However, after extensive back-and-forth between the authors and reviewers these issues have been addressed and the paper has been improved. There is now consensus among authors that this paper should be accepted. I would like to thank the reviewers and authors for taking the time to thoroughly discuss this paper.",ICLR2020, +zdEeV_fdD2L,1610040000000.0,1610470000000.0,1,Hr-cI3LMKb8,Hr-cI3LMKb8,Final Decision,Reject,"This paper proposes to employ affinity cycle consistency(ACC) for extracting active (or shared) factors of variation across groups. Experiments shows how ACC works in various scenarios. + +Pros: +- The problem is important and relevant. +- The paper is well written. +- The proposed method is simple and effective. + +Cons: +- The experimental section is weak: + It lacks an ablation to validate the contribution of ACC and discussion on + why the method works and the scalability of the proposed method to more complex cases. +- The novelty is limited because the proposed ACC is similar to previous work temporal cycle consistency(TCC). +- The paper missed some implementation details and could be difficult to reproduce without code + provided. + +Reviewers raised the concerns listed in Cons. The authors conducted additional experiments and added more discussions on the experimental results in the revised paper. The authors also explained that ACC is more general than TCC. However, the reviewers were not convinced by the rebuttal and kept their original ratings. + +Due to the two main weaknesses -- limited novelty and weak experimental analysis, I recommend reject.",ICLR2021, +fMDzd3WUZQ,1610040000000.0,1610470000000.0,1,DktZb97_Fx,DktZb97_Fx,Final Decision,Accept (Oral),"All of the reviewers agree that this paper is well-written, and provides sound theoretical analyses and comprehensive empirical evaluations. Overall, this paper makes a useful contribution in the direction of individual fairness. The authors have also addressed the concerns raised by the reviewers in their response.",ICLR2021, +0uvZfmQaJf,1576800000000.0,1576800000000.0,1,ByxT7TNFvH,ByxT7TNFvH,Paper Decision,Accept (Poster),"The paper proposes a using pixel-adaptive convolutions to leverage semantic labels in self-supervised monocular depth estimation. Although there were initial concerns of the reviewers regarding the technical details and limited experiments, the authors responded reasonably to the issues raised by the reviewers. Reviewer2, who gave a weak reject rating, did not provide any answer to the authors comments. We do not see any major flaws to reject this paper.",ICLR2020, +1tyvaTgUQtf,1642700000000.0,1642700000000.0,1,c-4HSDAWua5,c-4HSDAWua5,Paper Decision,Accept (Poster),"The SketchODE submission is a continuously-valued model for chirographic drawing data such as handwritten digits or sketches. It relies on variational sequence-to-sequence model where the latent code z is a global encoding of the drawing dynamical, and contains a neural controlled differential equation encoder to encode a discrete 2D drawing sequence s, and an augmented neural ODE decoder (conditional on the latent code z) to model both the first-order dynamics both of the drawing velocity and of the pen state (effectively modelling second order dynamics on the pen position). The model enables to sample sketches by sampling latent codes, as well as to interpolate between two latent codes, and is evaluated on VectorMNIST (a new task), QuickDraw sketches, and DiDi schematics, where it is compared to discrete RNN-based Seq2Seq and two more recent baselines. + +Reviewers praised the idea of using continuously-valued Neural ODEs for drawing, compelling properties of the model for conditional generation or interpolation, the new VectorMNIST dataset, and the writing. Reviewers had some concerns: overstating the novelty and contribution to general continuous seq2seq given that the evaluation was done only on chirographic drawing tasks (Q3GY, zrrF), some experimental details such as missing ablations, examples from QuickDraw or Didi, or comparisons with transformers (Q3GY, zrrF), clarifications on computational complexity (zrrF, S7jh), situating the work with respect to applications of Neural ODEs to physics (zrrF); most of these concerns were addressed in the rebuttal. Reviewer KvGm had the most concerns about the experimental section, but has increased their score after the discussion with the authors. + +There was no discussion among the reviewers, only between the authors and reviewers zrrF and KvGm. After the authors' rebuttal, the scores became 8. 8, 6 and 5, and thus I believe that the paper meets the conference acceptance bar.",ICLR2022, +aIBORe2xpPj,1610040000000.0,1610470000000.0,1,8xeBUgD8u9,8xeBUgD8u9,Final Decision,Accept (Poster),"I agree with the reviewers, and I find the careful analysis of CL approaches relying on regularization for RNN useful and insightful. I do feel that a lot of the interesting content is still in the appendix (from a quick skim and looking at the plots in the appendix) but I think something like this can potentially be unavoidable. + +I do like the separation between sequence length and memory requirements. I think making observations about different types of recurrent architectures is hard, but I think the paper does a good job to raise some interesting questions. + +A note that I would make (that I haven't seen raised through a quick look in the paper) is that is not clear how the Fisher Information Matrix should be computed in case of a recurrent model (which is a problem in general). E.g. a typical thing is to compute it as for a feed-forward model (using the gradients coming from BPTT) which is feasible computationally, but actually that is problematic as you first sum gradients before taking their outer-product rather than summing the outer-products corresponding of the different terms in the gradient. I'm wondering if that plays a role here as well. + +Overall I think the paper does careful analysis and ablation studies and raises some interesting observation of how one should approach CL algorithms for RNN models. + +",ICLR2021, +8oA8I98Gsy3,1642700000000.0,1642700000000.0,1,GDUfz1phf06,GDUfz1phf06,Paper Decision,Reject,"This paper proposes two methods to learn the architecture of normalizing flows models; Their framework is inspired by (Liu et al., 2019) which uses ensembles/mixtures with learnable weights for architecture search. The application of these ideas to NFs requires a trivial modification to respect the invertibility constraint; which consists in building a mixture model over all possible sequences of compositions of transformations from a fixed set. + +The paper proposes to use an upper-bound to the forward KL instead of the fKL directly. The reasoning is that this will lead to a ""pure"" model after optimization, that is, the mixture weights will be in {0, 1}. Mathematically, this simply corresponds to treating the mixture as a latent-variable model and performing MAP-inference over discrete latent variables, assuming that all mixture components have the same prior weights in the mixture. + +The experimental results across various datasets are very mixed, and the family of transformations considered in the experiments is quite restricted.",ICLR2022, +BkaQUk6Sz,1517250000000.0,1517260000000.0,752,ryserbZR-,ryserbZR-,ICLR 2018 Conference Acceptance Decision,Reject,"Authors present a method for disease classification and localization in histopathology images. Standard image processing techniques are used to extract and normalize tiles of tissue, after which features are extracted from pertained networks. A 1-D convolutional filter is applied to the bag of features from the tiles (along the tile dimension, kernel filter size equal to dimensionality of feature vector). The max R and min R values are kept as input into a neural network for classification, and thresholding of these values provides localization for disease / non-disease. + +Pro: + - Potential to reduce annotation complexity of datasets while producing predictions and localization + +Con: +- Results are not great. If anything, results re-affirm why strong annotations are necessary. +- Several reviewer concerns regarding novelty of proposed method. While authors have made clear the distinctions from prior art, the significance of those changes are debated. + +Given the current pros/cons, the committee feels the paper is not ready for acceptance in its current form.",ICLR2018, +B18psGIOe,1486400000000.0,1486400000000.0,1,Bk67W4Yxl,Bk67W4Yxl,ICLR committee final decision,Reject,"The authors suggest alternate feedforward architectures (residual layers) for training a Go policy more efficiently. The authors refer to AlphaGo, but the proposed approach does not use reinforcement learning, which is not stated clearly in the introduction of the paper. The contribution is very incremental, there are no new ideas, and the presentation is muddled.",ICLR2017, +E6x0CDtFbll,1610040000000.0,1610470000000.0,1,Vd7lCMvtLqg,Vd7lCMvtLqg,Final Decision,Accept (Poster),"This paper proposes a method to cope with large vocabulary sizes. The idea is to find a small number of anchor words and to express every other word as a sparse nonnegative linear combination of them. They give an end-to-end method for training, and give a statistical interpretation of their algorithm as a Bayesian nonparametric prior (in particular an Indian restaurant process). They give extensions that allow them to deduce the optimal number of anchors which allows them to avoid needing to tune this hyperparameter. Finally they give a variety of experiments, particularly in language and recommendation tasks. The results on language are particularly impressive, and in the author response period, at the behest of a reviewer, they were able to extend the experiments to the Amazon Review dataset which contains 233M reviews on 43.5 M items by 15.2 M users. + +This paper is a nice combination of a simple but powerful idea, and a range of experiments demonstrating its utility. Other papers have proposed related ideas, but here the main novelty is in (1) using a small number of anchors that can incorporate domain knowledge and (2) using a sparse linear transformation to express other words in this basis. One reviewer did not find the Bayesian nonparametric interpretation to be fruitful, since it does not lead to techniques for handling growing datasets (e.g. if the ideal number of anchors changes over time). ",ICLR2021, +KRpUC4DdkBm,1642700000000.0,1642700000000.0,1,ATUh28lnSuW,ATUh28lnSuW,Paper Decision,Accept (Poster),"The paper proposes a novel approach to graph representation learning. In particular, a graph auto-encoder is proposed that aims to better capture the topological structure by utilising a neighbourhood reconstruction and a degree reconstruction objective. An optimal-transport based objective is proposed for the neighbourhood reconstruction that optimises the 2-Wasserstein distance between the decoded distribution and an empirical estimate of the neighbourhood distribution. An extensive experimental analysis is performed, highlighting the benefits of the proposed approach on a range of synthetic datasets to capture structure information. The experimental results also highlight its robustness across 9 different real-world graph datasets (ranging from proximity-oriented to structure-oriented datasets). + +Strengths: +- The problem studied is well motivated and the method proposed is well placed in the literature. +- The method is intuitive and the way that the neighbourhood information is reconstructed appears novel. +- The empirical comparisons are extensive. + + +Weaknesses: +- Some of the choices in matching neighborhoods seem a bit arbitrary and not sufficiently justified. +- The scalability of the proposed method is questionable. The method has a high complexity of O(Nd^3) (where N is the number of nodes and d is the average node degree). The authors address this problem by resorting to the neighborhood sampling method (without citing the prior art), which is only very briefly discussed in the paper. +- The reviewers have also expressed concerns about the fixed sample size q. The question of how the neighbour-sampling is handled when a node has less than q neighbours remains unanswered.",ICLR2022, +rkRJ3ATVJ,1576800000000.0,1576800000000.0,1,SJezGp4YPr,SJezGp4YPr,Paper Decision,Accept (Poster),"This paper takes steps towards a theory of convergence for TD(0) with non-linear function approximation. The paper provides two theoretical results. One result bounds the error when training the sum of linear and homogenous parameterized functions. The second result shows global convergence when the environment dynamics are sufficiently reversible and the differentiable function approximation is sufficiently well-conditioned. The paper provides additional insight using a family of environments with partially reversible dynamics. + +The reviewers commented on several aspects of this work. The reviewers wrote that the presentation was clear and that the topic was relevant. The reviewers were satisfied with the correctness of the results. The reviewers liked the result that state value function estimation error is bounded when using homogeneous functions. They also noted that the deep networks in common use are not homogeneous so this result does not apply directly. The result showing global convergence of TD(0) with partial reversibility was also appreciated. Finally, the reviewers liked the family of examples. + +This paper is acceptable for publication as the presentation was clear, the results are solid, and the research direction could lead to additional insights.",ICLR2020, +B1O5Ly6HG,1517250000000.0,1517260000000.0,845,S16FPMgRZ,S16FPMgRZ,ICLR 2018 Conference Acceptance Decision,Reject,"This paper proposes methods for replacing parts of neural networks with tensors, the values of which are efficiently estimated through factorisation methods. The paper is well written and clear, but the two main objections from reviewers surround the novelty and evaluation of the method proposed. I am conscious that the authors have responded to reviewers on the topic of novelty, but the case could be made more strongly in the paper, perhaps by showing significant improvements over alternatives. The evaluation was considered weak by reviewers, in particular due to the lack of comparable baselines. + +Interesting work, but I'm afraid on the basis of the reviews, I must recommend rejection.",ICLR2018, +BJlP8EEUgN,1545120000000.0,1545350000000.0,1,ByfXe2C5tm,ByfXe2C5tm,"Important contribution, but ultimately weak evaluation",Reject,"This paper combines Prolog-like reasoning with distributional semantics, applied to natural language question answering. Given the importance of combining neural and symbolic techniques, this paper provides an important contribution. Further, the proposed method complements standard QA models as it can be easily combined with them. + +The reviewers and AC note the following potential weaknesses: +(1) The evaluation consisted primarily on small subsets of existing benchmarks, +(2) the reviewers were concerned that the handcrafted rules were introducing domain information into the model, and (3) were unconvinced that the benefits of the proposed approach were actually complementary to existing neural models. + +The authors addressed a number of these concerns in the response and their revision. They discussed how OpenIE affects the performance, and other questions the reviewers had. Further, they clarified that the rule templates are really high-level/generic and not ""prior knowledge"" as the reviewers had initially assumed. The revision also provided more error analysis, and heavily edited the paper for clarity. Although these changes increased the reviewer scores, a critical concern still remains: the evaluation is not performed on the complete question-answering benchmark, but on small subsets of the data, and the benefits are not significant. This makes the evaluation quite weak, and the authors are encouraged to identify appropriate evaluation benchmarks. + +There is disagreement in the reviewer scores; even though all of them identified the weak evaluation as a concern, some are more forgiving than others, partly due to the other improvements made to the paper. The AC, however, agrees with reviewer 2 that the empirical results need to be sound for this paper to have an impact, and thus is recommending a rejection. Please note that paper was incredibly close to an acceptance, but identifying appropriate benchmarks will make the paper much stronger.",ICLR2019,3: The area chair is somewhat confident +NkxBbfNDRK,1610040000000.0,1610470000000.0,1,_adSMszz_g9,_adSMszz_g9,Final Decision,Reject,"This paper introduces a new model, called Memformer, that combines the strength of transformer networks and recurrent neural networks. While the reviewers found the idea interesting, they also raised issues regarding the experimental section. In particular, they found the results unconvincing, because of weak baselines, non standard experimental settings (eg. using reporting perplexity results on BPE tokens), or evaluating on only one dataset. These concerns were not well addressed by the rebuttal. For these reasons, I recommend to reject the paper.",ICLR2021, +CSCdKIKFBUC,1610040000000.0,1610470000000.0,1,IrM64DGB21,IrM64DGB21,Final Decision,Accept (Poster),"The reviewers appreciated the author replies, the additional experiments (more runs but also more ablations/baselines), and the updated paper. Also R2 is now largely satisfied (but seems to have been too late to post a public reply or to raise the score of the review). + +The paper provides important insights in model-based RL and its connections to planning, by studying MuZero with systematic ablations. Hence a valuable contribution to the community. All (major) cons have been addressed in the revision.",ICLR2021, +oja_hsXH00,1610040000000.0,1610470000000.0,1,xoPj3G-OKNM,xoPj3G-OKNM,Final Decision,Reject,"All reviewers agree that the contributions of this paper are not significant, and the paper does not compare well with many of the existing works. Authors did not respond.",ICLR2021, +iZ1SL5uF9AY,1642700000000.0,1642700000000.0,1,E9z2A1-O7e,E9z2A1-O7e,Paper Decision,Reject,"The paper uses a transformer model to generate CNN models and use it for few shot learning. + +Although the reviewers appreciate the ideas and the good benchmarking results presented in the paper they are find the paper somewhat incremental compared to previous work in the hyper network literature. This also despite the authors thorough rebuttal with additional results. This shows that the authors could have done a better job in presenting their work. + +Rejection is therefore recommended with a strong encouragement to rework the paper to counter future reviewers having similar reservations.",ICLR2022, +jB_WnSCRH,1576800000000.0,1576800000000.0,1,HyerxgHYvH,HyerxgHYvH,Paper Decision,Reject,"This paper proposes to train and compose neural networks for the purposes of arithmetic operations. All reviewers agree that the motivation for such a work is unclear, and the general presentation in the paper can be significantly improved. As such, I cannot recommend this paper in its current state for publication. +",ICLR2020, +D88GreM8KY,1576800000000.0,1576800000000.0,1,BJeGA6VtPS,BJeGA6VtPS,Paper Decision,Reject,"This paper presents a very creative threat model for neural networks. The proposed attack requires systems-level intervention by the attacker, which prompts the reviewers to question how realistic the attack is, and whether it is well motivated by the authors. After conversing with the reviewers on this topic, they have not changed their mind about these issues. As an AC, I think the threat model is both interesting and potentially realistic in some scenarios, however I agree with the reviewers that the motivation for the threat model could be more powerful. For example the authors could focus more on realistic types of malicious behaviors that a developer could embed into a neural network. I also think there's lots of opportunities for a range of applications that exploit the type of ""two nets in one"" behavior that the authors study. Despite the interesting ideas in this paper, the post-rebuttal scores are not strong enough to accept it. I encourage the authors to address some of these presentation issues, and resubmit this interesting paper to another venue.",ICLR2020, +H1bA3fUdl,1486400000000.0,1486400000000.0,1,HJWHIKqgl,HJWHIKqgl,ICLR committee final decision,Accept (Poster),"This paper presents two ways that MMDs can be used to aid the GAN training framework. The relation to current literature is clearly explained, and the paper has illuminating side-experiments. The main con is that it's not clear if MMD-based training will be competitive in the long run with more flexible, but harder-to-use, neural network based approaches. However, this paper gives us a conceptual framework to evaluate new proposals for related GAN training procedures.",ICLR2017, +eM9lMdsXtRk,1642700000000.0,1642700000000.0,1,Zca3NK3X8G,Zca3NK3X8G,Paper Decision,Reject,"This paper proposes an architecture of a policy network (WaveCorr) that is particularly effective for portfolio management tasks. A key observation that leads to the design of WaveCorr is that the dependency across asset should be treated differently from the dependency across time. The proposed WaveCorr has the property that it is ""permutation invariant"" with respect to assets, which means that the class of functions that can be represented by WaveCorr is invariant to permutation of assets. WaveCorr is shown to achieve the state-of-the-art performance in a portfolio management task. + +A major point of discussion was the definition of ""permutation invariance"". The reviewers and AC understood the difference between the permutation invariance defined in this paper and that studied in the prior work (the output of a network is insensitive to the permutation of the particular values of the input). With the definition in this paper, however, a fully connected layer is permutation invariant, but the Corr layer proposed in the paper appears to have more structure. It is unclear exactly what properties of the Corr layer leads to the performance improvement.",ICLR2022, +7A-2b_L_bH,1576800000000.0,1576800000000.0,1,ByeadyrtPB,ByeadyrtPB,Paper Decision,Reject,"The paper received 6, 3, 1. The main criticism is the lack of quantitative evaluation/comparison. The rebuttal did not convince the last reviewer who strongly argues for a comparison. The authors are encouraged to add additional results and resubmit to a future venue.",ICLR2020, +Hkl9yDCSxE,1545100000000.0,1545350000000.0,1,SklEEnC5tQ,SklEEnC5tQ,A solid contribution to regularize GANs,Accept (Poster),"This paper proposes distributional concavity regularization for GANs which encourages producing generator distributions with higher entropy. + +The reviewers found the contribution interesting for the ICLR community. R3 initially found the paper lacked clarity, but the authors took the feedback in consideration and made significant improvements in their revision. The reviewers all agreed that the updated paper should be accepted.",ICLR2019,4: The area chair is confident but not absolutely certain +Gq8qthZNNbq,1642700000000.0,1642700000000.0,1,Ab0o8YMJ8a,Ab0o8YMJ8a,Paper Decision,Reject,"A method for pruning neural networks is proposed. Reviewers raised several concerns, including poor technical presentation and insufficient experimental validation with respect to both baseline methods and ablation studies. All reviewer ratings lean toward reject and the authors did not provide a response.",ICLR2022, +xoX29kfjfC,1610040000000.0,1610470000000.0,1,FoM-RnF6SNe,FoM-RnF6SNe,Final Decision,Reject,"The reviewers agree that the paper, in its current form, is not strong enough to allow for publication. There are specific weaknesses that need to be tackled: a better correlation study; a clearer relationship to existing literature (and improvement on the novelty); clearer, more precise use of descriptions. + +The authors are encouraged to continue with their work and submit a more mature manuscript.",ICLR2021, +xxgBRUhXbaM,1610040000000.0,1610470000000.0,1,jN8TTVCgOqf,jN8TTVCgOqf,Final Decision,Reject,"The paper considers using local spectral graph clustering methods such at the PPR-Nibble method for graph neural networks. These local spectral methods are widely used in social networks, and understanding neural networks from them is interesting. + +In many ways, the results are interesting and novel, and they deserve to be more widely known, but there are several directions to make the work more useful to the community. These are outlined in the reviewer comments, which the authors answered partially but not completely satisfactorily. Much of this has to do with explaining how/where these (these very fundamental and ubiquitous) methods are useful in a particular application (GNNs here, and node embeddings below). An example of a paper that successfully did this is ""LASAGNE: Locality And Structure Aware Graph Node Embedding, E. Faerman, et al. Proc. 2018 Conference on Web Intelligence."" (That is mentioned not since it is directly relevant to this paper, but since it provides an example of how to present the use of a method such as PPR-Nibble for the community.",ICLR2021, +soD47Jxqn,1576800000000.0,1576800000000.0,1,rJe9fTNtPS,rJe9fTNtPS,Paper Decision,Reject,"This paper proposes a load-balanced hashing called AHash that balances the load of hashing bins to avoid empty bins that appear in some minwise hashing methods. +Reviewers found the work interesting and well-motivated. Authors addressed some clarity issues in their rebuttal. However the impact appeared quite limited, and the experimental validation limited to few realistic experiments that did not alleviate this concern. +We thus recommend rejection.",ICLR2020, +7I3Pmk9_cag,1642700000000.0,1642700000000.0,1,_X90SIKbHa,_X90SIKbHa,Paper Decision,Accept (Poster),"The reviewers found this work well-motivated and the additional experiments conducted during the response phase were greatly appreciated. Anderson's acceleration appears to be a simple device that may be of great value to this field, and therefore this work is a timely contribution. The presented theoretical results justify the authors' modifications, although at times it felt more comparisons would be welcome: (a) Section 3.2 could have compared to a lot of three-term recurrences that lead to the optimal dependence on the condition number, including Chebyshev's polynomial and conjugate gradient, as well as the results in Brezinski et al. (2018); (b) Section 3.3 would benefit from some comparison with ""Evans, Claire, Sara Pollock, Leo G. Rebholz, and Mengying Xiao (2020). “A Proof That Anderson Acceleration Improves the Convergence Rate in Linearly Converging Fixed-Point Methods (But Not in Those Converging Quadratically)”. SIAM Journal on Numerical Analysis, vol. 58, no. 1, pp. 788–810."" (c) the results in Section 3.4 seem to be a bit preliminary and it would be great if the authors could compare to standard rates of SGD. + +Overall we believe this work will generate more interest on memory-based optimization techniques in deep learning, and we encourage the authors to thoroughly polish their draft by incorporating the reviewers' comments and the responses during the discussion phase.",ICLR2022, +5YQBxE-nFI0,1610040000000.0,1610470000000.0,1,jQUf0TmN-oT,jQUf0TmN-oT,Final Decision,Reject,"This paper attempts to jointly search for the sensor and the neural network architecture. More specifically, the proposed approach jointly optimizes the parameters governing the PhlatCam sensor and the backend CNN model. In terms of the approach, the paper follows a well known DARTS formulation for the differentiable architecture search. A very straightforward solution was proposed for the problem. + +Although all the reviewers place that the paper is marginally above the acceptance threshold, none of them strongly support the paper and the reviewers point out that the paper is limited in terms of the setting and data. The problem formulation of the paper itself is interesting, but the AC agrees with the reviewers that the paper is limited and lacks enough technical contributions to warrant the acceptance to ICLR. +",ICLR2021, +xf91Fi33Qj,1576800000000.0,1576800000000.0,1,r1e_FpNFDr,r1e_FpNFDr,Paper Decision,Accept (Poster),"The authors present several theorems bounding the generalization error of a class of conv nets (CNNs) with high probability by + + O(sqrt(W(beta + log(lambda)) + log(1/delta)]/sqrt(n)), + +where W is the number of weights, beta is the distance from initialization in operator norm, lambda is the margin, n is the number of data, and the bound holds with prob. at least 1-delta. (They also present a bound that is tighter when the empirical risk is small.) + +The bounds are ""size free"" in the sense that they do not depend on the size of the *input*, which is assumed to be, say, a d x d image. While there is dependence on the number of parameters, W, there is no implicit dependence on d here. + +The paper received the following feedback: + +1. Reviewer 3 mostly had clarifying questions, especially with respect to (essentially independent) work by Wei and Ma. Reviewer 3 also pressed the authors to discuss how the bounds compared in absolute terms to the bounds of Bartlett et al. The authors stated that they did not have explicit constants to make such a comparison. Reviewer 3 was satisfied enough to raise their score to a 6. + +2. Reviewer 1 admitted they were not experts and raised some issues around novelty/simplicity. I do not think the simplicity of the paper is a drawback. The reviewers unfortunately did not participate in the rebuttal, despite repeated attempts. + +3. Reviewer 2 argued for weak reject, despite an interaction with the authors. The reviewer raised the issue of bounds based on control of the Lipschitz constant. The conversation was slightly marred by a typo in the reviewers original comment. I don't believe the authors ultimately responded to the reviewer's point. There was another discussion about simultaneously work and compression-based bounds. I would agree with the authors that they need not have cited simultaneous work, especially since the details are quite different. Ultimately, this reviewer still argued for rejection (weakly). + +After the rebuttal period ended, the reviewers raised some further concerns with me. I tried to assess these on my own, and ended up with my own questions. + +I raise these in no particular order. Each of them may have a simple resolution. In that case, the authors should take them as possible sources of confusion. Addressing them may significantly improve the readability of the paper. + +i. Lemma A.3. The order of quantification is poorly expressed and so I was not confident in the statement. In particular, the theorem starts \forall \eta >0 \exists C, .... but then C is REINTRODUCED later, subsequent to existential quantification over M, B, and d and so it seems there is dependence. If there is no dependence, this presentation is sloppy and should be fixed. + +ii. Lemma A.4, the same dependence of C on M, B and d holds here and this is quite problematic for the later applications. If this constant is independent of these quantities, then the order of quantifiers has been stated incorrectly. Again, this is sloppy if it is wrong. If it's correct, then we need to know how C grows. + +Based on other claims by the authors, it is my understanding that, in both cases, the constant C does not depend on M, B, or d. Regardless, the authors should clarify the dependence. If C does in fact depend on these quantities, and the conclusions change, the paper should be retracted. + +iii. Proof of Lemma 2.3. I'd remind the reader that the parametrization maps the unit ball to G. + +iv. The bound depends on control of operator norms and empirical margins. It is not clear how these interact and whether, for margin parameters necessary to achieve small empirical margin risk, the bounds pick up dependence on other aspects of the learning problem (e.g., depth). I think the only way to assess this would be to investigate these quantities empirically, say, by varying the size and depth of the network on a fixed data set, trained to achieve the same empirical risk (or margin). + +I'll add that I was also disappointed that the authors did not attempt to address any of the issues by a revision of the actual paper. In particular, the authors promise several changes that would have been straightforward to make in the two weeks of rebuttal. Instead, the reviewers and myself are left to imagine how things would change. I see at least two promises: + +A. To walk back some of the empirical claims about distance from initialization that are based on somewhat flimsy empirical evaluations. I would add to this the need to investigate how the margin and operator norms depend on depth empirically. + +B. Attribute Dziugate and Roy for establishing the first bounds in terms of distance from initialization, though their bounds were numerical. I think a mention of simultaneously work would also be generous, even if not strictly necessary. +",ICLR2020, +r1xhtEzHlV,1545050000000.0,1545350000000.0,1,HJfwJ2A5KX,HJfwJ2A5KX,meta-review,Accept (Poster),"The reviewers and AC note that the strength of the paper includes a) an interesting compression algorithm of neural networks with provable guarantees (under some assumptions), b) solid experimental comparison with the existing *matrix sparsification* algorithms. The AC's main concern of the experimental part of the paper is that it doesn't outperform or match the performance of the ""vanilla"" neural network compression algorithms such as Han et al'15. The AC decided to suggest acceptance for the paper but also strongly encourage the paper to clarify the algorithms in comparison don't include state-of-the-art compression algorithms. ",ICLR2019,5: The area chair is absolutely certain +BJg4rnmtxV,1545320000000.0,1545350000000.0,1,rkxkHnA5tX,rkxkHnA5tX,metareview,Reject,"The reviewers raised a number of major concerns including the incremental novelty of the proposed (if any), insufficient explanation, and, most importantly, insufficient and inadequate experimental evaluation presented. The authors did not provide any rebuttal. Hence, I cannot suggest this paper for presentation at ICLR.",ICLR2019,5: The area chair is absolutely certain +Vfh0M9AG5p,1576800000000.0,1576800000000.0,1,rJlNKCNtPB,rJlNKCNtPB,Paper Decision,Reject,"The paper improves the Bloom filter learning by utilizing the complete spectrum of the scores regions. + +The paper is nicely written with strong motivation and theoretical analysis of the proposed model. The evaluation could be improved: all the experiments are only tested on the small datasets, which makes it hard to assess the practicality of the proposed method. The paper could lead to a strong publication in the future if the issue on evaluation can be addressed. +",ICLR2020, +BkTK7kTBM,1517250000000.0,1517260000000.0,192,r1Dx7fbCW,r1Dx7fbCW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),Well motivated and well received by all of the expert reviewers. The AC recommends that the paper be accepted.,ICLR2018, +yt5UcfQVh,1576800000000.0,1576800000000.0,1,SJx0oAEYwH,SJx0oAEYwH,Paper Decision,Reject,"The paper proposes a filtration based on the covers of data sets and demonstrates its effectiveness in recommendation systems and explainable machine learning. The paper is theory focused, and the discussion was mainly centered around one very detailed and thorough review. The main concerns raised in the reviews and reiterated at the end of the rebuttal cycle was lack of clarity, relatively incremental contribution, and limited experimental evaluation. Due to my limited knowledge of this particular field, I base my recommendation mostly on R1's assessment and recommend rejecting this submission.",ICLR2020, +d-0NsuthcE,1576800000000.0,1576800000000.0,1,Skgy464Kvr,Skgy464Kvr,Paper Decision,Accept (Poster),"This paper presents a mechanism for capsule networks to defend against adversarial examples, and a new attack, the reconstruction attack. The differing success of this attacks on capsnets and convnets is used to argue that capsnets find features that are more similar to what humans use. + +Reviewers generally like the paper, but took instance with the strength of the claim (about the usefulness of the examples) and argued that the paper might not be as novel as it claims. + +Still, this seems like a valuable contribution that should be published.",ICLR2020, +xlhu-6ZScfF,1642700000000.0,1642700000000.0,1,i2baoZMYZ3,i2baoZMYZ3,Paper Decision,Reject,"The paper presents a reinforcement learning technique for problems with continuous actions. The proposed approach consists in learning a discretization of continuous action spaces from human demonstrations. This discretization returns a fixed number of actions for each input state. By discretizing the action space, any discrete action deep RL technique can be readily applied to the continuous control problem. Experiments reported in the paper show that the proposed approach outperforms several RL baselines such as SAC. + +The key criticism from the reviewers relates to the incremental nature of this paper's contribution. While the precise equation proposed by in this paper for learning discrete actions from demonstrations may be novel, there have been several very similar techniques in the literature. For example, Gaussian Mixture Models (GMMs), a closely related model, have been widely studied in the context of learning policies from demonstrations. + +In summary, the reviewers are not convinced that the paper contains sufficiently novel ideas for an ICLR publication.",ICLR2022, +Sylupyz-l4,1544790000000.0,1545350000000.0,1,rkxusjRctQ,rkxusjRctQ,lack of experiments to obvious geometric baselines and previous learning-based methods for localization ,Reject,"The paper proposes a method that learns mapping implicitly, by using a generative query network of Eslami et al. with an attention mechanism to learn to predict egomotion. The empirical findings is that training for egomotion estimation alongside the generative task of view prediction helps over a discriminative baseline, that does not consoder view prediction. The model is tested in Minecraft environments. +A comparison to some baseline SLAM-like method, e.g., a method based on bundle adjustment, would be important to include despite beliefs of the authors that eventually learning-based methods would win over geometric methods. For example, potentially environments with changes can be considered, which will cause the geometric method to fail, but the proposed learning-based method to succeed. + +Moreover, there are currently learning based methods for the re-localization problem that the paper would be important to compare against (instead of just cite), such as ""MapNet: An Allocentric Spatial Memory for Mapping Environments"" of Henriques et al. and ""Active Neural Localization"" of Chaplot et al. . In particular, Mapnet has a generative interpretation by using cross-convolutions as part of its architecture, which generalize very well, and which consider the geometric formation process. The paper makes a big distinction between generative and discriminative, however the architectural details behind the egomotion estimation network are potentially more or equally important to the loss used. This means, different discriminative networks depending on their architecture may perform very differently. Thus, it would be important to present quantitative results against such methods that use cross-convolutions for egomotion estimation/re-localization. ",ICLR2019,5: The area chair is absolutely certain +S1hmIJ6SM,1517250000000.0,1517260000000.0,751,H1uP7ebAW,H1uP7ebAW,ICLR 2018 Conference Acceptance Decision,Reject,"Authors apply dense nets and LSTM to model dependencies among labels and demonstrate new state-of-art performance on an X-Ray dataset. + +Pros: +- Well written. +- New improvement to state-of-art + +Cons: +- Novelties are not strong. One combination of existing approaches are used to achieve state-of-art on what is still a relatively new dataset. (All Reviewers) + +- Using LSTM to model dependencies would be affected by the selected order of the disease states. In this sense, LSTM seems like the wrong architecture to use to model dependencies among labels. This may be a drawback in comparison to other methods of modeling dependencies, but this is not thoroughly discussed or evaluated. (Reviewer 1 & 3) + +- There is a large body of work on multi-task learning with shared information, which have not been evaluated for comparison. Because of this, the contribution of the LSTM to model dependencies between labels in comparison to other available approaches cannot be verified. (Reviewer 1 & 3) + +- Top AUC performance on this dataset does not carry much significance on its own, as the dataset is new (CVPR 2017), and few approaches have been tested against it. + +- Medical literature not cited to justify with evidence the discovered dependencies among disease states. (Reviewer 1) + +",ICLR2018, +A3xXDp_zGL,1576800000000.0,1576800000000.0,1,SkgvvCVtDS,SkgvvCVtDS,Paper Decision,Reject,"This paper present a learning method for speeding up of LP, and apply it to the TSP problem. + +Reviewers and AC agree that the idea is quite interesting and promising. However, I think the paper is far from being ready to publish in various aspects: + +(a) much more editorial efforts are necessary +(b) the TPS application of small scale is not super appealing + +Hence, I recommend rejection.",ICLR2020, +5Lr0vxUl0Ei,1610040000000.0,1610470000000.0,1,IU8QxEiG4hR,IU8QxEiG4hR,Final Decision,Reject,"This paper addresses the problem of estimating a “birds-eyed-view” overhead semantic layout estimate of a scene given an input pair of stereo images of the scene. The authors present an end-to-end trainable deep network that fuses features derived from the stereo images and projects these features into an overhead coordinate frame which is passed through a U-Net style model to generate the final top view semantic segmentation map. The model is trained in a fully supervised manner. Experiments are performed on the CARLA and KITTI datasets. + +While R2 was positive, they still had some concerns after reading the rebuttal and the other reviews. Specifically, they were not convinced about the value of the IPM module. This concern was also shared by R4, especially in light of the relationship to Roddick et al. BMVC 2019. R1 had concerns about the experiments, specifically the quantitative comparisons to MonoLayout. The authors addressed these comments, but it is still not clear if the differences can be attributed to the number of classes, how they are weighted, or the training split used? R3 had questions about the utility of BEV predictions in general. However, as stated by R2, there is a lot of value in approaching the problem in this way. + +In conclusion, while there were some positive comments from the reviewers, there were also several significant concerns. With no reviewer willing to champion the paper, there is not enough support to justify accepting the paper in its current form. +",ICLR2021, +ts9yuuqh4,1576800000000.0,1576800000000.0,1,S1eALyrYDH,S1eALyrYDH,Paper Decision,Accept (Talk),"This paper proposes a RNA structure prediction algorithm based on an unrolled inference algorithm. The proposed approach overcomes limitations of previous methods, such as dynamic programming (which does not work for molecular configurations that do not factorize), or energy-based models (which require a minimization step, e.g. by using MCMC to traverse the energy landscape and find minima). + +Reviewers agreed that the method presented here is novel on this application domain, has excellent empirical evaluation setup with strong numerical results, and has the potential to be of interest to the wider deep learning community. The AC shares these views and recommends an enthusiastic acceptance. ",ICLR2020, +a5525C6CH5,1610040000000.0,1610470000000.0,1,LxBFTZT3UOU,LxBFTZT3UOU,Final Decision,Reject,"This paper proposes an empirical method to automatically schedule the learning rate in stochastic optimization methods for deep learning, based on line-search over a locally fitted model. The idea looks interesting and promising, but the reviewers have serious concerns in the lack of principled support and insufficient empirical evidences. Therefore I recommend rejection of the paper and encourage the author(s) to strengthen the idea and contribution with further theoretical and empirical study.",ICLR2021, +9Z-Kd5_r6D0,1610040000000.0,1610470000000.0,1,hTUPgfEobsm,hTUPgfEobsm,Final Decision,Reject,"Most of the reviewers had serious problems with clarity to start out. +The authors have addressed some, but not all of these problems. + +More importantly, there were issues of significance and experimental evaluation. +I concur with r4 on the experimental evaluation. +I think if you're going to explicitly specialize toward disentangling affine transform parameters, +that's fine, but then you're in application-paper land, and I think there needs to be more of an attempt to show +that it will work ""in the wild"". +For this reason, and for the general reason that reviewers unanimously voted to reject, I am recommending rejection.",ICLR2021, +1-h2ZEooC9t,1642700000000.0,1642700000000.0,1,QdcbUq0-tYM,QdcbUq0-tYM,Paper Decision,Reject,"The paper addresses the learning of robot controllers for changing or unknown environments. It makes use of differentiable physics for online system identification and of reinforcement learning for offline policy training. A universal controller is trained on a distribution of simulation parameters in order to ensure its robustness. Differentiable physics is used to estimate the simulation parameters from the recent observation history. These parameters are fed to the controller so as to modulate the policy. This approach is evaluated on three benchmarks (2 + 1 added during the rebuttal). + +The main originality of the paper is the use of differentiable physics for the identification of the parameters in the context of varying environments. The topic is of interest and in line with the recent developments for robotics. However, the novelty is limited, and all evaluators were concerned about the limited experimental contribution. The authors have added a new experiment during the rebuttal but this was not considered sufficient to change the evaluation. Overall this is considered as a promising contribution, but the experimental setting should be largely improved with additional problems and comparisons with SOTA methods from the recent RL literature.",ICLR2022, +jUclfT2jm,1576800000000.0,1576800000000.0,1,BJgMFxrYPB,BJgMFxrYPB,Paper Decision,Accept (Poster),"This paper presents a framework for navigation that leverages learning spatial affordance maps (ie what parts of a scene are navigable) via a self-supervision approach in order to deal with environments with dynamics and hazards. They evaluate on procedurally generated VizDoom levels and find improvements over frontier and RL baseline agents. + +Reviewers all agreed on the quality of the paper and strength of the results. Authors were highly responsive to constructive criticism and the engagement/discussion appears to have improved the paper overall. After seeing the rebuttal and revisions, I believe this paper will be a useful contribution to the field and I’m happy to recommend accept. +",ICLR2020, +rym52GLug,1486400000000.0,1486400000000.0,1,BkmM8Dceg,BkmM8Dceg,ICLR committee final decision,Reject,"The reviewers found the idea interesting and practical but had concerns about the novelty of the approach and the claims and theory presented in the paper. In particular, it seems that the reviewers feel that the authors' claim to present novel theory is unjustified (i.e. the theorems presented are not novel). Also claims regarding the advantages of this approach relative to related literature (in particular Dieleman et al. and Cohen & Welling) seem to be unsubstantiated. This work would likely be more well received just through rewriting the manuscript with, as stated by AnonReviewer3, ""a more thorough and balanced appraisal of the merits and demerits, as well as the novelty, relative to each of the previous works"" and weakening the claims of novel theory.",ICLR2017, +AYB0xheCUL,1576800000000.0,1576800000000.0,1,Hke1gySFvB,Hke1gySFvB,Paper Decision,Reject,"This paper introduces the idea of ""empathy"" to improve learning in communication emergence. The reviewers all agree that the idea is interesting and well described. However, this paper clearly falls short on delivering the detailed and sufficient experiments and results to demonstrate whether and how the idea works. + +I thank the authors for submitting this research to ICLR and encourage following up on the reviewers' comments and suggestions for future submission. ",ICLR2020, +rk4wBkaSz,1517250000000.0,1517260000000.0,583,HklpCzC6-,HklpCzC6-,ICLR 2018 Conference Acceptance Decision,Reject,The experimental work was seen as one of the main weaknesses.,ICLR2018, +q4bvbQHGe9w,1642700000000.0,1642700000000.0,1,8kpSWDgzsh0,8kpSWDgzsh0,Paper Decision,Reject,"In this paper, the authors consider linear quadratic network games (also known as graphical games) and they discuss a number of conditions and procedures to learn the underlying graph of the game from observations of best-response trajectories (or possibly infinite sets thereof) in the game. + +The reviewers' initial assessment was overall negative, with two reviewers recommending rejection and one giving a borderline positive recommendation. The authors' rebuttal did not address the concerns of the reviewers recommending rejection, and the authors did not provide a revised paper for the reviewers to see how the authors would implement the suggested changes, so the overall negative assessment remained. + +After my own reading of the paper, I concur with the majority view that the paper has several weaknesses that do not make it a good fit for ICLR (especially regarding the lack of precision in the theorems and the statement of the relevant assumptions), so I am recommending rejection.",ICLR2022, +xIlozHBxMR,1610040000000.0,1610470000000.0,1,8znruLfUZnT,8znruLfUZnT,Final Decision,Reject,"The paper proposes a deep learning approach to blind image denoising based on deep unrolling. In particular, the proposed network is derived from convolutional sparse coding algorithms, which are unrolled, untied across layers and learned from data. The paper proposes a frequency domain regularization scheme in which the filters consist of a single analytically defined low-pass filter and a large collection of filters which are constrained to reside in the mid-to-high frequency ranges. It also proposes to tie the thresholds in the soft-thresholding stages of the learned network to estimates of the noise variance, making the proposed scheme more robust to variations in the noise level. + +Pros and Cons: + +[+] Having a single low-pass dictionary atom reduces redundancy (and potentially coherence) in the learned dictionary. This type of regularization may also reduce the time/data required to learn. + +[+/-] Using noise estimators and a noise adaptive threshold renders the model more robust to variations in the noise level. This is important, since in most denoising applications the noise level is not known a-priori. As the reviewers note, the idea of tuning thresholds in an unrolled sparse coding method based on the noise level is not a novelty of the paper; the novelty here is coupling this with a wavelet-based estimate of the noise level. + +[-] All three reviewers raise concerns regarding the novelty of the work compared to existing convolutional sparse coding-based neural networks. The structure of the network is similar; the main difference is the frequency restriction for learned atoms, which is enforced by prefiltering the learned atoms with a high-pass filter. + +[-] The paper is not entirely clear in its motivation and argumentation. Reducing the coherence of the learned dictionary makes sense from the perspective of certain worst case results from sparse approximation. However, the coherence is a worst case quantity; moreover, certain approaches to coherence control (e.g., using large stride) control coherence at the expense of the expressiveness of the dictionary, and hence may not actually improve its ability to provide sparse reconstructions of natural signals. The proposed frequency domain regularization is a sensible approach to controlling coherence, since low-frequency atoms will tend to be highly coherent, but would benefit from a crisper analytical motivation. + +[-] Reviewers found the experiments lacking in some regards. In particular, the paper only evaluates its proposals on synthetic experiments with Gaussian noise. While this is in line with some previous work on deep learning based denoising, more extensive and realistic experiments would have bolstered the paper's argument. + +Overall, the paper makes a sensible proposal regarding the adaptivity to unknown noise levels, and introduces a potentially useful frequency-domain restriction on the learned filters in a CSC network. However, the reviewers did not find that the paper made a clear argument for the significance of these proposals, and raised other concerns regarding the clarity and experiments. The consensus of the reviewers is to recommend rejection. ",ICLR2021, +1ZIKu8ItqWK,1610040000000.0,1610470000000.0,1,6Lhv4x2_9pw,6Lhv4x2_9pw,Final Decision,Reject,"The paper considers an interesting application of Bayesian neural nets to the geophysics domain; however, the paper does not make a novel contribution from the machine learning perspective, and the improvements on top of the previously proposed approach by Ahamed & Daub (2019) seem to be quite modest. Overall, the paper does not seem to be ready for publication at ICLR. + ",ICLR2021, +r1xq2DCXe4,1544970000000.0,1545350000000.0,1,HJlQfnCqKX,HJlQfnCqKX,"A layer-wise geometric margin distribution is used to calibrate the generalization ability, with extensive experimental support yet lacking a theory.",Accept (Poster),"The paper suggests a new measurement of layer-wise margin distributions for generalization ability. Extensive experiments are conducted. Though there lacks a solid theory to explain the phenomenon. The majority of reviewers suggest acceptance (9,6,5). Therefore, it is proposed as probable accept.",ICLR2019,4: The area chair is confident but not absolutely certain +r1yInMLug,1486400000000.0,1486400000000.0,1,H1eLE8qlx,H1eLE8qlx,ICLR committee final decision,Reject,"Most of the reviewers agreed that the proposed budgeted options framework was interesting, but there were a number of serious concerns raised about the work. Many of the reviewers found the assumptions of the approach to be somewhat odd, and while the particular formulation in the paper was generally assessed as novel, it has connections to a number of previous works that were not explored in detail. Finally, the experimental evaluation is conducted on simple tasks with few comparisons, so it is very difficult to make concrete conclusions about how well the method works.",ICLR2017, +SkP9EkpHG,1517250000000.0,1517260000000.0,413,BJypUGZ0Z,BJypUGZ0Z,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"The paper proposes to use simple regression models for predicting the accuracy of a neural network based on its initial training curve, architecture, and hyper-parameters; this can be used for speeding up architecture search. While this is an interesting direction and the presented experiments look quite encouraging, the paper would benefit from more evaluation, as suggested by reviewers, especially within state-of-the-art architecture search frameworks and/or large datasets.",ICLR2018, +DK7PRAgClCm,1642700000000.0,1642700000000.0,1,metRpM4Zrcb,metRpM4Zrcb,Paper Decision,Accept (Spotlight),"The authors propose a memory-based continual learning method that decomposes the models' parameters and that shares a large number of the decomposed parameters across tasks. In other words, only a small number of parameters are task-specific and the memory usage of storing models from previous tasks is hence a fraction of the memory usage of previous approaches. The authors take advantage of their method to propose specific ensembling approaches and demonstrate the strong performance of their methods using several datasets. + +In the rebuttal, the authors were very reactive and provided many useful additional results during including a comparison of the computational cost of their method vs. others, results using two new datasets (CUBS & Flowers), and additional results on mini-ImageNet. They also answered, through additional experiments, several reviewer questions including the robustness to different first tasks in the sequence. + +Overall, the reviewers after the rebuttal/discussion period agree that this is a strong contribution: novel and fairly simple method with some theoretical justification, thorough empirical evaluation, well-written and easy to follow manuscript. It also opens a few interesting avenues some of which the authors have already explored in their paper (e.g., ensembling).",ICLR2022, +H5890rkeS_H,1610040000000.0,1610470000000.0,1,ufZN2-aehFa,ufZN2-aehFa,Final Decision,Accept (Poster),"The authors present a Bayesian approach for context aggregation in neural processes based models. The article is well written, and provides a nice and comprehensive framework. The reviewers raised some issues regarding the lack of comparisons to proper baselines. The authors provided additional comparisons in the revised version. The comparisons were found satisfactory by some some reviewers, who increased their scores. Based on the revised version, I recommend acceptance.",ICLR2021, +HkhmN1pBG,1517250000000.0,1517260000000.0,323,BkSDMA36Z,BkSDMA36Z,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"despite not amazing scores, this is a solid paper. +it created a lot of discussion and was found to be reproducible. +we should accept it to let the iclr community partake in the discussion and learn about this method of n-gram embeddings +",ICLR2018, +rJr5ryTBM,1517250000000.0,1517260000000.0,625,BJ78bJZCZ,BJ78bJZCZ,ICLR 2018 Conference Acceptance Decision,Reject,"RDA improves on RWA, but even so, the model is inferior to the other standard RNN models. As a result R1 and R3 question the motivation for the use of this model -- something the authors should motivate.",ICLR2018, +SyxVaYC4JE,1543990000000.0,1545350000000.0,1,ryxOIsA5FQ,ryxOIsA5FQ,Insufficient Novelty,Reject,"This work proposes a method for both instance and feature based transfer learning. +The reviewers agree that the approach in current form lacks sufficient technical novelty for publication. The paper would benefit from experiments on larger datasets and with more analysis into the different aspects of the proposed model. +",ICLR2019,5: The area chair is absolutely certain +QbBlFJuB40h,1610040000000.0,1610470000000.0,1,mfJepDyIUcQ,mfJepDyIUcQ,Final Decision,Reject,"# Quality: +The paper makes a good job of presenting the proposed algorithm, which seems interesting and solid. +However, the paper fails to place the proposed approach in the larger context of the existing literature. +In addition, only qualitative results are presented, without any comparison. +As such, it is impossible to really understand the goodness of the proposed approach. + +# Clarity: +The paper is generally well-written and clear. + +# Originality: +The proposed approach is novel to the best of the reviewers and my knowledge. + +# Significance of this work: +The paper deal with a very relevant and timely topic. However, as stated by the authors themself the paper is not concerned with high-dimensional systems, which is what would really differentiate this work compared to existing literature. In addition, the paper has no quantitative results nor comparisons against previous literature, and does not evaluate any of the standard benchmarks. + +# Overall: +There is disagreement from the reviewers regarding the acceptance of this paper, and the overall score is very borderline. After thoroughly reading the paper, I agree with the evaluation of Reviewer 2 and 3 regarding the lack of comparisons and thus lean towards rejection. + +",ICLR2021, +S1l3uB2A14,1544630000000.0,1545350000000.0,1,rklXaoAcFX,rklXaoAcFX,perhaps for another venue?,Reject,"Learning on Riemannian manifolds can be easily done with this Python package. Considering the recent work on these in latent-variable models, the package can be quite a useful approach. + +But its novelty is disputed. In particular Pymanopt is a package that does mostly the same, even though that may be computationally more expensive. The merits of Geomstats vs. Pymanopt is not clarified. But be that as it may, there is interest amongst the reviewers for the software package. + +In the end, too, it's not uniformly agreed upon that a software-describing paper fits ICLR.",ICLR2019,5: The area chair is absolutely certain +HkQ7BJTSf,1517250000000.0,1517260000000.0,530,Byk4My-RZ,Byk4My-RZ,ICLR 2018 Conference Acceptance Decision,Reject,"This paper presents a method for learning more flexible prior distributions for GANs by learning another distribution on top of the latent codes for training examples. It's reminiscent of layerwise training of deep generative models. This seems like a reasonable thing to do, but it's probably not a substantial enough contribution given that similar things have been done for various other generative models. Experiments show improvement in samples compared with a regular GAN, but don't compare against various other techniques that have been proposed for fixing mode dropping. For these reasons, as well as various issues pointed out by the reviewers, I don't recommend acceptance. +",ICLR2018, +BYqCF49Ktck,1610040000000.0,1610470000000.0,1,Ig-VyQc-MLK,Ig-VyQc-MLK,Final Decision,Accept (Poster),"The paper analyses several approaches to pruning at initialization, compared to after training. There was a large gap in reviewers appreciation of the paper, but I think that the pros outdo the cons as the paper show a lot of insights overall. I recommend accepting the paper.",ICLR2021, +JWRbBs2-kH,1576800000000.0,1576800000000.0,1,SJgs8TVtvr,SJgs8TVtvr,Paper Decision,Reject,"The paper proposes a VAE with a mixture-of-experts decoder for clustering and generation of high-dimensional data. Overall, the reviewers found the paper well-written and structured , but in post rebuttal discussion questioned the overall importance and interest of the work to the community. This is genuinely a borderline submission. However, the calibrated average score currently falls below the acceptance threshold, so I’m recommending rejection, but strongly encouraging the authors to continue the work, better motivating the importance of the work, and resubmitting.",ICLR2020, +JAITdEPnGG,1610040000000.0,1610470000000.0,1,TBIzh9b5eaz,TBIzh9b5eaz,Final Decision,Accept (Poster),"This paper proposes O-RAAC, an offline RL algorithm that minimizes the Conditional Value-at-Risk (CVaR) of the learned policy's return given a dataset by a behavior policy. The reviews are generally positive with most agreeing that the paper presents interesting empirical results. + +The experiments are limited to simpler domains, and could be extended to include harder tasks from other continuous control domains. Some examples could be domains such as in Robosuite (http://robosuite.ai/) or Robogym (https://github.com/openai/robogym). These environments have higher dimensional systems with more clearer safety settings. +Agreeably, asking for comparisons with unpublished results may be unfair, however, it would be recommended to authors to include additional comparisons with latest methods in Offline/Batch-RL, including the ones which don't guarantee risk, such as CQL, BRAC, CSC. + +Further, The theoretical properties of the proposed algorithm are largely unclear. It would help to analyze the effect of both convergence rates, and fixed points, further what is the effect of addition of risk, does the algorithm converge to a suboptimal solution or get there slower. Finally empirical reporting of cumulative number of failures (discrete count) during training as well as during evaluation would be very useful to practitioners. + +Other relevant and concurrent papers to potentially take note of: +Distributional Reinforcement Learning for Risk-Sensitive Policies (https://openreview.net/forum?id=19drPzGV691) +Conservative Safety Critics for Exploration (https://openreview.net/forum?id=iaO86DUuKi) + +I would recommend acceptance of the paper. I would strong encourage release of sufficiently documented and easy to use implementation. Given the fact that the main argument is empirical utility of the method, it would be limit the impact of this work if readers cannot readily build on O-RAAC. +",ICLR2021, +5G8P29mA0Z,1576800000000.0,1576800000000.0,1,SyxrxR4KPS,SyxrxR4KPS,Paper Decision,Accept (Spotlight),"This paper is somewhat unorthodox in what it sets out to do: use neuroscience methods to understand a trained deep network controlling an embodied agent. This is exciting, but the actual training of the virtual rodent and the performance it exhibits is also impressive in its own right. All reviewers liked the papers. The question that recurred among all reviewers was what was actually learned in this analysis. The authors responded to this convincingly by listing a number of interesting findings. + +I think this paper represents an interesting new direction that many will be interested in.",ICLR2020, +mRjBB3rwc3,1576800000000.0,1576800000000.0,1,B1liIlBKvS,B1liIlBKvS,Paper Decision,Reject,"There has been a long discussion on the paper, especially between the authors and the 2nd reviewer. While the authors' comments and paper modifications have improved the paper, the overall opinion on this paper is that it is below par in its current form. The main issue is that the significance of the results is insufficiently clear. While the sender-receiver game introduced is interesting, a more thorough investigation would improve the paper a lot (for example, by looking if theoretical statements can be made).",ICLR2020, +P8eZ5kF-Vx,1610040000000.0,1610470000000.0,1,UOOmHiXetC,UOOmHiXetC,Final Decision,Reject,"This paper proposes a modification to MCTS in which a sequence of nodes (obtained by following the policy prior) are added to the search tree per simulation, rather than just a single node. This encourages deeper searches that what is typically attained by vanilla MCTS. STS results in slightly improved performance in Sokoban and much larger improvements Google Research Football. + +R4 and R1 both liked the simplicity of the idea, with R1 also praising the paper for the thoroughness of its evaluation. I agree that the idea is interesting and worth exploring, and am impressed by the scope of the experiments in the paper as well as the additional ones linked to in the rebuttal. However, R1 and R5 explicitly noted they had many points of confusion, and across the reviews there seemed to be many questions regarding the difference between STS and other variants of MCTS. I also needed to read parts of the paper multiple times to fully understand the approach. If this many experts on planning and MCTS are confused, then I think readers who are less familiar with the area will definitely struggle to understand the main takeaways. While I do think the clarifications and new experiments provided in the rebuttal help, my overall sense is that the paper at this stage is not written clearly enough to be ready for publication at ICLR. I would encourage the authors to try to synthesize their results and organize them more succinctly in future versions of the paper. + +One comment about a point of confusion that I had: I noticed the PUCT exploration parameter was set to zero for Sokoban, and one for GRF (with an explanation given that many values were tried, though these values are unspecified). As the exploration parameter is normally considered to be the thing that controls whether MCTS acts more like BFS ($c = \infty$) or DFS ($c = 0.0$), I would encourage the authors to more explicitly report which values they tried and to be clearer about the advantage of STS's multi-step expansions over low values of the exploration parameter.",ICLR2021, +6eU6H4lngg1,1642700000000.0,1642700000000.0,1,lQI_mZjvBxj,lQI_mZjvBxj,Paper Decision,Accept (Poster),"This manuscript proposes and analyzes a distillation approach to address heterogeneity in distributed learning. The main paper focuses on a relatively simple two-agent kernel regression setting, and the insights developed are extended (and partially analyzed) for a multiagent setting. + +There are four reviewers, all of whom agree that the method addresses an interesting and timely issue. However, reviewers are mixed on the paper score. While all reviewers agree that the setting is somewhat stylized, a subset of reviewers highlights that the results give some deep insight that might drive future analysis and implementation in the area. Other concerns raised include potential issues with the communication overhead and the simplicity of the kernel regression setting vs real-world deep learning. There are initial concerns about whether the failure case is realistic, which the authors address. Extensions to the multi-agent setting and a partial analysis are also addressed by the authors and partially satisfy the reviewers. Nevertheless, after reviews and discussion, the reviewers are mixed at the end of the discussion. + +The area chair finds, first, that the paper is much improved, and much more applicable in the updated form than in the original version, and indeed, the insights from the simple model may be informative for practice. However, the concerns raised about the distance between theory and practice are valid. The final opinion remains borderline. Authors are encouraged to address the highlighted technical concerns in any future submission of this work. In particular, the muti0agent setting should probably be central in the discussion of this work. More ambitious empirical evaluation showing that the theory translates to practice )even if there is a gap) would also help.",ICLR2022, +mLoVcgqenVf,1610040000000.0,1610470000000.0,1,0PtUPB9z6qK,0PtUPB9z6qK,Final Decision,Accept (Poster),"This paper introduces a generative model termed generalized energy-based model (GEBM). + +The goal is modelling complex distributions supported on low-dimensional manifolds, while offering more flexibility in refining the distribution of mass on those manifolds. The key idea is presented as parametrizing the base measure (called a generator in the paper) and the density with respect to this base measure separately. Figure 1 of the paper sketches the idea on a very clear toy example. + +The pros: +* Flexibility: Decomposing the full problem as learning the support and learning the density on this support +* Theoretical justification +* Introducing the KALE objective +* Comparative empirical results with GANs show the additional benefits. Empirically, the framework outperforms GAN with the same complexity. +* Clear written paper + +The lack of a comparison with GANs has been raised as a concern. The authors have satisfactorily answered key questions and +others raised during rebuttal and added several new references. They have also improved the narrative and included an additional experiment to contrast GEBM and GANs in response to AnonReviewer2, also provided more detail +on how the energy function (class) is chosen. +",ICLR2021, +HyeABcBxg4,1544740000000.0,1545350000000.0,1,BklMjsRqY7,BklMjsRqY7,Meta Review,Accept (Poster),"The authors present a theoretical and practical study on low-precision training of neural networks. They introduce the notion of variance retention ratio (VRR) that determines the accumulation bit-width for +precise tailoring of computation hardware. Empirically, the authors show that their theoretical result extends to practical implementation in three standard benchmarks. + +A criticism of the paper has been certain hyperparameters that a reviewer found to be chosen rather arbitrarily, but I think the reviewers do a reasonable job in rebutting it. + +Overall, there is consensus that the paper presents an interesting framework and does both practical and empirical analysis, and it should be accepted.",ICLR2019,4: The area chair is confident but not absolutely certain +r1lEchveeN,1544740000000.0,1545350000000.0,1,HygsfnR9Ym,HygsfnR9Ym,Novel take on model-based improvement on model-free RL,Accept (Poster),"The paper presents ""recall traces"", a model based approach designed to improve reinforcement learning in sparse reward settings. The approach learns a generative model of trajectories leading to high-reward states, and is subsequently used to augment the real experience collected by the agent. This novel take on combining model-based and model-free learning is conceptually well motivated and is empirically shown to improve sample efficiency on several benchmark tasks. + +The reviewers noted the following potential weaknesses in their initial reviews: the paper could provide a clearer motivation of why the proposed approach is expected to lead to performance improvements, and how it relates to learning (and uses of) a forward model. Details of the method, e.g., model parameterization is unclear, and the effect of hyperparameter choices is not fully evaluated. + +The authors provided detailed replies to all reviewer suggestions, and ran extensive new experiments, including experiments to address questions about hyperparameter settings, and an entirely new use of the proposed model in a learning from demonstration setting. The authors also clarified the paper as requested by the reviewers. The reviewers have not responded to the rebuttal, but in the AC's assessment their concerns have been adequately addressed. The reviewers have updated their scores in response to the rebuttal, and the consensus is to accept the paper. + +The AC notes that the authors seem unaware of related work by Oh et al. ""Self Imitation Learning"" which was published at ICML 2018. The paper is based on a similar conceptual motivation but imitates high-value traces directly, instead of using a generative model. The authors should include a discussion of how their paper relates to this earlier work in their camera ready version.",ICLR2019,4: The area chair is confident but not absolutely certain +RY4ue5r6Q6V,1642700000000.0,1642700000000.0,1,4lLyoISm9M,4lLyoISm9M,Paper Decision,Reject,"The reviewers were fairly consistent in agreeing that this is a reasonable paper with an interesting idea. However, the use-case is fairly narrow, as the main benefit is less intermediate storage (and only significant for very rectangular matrices) but compared to alternatives it require many passes over the data (usually 5 or so). So it's a narrow use-case and many of the comparisons are not apples-to-apples since the accuracy, time, space-complexity and number of passes differ from algorithm to algorithm. + +So while acknowledging the potential benefits of the method, there are downsides too, and thus a clear presentation is very essential. The reviewers mention that presentation (listing the algorithm, clear experiments) could be improved. + +On my own reading, I noted that the choice of SketchySVD as the dominant baseline is misleading. SketchySVD is for streaming data (more restrictive than single-pass) so this is an unfair comparison. The appendix does a better job of including other baselines (block Lanczos), though it mischaracterizes them (it says ""BlockLanczos requires persistence presence of the data matrix X in memory"", but this is not true, the method could easily be implemented in a matrix-free fashion). Another method to compare with is the single-pass algorithm randSVD in Yu et al., who show how to implement one of the Halko et al. 2011 2-pass methods in just one-pass. Other reviewers mention baseline algorithm issues too. I do acknowledge the improved accuracy of your method over all these baselines for some matrices, in terms of the Frobenius norm (or tail error); however, I'm not sure the differences in spectral norm are are great, and see Remark 2.1 in Martinsson and Tropp '20 for arguments about why Frobenius norm guarantees are often not as desirable as spectral norm guarantees. + +Another issue is related to the left vs right singular vectors. A reviewer noted: ""It is not fair to compare RangeNet with SketchSVD, RangeNet just produces the right singular vectors while SketchSVD produces both left and right singular vectors."" The authors respond ""Range-Net computes both left and right singular vectors but does not consume main memory to store left singular vectors at run-time"". However, if we allow another pass over the matrix to find the left singular vectors, this post-processing can be applied to *any* technique that approximates the singular values and right singular vectors, hence PCA methods are applicable, including deterministic methods like the ""Frequent Directions"" method (Ghashami et al. '16). + +In summary, this method is high-accuracy and low-memory, yet it also has downsides compared to other methods, and the paper could use some improvement. I don't think the paper is ready at this time for acceptance, but given the advantages of the method, I encourage the authors to make changes and resubmit an improved version to ICLR next year or other similar venue. + + +References: + +Yu, Gu, Li, Liu, Li, ""Single-Pass PCA of Large High-Dimensional Data"". IJCAI '17, https://doi.org/10.24963/ijcai.2017/468 + +Ghashami, Liberty, Phillips, Woodruff, ""Frequent directions: Simple and deterministic matrix sketching"". SIAM Journal on Computing. 2016;45(5):1762-92. + +Martinsson, Tropp. ""Randomized numerical linear algebra: Foundations and algorithms"". Acta Numerica. 2020 May;29:403-572.",ICLR2022, +S1xW__YVxE,1545010000000.0,1545350000000.0,1,SyeKf30cFQ,SyeKf30cFQ,Further iteration needed,Reject,"This paper studies the behavior of gradient descent on deep neural network architectures with spatial locality, under generic input data distributions, using a planted or ""teacher-student"" model. + +Whereas R1 was supportive of this work, R2 and R3 could not verify the main statements and the proofs due to a severe lack of clarity and mathematical rigor. The AC strongly aligns with the latter, and therefore recommends rejection at this time, encouraging the authors to address clarity and rigor issues and resubmit their work again. + +",ICLR2019,5: The area chair is absolutely certain +yHiPyn_D-9h,1610040000000.0,1610470000000.0,1,Xh5eMZVONGF,Xh5eMZVONGF,Final Decision,Accept (Poster),"The paper gives an extension of the transformer model that is suited to computing representations of source code. The main difference from transformers is that the model takes in a program's abstract syntax tree (AST) in addition to its sequence representation, and utilizes several pairwise distance measures between AST nodes in the self-attention operation. The model is evaluated on the task of code summarization for 5 different languages and shown to beat two state-of-the-art models. One interesting observation is that a model trained on data from all languages outperforms the monolingual version of the model. + +The reviewers generally liked the paper. The technical idea is simple, but the evaluation is substantial and makes a convincing case about setting a new state of the art. The observation about multilingual models is also interesting. While there were a few concerns, many of these were addressed in the authors' responses, and the ones that remain seem minor. Given this, I am recommending acceptance as a poster. Please incorporate the reviewers' comments in the final version. ",ICLR2021, +S1gBXyaSM,1517250000000.0,1517260000000.0,125,HyrCWeWCb,HyrCWeWCb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper adapts (Nachum et al 2017) to continuous control via TRPO. The work is incremental (not in the dirty sense of the word popular amongst researchers, but rather in the sense of ""building atop a closely related work""), nontrivial, and shows empirical promise. The reviewers would like more exploration of the sensitivity of the hyper-parameters.",ICLR2018, +BJRQEJTSz,1517250000000.0,1517260000000.0,325,HyRnez-RW,HyRnez-RW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),The authors did a good job addressing reviewer concerns and analyzing and testing their model on interesting datasets with convincing results.,ICLR2018, +Efq6imNGVhS,1642700000000.0,1642700000000.0,1,CLpxpXqqBV,CLpxpXqqBV,Paper Decision,Accept (Poster),"This paper proposes augmenting standard forward prediction techniques used for representation learning with backward prediction as well, termed ""learning via retracing"". The paper implements this idea in a Cycle-Consistency World Model (CCWM) and demonstrates that CCWM improves performance of a Dreamer agent across a number of Control Suite tasks. The paper also proposes a way to detect ""irreversible"" transitions and exclude them from the backwards prediction step. + +This paper generated mixed opinions, and the reviewers did not come to a consensus on whether it should be accepted or rejected. In particular, Reviewer VSAG maintained it should be accepted, while Reviewer NEVM maintained it should be rejected. The other reviewers did not reply; I thought the authors' responses to their questions were reasonable so I assume their concerns were addressed (additionally, I don't believe a comparison to PlayVirtual is a justifiable request per the [reviewing guidelines](https://iclr.cc/Conferences/2022/ReviewerGuide), as it is concurrent work). + +The reviewers generally agreed that the cycle-consistency idea proposed by the paper is interesting, well-motivated, and borne out by the experimental results. I agree with these points. The main weakness of the paper, brought up by multiple reviewers, was the justification/motivation for the method for irreversibility detection. The authors clarified in the rebuttal that the motivation comes from the idea that temporally-adjacent states with very different values will tend to be far apart in representation space. While I believe that is true, it's not at all clear to me that this necessarily *entails* irreversible transitions between those states. That said, my feeling is that this is approach is (1) not the main contribution of the paper and (2) empirically seems to work, based on the experiments, even if the motivation is unclear/unjustified. Therefore, I do not think the concern about irreversibility detection is grounds on its own for rejection. + +Overall, I find this is a sensible approach to better representation learning in MBRL. I recommend acceptance as a poster.",ICLR2022, +aqpFu_20Il,1576800000000.0,1576800000000.0,1,B1em8TVtPr,B1em8TVtPr,Paper Decision,Reject,"This paper proposes a new benchmark to evaluate natural language processing models on discourse-related tasks based on existing datasets that are not available in other benchmarks (SentEval/GLUE/SuperGLUE). The authors also provide a set of baselines based on BERT, ELMo, and others; and estimates of human performance for some tasks. + +I think this has the potential to be a valuable resource to the research community, but I am not sure that it is the best fit for a conference such as ICLR. R3 also raises a valid concern regarding the performance of fine-tuned BERT that are comparable to human estimates on half of the tasks (3 out of 5), which slightly weakens the main motivation of having this new benchmark. + +My main suggestion to the authors is to have a very solid motivation for the new benchmark, including the reason of inclusion for each of the tasks. I believe that this is important to encourage the community to adopt it. For something like this, it would be nice (although not necessary) to have a clean website for submission as well. I believe that someone who proposes a new benchmark needs to do as best as they can to make it easy for other people to use it. + +Due to the above issues and space constraint, I recommend to reject the paper.",ICLR2020, +q0mm_QkSDIW,1610110000000.0,1610470000000.0,1,Ek7qrYhJMbn,Ek7qrYhJMbn,Final Decision,Reject,"The paper studies federated learning in what they call ```'single-sided trust' scenario, i.e. there is no dedicated server and the trust relationship is asymmetric. + +This paper was a trickier case to decide on, and more borderline, in our opinion, than the reviewers' scores suggest, primarily, because the reviewers' recommendations are based on more subjective notions of novelty and importance/appropriateness of the studied setting, rather than identifying specific flaws in theoretical analysis or experiments. Ultimately, it boils down to three reviewers being (rather) negative about the paper, and the only (very) positive reviewer not stepping in to champion it. The negative reviewers believed that the novelty is not substantial enough to meet the high ICLR acceptance bar (see details in reviews by R1 and R2 re similarity to ) and also have questioned the general motivation (R1) and/or the online learning setting (R3). While this assessment may be too harsh (esp. R1) - I think that the paper has merit - I share their feeling that in its current form it does not have a strong enough contribution. + + +",ICLR2021, +BZuJlWXyND,1576800000000.0,1576800000000.0,1,r1l1myStwr,r1l1myStwr,Paper Decision,Reject,"In this paper the authors view meta-learning under a general, less studied viewpoint, which does not make the typical assumption that task segmentation is provided. In this context, change-point analysis is used as a tool to complement meta-learning in this expanded domain. + +The expansion of meta-learning in this more general and often more practical context is significant and the paper is generally well written. However, considering this particular (non)segmentation setting is not an entirely novel idea; for example the reviewers have already pointed out [1] (which the authors agreed to discuss), but also [2] is another relevant work. The authors are highly encouraged to incorporate results, or at least a discussion, with respect to at least [2]. It seems likely that inferring boundaries could be more powerful, but it is important to better motivate this for a final paper. + +Moreover, the paper could be strengthened by significantly expanding the discussion about practical usefulness of the approach. R3 provides a suggestion towards this direction, that is, to explore the performance in a situation where task segmentation is truly unavailable. + +[1] Rahaf et el. ""Task-Free Continual Learning"". +[2] Riemer et al. ""Learning to learn without forgetting by maximizing transfer and minimizing interference"". + +",ICLR2020, +XUUfa-v9m_2,1610040000000.0,1610470000000.0,1,o20_NVA92tK,o20_NVA92tK,Final Decision,Reject,"The authors propose a new dataset to evaluate the robustness of image classifiers. The dataset consists of data from three sources: a crowdsourced dataset collected by the authors called ImageNet-Renditions, images from Google street view, and data sampled from DeepFashion2. This new dataset allows the authors to test robustness to different renditions of an object (e.g. artistic depictions of an object category) and robustness to changes in geography and camera type. In addition, they propose a new augmentation strategy called DeepAugment which consists of an encoder/decoder style network that transforms the appearance of the input image by simply applying different random perturbations of the weights of the augment network. Robustness results are presented on the previously described datasets where the proposed augmentation strategy in combination with an existing approach (AugMix) performs best in some cases. However, the results are not convincing and AugMix often outperforms the new method. + +In general, the authors did a good job addressing many of the comments (e.g. they provided more detail about how ImageNet-R was collected), but there were still several lingering concerns. R4 was the most positive about the paper, but unfortunately was one of the least vocal during the discussion. R1 was concerned that the paper did not do a great job of defining what was meant by robustness. This AC doesn't agree fully with their concerns, but does agree that more care could have been taken to position the paper better in light of the existing datasets that are already available (see R1’s comments). As the reviewers and authors note, collecting new datasets is a lot of work so care should be taken to ensure that this is not duplicate effort. The authors addressed these concerns in their response to some extent, but more discussion is needed in the paper. + +There was a lot of discussion between the authors and reviewers about this paper. The new dataset has a lot of merit, but there is some concern that the paper does not do a great job of clearly presenting its findings and conclusions. In addition, the proposed augmentation technique is slightly underwhelming performance wise and not very clearly described in the main paper. R2 sums up the opinion of this AC: “I think this work is interesting and is in principle asking the right questions. However, the analysis and conclusions currently do not provide robust and generalizable insights that advance the field.” There is clearly a lot of promise here, and the current recommendation is a weak reject. The authors are strongly encouraged to take the detailed feedback they have received on board and to revise the paper to further improve it for a future submission. +",ICLR2021, +_-wzDAie7v,1576800000000.0,1576800000000.0,1,SJgndT4KwB,SJgndT4KwB,Paper Decision,Accept (Spotlight),"This paper aims to study the mean and variance of the neural tangent kernel (NTK) in a randomly initialized ReLU network. The purpose is to understand the regime where the width and depth go to infinity together with a fixed ratio. The paper does not have a lot of numerical experiments to test the mathematical conclusions. In the discussion the reviewers concurred that the paper is interesting and has nice results but raised important points regarding the fact that only the diagonal elements are studied. This I think is the major limitation of this paper. Another issue raised was lack of experimental work validating the theory. Despite the limitations discussed above, overall I think this is an interesting and important area as it sheds light on how to move beyond the NTK regime. I also think studying this limit is very important to better understanding of neural network training. I recommend acceptance to ICLR.",ICLR2020, +CmfcsaL9el_,1642700000000.0,1642700000000.0,1,e6MWIbNeW1,e6MWIbNeW1,Paper Decision,Reject,"This paper considers an important problem, graph partitioning, from a transductive viewpoint: assuming that the graphs are generated by independent draws from an unknown distribution, learn some parameters in an ``offline” phase, and use these in the ``online” phase (much as in PAC learning). The authors have also answered many of the reviewer questions. In particular, the comparison with existing work is substantial. + +While I laud the positives of this work and the importance of the transductive approach, I see an issue: as a reviewer points out and as agreed by the authors, the paper does not provide a theoretical guarantee of the quality of the generalization to unseen graphs. It would have been useful, e.g., to consider this on Erdos-Renyi G(n,p) models, stochastic block models etc.",ICLR2022, +B1bmHkpHz,1517250000000.0,1517260000000.0,528,rkhCSO4T-,rkhCSO4T-,ICLR 2018 Conference Acceptance Decision,Reject,Thank you for submitting you paper to ICLR. ICLR. The consensus from the reviewers is that this is not quite ready for publication.,ICLR2018, +ifU_ej8I1wK,1610040000000.0,1610470000000.0,1,ZD7Ll4pAw7C,ZD7Ll4pAw7C,Final Decision,Reject,"The paper works towards analysis to understand the difference -- and primarily the lack thereof -- between different pruning methods. The central observation is that the convolutional filters in a layer are not strongly correlated and -- if the weights of the layer are taken as a matrix -- then the covariance matrix is block diagonal. + +Extending this objective the regime of a large number of filters, then the matrix is approximately diagonal and all weights are -- approximately Gaussian and i.i.d. The point of this analysis is that under this assumption, norm-based metrics, particluarly $\ell_1$ and $\ell_2$, behave quite similarly. + +The pros of this paper are the extensive evaluation and -- after revisions -- relatively clear text. The core analysis is nice to have elaborated in detail in the community. + +The primary con of this paper is, as the reviewers point out, that there are limited conclusions to take away from this work. Specifically, a plausible default hypothesis is that different pruning criteria result in different pruning decisions. From the results in this paper, that still seems to hold with -- exception of the norm-based metrics. So, while this work does demonstrate that these norm-based metrics are relatively similar -- a nice clarification to see in the community -- the work offers limited comment on the broader space of pruning metrics. + +My recommendation is Reject. Despite the strong empirical evaluation, the ultimate results offer limited clarification on the similarity of pruning metrics. +",ICLR2021, +jkUEJEVDJx,1642700000000.0,1642700000000.0,1,TvMrYbWpa7,TvMrYbWpa7,Paper Decision,Reject,"Overall, this paper receives negative reviews due to limited technical novelty and contributions. The authors' rebuttal does not address all the raised concerns. As such, the area chair agrees with the reviewers and does not recommend it be accepted at this conference",ICLR2022, +OOXxnJSnER,1576800000000.0,1576800000000.0,1,Byg-wJSYDS,Byg-wJSYDS,Paper Decision,Accept (Poster),"This paper tackles an interesting problem: ""How should we evaluate models when the test data contains noisy labels?"". This is a particularly relevant question in the medical imaging domain where expert annotators often disagree with each other. The paper proposes a new metric ""discrepancy ratio"" which computes the ratio how often the model disagrees with humans to how often humans disagree with each other. The paper shows that under certain noise models for the human annotations the discrepancy ratio can exactly determine when a model is more accurate than humans, whereas commonly used baselines such as comparing with the majority vote do not have this property. Reviewers were satisfied with the author rebuttal, particularly with the clarification that the goal of the metric is to accurately determine when model performance exceeds that of human annotators, and not to better rank models. The metric should be quite useful, assuming users are cautious of the limitations described by the authors.",ICLR2020, +PjKsynVQ504,1610040000000.0,1610470000000.0,1,aDjoksTpXOP,aDjoksTpXOP,Final Decision,Accept (Poster),"This paper analyzes the expressive power of NTK corresponding to deep neural network. It is shown that the depth hardly affects the behavior of the spectrum of the corresponding integral operator, which indicates that depth separation does not occur as long as NTK is considered. +The analysis is novel and gives a significant insight to the NTK research literature. The theoretical framework considered in this paper is considerably broad and potentially can be applied to several types of activation functions (while only ReLU is analyzed as a concrete example in the paper). Moreover, some numerical experiments are conducted that support the validity of the theoretical analysis. +All reviewers are positive on this paper. I agree with their evaluations. For these reasons, I think this paper is worth acceptance.",ICLR2021, +V_LKT32Z3xV,1610040000000.0,1610470000000.0,1,04ArenGOz3,04ArenGOz3,Final Decision,Accept (Poster),"The paper proposes to predict sets using conditional density +estimates. The conditional densities of the reponse set given the +observed features is modeled through an energy based function. The +energy function can be specified using tailored neural nets like deep +sets and is trained trough approximate negative log likelihoods using +sampling. + +The paper was nice to read and was liked by all the reviewers. The one +thing that stood out to me was the emphasis on multi-modality. (multi +appears 51 times). This could be toned down because little is said +about the quality relative to the true p(Y | x) and the focus is +mainly on the lack of this in existing work.",ICLR2021, +DqhyHkUcRLj,1642700000000.0,1642700000000.0,1,HL_qE4fz-JZ,HL_qE4fz-JZ,Paper Decision,Reject,"Three reviewers recommend Reject. Two reviewers recommend Accept although do not champion the paper. I believe the paper develops an interesting idea to better estimate the location of the inducing inputs in sparse GP models. However, I still think the paper would benefit from another careful revision and therefore I do not recommend Acceptance at this stage. I agree with reviewers that 1) currently the method is unable to estimate the covariance between two data points. This is important in applications of GPs for uncertainty quantification such as Bayesian optimisation. For example, including a BayesOpt example would clearly strengthen the paper. 2) empirical evaluation lacks simple baselines, e.g. Titsias (2009). The authors claim that Titsias (2009) does not scale and that's why they don't care for it. Even if this is true, including an example that helps to better compare against this method at a different scale might strengthen the model proposed here.",ICLR2022, +SJglKRzrg4,1545050000000.0,1545350000000.0,1,HklnzhR9YQ,HklnzhR9YQ,"Interesting transformation of block-sparse fully connected net to ResNet Convolutional blocks, yet the ResNet architecture seems unrealistic and indirect. ",Reject,"The paper presents an interesting treatment of transforming a block-sparse fully connected neural networks to a ResNet-type Convolutional Network. Equipped with recent development on approximations of function classes (Barron, Holder) via block-sparse fully connected networks in the optimal rates, this enables us to show the equivalent power of ResNet Convolutional Nets.  + +The major weakness in this treatment lies in that the ResNet architecture for realizing the block-sparse fully connected nets is unrealistic. It originates from the recent developments in approximation theory that transforming a fully connected net into a convolutional net via Toeplitz matrix (operator) factorizations. However the convolutional nets or ResNets obtained in this way is different to what have been used successfully in applications. Some special properties associated with convolutions, e.g. translation invariance and local deformation stability, are not natural in original fully connected nets and might be indirect after such a treatment. + +The presentation of the paper is better polished further. Based on ratings of reviewers, the current version of the paper is on borderline lean reject.",ICLR2019,4: The area chair is confident but not absolutely certain +Hk-ssfI_l,1486400000000.0,1486400000000.0,1,rJXTf9Bxg,rJXTf9Bxg,ICLR committee final decision,Reject,"Ratings summary: + 3: Clear rejection + 6: Marginally above acceptance threshold + 6: Marginally above acceptance threshold + + Clear easy to read paper focusing on generating higher quality higher resolution (128x128) pixel imagery with GANs. There were broad concerns however across reviewers that the work is lacking in clearly identifiable novelty. The authorÕs point to a list of novel elements of the work in their rebuttal. However, the most negative reviewer also has issues with the evaluation metrics used. + + Thus, unfortunately, the PCs believe that this work isn't ready to appear at the conference.",ICLR2017, +MCjH9hMEbB,1576800000000.0,1576800000000.0,1,HJe7unNFDH,HJe7unNFDH,Paper Decision,Reject,"This paper presents a NAS method that avoids having to retrain models from scratch and targets a range of model sizes at once. The work builds on Yu & Huang (2019) and studies a combination of many different techniques. +Several baselines use a weaker training method, and no code is made available, raising doubts concerning reproducibility. + +The reviewers asked various questions, but for several of these questions (e.g., running experiments on MNIST and CIFAR) the authors did not answer satisfactorily. Therefore, the reviewer asking these questions also refuses to change his/her rating. + +Overall, as AnonReviewer #1 points out, the paper is very empirical. This is not necessarily a bad thing if the experiments yield a lot of insight, but this insight also appears limited. Therefore, I agree with the reviewers and recommend rejection.",ICLR2020, +AwBH-h5xE,1576800000000.0,1576800000000.0,1,BkxfaTVFwH,BkxfaTVFwH,Paper Decision,Accept (Poster),"This paper offers a new method for scene generation. While there is some debate on the semantics of ‘generative’ and ‘3d’, on balance the reviewers were positive and more so after rebuttal. I concur with their view that this paper deserves to be accepted.",ICLR2020, +njKuR8QCLyV,1642700000000.0,1642700000000.0,1,M2sNIiCC6C,M2sNIiCC6C,Paper Decision,Reject,"This paper focuses on unsupervised image denoising and proposes a method to do so. It shows that using a designed operator based on domain knowledge can help improve unsupervised image denoising. The authors also provide experimental results demonstrating that the proposed methods outperform existing unsupervised denoising and behave similar in performance to supervised methods. The reviewers liked the improvements but (1) limited novelty/simple extension of noise2self, (2) example not convincing, (3) lack of clarity in 2.3, (4) a variety of other technical concerns. The authors partially addressed these concerns. However, I concur with the reviewers that the paper still requires more work and is not ready for publication in its current form.",ICLR2022, +aWWGt6XPEfx,1642700000000.0,1642700000000.0,1,1NUsBU-7HAL,1NUsBU-7HAL,Paper Decision,Accept (Poster),"This paper presents a hierarchical Bayesian approach to exploration in grid worlds. The paper considers the hypothesis that humans maintain a hierarchical representation when exploring a space, where the distribution over unknown space can be modeled with a structured probabilistic program. The paper compares the behavior of people during exploration tasks to the behavior of a Bayesian model under different distributional approximations. The results indicate that people can behave similarly to a sophisticated Bayesian model on small grid world domains. + +The reviews highlighted several concerns about the paper. One initial concern was that the experimental domain is too simple and small compared to real world environments encountered by robots or humans. However, this work is similar in scope to other exploration work in reinforcement learning and psychology studies, where tiny grid worlds are still commonly used to gather insight. Thus, this concern does not reduce the potential contribution of the paper. The reviewers raised several other concerns about the work that were largely addressed by the author response. The remaining reviewer concerns centered on the limited strength of the evidence in the experiments, but the reviewers expect the paper will still be of interest to the broader research community. + +Four reviewers indicate to accept this paper for its contribution of a study into the use of probabilistic program induction to infer possible completions of maps in small environments. The paper is therefore accepted.",ICLR2022, +5p20fBDe_z4,1642700000000.0,1642700000000.0,1,WE4qe9xlnQw,WE4qe9xlnQw,Paper Decision,Accept (Poster),"The paper proposes an approach to constructing steerable equivariant CNNs over arbitrary subgroups of E(3), by generalizing the Wigner-Eckart theorem for steerable kernels in Lang & Weiler (2020). The intuitive idea is to use a steerable basis for a large group like O(3) to build a basis for a subgroup of interest like SO(3). Reviewers were generally happy with the author response, finding the paper makes a good contribution to steerable network design, with theoretically interesting ideas. However, there were still questions after the rebuttal about the practical utility of the approach, such as the relevance of subgroups of O(3). Reviewers also felt that much of the material was not written in an accessible way, such that it could only be appreciated by experts working on group equivariant CNNs. + +In a final version, the authors should try to make a much clearer case for practical relevance, introduce the key concepts assuming less prior knowledge, and present the material in a way that makes the high-level story more clear, as detailed by reviewers.",ICLR2022, +xSAe703dTs,1610040000000.0,1610470000000.0,1,bgQek2O63w,bgQek2O63w,Final Decision,Accept (Poster),"The paper considers the use of adversarial self-supervised learning to render robust data representations for various tasks, in particular to integrate the Bootstrap Your Own Robust Latents (BYOL) with adversarial training, where a small amount of labeled data is available together with a sizable unlabeled dataset. Especially the low-data regime is of interest. It extends a previous method with a new adversarial augmentation technique, it is compared against several methods, and the robust representations are shown to be useful more generally. There were some confusing presentations and questions that were resolved in a detailed discussion with the reviewers.",ICLR2021, +H1xk21fgeN,1544720000000.0,1545350000000.0,1,rylKB3A9Fm,rylKB3A9Fm,"Important topic, poorly motivated benchmark",Reject,"The manuscript proposes benchmarks for studying generalization in reinforcement learning, primarily through the alteration of the environment parameters of standard tasks such as Mountain Car and Half Cheetah. In contrast with methodological innovations where a numerical argument can often be made for the new method's performance on well-understood tasks, a paper introducing a new benchmark must be held to a high standard in terms of the usefulness of the benchmark in studying the phenomenon under consideration. + +Reviewers commended the quality of writing and considered the experiments given the set of tasks to be thorough, but there were serious concerns from several reviewers regarding how well-motivated this benchmark is and restrictions viewed as artificial (no training at test-time), concerns which the updated manuscript has failed to address. I therefore recommend rejection at this stage, and urge the authors to carefully consider the desiderata for a generalization benchmark and why their current proposed set of tasks satisfies (or doesn't satisfy) those desiderata.",ICLR2019,4: The area chair is confident but not absolutely certain +G0zPRYyrFUe,1642700000000.0,1642700000000.0,1,gEynpztqZug,gEynpztqZug,Paper Decision,Reject,"This submission proposes ""Mako"", which enables continual learning when only a limited amount of labeled data is available (along with a good deal of unlabeled data). Reviewers shared concerns about difficulty in understanding which components of the proposed system were novel, especially given that the most important components seemed to be proposed in past work. Reviewers also had difficulty getting insight on which parts of the system were most useful, and further requested additional experiments on harder benchmarks. There consensus was therefore to reject the paper.",ICLR2022, +SJesiGugx4,1544750000000.0,1545350000000.0,1,BkzeUiRcY7,BkzeUiRcY7,Deep reinforcement learning for principle agent problems,Accept (Poster),"The paper addresses a variant of multi-agent reinforcement learning that aligns well with real-world applications - it considers the case where agents may have individual, diverging preferences. The proposed approach trains a ""manager"" agent which coordinates the self-interested worker agents by assigning them appropriate tasks and rewarding successful task completion (through contract generation). The approach is empirically validated on two grid-world domains: resource collection and crafting. The reviewers point out that this formulation is closely related to the principle-agent problem known in the economics literature, and see a key contribution of the paper in bringing this type of problem into the deep RL space. + +The reviewers noted several potential weaknesses: They asked to clarify the relation to prior work, especially on the principle-agents work done in other areas, as well as connections to real world applications. In this context, they also noted that the significance of the contribution was unclear. Several modeling choices were questioned, including the choice of using rule-based agents for the empirical results presented in the main paper, and the need for using deep learning for contract generation. They asked the authors to provide additional details regarding scalability and sample complexity of the approach. + +The authors carefully addressed the reviewer concerns, and the reviewers have indicated that they are satisfied with the response and updates to the paper. The consensus is to accept the paper. + +The AC is particularly pleased to see that the authors plan to open source their code so that experiments can be replicated, and encourages them to do so in a timely manner. The AC also notes that the figures in the paper are very small, and often not readable in print - please increase figure and font sizes in the camera ready version to ensure the paper is legible when printed.",ICLR2019,4: The area chair is confident but not absolutely certain +KyWUSdnj2,1576800000000.0,1576800000000.0,1,ryxQuANKPB,ryxQuANKPB,Paper Decision,Accept (Poster),"This work proposes use of two pre-trained FST models to explicitly incorporate semantic and strategic/tactic information from dialog history into non-collaborative (negotiation) dialog systems. Experiments on two datasets from prior work show the advantage of this model in automated and human evaluation. While all reviewers found the work interesting, they made many suggestions regarding the presentation. Author'(s) rebuttal included explanations and changes to the presentation. Hence, I suggest acceptance as a poster presentation.",ICLR2020, +fDBtuu4a77m,1642700000000.0,1642700000000.0,1,Hg7xLoENqHW,Hg7xLoENqHW,Paper Decision,Reject,"This work introduces/applies the mirror descent optimization technique to adversarial inverse reinforcement learning (AIRL). As a result, the proposed algorithm (MD-AIRL) incrementally learns a parameterized reward function in an associated reward space. The two issues of standard adversarial imitation learning algorithms are 1) current ""divergence""-based updates may not lead to updates that better match the expert (due to geometry) 2) ""divergence""-based updates may suffer when only small number of demonstrations are provided. Thus the goal of this work is to (presumably) to ""robustify"" the learning of reward function especially by addressing these issues. The proposed algorithm is evaluated on a bandits problem, a multi-goal toy example and standard mujoco benchmark. + +**Strengths** +This work attempts to address the important problem of understanding and improving the updates of IRL algorithms +A theoretical analysis is provided + +**Weaknesses** +The major concern is clarity of the manuscript. Even after updating clarity remains a concern +While a lot of experiments were performed, evaluation is not entirely convincing. One reason for this that it is hard to tie the results back to the original motivation/claims of this algorithm. As one reviewer notes ""it's unclear how the new algorithm affects reward functions"". Furthermore, reviewers find the experimental results not entirely convincing + +**Summary** +After rebuttal and revision, the clarity and experimental analysis remain a concern. My recommendation is that the authors are encouraged to take the reviewers feedback and improve the manuscript. In its current form it's not quite ready yet for publication.",ICLR2022, +vg3fITfom58,1642700000000.0,1642700000000.0,1,tvwNdOKhuF5,tvwNdOKhuF5,Paper Decision,Reject,"The authors propose a method for training agents in FPS games, and achieve good results in a VizDoom setting. The method combines a number of different components and ideas, and it is not clear which of these are crucial to the success. In particular, ablations of the method are missing, as well as more runs to test variability and diversity. In addition, the paper is not all that easy to read. Reviewers had a number of partly overlapping concerns, of which I've tried to distil the main ones above. While the empirical results are promising, it is clear that much more work is needed to distil this method into generalizable knowledge.",ICLR2022, +1s-hFko5Ra,1642700000000.0,1642700000000.0,1,bq7smM1OJIX,bq7smM1OJIX,Paper Decision,Reject,"This paper proposes to use longstanding statistical learning techniques to identify the nationality of the author of a text. + +Reviewers agreed that this work is a poor fit for ICLR, as there is nothing here that advances our understanding of representation learning. Reviewers were further concerned about the soundness of the claims, raising issues about data contamination and comparison with prior work. + +Finally, reviewers pointed out (correctly in my view) that work that aims to infer protected identity characteristics of non-user human subjects should be held to an especially high ethical standard, and needs a highly persuasive cost-benefit analysis that defends why the problem is ethical to study at all. The available discussion of ethics is not up to this standard.",ICLR2022, +lY7XbwIVzS5,1642700000000.0,1642700000000.0,1,CSw5zgTjXyb,CSw5zgTjXyb,Paper Decision,Reject,"The paper proposes a model of agent collaboration to improve outcomes for any participating agent in a setting where every agent does not always benefit from collaborating with all other agents. The reviewers did find some of the theoretical results interesting, however, in its current (revised) form, they still argued during the discussion post-rebuttal that: (i) the game theoretic formulation of this problem is not entirely new and has been studied in various forms before and (ii) the particular application of the results to federated learning comes after making various (questionable) assumptions. I would encourage the authors to take into account (i-ii) for preparing a revised version of their paper and resubmit to another conference.",ICLR2022, +gslGBY-rE5,1610040000000.0,1610470000000.0,1,6VhmvP7XZue,6VhmvP7XZue,Final Decision,Reject,"This work addresses a novel and important real-world setting for semi-supervised learning – the open-world problem where unlabeled data may contain novel classes that are not seen in labeled data. The paper provides an approach by combining three loss functions: a supervised cross-entropy loss, a pairwise cross-entropy loss with adaptive uncertainty margin, and a regularization towards uniform distribution. + +The authors were responsive to reviewers’ comments and have respectively improved their paper by adding experiments, including an ablation study of each component of the objective function, study of the effect regularization on unbalanced class distributions, reporting accuracy on pseudo-labels. While two reviewers have slightly increased their scores, some concerns still remain. + +This is a borderline paper, and after some discussion and calibration, we decided that the work in its current form does not quite meet the bar for acceptance.",ICLR2021, +SqV2TQO2rZ6,1642700000000.0,1642700000000.0,1,v-f7ifhKYps,v-f7ifhKYps,Paper Decision,Reject,"This paper has several issues: +(1) The empirical results were incomplete and hard to interpret. +1.a The paper uses non-standard benchmark domains making comparisons with results in the literature very difficult. The paper does not use the same environments as related baselines they build on. Some progress on this last point was made during the discussion period---well done authors! +1.b The experiments did not sweep key hyperparameters of the TrajeDi baseline, and generally did not comment on nor ablate several other potentially important hyperparameters and design choices +(2) the proposed method is very similar to another called TrajeDi and the paper and results don't clearly show why the modification of TrajeDi is significant (see #1). The paper initially claimed the TrajeDi was a concurrent submission but one reviewer pointed out the work was published in May +(3) writing and structure could be improved. In addition some inaccurate statements could be cleaned up +(4) The algorithm is more generally applicable beyond human-AI coordination and the reviewers found it odd the paper did not focus on this + +In addition, the authors did not respond to several of the reviewers responses. This made it difficult for the reviewers to increase their scores. Several reviewers found the work intriguing, but its not ready yet",ICLR2022, +iu_-nMw_VTd,1610040000000.0,1610470000000.0,1,To4Wy2NEM2,To4Wy2NEM2,Final Decision,Reject,"The paper presents a generic way to add group sparsity based regularizers to a family of adaptive optimizers leading to generalizations of many popular optimizers ADAM, ADAGRAD etc to their group versions. Overall the reviewers appreciated the algorithmic contribution and its genericness in terms of application to most known adaptive optimizers. While the paper's revision during the rebuttal phase satisfied some reviewer concerns regarding the experimental baselines and the precise experimental methodology, reviewers continued to have concerns regarding the experiments performed - the potential lack of fine tuning post pruning, the use of s_t tilde as opposed to s_t in the practical algorithms amongst others listed in the review. Overall, the reviewers deemed the theoretical contribution of the paper not significant enough in terms of novelty and the decision hinged on the efficacy of the experimental evaluation - the lingering concerns for which led to the decision. +",ICLR2021, +BJxN9d0gg4,1544770000000.0,1545350000000.0,1,B1g29oAqtm,B1g29oAqtm,Contribution narrower than current framing implies and should be enhanced and situated,Reject,"The issue of when model based methods can be used successfully in RL is an interesting one. However, the reviewers had a number of concerns about the significance and framing of this work with respect to the related literature. In addition, the abstract and title suggest a very generic contribution will be made, whereas the actual contribution is to a much more specific subclass. Some relevant papers (and their related efforts) include the following. + +The Dependence of Effective Planning Horizon on Model Accuracy. +(AAMAS-15, best paper award) Nan Jiang, Alex Kulesza, Satinder Singh, Richard Lewis. + +Self-Correcting Models for Model-Based Reinforcement Learning. Erik Talvitie. In 'Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI).' 2017. +",ICLR2019,4: The area chair is confident but not absolutely certain +Bk6UnfIde,1486400000000.0,1487080000000.0,1,H1acq85gx,H1acq85gx,ICLR committee final decision,Accept (Poster),"This paper is an interesting application to maximum entropy problems, which are not widely considered in deep learning. The reviewers have concerns over the novelty of this method and the ultimate applicability and scope of these methods. But this method should be of interest, especially in connecting deep learning to the wider community on information theory and should make an interesting contribution to this year's proceedings.",ICLR2017, +dp2aT4dymTj,1642700000000.0,1642700000000.0,1,XOh5x-vxsrV,XOh5x-vxsrV,Paper Decision,Accept (Poster),"Meta Review of Cross-Trajectory Representation Learning for Zero-Shot Generalization in RL + +This work investigates a zero-shot generalization method for RL based on an online-clustering adapting it to RL. The intuition of this approach (called Cross Trajectory Representation Learning, CTRL) is that the self-supervised objective used will encourage the encoder to map behaviorally similar observations to similar representations without the use of reward signals. The authors performed experiments on the 16 procgen tasks, and compared it against several baselines (DBC, PSE, CURL, Proto-RL and DIAYN). The performance is generally better against baselines, but what I like about it is that a new approach to achieve such performance is proposed. + +The scores were generally good (6, 6, 6, 3), and the 6's are overall positive with the work (both in the writing, breadth of experiments). Reviewer Kekc, who gave a score of 3, maintained their score, despite acknowledging the authors' responses. The main outstanding issue from Kekc is that they believe the paper should stick with the original 25M step protocol (with a larger training budget), rather than 8M steps. If that's the main issue for this paper to not be accepted into ICLR 2022, I feel this can be adequately addressed for the camera ready version. (Please note that while I disagree with the final score of 3 that Kekc gave, I find their review to be highly informative and useful, and would like to acknowledge Kekc for their input and discussion). + +Based on the discussion and the reviews, and with the context behind the score of 3, I would like to be on the side of recommending this paper for acceptance, and urge strongly for the authors to conduct the 25M experiments as reviewer Kekc suggested (as Kekc also noted, the training curves are still going up, so just train them for a longer time). Even if the final results are not as good as the 8M, that's fine, just include them in the final camera ready version, since I believe this work to meet the bar, offers good insights into RL generalization, has a good breadth of experiments and baselines, and will be of great interest to the broader RL community.",ICLR2022, +HyVHVy6Bz,1517250000000.0,1517260000000.0,341,r1hsJCe0Z,r1hsJCe0Z,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"To summarize the pros and cons: + +Pro: +* Interesting application +* Impressive results on a difficult task +* Nice discussion of results and informative examples +* Clear presentation, easy to read. + +Con: +* The method appears to be highly specialized to the four bug types. It is not clear how generalizable it will be to more complex bugs, and to the real application scenarios where we are dealing with open world classification and there is not fixed set of possible bugs. + +There were additional reviewer complaints that comparison to the simple seq-to-seq baseline may not be fair, but I believe that these have been addressed appropriately by the author's response noting that all other reasonable baselines require test cases, which is an extra data requirement that is not available in many real-world applications of interest. + +This paper is somewhat on the borderline, and given the competitive nature of a top conference like ICLR I feel that it does not quite make the cut. It is definitely a good candidate for presentation at the workshop however.",ICLR2018, +BkVShfIux,1486400000000.0,1486400000000.0,1,rJg_1L5gg,rJg_1L5gg,ICLR committee final decision,Reject,"This is an empirical paper which compares three different instantiations of a kind of incremental/curriculum learning for sequences. + + The reviews from R1 and R3 (which gave confidence scores of 4) were negative. The main concerns addressed by the reviewers: + * Paper is too long -- 17 pages -- and length is due to experiments (e.g. transfer learning) which are tangential to the main message of the paper (R3, R1) + * Lack of novelty (R3) + * Tests only on single, synthetic, small dataset and questioning the claim that this new synthetic dataset is helpful to the community (R3, R1) + + However, R3 and R1 both pointed out that they found the ablation studies interesting. R4 (who gave a confidence score of 3) was gave a more positive score but also expressed similar concerns with R1 & R3 (page length, similarity to existing work, dataset too specific and not necessarily justified). + + The author argued for the novelty of the paper, agreed to reduce the paper length and also argued that the data was indeed helpful (giving a specific case of another researcher who was extending the data). The author also provided a ""twitter trail"" countering the argument that the dataset was created for the sole purpose of showing that the method works. + + After engaging the reviewers in discussion, R4 admitted they were originally too generous with their score and downgraded to 5. The AC has decided that, while the paper has merits as acknowledged by the reviewers, it's not strong enough for acceptance in its present form. The AC encourages the author to work on an improved version (perhaps with experiments on an additional real dataset) and organize it with the audience in mind.",ICLR2017, +zFQ1wdCwEQo,1642700000000.0,1642700000000.0,1,_BNiN4IjC5,_BNiN4IjC5,Paper Decision,Accept (Poster),"This paper suggests using a conditional prior in conditional diffusion-based generative models. Typically, only the score function estimator is provided with the conditioning signal, and the prior is an unconditional standard Gaussian distribution. It is shown that making the prior conditional improves results on speech generation tasks. + +Several reviewers initially recommended rejection, but after extensive discussion and interaction with the authors, all reviewers have given this work a ""borderline accept"" rating. + +Criticisms included that the idea is too simple or obvious to warrant an ICLR paper. I am inclined to disagree: simple ideas that work are often the ones that persist and see rapid adoption (dropout regularisation is my favourite example). Like the authors, I believe simplicity is an advantage in this respect, rather than a disadvantage. Of course, simple ideas do require extensive and convincing empirical validation to be worth publishing at ICLR. After the authors' updates, I believe the work meets this bar. + +Another issue raised by several reviewers is the limited theoretical justification for the approach. However, combined with the simplicity of the method, I believe the empirical results of the revised version sufficiently justify the approach on their own. Nevertheless, I would recommend that the authors consider further how they could address this issue in the final version of their manuscript, as they have already begun to do during the discussion phase. + +Another way to strengthen the paper further would be to demonstrate how the generic approach can be applied in a different domain (e.g. conditional image generation), but I do not consider this addition necessary for the work to warrant publication. + +In light of this, I am recommending acceptance.",ICLR2022, +mtJtiJNB_u,1576800000000.0,1576800000000.0,1,rkg3kRNKvH,rkg3kRNKvH,Paper Decision,Reject,"This paper presents an analysis of the kind of knowledge captured by pre-trained word embeddings. The authors show various kinds of properties like relation between entities and their description, mapping high-level commands to discrete commands etc. The problem with the paper is that almost all of the properties shown in this work has already been established in existing literature. In fact, the methods presented here are the baseline algorithms to the identification of different properties presented in the paper. + +The term common-sense which is used often in the paper is mischaracterized. In NLP literature, common-sense is something that is implicitly understood by humans but which is not really captured by language. For example, going to a movie means you need parking is something that is well-understood by humans but is not implied by the language of going to the movie. The phenomenon described by the authors is general language processing. + +Towards the end the evaluation criteria for embedding proposed is also a well-established concept, its just that these metrics are not part of the training mechanism as yet. So if the contribution was on showing how those metrics can be integrated in training the embeddings, that would be a great contribution. + +I agree with the reviewer's critics and recommend a rejection as of now.",ICLR2020, +B1hhofIdl,1486400000000.0,1486400000000.0,1,Sk36NgFeg,Sk36NgFeg,ICLR committee final decision,Reject,"The program committee appreciates the authors' response to concerns raised in the reviews. Unfortunately, reviews are not leaning sufficiently towards acceptance. Reviewers find this direction of exploration to be interesting, but a bit preliminary at the moment. Authors are strongly encouraged to incorporate reviewer comments to make future iterations of the work stronger.",ICLR2017, +_QI7ffqaiU,1576800000000.0,1576800000000.0,1,BkxXe0Etwr,BkxXe0Etwr,Paper Decision,Accept (Poster),All three reviewers gave scores of Weak Accept. AC has read the reviews and rebuttal and agrees that the paper makes a solid contribution and should be accepted.,ICLR2020, +S1LB3MLul,1486400000000.0,1486400000000.0,1,rkE3y85ee,rkE3y85ee,ICLR committee final decision,Accept (Poster),"This paper proposes a neat general method for relaxing models with discrete softmax choices into closely-related models with continuous random variables. The method is designed to work well with the reparameterization trick used in stochastic variational inference. This work is likely to have wide impact. + + Related submissions at ICLR: + ""The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables"" by Maddison et al. contains the same core idea. ""Discrete variational autoencoders"", by Rolfe, contains an alternative relaxation for autoencoders with discrete latents, which I personally find harder to follow.",ICLR2017, +_QRX5_PVi24,1610040000000.0,1610470000000.0,1,GJwMHetHc73,GJwMHetHc73,Final Decision,Accept (Spotlight),"Reviewers all agreed that this submission has an interesting new idea for learning object/keypoint representations: parts of a visual scene that are not easily predictable from their neighborhoods are good object candidates. Experimental gains on various Atari games are convincing. The main drawback at this point is that the evaluation is limited to visually rather simple settings, and it is unclear how the approach will scale to more realistic scenes. ",ICLR2021, +a0sO1Ldrx1,1576800000000.0,1576800000000.0,1,rklj3gBYvH,rklj3gBYvH,Paper Decision,Reject,"The paper proposes a LSTM-based meta-learning approach that learns how to update each neuron in another model for best few-shot learning performance. + +The reviewers agreed that this is a worthwhile problem and the approach has merits, but that it is hard to judge the significance of the work, given limited or unclear novelty compared to the work of Ravi & Larochelle (2017) and a lack of fair baseline comparisons. + +I recommend rejecting the paper for now, but encourage the authors to take the reviewers' feedback into account and submit to another venue.",ICLR2020, +nHMDOdDZZm6,1642700000000.0,1642700000000.0,1,YVa8X_2I1b,YVa8X_2I1b,Paper Decision,Reject,"The paper proposes an approach for learning a decomposition of a scene into 3D objects using single images without pose annotations as training data. The model is based on Slot Attention and NeRF. Results are demonstrated on CLEVR and its variants. + +The reviewers point out that the method is reasonable and the paper is quite good, but even after considering the authors' feedback agree that the paper is not ready for acceptance. In particular, the key concern is around experimental evaluation - that it is performed on one dataset (and variants thereof) and that the evaluation of the 3D properties of the model is not sufficiently convincing: it does not outperform 2D object learning methods on segmentation and is not compared to those on ""snitch localization"". + +Overall, this is a reasonable paper, and the results are promising but somewhat inconclusive, so I recommend rejection at this point, but encourage the authors to improve the paper and resubmit to a different venue. + +(One remark. The paper makes a point of not using any annotation. It is technically true, but in practice on CLEVR unsupervised segmentation works so well that it's basically as if segmentation masks were provided. If the authors could demonstrate that their method - possibly with provided coarse segmentation masks - works on more complex datasets, it would be a nice additional experiment)",ICLR2022, +fug5XVCtk398,1642700000000.0,1642700000000.0,1,C03Ajc-NS5W,C03Ajc-NS5W,Paper Decision,Accept (Poster),"This work introduces an autoregressive flow model that generates molecular geometries by placing one atom at the time. +In order to preserve the E(3) invariance of the density, successive atom locations are sampled relative to already placed atoms (in a coordinate system described by distance, angle and torsion). +The paper is overall well-written and experimental results are compelling.",ICLR2022, +BkwAozLdg,1486400000000.0,1486400000000.0,1,SygvTcYee,SygvTcYee,ICLR committee final decision,Reject,"The work proposes a parallel/distributed variant of the MAC decomposition method. In presents some theoretical and experimental results supporting the parallelization strategy. The reviews are mixed and indeed a common concern among the reviewers was the choice of test problem. To me it is ok to only concentrate on a single class of problems, but in this case it needs to be a problem that the ICLR community identifies as being of central importance. Otherwise, if a more esoteric problem is chosen then I (and the reviewers) would rather see that the method is useful on multiple problems. Otherwise, it's basically impossible to extrapolate the experiments to new settings and we are forced to re-implement the algorithm. I'm not saying that the authors necessarily need to consider deep networks and there are many alternative possible models (sparse coding, collaborative filtering, etc.). But it should be noted that, without further experimental comparisons, it is impossible to verify the author's claims that the method is effective for deeply-nested models. + + Other concerns brought up by the reviewers (beyond the clarity/presentation issues, which should also be addressed): the experimental comparison would be more convincing with a comparison to an existing approach like a parallel SGD method. I appreciate that the authors have done a lot of work already on this problem, but doing such obvious comparisons should be the job of the author instead of the reader (focusing purely on parallelization would be ok if the MAC model was extremely-widely-used already and parallelizing was an open problem, but my impression is that this is not the case). As a minor aside, the memory issue will be more serious for deeply-nested models, due to the use of the decomposition approach (we don't want to store the activations for all layers for all examples), and this doesn't arise in SGD.",ICLR2017, +B1m7TzLdg,1486400000000.0,1486400000000.0,1,HJrDIpiee,HJrDIpiee,ICLR committee final decision,Reject,"The reviewers agree that the paper is clear and well-written, but all reviewers raised significant concerns about the novelty of the work, since the proposed algorithm is a combination of well-known techniques in reinforcement learning. It is worth noting that the use of eligibility traces is not very heavily explored in the deep reinforcement learning literature, but since the contribution is primarily empirical rather than conceptual and algorithmic, there is a high bar for the rigorousness of the experiments. The reviewers generally did not find the evaluation to be compelling enough in this regard. Based on this evaluation, the paper is not ready for publication.",ICLR2017, +cJHZGnjZl,1576800000000.0,1576800000000.0,1,rJg76kStwH,rJg76kStwH,Paper Decision,Accept (Poster),"This paper is far more borderline than the review scores indicate. The authors certainly did themselves no favours by posting a response so close to the end of the discussion period, but there was sufficient time to consider the responses after this, and it is somewhat disappointing that the reviewers did not engage. + +Reviewer 2 states that their only reason for not recommending acceptance is the lack of experiments on more than one KG. The authors point out they have experiments on more than one KG in the paper. From my reading, this is the case. I will consider R2 in favour of the paper in the absence of a response. + +Reviewer 3 gives a fairly clear initial review which states the main reasons they do not recommend acceptance. While not an expert on the topic of GNNs, I have enough of a technical understanding to deem that the detailed response from the authors to each of the points does address these concerns. In the absence of a response from the reviewer, it is difficult to ascertain whether they would agree, but I will lean towards assuming they are satisfied. + +Reviewer 1 gives a positive sounding review, with as main criticism ""Overall, the work of this paper seems technically sound but I don’t find the contributions particularly surprising or novel. Along with plogicnet, there have been many extensions and applications of Gnns, and I didn’t find that the paper expands this perspective in any surprising way."" This statement is simply re-asserted after the author response. I find this style of review entirely inappropriate and unfair: it is not a the role of a good scientific publication to ""surprise"". If it is technically sound, and in an area that the reviewer admits generates interest from reviewers, vague weasel words do not a reason for rejection make. + +I recommend acceptance.",ICLR2020, +l6hr25do6z,1576800000000.0,1576800000000.0,1,Hye1RJHKwB,Hye1RJHKwB,Paper Decision,Accept (Poster),"All three reviewers appreciate the new method (FactorGAN) for training generative networks from incomplete observations. At the same time, the quality of the experimental results can still be improved. On balance, the paper will make a good poster.",ICLR2020, +mHIcMKBvHb_,1642700000000.0,1642700000000.0,1,nf3A0WZsXS5,nf3A0WZsXS5,Paper Decision,Accept (Poster),"The authors present a GAN for learning a continuous representation of disease-related image patterns from regional volume information generated from structural MRI images. +The reviewers find the problem relevant and appreciate the proposed solution. They find the paper well-written and find the empirical results on Alzheimer brain MRIs relevant for the neuroscience community. + +The overall objective function includes several hyper-parameters. As pointed out as the main weak point by multiple reviewers this may hint at overengineering/overfitting to a data set. However, the reviewers also mention that the regularizers are all sufficiently well-motivated in the paper and the author response. + +Reviewers highlight comparisons on the real data as a strong result demonstrating that Surreal GAN was able to isolate two major sources/locations of atrophy in Alzheimer’s disease. Overall, the reviews are positive in majority.",ICLR2022, +preNA0A6cy,1576800000000.0,1576800000000.0,1,rkgTdkrtPH,rkgTdkrtPH,Paper Decision,Reject,"This paper proposes a noise-aware knowledge graph embedding (NoiGAN) by combining KG completion and noise detection through the GANs framework. The reviewers find that the idea is interesting, but the comparison to SOTA is largely missing. The paper can be improved by addressing the reviewer comments. ",ICLR2020, +BCBLoeACvXs,1610040000000.0,1610470000000.0,1,7EDgLu9reQD,7EDgLu9reQD,Final Decision,Accept (Poster),"Congratulations! The reviewers unanimously viewed this work positively and were in favor of acceptance to ICLR. + +While the current revision already addresses many reviewer concerns, it may be worth adding some of the datasets pointed out by R3 or comparing to some of the papers suggested by R1.",ICLR2021, +eI4rLtBg2r,1642700000000.0,1642700000000.0,1,9dn7CjyTFoS,9dn7CjyTFoS,Paper Decision,Reject,"In this manuacript, the authors develop feature-fool attacks with feature-level adversarial perturbations using deep image generators and a novel optimization objective. They further show that the feature-fool attacks are versatile and can generate targeted feature-level attacks at the ImageNet scale that are simultaneously interpretable, universal to any source image, and physically-realizable. +The reviewers agree that the paper is well-motivated and the authors have addressed some concerns. +However, the reviewers still do not satisfy with some concerns so as to keep the initial scores. +In comparison with the manuscripts I'm handling, I have to recommend to reject it!",ICLR2022, +Evp98S07ISE,1642700000000.0,1642700000000.0,1,JJxiD-kg-oK,JJxiD-kg-oK,Paper Decision,Accept (Poster),"The Authors propose a neural-network based approach for the phase retrieval problem. Solving the phase retrieval problem is key for important application areas such as crystallography or radioastronomy. + +After adding more baselines and other changes, 3 out of 4 reviewers recommended acceptance. Reviewer kQWk recommended rejection mostly based on the fact that the paper is quite narrow in scope. + +Reviewer kQWk is right that the topic might not appeal to most of the ICLR community. It is worth noting that the main contribution of the paper is not about neural networks but rather about connecting phase retrieval with Blaschke products. As it stands, it seems that after making this connection, any non-linear approximator could do well. + +Having said that, this is an important application area and the progress is welcomed. Hopefully, it will draw inspire more research in this area. + +Currently, the key issue of the paper is that it is very challenging to understand for people without background in the phase retrieval area or complex analysis. To make this paper more valuable for the ICLR community, I would strongly encourage the Authors to devote at least a page to explanation of what is the phase retrieval problem, and the intuition behind the solution. Perhaps [1] could serve as an inspiration. + +[1] Phase retrieval in crystallography and optics, R. P. Millane",ICLR2022, +ryebT_I1xE,1544670000000.0,1545350000000.0,1,Bkx_Dj09tQ,Bkx_Dj09tQ,"Clearly written paper, but unclear impact",Reject,"The authors conduct experiments to study orientation selectivity in neural networks. + +The reviewers generally agreed that the paper was clearly written and easy to follow. Further, the experimental analysis demonstrates that contrary to what was claimed in some previous work, the learned orientation selectivity can be useful for generalization. + +However, the reviewers also raised a number of concerns: 1) that the conclusions are drawn on the basis of a couple of neural network architectures; the authors attempted to add results using a Resnet50 model, but this analysis was ultimately removed when the authors discovered a bug; 2) in the context of the contributions in neuroscience it was not clear that the limited results on the two artificial networks are sufficient to help draw such conclusions, and that 3) since the network is trained to recognize objects, it would seem natural that the model would learn neurons that are sensitive to orientation and that it is not clear how the author’s observations might lead to better trained models. +While the reviewers were not completely unanimous in their scores, the AC agrees with a majority of the reviewers that the work while interesting could be strengthened by additional experiments on other architectures. +",ICLR2019,3: The area chair is somewhat confident +Fe5tLSJs7X,1576800000000.0,1576800000000.0,1,ryga2CNKDH,ryga2CNKDH,Paper Decision,Reject,"The paper proposed a method to evaluate latent variable based generative models by estimating the compression in the latents (rate) and the distortion in the resulting reconstructions. While reviewers have clearly appreciated the theoretical novelty in using AIS to get an upper bound on the rate, there are concerns on missing empirical comparison with other related metrics (precision-recall) and limited practical applicability of the method due to large computational cost. Authors should consider comparing with PR metric and discuss some directions that can make the method practically as relevant as other related metrics. ",ICLR2020, +Q2oKa1q9Cp,1642700000000.0,1642700000000.0,1,eW5R4Cek6y6,eW5R4Cek6y6,Paper Decision,Accept (Spotlight),"The paper demonstrates that test error of image classification models can be accurately estimated using samples generated by a GAN. Surprisingly, this relatively simple proposed method outperforms existing approaches including ones from recent competitions. All reviewers agree this is a very interesting finding, even though theoretical analysis is lacking. Given the importance of the problem of predicting generalization, I recommend acceptance.",ICLR2022, +DxVrvy_Za,1576800000000.0,1576800000000.0,1,SyxKrySYPr,SyxKrySYPr,Paper Decision,Reject,"This paper proposes architectural modifications to transformers, which are promising for sequential tasks requiring memory but can be unstable to optimize, and applies the resulting method to the RL setting, evaluated in the DMLab-30 benchmark. + +While I thought the approach was interesting and the results promising, the reviewers unanimously felt that the experimental evaluation could be more thorough, and were concerned with the motivation behind of some of the proposed changes. +",ICLR2020, +kixpfTbKAeeb,1642700000000.0,1642700000000.0,1,Ek7PSN7Y77z,Ek7PSN7Y77z,Paper Decision,Accept (Spotlight),"I thank the authors for their submission and active participation in the discussions. All reviewers are unanimously leaning towards acceptance of this paper. Reviewers in particular liked that the paper is well-written and easy to follow [186e,TAdH,Exgo], well motivated [TAdH], interesting [PsKh], novel [186e] and provides gains over baselines [186e,TAdH,PsKh] with interesting ablations [186e,Exgo]. I thus recommend accepting the paper and I encourage the authors to further improve their paper based on the reviewer feedback.",ICLR2022, +c0Bxya9na8,1576800000000.0,1576800000000.0,1,rkxZCJrtwS,rkxZCJrtwS,Paper Decision,Reject,"This paper proposes a hybrid RL algorithm that uses model based gradients from a differentiable simulator to accelerate learning of a model-free policy. While the method seems sound, the reviewers raised concerns about the experimental evaluation, particularly lack of comparisons to prior works, and that the experiments do not show a clear improvement over the base algorithms that do not make use of the differentiable dynamics. I recommend rejecting this paper, since it is not obvious from the results that the increased complexity of the method can be justified by a better performance, particularly since the method requires access to a simulator, which is not available for real world experiments where sample complexity matters more.",ICLR2020, +CGziKBnSAMT,1610040000000.0,1610470000000.0,1,c3MWGN_cTf,c3MWGN_cTf,Final Decision,Reject,"The paper shows that a form of Fictitious Self-Play converges to the Nash equilibria in Markov games. Understanding the theoretical properties of Fictitious Self-Play is important, however the paper in its current form is not ready for publication. The paper needs a more thorough discussion on related works, the assumptions made, and as pointed out by Reviewer3, the convergence argument needs to be expanded and explained in more detail. Further, I encourage authors to add experiments and compare their algorithm with other methods. ",ICLR2021, +7dDuAUn-TIk,1610040000000.0,1610470000000.0,1,0MjC3uMthAb,0MjC3uMthAb,Final Decision,Reject,"The paper proposed a shot-conditional form of episodic fine-tuning approach for few-shot image classification. There were a number of concerns raised, e.g., there lacks of sufficient comparison with SOTA baselines, the justification on the significance of shot-aware approach is not entirely convincing, and incremental contributions in both novelty and improvements. While some of these issues were improved in the rebuttal, the revision remains not satisfied by the reviewers. Overall, I think the paper has some interesting idea, but is still not ready for publication. ",ICLR2021, +SJlsE_RQe4,1544970000000.0,1545350000000.0,1,H1goBoR9F7,H1goBoR9F7,novel approach,Accept (Poster),"This paper proposes a novel approach for network pruning in both training and inference. This paper received a consensus of acceptance. Compared with previous work that focus and model compression on training, this paper saves memory and accelerates both training and inference. It is activation, rather than weight that dominates the training memory. Reviewer1 posed a valid concern about the efficient implementation on GPUs, and authors agreed that practical speedup on GPU is difficult. It'll be great if the authors can give practical insights on how to achieve real speedup in the final draft. ",ICLR2019,5: The area chair is absolutely certain +zzNZLRApya,1576800000000.0,1576800000000.0,1,SJlM0JSFDr,SJlM0JSFDr,Paper Decision,Reject,"The authors offer theoretical guarantees for a simplified version of the deep Q-learning algorithm. However, the majority of the reviewers agree that the simplifying assumptions are so many that the results do not capture major important aspects of deep Q-Learning (e.g. understanding good exploration strategies, understanding why deep nets are better approximators and not using neural net classes that are so large that can capture all non-parametric functions). For justifying the paper to be called a theoretical analysis of deep Q-Learning some of these aspects need to be addressed, or the motivation/title of the paper needs to be re-defined. ",ICLR2020, +Hklk74ZiyN,1544390000000.0,1545350000000.0,1,rJl8FoRcY7,rJl8FoRcY7,"Poorly motivated, implications unclear",Reject,"This paper suggests a problem with the standard ELBO for the multi-modal case, and proposes a new objective to address this problem. However, I (and some of the reviewers) disagree with the motivation. First of all, there's no reason one can't train a separate encoder for every combination of modalities available, at least when there are only 2 or 3. Failing that, one could simple optimize per-example approximate posteriors without using an encoder. + +Second, once you stop optimizing the ELBO, you've lost the motivating principle for training VAEs, and must justify your new objective empirically. Almost all of the results are (in my opinion) ambiguous plots of latent encodings. + +Finally, a point made throughout the paper and discussions was that different modalities should give the same encodings, which is plainly false. One of the reviewers made this point: ""The fact that z_a != z_b != z_{a,b} should be expected if a and b provide different information. I don't see the problem with this."", which you dismiss. Additionally, the encoder's job is to approximate the true posterior. The true posteriors will in general be different for different modalities. + +I would recommend focusing on ways to train the original ELBO in the presence of different modalities, instead of modifying it based on these intuitions. +",ICLR2019,4: The area chair is confident but not absolutely certain +jeOhaL3xXG,1610280000000.0,1610470000000.0,1,ZVqZIA1GA_,ZVqZIA1GA_,Final Decision,Reject,"This work proposes capsule networks with deformable capsules for tackling object detection. All reviewers agreed that object detection is an important problem that is interesting to the ICLR community. Reviewers also agree that the proposed approach is novel and interesting, and in particular they mention that proposing an efficient capsule network for detection is non-trivial. On the less positive side, during the discussion phase all reviewers had concerns about the weak experimental validation, particularly missing ablation studies to analyse the effectiveness of their contributions. At the end, all reviewers felt that, while this is a promising direction of research for object detection, the experimentation should be improved.",ICLR2021, +jL8-h1it-dY,1610040000000.0,1610470000000.0,1,qHXkE-8c1sQ,qHXkE-8c1sQ,Final Decision,Reject,"All the reviewers agree that the paper studies an important and interesting problem. However the reviewers felt the paper is still in preliminary stages, with incorrect derivations, missing comparisons/references, and writing. While the authors updated the paper during the discussion stage addressing some of the concerns, the paper still needs more work in adding appropriate comparisons and in presenting the concepts more clearly. Hence I believe the paper is not yet ready for publication and encourage authors to continue their work.",ICLR2021, +H1xu-7GggV,1544720000000.0,1545350000000.0,1,SkGpW3C5KX,SkGpW3C5KX,promising quantitative results but limited contribution over previous work,Reject,"1. Describe the strengths of the paper. As pointed out by the reviewers and based on your expert opinion. + +- The method and justification are clear +- The quantitative results are promising. + +2. Describe the weaknesses of the paper. As pointed out by the reviewers and based on your expert opinion. Be sure to indicate which weaknesses are seen as salient for the decision (i.e., potential critical flaws), as opposed to weaknesses that the authors can likely fix in a revision. + +- The contribution is minor +- Analysis of the properties of the method is lacking. +The first point was the major factor in the final decision. + +3. Discuss any major points of contention. As raised by the authors or reviewers in the discussion, and how these might have influenced the decision. If the authors provide a rebuttal to a potential reviewer concern, it’s a good idea to acknowledge this and note whether it influenced the final decision or not. This makes sure that author responses are addressed adequately. + +Reviewer opinion was quite divergent but both AR1 and AR2 had concerns about the 2 weaknesses mentioned in the previous section (which remained after the author rebuttal). + +4. If consensus was reached, say so. Otherwise, explain what the source of reviewer disagreement was and why the decision on the paper aligns with one set of reviewers or another. + +No consensus was reached. The source of disagreement was on how to weigh the pros vs the cons. The final decision was aligned with the lower ratings. The AC agrees that the contribution is minor. +",ICLR2019,4: The area chair is confident but not absolutely certain +i6myWJjXQG,1610040000000.0,1610470000000.0,1,_bF8aOMNIdu,_bF8aOMNIdu,Final Decision,Reject,"Reading the paper and the reviews themselves, I found myself conflicted about this work: + +- Multiple reviewers commented that this is a rather incremental piece of work, given that it's a rather straightforward combination of existing losses/models. +- On the other hand, there is admittedly value in (1) realizing that this combination is meaningful (2) understanding the meaningful ways in which these work or do not work with ablation studies. +- I am not quite satisfied that the datasets and experiments in this work represent in any meaningful way real world noise. However, it does appear that the authors ran experiments on common benchmarks using common protocols so there's only so much that they themselves can be blamed for. +- Tangentially, I am somewhat surprised about the relatively good ImageNet performance of this method. I suspect the combination of this being done with uniform noise rather than structured noise is helping quite a bit. + +All in all, this work is certainly interesting enough, but the results are just not quite compelling enough to pass the bar.",ICLR2021, +9lyHx3L63R,1610040000000.0,1610470000000.0,1,73WTGs96kho,73WTGs96kho,Final Decision,Accept (Poster),"The paper proposes an end-to-end architecture, Net-DNF, for handling tabular data. This is a novel approach in the relatively under-explored domain of application of neural networks; the paper also presents justification of the design choices via ablation studies. The paper is clearly written, and empirical results are convincing. +",ICLR2021, +KAdsiztoDDr,1610040000000.0,1610470000000.0,1,ZzwDy_wiWv,ZzwDy_wiWv,Final Decision,Accept (Poster),"This paper proposes a new idea for performing knowledge distillation by leveraging teacher’s classifier to train student’s penultimate layer feature via proposing suitable loss functions. Reviewers appreciate the simultaneous simplicity and effectiveness of the method. A comprehensive set of studies are performed to empirically show the effectiveness of the method. Specifically, the proposed distillation method is shown to outperform state-of-the-art across various network architectures, teacher-student capacities, datasets, and domains. The paper is well-written and is easy to follow. All reviewers rate the paper on the accept side (after the rebuttal) and believe the new perspective this work provides on distillation and its simplicity to implement can lead it to gain high impact. I concur with the reviewers and find this submission a convincing empirical work, and thus recommend for accept. +",ICLR2021, +9-OmkaOVen,1576800000000.0,1576800000000.0,1,H1gNOeHKPS,H1gNOeHKPS,Paper Decision,Accept (Spotlight),"This paper extends work on NALUs, providing a pair of units which, in tandem, outperform NALUs. The reviewers were broadly in favour of the paper given the presentation and results. The one dissenting reviewer appears to not have had time to reconsider their score despite the main points of clarification being addressed in the revision. I am happy to err on the side of optimism here and assume they would be satisfied with the changes that came as an outcome of the discussion, and recommend acceptance.",ICLR2020, +r9iJ2a2HUV,1610040000000.0,1610470000000.0,1,xgGS6PmzNq6,xgGS6PmzNq6,Final Decision,Accept (Poster),"This paper is devoted to ""dyadic fairness"" in representation learning. All the reviewers agreed that the contribution is novel, original and technically sound. However, all the reviewers agreed that the paper should be improved in terms of presentation -- for two reviewers, presentation/clarity issues were at the core of their weak rejects. The most positive reviewers highlighted that the problem is still understudied despite the flurry of work on fair machine learning in the last years and therefore the contribution deserves to be accepted. If there is room, this paper can be accepted as a poster.",ICLR2021, +BkeNfekT27,1541370000000.0,1545350000000.0,1,H1xsSjC9Ym,H1xsSjC9Ym,Good work and helpful revisions,Accept (Poster),"Pros: +- The paper is well-written and clear and presented with helpful illustrations and videos. +- The training methodology seems sound (multiple random seeds etc.) +- The results are encouraging. + +Cons: +- There was some concern generally about how this work is positioned relative to related work and the completeness of the related work. However, the authors have made this clearer in their rebuttal. + +There was a considerable amount of discussion between the authors and all reviewers to pin down some unclear aspects of the paper. I believe in the end there was good convergence and I thank both the authors and reviewers for their persistence and dilligence in working through this. The final paper is much better I think and I recommend acceptance.",ICLR2019,4: The area chair is confident but not absolutely certain +KoJ5dScvHj,1576800000000.0,1576800000000.0,1,S1lBVgHYvr,S1lBVgHYvr,Paper Decision,Reject,"This paper proposes a certified defense under the more general threat model beyond additive perturbation. The proposed defense method is based on adding noise to the classifier's outputs to limit the attacker's knowledge about the parameters, which is similar to differential privacy mechanism. The authors proved the query complexity for any attacker to generate a successful adversarial attack. The main objection of this work is (1) the assumption of the attacker and the definition of the query complexity (to recover the optimal classifier rather than generating an adversarial example successfully) is uncommon, (2) the claim is misleading, and (3) the experimental evaluation is not sufficient (only two attacks are evaluated). The authors only provided a brief response to address the reviewers’ comments/questions without submitting a revision. Unfortunately none of the reviewer is in support of this paper even after author response. +",ICLR2020, +NCH4vn4fi3r,1642700000000.0,1642700000000.0,1,dtYnHcmQKeM,dtYnHcmQKeM,Paper Decision,Reject,"The paper proposes the physics-informed neural operator. It combines the operating-learning and function-optimization frameworks, which improves convergence rates and accuracy over traditional methods. While the paper was well written, several reviewers raised their concerns on the novelty of the paper, especially regarding the difference from PINN-DeepONet (Wang et. al.). Following this, there have been a long discussion between the authors and the reviewers, as well as among the reviewers. As a consequence, we think the authors somehow overclaimed their contributions on combining PINN and operator learning, and there are some important references missing and baselines not compared empirically. With this, the conclusion is that we cannot accept this paper in its current form, and we hope that authors can take all the review feedback into consideration and better position the novelty and impact of their work in the future submissions.",ICLR2022, +zQRu4Kf9co8,1610040000000.0,1610470000000.0,1,QpT9Q_NNfQL,QpT9Q_NNfQL,Final Decision,Reject,"This paper approximates the Whittle index in restless bandits using a neural network. Finding the Whittle index is a difficult problem and all reviewers agreed on this. Nevertheless, the scores of this paper are split between 2x 4 and 2x 7, essentially along the line of whether this paper is too preliminary to be accepted. Therefore, I read the paper and propose a rejection. + +The reason is that the paper lacks rigor, which was brought up by the two reviewers who suggested rejections. For instance, in the last line of Algorithm 1, it is not clear what kind of a gradient is computed. The reason is that \bar{G}_b is not a proper baseline, as it depends on the future actions of the bandit policy in any given round. I suggest that the authors look at recent papers on meta-learning of bandit policies by policy gradients, + +https://papers.nips.cc/paper/2020/hash/171ae1bbb81475eb96287dd78565b38b-Abstract.html + +https://arxiv.org/abs/2006.16507 + +This is the level of rigor that I would expect from this paper, to make sure that the gradients are correct.",ICLR2021, +1452v-_FCJ2,1642700000000.0,1642700000000.0,1,Oh1r2wApbPv,Oh1r2wApbPv,Paper Decision,Accept (Poster),"Strengths: +* Strong results across two benchmarks +* Ablation study demonstrates importance of components +* Provides improvements especially in low resource settings +* Well-written paper + +Weaknesses: +* Novelty of the method may be limited as previous works have explored structured outputs as intermediate plans +* Not clear method will extend to other domains as decent AMR parses are required to train the imagination module, which might work well on the datasets used (e.g., RocStories), but wouldn't work in settings with more complex language",ICLR2022, +H1t02MU_e,1486400000000.0,1486400000000.0,1,Hk95PK9le,Hk95PK9le,ICLR committee final decision,Accept (Poster),"The area chair disagrees with the reviewers and actually thinks this is a very important contribution. This paper has the potential to be have huge impact as it sets a new state of the art on a 20+ year old benchmark with a simple model. The simplicity of the model is what is so impressive, because it resets how people think about syntactic parsing as a task. The contributions in this paper have the potential to unleash a series of model simplifications in a number of areas and the area chair therefore strongly suggests accepting the paper. + + It is true that the techniques used in this paper are not new inventions but rather a careful combination of ideas proposed in other places. However, in the area chair's opinion this is a substantial contribution. There are lots of ideas out there and knowing which ideas to pick and how to combine them is very valuable. Showing that a simple model can beat more complicated models advances our understanding much more than a new technique that adds unnecessary additional complexity. The focus on novel models in academia is too big, leading to a proliferation of models that nobody needs.",ICLR2017, +fWHRGn5eec_,1642700000000.0,1642700000000.0,1,1OHZX4YDqhT,1OHZX4YDqhT,Paper Decision,Reject,"This paper proposes a personalized federated learning framework based on neural architecture search, in which the local clients perform NAS to search for a better architecture for the private local data. Specifically, the authors extend MiLeNAS, which is an existing NAS algorithm, to be run in the federated learning setting, and use FedAvg for model aggregation. The proposed FedNAS framework is validated against personalized federated learning methods with predefined architectures, such as perFedAvg, Ditto, and local fine-tuning, and is shown to largely outperform them on non-IID settings with label skew and LDA distribution. FedNAS’s collaborative search for the optimal architecture also yields a better performing global model than FedAvg. + +The paper received borderline ratings. Three out of four reviewers are learning negative, while one is leaning negative. The below is the summary of pros and cons of the paper mentioned by the reviewers: + +Pros +- The idea of using NAS for personalized federated learning seems novel and interesting. +- The proposed FedNAS framework is shown to be effective in tackling the data heterogeneity problem, which is a fundamental problem with federated learning. +- The authors have released the code for reproducibility. + +Cons +- The technical contribution of the work seems limited, since the proposed FedNAS straightforwardly combines an existing NAS method (MiLeNAS) with federated averaging, and there is no challenge mentioned for this new problem of federated NAS. +- The choice of a specific NAS method (MiLeNAS) is not well justified, and other NAS methods should be also considered. +- The motivation is unclear: It is not clear whether the authors aim to perform collaborative automotive design or solve personalized federated learning. +- There is no convergence analysis. + +While some of the concerns have been addressed away in the authors’ responses during the rebuttal period, the reviewers did not change their ratings, and the final consensus was to reject the paper. + +I agree with the authors that combining federated learning with NAS, and applying it for personalized federated learning is a novel idea that intuitively makes sense. However, I agree with the reviewers that the current method is a straightforward combination of an existing NAS method and an existing FL algorithm, the authors should identify new challenges posed by the combination of the two methods, and identify them. + +Further, performing NAS on edge devices may be possible, but not the best solution, since it could result in large computational overhead. While the authors mention that MiLeNAS is computational suitable in such settings, there should be a proper investigation of the accuracy-efficiency tradeoff, showing how well FedNAS performs against others with the same computational budget (or training time / energy consumption). + +Overall, this is a paper that proposes a novel and interesting idea that seems to work, but the paper does not sufficiently examine challenges posed by the new problem. I suggest the authors identify the new challenges and examine the efficiency issue mentioned, and further develop their method, if necessary.",ICLR2022, +SklNuUfzlE,1544850000000.0,1545350000000.0,1,HkElFj0qYQ,HkElFj0qYQ,A novel approach but has significant issues ,Reject,"This paper presents a new defense against adversarial examples using random permutations and a Fourier transform. The technique is clearly novel, and the paper is clearly written. + +However, as the reviewers and commenters pointed out, there is a significant degradation in natural accuracy, which does not seem to be easily recoverable. This degradation is due to the random permutation of the images, which effectively disallows the use of convolutions. + +Furthermore, Reviewer 1 points out that the baselines are insufficient, as the authors do not explore (a) learning the transformation, or (b) using expectation over transformation to attack the model. + +This concern is further validated by the fact that Black-box attacks are often the best-performing, which is a sign of gradient masking. The authors try to address this by performing an attack against an ensemble of models, and against a substitute model attack. However, attacking an ensemble is not equivalent to optimizing the expectation, which would require sampling a new permutation at each step. + +The paper thus requires significantly stronger baselines and attacks.",ICLR2019,5: The area chair is absolutely certain +S1cSS1prf,1517250000000.0,1517260000000.0,562,r1CE9GWR-,r1CE9GWR-,ICLR 2018 Conference Acceptance Decision,Reject,"While the reviewers agree that this is an important topic, there are numerous concerns novelty, correctness and limitations. ",ICLR2018, +Lk4m9JvHck,1576800000000.0,1576800000000.0,1,B1x1ma4tDr,B1x1ma4tDr,Paper Decision,Accept (Spotlight),This paper proposes a novel differentiable digital signal processing in audio synthesis. The application is novel and interesting. All the reivewers agree to accept it. The authors are encouraged to consider the reviewer's suggestions to revise the paper.,ICLR2020, +FbV1U9mBU,1576800000000.0,1576800000000.0,1,H1l_0JBYwS,H1l_0JBYwS,Paper Decision,Accept (Spotlight),"The paper proposes a nice and easy way to regularize spectral graph embeddings, and explains the effect through a nice set of experiments. Therefore, I recommend acceptance.",ICLR2020, +IxjZtFYHIYG,1610040000000.0,1610470000000.0,1,R6tNszN_QfA,R6tNszN_QfA,Final Decision,Reject,"This paper proposed a new family of losses for GANs and showed that this family is quite general and encompasses a number of existing losses as well as some new loss functions. The paper compared experimentally the existing losses and the new proposed losses. But the benefit of this family is not clear theoretically, and this work did not also provide the very helpful insights for the practical application of GANs. +",ICLR2021, +OQWTQAlejBZ,1610040000000.0,1610470000000.0,1,H5B3lmpO1g,H5B3lmpO1g,Final Decision,Reject,"The paper got mixed reviews ranging from 5 to 7. The main concerns of the reviewers were the missing novelty as the paper combines different well known methods for a given problem, so there is no big algorithmic contribution. The presented pipeline for closed-loop grasping using imitation learning from a planner, Dagger and subsequent deep RL with TD3 is a straightforward, but sound and intuitive combination of algorithms to address the problem of closed loop grasping. The presented results and ablation studies also motivate these algorithmic choices. In the rebuttal the authors addressed most concerns regarding the experiments (missing comparisons to open-loop grasping and real world experiments), but more real world experiments would be necessary to evaluate the effectiveness of the approach. + +This is a borderline paper were I unfortunately have to recommend rejection due to the missing algorithmic contribution, a major requirement for ICLR. The paper would however fit very well to a robotics conference and the authors are encouraged to resubmit the paper the venues such as RSS or CoRL. ",ICLR2021, +BCgo9YeOe,1576800000000.0,1576800000000.0,1,B1lC62EKwr,B1lC62EKwr,Paper Decision,Reject,"The authors propose a new perspective on active learning by borrowing concepts from subjective logic. In particular, they model uncertainty as a combination of dissonance and vacuity; two orthogonal forms of uncertainty that may invite additional labels for different reasons. The concepts introduced are not specific to deep learning but are generally applicable. Experiments on 2d data and a couple standard datasets are provided. + +The derivation of the model is intuitive but it's not clear that it is ""better"" than any other intuitively derived model for active learning. With the field of active learning having such a long history, the field has moved towards a standard of expecting theoretical guarantees to distinguish a new method from the rest; this paper provides none. Instead anecdotal examples and small experiments are performed. Like other reviews, I am extremely skeptical about the use of KDE which is known to have essentially no inferential ability in high dimensions (such as in deep learning situations where presumably images are involved). It is hard not to feel as though deep learning is somewhat of a red herring in this paper. + +I recommend the authors lean into understanding the method from a perspective beyond anecdotes and experiments if they wish for this method to gain traction. ",ICLR2020, +mlzOkObmMb,1576800000000.0,1576800000000.0,1,BklSv34KvB,BklSv34KvB,Paper Decision,Reject,"The authors propose a new mini-batch selection method for training deep NNs. Rather than random sampling, selection is based on a sliding window of past model predictions for each sample and uncertainty about those samples. Results are presented on MNIST and CIFAR. + +The reviewers agreed that this is an interesting idea which was clearly presented, but had concerns about the strength of the experimental results, which showed only a modest benefit on relatively simple datasets. In the rebuttal period, the authors added an ablation study and additional results on Tiny-ImageNet. However, the results on the new dataset seem very marginal, and R1 did not feel that all of their concerns were addressed. I’m inclined to agree that more work is required to prove the generalizability of this approach before it’s suitable for acceptance. +",ICLR2020, +-c3pkEZl6m3,1610040000000.0,1610470000000.0,1,pTZ6EgZtzDU,pTZ6EgZtzDU,Final Decision,Reject,"This paper is borderline, as evidenced by all of the reviewer's scores. + +The pros are: +- important and relevant topic +- IMPORT is a reasonable, technically sound approach +- paper is relatively clear + +The cons all lie in the experimental evaluation, and whether the experiments sufficiently back the claim that IMPORT can learn sophisticated exploration strategies and validate IMPORT's merits compared to prior algorithms. In particular: +- The choice of benchmarks does not sufficiently test the ability to explore in a sophisticated manner +- Lack of comparisons to PEARL and MANGA, which can readily be applied to the online setting +- The empirical improvements are relatively modest. + +Overall, the cons slightly outweigh the pros of the paper. Indeed, no reviewer was willing to champion the paper's acceptance.",ICLR2021, +HkKfEyprG,1517250000000.0,1517260000000.0,309,BkrsAzWAb,BkrsAzWAb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"All reviewers agreed that, despite the lack of novelty, the proposed method is sound and correctly linked to existing work. As the topic of automatically learning the stepsize is of great practical interest, I am glad to have this paper presented as a poster at ICLR.",ICLR2018, +B1LNhGL_l,1486400000000.0,1486400000000.0,1,rJe-Pr9le,rJe-Pr9le,ICLR committee final decision,Reject,"The authors have proposed a new method for deep RL that uses model-based evaluation of states and actions and reward/life loss predictions. The evaluation, on just 3 ATari games with no comparisons to state of the art methods, is insufficient, and the method seems ad-hoc and unclear. Design choices are not clearly described or justified. The paper gives no insight as to how the different aspects of the approach relate or contribute to the overall results.",ICLR2017, +Hkg2L5t4xN,1545010000000.0,1545350000000.0,1,HJeuOiRqKQ,HJeuOiRqKQ,Extra iteration needed,Reject,"This paper studies the role of pooling in the success underpinning CNNs. Through several experiments, the authors conclude that pooling is neither necessary nor sufficient to achieve deformation stability, and that its inductive bias can be mostly recovered after training. + +All reviewers agreed that this is a paper asking an important question, and that it is well-written and reproducible. On the other hand, they also agreed that, in its current form, this paper lacks a 'punchline' that can drive further research. In words of R6, ""the paper does not discuss the links between pooling and aliasing"", or in words of R4, ""it seems to very readily jump to unwarranted conclusions"". In summary, the AC recommends rejection at this time, and encourages the authors to pursue the line of attack by exploring the suggestions of the reviewers and resubmit. ",ICLR2019,5: The area chair is absolutely certain +T8O4Vdqhx1,1576800000000.0,1576800000000.0,1,HkgH0TEYwH,HkgH0TEYwH,Paper Decision,Accept (Poster),"Issues raised by the reviewers have been addressed by the authors, and thus I suggest the acceptance of this paper.",ICLR2020, +HybmIyprG,1517250000000.0,1517260000000.0,744,r1tJKuyRZ,r1tJKuyRZ,ICLR 2018 Conference Acceptance Decision,Reject,"The paper proposes an autoencoder for sets, an interesting and timely problem. The encoder here is based on prior related work (Vinyals et al. 2016) while the decoder uses a loss based on finding a matching between the input and output set elements. Experiments on multiple data sets are given, but none are realistic. The reviewers have also pointed out a number of experimental comparisons that would improve the contribution of the paper, such as considering multiple matching algorithms and more baselines. In the end the idea is reasonable and results are encouraging, but too preliminary at this point.",ICLR2018, +r1liliKglV,1544750000000.0,1545350000000.0,1,BJe0Gn0cY7,BJe0Gn0cY7,A new and not too hacky VAE training trick,Accept (Poster),"Strengths: The proposed method is relatively principled. The paper also demonstrates a new ability: training VAEs with autoregressive decoders that have meaningful latents. The paper is clear and easy to read. + +Weaknesses: I wasn't entirely convinced by the causal/anticausal formulation, and it's a bit unfortunate that the decoder couldn't have been copied without modification from another paper. + +Points of contention: +It's not clear how general the proposed approach is, or how important the causal/anti-causal idea was, although the authors added an ablation study to check this last question. + +Consensus: All reviewers rated the paper above the bar, and the objections of the two 6's seem to have been satisfactorily addressed by the rebuttal and paper update.",ICLR2019,3: The area chair is somewhat confident +v5Uwg_gUR,1576800000000.0,1576800000000.0,1,rklTmyBKPH,rklTmyBKPH,Paper Decision,Accept (Poster),"Main content: Paper proposes a fast network adaptation (FNA) method, which takes a pre-trained image classification network, and produces a network for the task of object detection/semantic segmentation + +Summary of discussion: +reviewer1: interesting paper with good results, specifically without the need to do pre-training on Imagenet. Cons are better comparisons to existing methods and run on more datasets. +reviewer2: interesting idea on adapting source network network via parameter re-mapping that offers good results in both performance and training time. +reviewer3: novel method overall, though some concerns on the concrete parameter remapping scheme. Results are impressive +Recommendation: Interesting idea and good results. Paper could be improved with better comparison to existing techniques. Overall recommend weak accept.",ICLR2020, +Rg_ywFfsTLH,1610040000000.0,1610470000000.0,1,HgLO8yalfwc,HgLO8yalfwc,Final Decision,Accept (Spotlight),"This paper studies inverse reinforcement learning through the prism of regularized Markov decision processes, by generalizing MaxEntIRL from the negative entropy to any strongly convex regularizer (as a side note, strict convexity might be enough for many results). +The reviewers appreciated the clarity, the mathematical rigor and the empirical evaluation of this paper. They asked some questions and raised some concerns, that were mostly addressed in the rebuttal and the revision provided by the authors. +This is a strong paper, for which the AC recommends acceptance. +",ICLR2021, +xz98kScBG,1576800000000.0,1576800000000.0,1,rklz16Vtvr,rklz16Vtvr,Paper Decision,Reject,"This paper proposes a method for finding neural architecture which, through the use of selective branching, can avoid processing portions of the network on a per-data-point basis. + +While the reviewers felt that the idea proposed was technically interesting and well-presented, they had substantial concerns about the evaluation that persisted post-rebuttal, and lead to a consensus rejection recommendation.",ICLR2020, +B4aZqKeQsmy,1642700000000.0,1642700000000.0,1,hEiwVblq4P,hEiwVblq4P,Paper Decision,Reject,"I agree with the reviewers that this work is not well-presented, and it seriously lacks rigor and experimental support. The writing of this work also needs significant improvement. The authors made many claims without offering rigorous proofs, and hand-waved their argument throughout without strong empirical support. In the end, the authors' response did not address the reviewers' concerns satisfactorily and no one is excited enough to defend the current draft. Please consider revising your draft according to the reviewers' comments.",ICLR2022, +SyT7EJ6BG,1517250000000.0,1517260000000.0,324,S1Dh8Tg0-,S1Dh8Tg0-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper proposes an interesting new idea which creates an interesting discussion. +",ICLR2018, +NJtPq9Wr-yG,1642700000000.0,1642700000000.0,1,FlwzVjfMryn,FlwzVjfMryn,Paper Decision,Accept (Poster),"Multi-objective learning is an increasingly important topic. This paper presents a method for better finding parts of the Pareto frontier through a new method to estimate the distance to the frontier and use this proxy to refine the state space partition. The reviewers found this paper interesting and compelling and generally well written. The reviewers also thought the work could be further improved by better clarifying in the text where the proposed approach might fail, and what properties of the domain are needed, and also to better situate this paper within the related work, potentially including additional experimental comparisons. The authors provided detailed responses to the proposed questions and the authors are encouraged to ensure that these suggestions and discussions are well represented in the revised version.",ICLR2022, +S1jJwypSG,1517250000000.0,1517260000000.0,915,HktXuGb0-,HktXuGb0-,ICLR 2018 Conference Acceptance Decision,Reject,"The paper presents a method for learning from expert state trajectories using a similarity metric in a learned feature space. The approach uses only the states, not the actions of the expert. The reviewers were variously dissatisfied with the novelty, the theoretical presentation, and the robustness of the approach. Though it empirically works better than the baselines (without expert demos) this is not surprising, especially since thousands of expert demonstrations were used. This would have been more impressive with fewer demonstrations, or more novelty in the method, or more evidence of robustness when the agent's state is far from the demonstrations.",ICLR2018, +UUC3HYOJd2h,1642700000000.0,1642700000000.0,1,PQQp7AJwz3,PQQp7AJwz3,Paper Decision,Accept (Poster),"This is a solid paper and considers the problem of training a wide neural network with a single hidden layer. This can be framed as an optimization problem in the space of probability distributions with a suitable entropy regularization, where each atom in the distribution corresponds to a hidden neuron. The dual of this problem (for finite data) is a finite-dimensional optimization problem and the paper proposes a particle based coordinate ascent scheme. +The paper provides some convergence rate results. After the rebuttal, the authors have also included more experimental/numerical results. + +The authors have answered the concerns raised by the reviewers and overall, the paper can be accepted: +The presented approach appears to be sufficiently novel and might be useful in other settings. +The presentation is clear and easy to follow for such a technical paper; the paper is well organized. +The limitations of the approach are clearly stated (dependence on the regularization parameter for entropy term that may be hard to select)",ICLR2022, +HyeKKRLrlN,1545070000000.0,1545350000000.0,1,BJxh2j0qYm,BJxh2j0qYm,borderline,Accept (Poster),"The authors propose a dynamic inference technique for accelerating neural network prediction with minimal accuracy loss. The method are simple and effective. The paper is clear and easy to follow. However, the real speedup on CPU/GPU is not demonstrated beyond the theoretical FLOPs reduction. Reviewers are also concerned that the idea of dynamic channel pruning is not novel. The evaluation is on fairly old networks.",ICLR2019,3: The area chair is somewhat confident +rJ48nz8ug,1486400000000.0,1486400000000.0,1,ByIAPUcee,ByIAPUcee,ICLR committee final decision,Accept (Poster),"Reviewers found this paper to be a rigorous and ""thorough experimental analysis"" of context-length in language modelingv through the lens of an ""interesting extension to standard attention mechanism"". The paper reopens and makes more problematic widely accepted but rarely verified claims of the importance of long-term dependency. + + Pros: + - ""Well-explained"" and clear presentation + - Use of an ""inventive baseline"" in the form a ngram rnn + - Use a impactful corpus for long-term language modeling + + Cons: + - Several of the ideas have been explored previously. + - Some open questions about the soundness of parameters (rev 1) + - Requests for deeper analysis on data sets released with the paper.",ICLR2017, +BygDo0-ggE,1544720000000.0,1545350000000.0,1,HJx9EhC9tQ,HJx9EhC9tQ,novel approach with good performance on interesting and challenging problem; clarity could be improved,Accept (Poster),"1. Describe the strengths of the paper. As pointed out by the reviewers and based on your expert opinion. + +- The problem is interesting and challenging +- The proposed approach is novel and performs well. + +2. Describe the weaknesses of the paper. As pointed out by the reviewers and based on your expert opinion. Be sure to indicate which weaknesses are seen as salient for the decision (i.e., potential critical flaws), as opposed to weaknesses that the authors can likely fix in a revision. + +- The clarity could be improved + +3. Discuss any major points of contention. As raised by the authors or reviewers in the discussion, and how these might have influenced the decision. If the authors provide a rebuttal to a potential reviewer concern, it’s a good idea to acknowledge this and note whether it influenced the final decision or not. This makes sure that author responses are addressed adequately. + +Many concerns were clarified during the discussion period. One major concern had been the experimental evaluation. In particular, some reviewers felt that experiments on real images (rather than in simulation) was needed. +To strengthen this aspect, the authors added new qualitative and quantitative results on a real-world experiment with a robot arm, under 10 different scenarios, showing good performance on this challenging task. Still, one reviewer was left unconvinced that the experimental evaluation was sufficient. + +4. If consensus was reached, say so. Otherwise, explain what the source of reviewer disagreement was and why the decision on the paper aligns with one set of reviewers or another. + +Consensus was not reached. The final decision is aligned with the positive reviews as the AC believes that the evaluation was adequate. +",ICLR2019,4: The area chair is confident but not absolutely certain +41GqQnr58Ws,1642700000000.0,1642700000000.0,1,nL2lDlsrZU,nL2lDlsrZU,Paper Decision,Reject,"The paper presents a deep-learning network architecture for (semi)-supervised tabular data classification and regression problems based on a new attention mechanism between samples (rows) and features (columns). +The model is compared to 10 sota methods, studied on 30 diverse datasets (10 for binary classification, 10 for multiclass classification and 10 for regression). +contrastive learning approach for pre-training on unlabeled data and fine-tuning on a small number of labels +Explainability capabilities are not presented in a very convincing way. +While the reviewers find the problem relevant, they criticise novelty and, in particular, the experimental comparison. +Concerns about hyperparameter tuning of own vs. comparison methods voiced by the reviewers. +While these concerns have partially been addressed in the author response, the reviewers still doubt the fairness of comparison.",ICLR2022, +cwCe9nn7k1Y,1610040000000.0,1610470000000.0,1,MbM_gvIB3Y4,MbM_gvIB3Y4,Final Decision,Reject,"**Overview**: +The paper tries to answer which mutual information (MI) objective is sufficient for representation learning (repL) in reinforcement learning (RL). Three common objectives are considered: forward, state, and inverse. The paper shows that only the forward objective is sufficient for learning, i.e., sufficient for learning of optimal policy/value function. The authors also demonstrate this phenomena using empirical experiments. + +**Quality, Clarity, Originality and Significance**: +All the reviewers believe this paper is novel in terms of methodology, i.e., evaluate the sufficiency of the repL in terms of down stream tasks. However, there is a lack of clarity in the experiment sections. The authors have provided more details in the rebuttal phase. The reviewers also have concerns that this paper may be too far from typical experimental settings to have a real impact on the field. An unofficial review pointed out there is a mistake in the proof of the paper. The authors later also confirmed the flaw and claimed it is fixed. + +**Recommendation**: +The paper is indeed interesting and novel. However, the impact to the practice community might not be significant. That being said, the paper should warrant publication eventually. However, the authors changed large amount of text about the proofs before and after rebuttal, which also introduced some additional typos, confusions, and also technique sloppiness or flaws. The reviewers are concerned about this. Overall I believe that the paper is not in a state to be published yet. +",ICLR2021, +z38QVU-HRGI,1642700000000.0,1642700000000.0,1,UQBEkRO0_-M,UQBEkRO0_-M,Paper Decision,Reject,"This paper introduces Softmax Gradient Tampering, a technique for modifying the gradient of the softmax loss to make the loss more smooth. On standard benchmarks the authors demonstrate improved training and test accuracy. + +The reviewers are unanimous in their recommendation to not accept the paper. They identify the following problems: +* a lack of theoretical understanding and rigor, and a lack of support for the claims that are made +* insufficient experimental results to convince the reviewers of the merit of the proposed technique in the absence of theoretical understanding + +The authors did not provide a rebuttal, and I see no special reasons to question the assessment made by the reviewers. I therefore recommend to not accept this paper.",ICLR2022, +B1FbHkaBG,1517250000000.0,1517260000000.0,510,Syjha0gAZ,Syjha0gAZ,ICLR 2018 Conference Acceptance Decision,Reject,"The submission addresses the problem of multiset prediction, which combines predicting which labels are present, and counting the number of each object. Experiments are shown on a somewhat artificial MNIST setting, and a more realistic problem of the COCO dataset. + +There were several concerns raised by the reviewers, both in terms of the clarity of presentation (Reviewer 1), and that the proposed solution is somewhat heuristic (Reviewer 3). On the balance, two of three reviewers did not recommend acceptance.",ICLR2018, +63-nCYfDapI,1610040000000.0,1610470000000.0,1,IeuEO1TccZn,IeuEO1TccZn,Final Decision,Reject,"This paper aims to present a new representation learning framework for supervised learning based on finding a representation such that the input is conditionally independent given the representation, the components of the representations are independent and the representation is rotation-invariant. While there were both positive and negative assessments of this paper by the reviewers, there are 3 major concerns that lead me to recommend rejecting this paper: +1. Most importantly, experiments do not seem to be conclusive as they do not properly ablate the specific aspects of this method. More specifically, the authors compare their deep learning based approach with non-deep learning approaches but do not compare against deep learning baselines. This makes it impossible to assess the merit of the proposed approach (which also appears to be complicated) over much simpler baselines. +2. The required properties of the representations do not seem to be properly motivated. +3. The paper refers to their produced representations as disentangled representations. As pointed out by AnonReviewer4, this appears not to be consistent with prior uses of that word in the community. +",ICLR2021, +HkguQCg9yN,1544320000000.0,1545350000000.0,1,r1gEqiC9FX,r1gEqiC9FX,"neat normalization method, well-executed",Accept (Poster),"The proposed ENorm procedure is a normalization scheme for neural nets whereby the weights are rescaled in a way that minimizes the sum of L_p norms while maintaining functional equivalence. An algorithm is given which provably converges to the globally optimal solution. Experiments show it is complementary to, and perhaps slightly better than, other normalization schemes. + +Normalization issues are important for DNN training, and normalization schemes like batch norm, weight norm, etc. have the unsatisfying property that they entangle multiple issues such as normalization, stochastic regularization, and effective learning rates. ENorm is a conceptually cleaner (if more algorithmically complicated) approach. It's a nice addition to the set of normalization schemes, and possibly complementary to the existing ones. + +After a revision which included various new experiments, the reviewers are generally happy with the paper. While there's still some controversy over whether it's really better than things like batch norm, I think the paper would be worth publishing even if the results came out negative, since it is a very natural idea which took some algorithmic insight in order to actually execute. + +",ICLR2019,5: The area chair is absolutely certain +nWWEpYeNSP,1610040000000.0,1610470000000.0,1,de11dbHzAMF,de11dbHzAMF,Final Decision,Accept (Poster),"I thank the authors for their constructive engagement in the review process - this paper clearly benefitted from your prompt attention to initial weaknesses - and I thank the reviewers for their excellent, detailed, careful reviews. + +This paper presents methods that allow a multi-task learning (MTL) system to performs competitively against single-task (ST) fine-tuning despite using fewer parameters and less data per task. This is a great goal, which disrupts the currently most pursued approach of ST fine-tuning and beats aiming for fully single model like BAM! + +Pros + +* The paper presents a number of useful methods, some novel like conditional attention mechanisms, for improving MTL. There is a lot in this paper. +* Clean, motivated, intuitive model modifications +* Comprehensive experiments +* Generally well-written paper, fairly comprehensive discussion of related work + +Cons + +The initial paper had a number of weaknesses, the most consistently mentioned being that the results presented tend to overclaim and to confuse (flipping between BERT and RoBERTa, etc.). The authors have done a good job revising the paper to address these concerns but should certainly remember these points in producing the final version. + +The paper is a bit of a grab bag of small contributions that together help for MTL, but which isn't as strong a story as one big novel idea carefully presented and evaluated. While this paper has a ton of work behind it, and a lot of good stuff in it, I think this does somewhat weigh against the paper being selected for spotlight presentation. + +Requests for the final version: + +Further small clean up: e.g., still one ""STITLS"" to become ""STILTS"", some cleaning up could be done in the references where ""stilts"" is lowercase, there isn't capitalization after punctuation in the title of either Clark et al. paper, and the de Marneffe et al. paper, Fisch et al. paper and others lack information on where the work appeared). + +Despite the improvements, I think more can and should be done to make the paper clearer and better laid out. Here are a few suggestions: + +* More consistency in labeling the contributions might be possible. I'm imagining that they might be able to be consistently labeled 1 to 5, rather than being 1 to 5 in subsections of section 2, but 1 to 3 with bulleted sub items in the introduction, and several of them (a) through (c) in Fig 1. +* Although the reviewers generally said the paper was clear and well-written, I still feel that section 2.1 is harder to get than it should be – and indeed you resorted to pasting PyTorch code in this discussion to help us! While people often say that a figure should be standalone, maybe that doesn't really make sense when it's an inline figure like this, and you'd be better off explaining things well once (mainly in the text) rather than badly twice?!? I.e., just make the Fig 2 caption: ""Figure 2: Conditional attention"". I still think it's harder to get what the equations are doing than it should be, because there is inadequate text explaining the equations and the ideas motivating them. Although you now sort of sneak in to the text that ⨁ means diag, you still don't explicitly say so. I think having even 2 more lines of text at your disposal could really help if well deployed. +* As R5 observes, the compactification in pages 7-9 just gets kind of ridiculous. I get that you're trying to deal with page breaks and to fit a lot in etc., but it just makes no sense when the tables embedded into paragraphs on p9 don't even belong with the corresponding paragraph! How might this be fixed? It's difficult, since you do just have a lot of material. The best idea I can come up with would be to shorten the main paper related work section to only 1 paragraph that discusses things at a very high level, and to put your detailed related work (even adding a few more of the things reviewers suggest like MT, etc.) to the Appendix. This might give you another half page in the main paper to make things more sensible. + +",ICLR2021, +OiVNONrEI6n,1642700000000.0,1642700000000.0,1,ci7LBzDn2Q,ci7LBzDn2Q,Paper Decision,Accept (Poster),"The paper studies the length distortion in a random (deep) ReLU network — namely, it bounds the expectation and higher moments of the length of the curve in feature space produced by applying a random ReLU work to a smooth curve. Because the product of layer norms grows exponentially in the depth, it might be natural to conjecture that the length grows exponentially in depth. Indeed, this has been claimed in previous theoretical work. The submission argues through rigorous mathematical analysis and corroborating experiments that this claim is incorrect. In fact, the length exhibits a slow (1/depth) contraction as the network depth increases. The paper also works out higher order moments and extensions to higher dimensional volumes. These results are obtained using nice (and natural) independence arguments and calculations. + +Initial reviews were mostly positive, with the reservation that the initial submission may have slightly over claimed (the reviewer correctly notes that it is impossible to prove interesting results about the NTK of deep networks with the incorrect exponential growth hypothesis, and that related, and correct, arguments are embedded in the proofs of a number of NTK adjacent papers). After responses and revisions from the authors, the reviewers uniformly recommend acceptance. This is a solid paper, with an important conceptual point — length/volume contraction is critical to reasoning correctly about feature evolution in deep networks. In addition, it corrects existing errors in the literature, and provides relatively transparent justifications of its main claims. + +The AC concurs with the reviewers’ evaluation of the paper, and recommends acceptance.",ICLR2022, +HkeMV80egV,1544770000000.0,1545350000000.0,1,rJfUCoR5KX,rJfUCoR5KX,Accept,Accept (Poster),"The paper summarizes existing work on binary neural network optimization and performs an empirical study across a few datasets and neural network architectures. I agree with the reviewers that this is a valuable study and it can establish a benchmark to help practitioners develop better binary neural network optimization techniques. + +PS: How about ""An empirical study of binary neural network optimization"" as the title? +",ICLR2019,4: The area chair is confident but not absolutely certain +D68Zl-oCLQv,1610040000000.0,1610470000000.0,1,RqCC_00Bg7V,RqCC_00Bg7V,Final Decision,Accept (Poster),"The authors put a lot of effort in replying to questions and improving the paper (to a point that the reviewers felt overwhelmed). + +Pros: +- An interesting way of dealing with model bias in MPC +- They successfully managed to address the most important concerns of the reviewers, with lots of additional experiments and insights +- R3's concerns have also been successfully addressed by the authors, the review & score were unfortunately not updated + +Cons: +- The only remaining point is that the simulations seem to be everything but physically realistic (update at end of R1's review), which is probably a problem of the benchmarks and not the authors faults.",ICLR2021, +SJgWpEnyl4,1544700000000.0,1545350000000.0,1,SkeQniAqK7,SkeQniAqK7,Lack of clarity and justification for the final task,Reject,"Dear authors, + +Thank you for submitting your work to ICLR. The original goal of using smaller models to train a bigger one is definitely interesting and has been the topic of a lot of works. + +However, the reviewers had two major complaints: the first one is about the clarity of the paper and the second one is about the significance of the tasks on which the algorith is tested. For the latter point, your rebuttal uses arguments which are little known in the ML community and so should be expanded in a future submission.",ICLR2019,5: The area chair is absolutely certain +HklzDghbgV,1544830000000.0,1545350000000.0,1,r1g5b2RcKm,r1g5b2RcKm,Work could be strengthened by analysis of runtime performance,Reject,"The authors propose a technique for pruning networks by using second-order information through the Hessian. The Hessian is approximated using the Fisher Information Matrix, which is itself approximated using KFAC. The paper is clearly written and easy to follow, and is evaluated on a number of systems where the authors find that the proposed method achieves good compression ratios without requiring extensive hyperparameter tuning. + +The reviewers raised concerns about 1) the novelty of the work (which builds on the KFAC work of Martens and Grosse), 2) whether zeroing out individual connections as opposed to neurons will have practical runtime benefits, 3) the lack of comparisons against baselines on overall training time/complexity, 4) comparisons to work which directly prune as part of training (instead of the train-prune-finetune scheme adopted by the authors). +In the view of the AC, 4) would be an interesting comparison but was not critical to the decision. Ultimately, the decision came down to the concern of lack of novelty and whether the proposed techniques would have an impact on runtime in practice. +",ICLR2019,4: The area chair is confident but not absolutely certain +xh-xoM8xZE,1610040000000.0,1610470000000.0,1,sojnduJtbfQ,sojnduJtbfQ,Final Decision,Reject,"This work targets an important problem: susceptibility of ML models to adversarial perturbations that make them completely misclassify an input, as opposed to ""just"" fail to get the right fine-grained class while getting the correct coarse-grained one. This natural question did not receive enough attention so far, so having this work look into it is a definite plus. + +However, as the reviewers point out, this study has a number of issues in terms of the methodology of the experiments. For example, it is unclear whether the proposed (natural) variant of training the robust model is particularly beneficial for the stated goal. As such, it seems that the paper is not ready for publication and the authors are strongly advised to revise the article and submit it again.",ICLR2021, +Gza7ChkgC6W,1642700000000.0,1642700000000.0,1,wu5yYUutDGW,wu5yYUutDGW,Paper Decision,Reject,"This paper presents work on video scene segmentation. The reviewers appreciated the introduction of a boundary-aware pre-training method. However, concerns were raised regarding limited novelty, empirical effectiveness, and generic applicability. The reviewers engaged in significant discussion based on the other reviews and authors' responses. Based on these discussions the reviewers concluded that while the proposed method does have differences with respect to BSP, the overall contributions were not sufficient for inclusion in ICLR.",ICLR2022, +r1xyARtglE,1544750000000.0,1545350000000.0,1,SylU3jC5Y7,SylU3jC5Y7,Meta-Review,Reject,"The paper proposes Variational Beta-Bernoulli Dropout,, a Bayesian method for sparsifying neural networks. The method adopts a spike-and-slab pior over parameter of the network. The paper proposes Beta hyperpriors over the network, motivated by the Indian Buffet Process, and propose a method for input-conditional priors. + +The paper is well-written and the material is communicated clearly. The topic is also of interest to the community and might have important implications down the road. + +The authors, however, failed to convince the reviewers that the paper is ready for publication at ICLR. The proposed method is very similar to earlier work. The reviewers think that the paper is not ready for publication.",ICLR2019,4: The area chair is confident but not absolutely certain +FzBfpIGEnK,1576800000000.0,1576800000000.0,1,S1gEIerYwH,S1gEIerYwH,Paper Decision,Accept (Poster),"This paper presents a theoretically motivated method based on homotopy continuation for transfer learning and demonstrates encouraging results on FashionMNIST and CIFAR-10. The authors draw a connection between this approach and the widely used fine-tuning heuristic. Reviewers find principled approaches to transfer learning in deep neural networks an important direction, and find the contributions of this paper an encouraging step in that direction. Alongside with the reviewers, I think homotopy continuation is a great numerical tool with a lot of untapped potentials for ML applications, and I am happy to see an instantiation of this approach for transfer learning. Reviewers had some concerns about experimental evaluations (reporting test performance in addition to training), and the writing of the draft. The authors addressed these in the revised version by including test performance in the appendix and rewriting the first parts of the paper. Two out of three reviewers recommend accept. I also find the homotopy analysis interesting and alongside with majority of reviewers, recommend accept. However, please try to iterate at least once more over the writing; simply long sentences and make sure the writing and flow are, for the camera ready version.",ICLR2020, +Hyl4jvExlV,1544730000000.0,1545350000000.0,1,B1e9W3AqFX,B1e9W3AqFX,"Interesting MTL approach, but the work could be improved in the light of suggestions.",Reject,"This paper presents a novel idea of transferring gradients between tasks to improve multi-task learning in neural network models. The write-up includes experiments with multi-task experiments with text classification and sequence labeling, as well as multi-domain experiments. After the reviews, there are still some open questions in the reviewer comments, hence the reviewer decisions were not updated. +For example, the impact of sequential update in pairwise task communication on performance can be analyzed. Two reviewers question task relatedness and the impact of how and when it is computed could be good to include in the work. Baselines could be improved to reflect reviewer suggestions.",ICLR2019,4: The area chair is confident but not absolutely certain +K-evbMGKR6,1576800000000.0,1576800000000.0,1,HkgxW0EYDS,HkgxW0EYDS,Paper Decision,Accept (Poster),"The paper describes a simple method for neural network compression by applying Shannon-type encoding. This is a fresh and nice idea, as noted by reviewers. A disadvantage is that the architectures on ImageNet are not the most efficient ones. Also, the review misses several important works on low-rank factorization of weights for the compression (Lebedev et. al, Novikov et. al). But overall, a good paper.",ICLR2020, +cyuuQfZ2kn,1576800000000.0,1576800000000.0,1,SJgzLkBKPB,SJgzLkBKPB,Paper Decision,Accept (Poster),"A new method of calculating saliency maps for deep networks trained through RL (for example to play games) is presented. The method is aimed at explaining why moves were taken by showing which salient features influenced the move, and seems to work well based on experiments with Chess, Go, and several Atari games. + +Reviewer 2 had a number of questions related to the performance of the method under various conditions, and these were answered satisfactorily by the reviewers. + +This is a solid paper with good reasoning and results, though perhaps not super novel, as the basic idea of explaining policies with saliency is not new. It should be accepted for poster presentation. +",ICLR2020, +0kwIrWtO8rI,1642700000000.0,1642700000000.0,1,gggnCQBT_iE,gggnCQBT_iE,Paper Decision,Reject,"This paper proposes a meta-structural causal model framework, to increase the representation capability of structural equation models. It also considers how to connect data to mechanisms. The paper is conceptually interesting. However, on the technical side, reviewers feel that without supporting proofs or empirical experiments, it is hard to justify the correctness of the proposal and judge its applicability to real-world problems. + +As authors claimed in their response, ""it is our future work of interest to code our proposed framework into a working system and validate it in a proper setting given its early stage status on research in modeling causal cycles."" I think some future version of the paper might be a great contribution to the field if a working system were included.",ICLR2022, +S1FLIJpSf,1517250000000.0,1517260000000.0,791,SJ8M9yup-,SJ8M9yup-,ICLR 2018 Conference Acceptance Decision,Reject,#NAME?,ICLR2018, +S1zRMkarf,1517250000000.0,1517260000000.0,42,Sk6fD5yCb,Sk6fD5yCb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper describes a new library for forward propagation of binary CNNs. R1 for clarification on the contributions and novelty, which the authors provided. They subsequently updated their score. I think that optimized code with permissive licensing (as R2 points out) benefits the community. The paper will benefit those who decide to work with the library.",ICLR2018, +j4oFzv7WK-,1576800000000.0,1576800000000.0,1,SJgBQaVKwH,SJgBQaVKwH,Paper Decision,Reject,"This paper investigates variational models of speech for synthesis, and in particular ways of making them more controllable for a variety of synthesis tasks (e.g. prosody transfer, style transfer). They propose to do this via a modified VAE objective that imposes a learnable weight on the KL term, as well as using a hierarchical decomposition of latent variables. The paper shows promising results and includes a good amount of analysis, and should be very interesting for speech synthesis researchers. However, there is not much novelty from a machine learning perspective. Therefore, I think the paper is not a great fit for ICLR and is better suited for a speech conference/journal.",ICLR2020, +ZVyZjqEG2nB,1610040000000.0,1610470000000.0,1,7UyqgFhPqAd,7UyqgFhPqAd,Final Decision,Reject,"The paper considers the problem of using sparse coding to create better generalization in neural networks. The new generalization bound of the neural network only depends on the l1 norm of the weight, instead of the original \ell_2 version as in previous papers. + + +Well this direction is promising, the major concern about this work is that how they compare with existing generalization bounds empirically. There are definitely some hand-crafted instances where this bound excel, but the authors did not provide enough evidence that this bound would actually be better than others for neural networks trained in practice: For example, would adding a relatively large \ell_1 regularizer resulted in a drastic decrement in test accuracy? How does the bound compare with compression based approach such as VC dimension + weight pruning (since the weights are somewhat sparse, so the VC dimension is lower) -- One might argue that those pruning techniques do not have theoretical guarantees that they can work -- Well this technique does not have theoretical guarantees either (whether this objective can be minimized efficiently): The theorem, at least in the current form, seems to only apply to networks that are the ""global optimals"" of some non-convex training objective (MSE-loss involving a non-linear neural network + ell_1 regularizer on its weights). It is also unclear whether such global optimals can be found efficiently in practice. At very least, the authors should devote some effort demonstrating the superiority of their bound empirically. + + +",ICLR2021, +OWNayOES6va,1642700000000.0,1642700000000.0,1,qyTBxTztIpQ,qyTBxTztIpQ,Paper Decision,Accept (Poster),"This paper studies the problem of how to collect demonstrations via crowd sourcing for imitation and offline learning. The paper received mixed reviews initially. The reviewers had difficulty understanding empirical results, asked for some more ablations, and were little unconvinced by the proposed usefulness of the collected data. The authors provided a strong thoughtful rebuttal that addressed many of those concerns. The paper was discussed extensively with one of the reviews who increased their score from 3 to 5. Reviewers generally agree that the paper is good but not all reviewers are on-board with acceptance. AC recommends accept but agrees with the reviewers and the authors are urged to look at reviewers' feedback and incorporate their comments in the camera-ready.",ICLR2022, +JBpm0NKjwIW,1610040000000.0,1610470000000.0,1,F-mvpFpn_0q,F-mvpFpn_0q,Final Decision,Accept (Poster),"The paper proposes the challenge of rapid task-solving in unfamiliar environments and presents an approach to achieve this called Episodic Planning Networks -- a non-parametric memory based on the transformer architecture to learn tasks that require planning from previously experienced tasks, following a form of meta-RL. The problem and approach are compelling, with strong empirical results. The paper is well-written and is an exciting contribution. This is a clear accept. + +In response to the initial reviews, the authors updated their paper to improve the formalization and address other concerns in the reviews, which were viewed favorably by the reviewers as a good improvement. Based on the reviewer discussions, the work could still be placed better in context with respect to other literature.",ICLR2021, +Ky-E3ZFN3MG,1642700000000.0,1642700000000.0,1,z1-I6rOKv1S,z1-I6rOKv1S,Paper Decision,Accept (Spotlight),"The paper proposes a framework for training autoregressive flows based on proper scoring rules. The proposed framework is shown to be a computationally appealing alternative to maximum-likelihood training, and is empirically validated in a wide variety of applications. + +All three reviewers are positive about the paper and recommend acceptance (one weak, two strong). The reviewers describe the paper as well written and well motivated, and recognize the paper's contribution as significant. + +Overall, this is a nice and promising methodological exploration of flow-model training that is worth communicating to the ICLR community.",ICLR2022, +iJ2_ZGdq1,1576800000000.0,1576800000000.0,1,rJg46kHYwH,rJg46kHYwH,Paper Decision,Reject,"This paper presents an interesting method for creating adversarial examples using a GAN. Reviewers are concerned that ImageNet Results, while successfully evading a classifier, do not appear to be natural images. Furthermore, the attacks are demonstrated on fairly weak baseline classifiers that are known to be easily broken. They attack Resnet50 (without adv training), for which Lp-bounded attacks empirically seem to produce more convincing images. For MNIST, they attack Wong and Kolter’s ""certifiable"" defense, which is empirically much weaker than an adversarially trained network, and also weaker than more recent certifiable baselines. +",ICLR2020, +r1gJ6zGgxN,1544720000000.0,1545350000000.0,1,r1Vx_oA5YQ,r1Vx_oA5YQ,promising results but hard to assess due to lack of clarity,Reject,"1. Describe the strengths of the paper. As pointed out by the reviewers and based on your expert opinion. + +- The problem and approach, steganography via GANs, is interesting. +- The results seem promising. + +2. Describe the weaknesses of the paper. As pointed out by the reviewers and based on your expert opinion. Be sure to indicate which weaknesses are seen as salient for the decision (i.e., potential critical flaws), as opposed to weaknesses that the authors can likely fix in a revision. + +The original submission was imprecise and difficult to follow and, while the AC acknowledges that the authors made significant improvements, the current version still needs some work before it's clear enough to be acceptable for publication. + +3. Discuss any major points of contention. As raised by the authors or reviewers in the discussion, and how these might have influenced the decision. If the authors provide a rebuttal to a potential reviewer concern, it’s a good idea to acknowledge this and note whether it influenced the final decision or not. This makes sure that author responses are addressed adequately. + +Concerns varied by reviewer and there was no main point of contention. + +4. If consensus was reached, say so. Otherwise, explain what the source of reviewer disagreement was and why the decision on the paper aligns with one set of reviewers or another. + +The reviewers did not reach a consensus. The final decision is aligned with the less positive reviewers, one of whom was very confident in his/her review. The AC agrees that the paper should be made clearer and more precise. +",ICLR2019,4: The area chair is confident but not absolutely certain +LjddGUgmnWC,1642700000000.0,1642700000000.0,1,T_p1vd88T87,T_p1vd88T87,Paper Decision,Reject,"The paper proposes a framework for learning the physical parameters of a physical system’s dynamics from a video. The model combines a differentiable neural ODE solver (NODE) with neural implicit representations through a local coordinate-based network which reconstruct the frames based on the ODE solution. Both the static background and the moving objects are modeled via implicit representations. The system being differentiable, it can be trained to recover the physical parameters and the initial conditions of the ODE. Experiments are performed on two toy problems (pendulum and masses that are connected by a spring). + +The reviewers agree on the originality of the approach. They however all consider that the paper falls short to demonstrate the potential of the proposed approach because of limited experiments, limited ablation analyses and comparison with baselines. The authors added a new experiment during the rebuttal, but this was not found sufficient to change the reviewers’ opinion.",ICLR2022, +AVnirYQLEA,1576800000000.0,1576800000000.0,1,S1glGANtDr,S1glGANtDr,Paper Decision,Accept (Spotlight),"The paper proposes a doubly robust off-policy evaluation method that uses both stationary density ratio as well as a learned value function in order to reduce bias. +The reviewers unanimously recommend acceptance of this paper.",ICLR2020, +oL2x3wiUt,1576800000000.0,1576800000000.0,1,rkg8xTEtvB,rkg8xTEtvB,Paper Decision,Reject,"The authors propose a new method for learning hierarchically disentangled representations. One reviewer is positive, one is between weak accept and borderline and two reviewers recommend rejection, and keep their assessment after rebuttal and a discussion. The main criticism is the lack of disentanglement metrics and comparisons. After reading the paper and the discussion, the AC tends to agree with the negative reviewers. Authors are encouraged to strengthen their work and resubmit to a future venue.",ICLR2020, +mAmTzFxB4x1,1642700000000.0,1642700000000.0,1,TVHS5Y4dNvM,TVHS5Y4dNvM,Paper Decision,Reject,"This paper observes that a fully-convolutional model in the style of recent MLP-Mixer and ViT variants can have surprisingly good initial performance. As this paper attracts certain amount of attentions, three expert reviewers have provided very detailed and serious comments, and two actively engaged with author discussions. AC also carefully read the paper as well as all discussion threads. + +AC agrees the authors should not be penalized by not achieving the best performance, nor not comparing with very recent work. The main legitimate critiques, however, focus on three aspects: (1) over-claimed contribution; (2) experiment solidness/competitiveness; and (3) writing completeness/clarity. + +First, this paper established an interesting ablation experiment that a very simple model, that uses only standard convolutions to achieve the mixing steps, can roughly ""do the work"". However, AC disagrees this is a very ""surprisingly new"" result, on top of MLP-mixer: given convolutions are increasingly re-injected into ViTs to gain the vision inductive bias, their similar role in MLP-mixer should be expected too. Moreover, as in general agreement by reviewers, the paper title might have over-claimed - the authors cannot directly prove this concept ""patch is the most critical component"" yet. The authors later also agreed and changed some confusing wording, which is a good move (but also, making their contribution now even less obvious). + +Second, this method does not achieve noteworthy competitive results compared to others, in order to justify its merit (simplicity alone is good to have, but insufficient to justify a strong work). Importantly, it has been pointed out by two reviewers that the model throughput is much worse than the competitors. AC also noticed that the comparison was not very rigorous, e.g., comparing ConvMixer patch size 7 with DeiT-B 16 patch size 16 doesn't help draw much fair informative conclusion. The cifar-10 results alone did not provide strong support and were later de-emphaszied by authors too. + +Third, while NOT being the main reason of rejection, AC personally suggests the authors to responsibly enrich their main text, and to remove the “A note on paper length” paragraph. The authors intentionally kept the paper length unusually short. Reviewers generally dislike this idea. Being an innovative writer is good, but very relevant details and discussions were left in the supplemental as a result. Especially, AC agrees the whole section A and part of section B of the supplemental should have been in the main paper at very least. + +In summary, the authors strive to tell an interesting story, but it is not yet a well settled story. The experiments are not solid enough to support their bold claims. The authors are suggested to improve their work further by taking into account reviewer comments.",ICLR2022, +SkSfpGIue,1486400000000.0,1486400000000.0,1,rJxdQ3jeg,rJxdQ3jeg,ICLR committee final decision,Accept (Oral),This is one of the two top papers in my stack and I recommend it for oral presentation. The reviewers were particularly careful and knowledgable of the topic.,ICLR2017, +R3PLq42KfqV,1610040000000.0,1610470000000.0,1,jEYKjPE1xYN,jEYKjPE1xYN,Final Decision,Accept (Poster),"The paper provides a new covariant approach to 3D molecular generation motivated by the desire handle compounds with symmetries. To this end, the method uses equivariant state representations for autoregressive generation, built largely from recently proposed covariant molecular networks (comorant), and integrating such representations within an existing actor-critic RL generation framework (Simm et al). The selection of focal atom, element to add, and the distance are realized in an equivariant manner while the compound valuation remains invariant to rotation. The approach is clean and well-executed. The authors added additional experiments (e.g., RMSD demonstrating stability of generated compounds) to further reinforce the case for the method. +",ICLR2021, +rkL_hMLdl,1486400000000.0,1486400000000.0,1,rye9LT8cee,rye9LT8cee,ICLR committee final decision,Reject,"This paper studies a sparsification method for pre-trained CNNs based on the alternating direction method of multipliers. + The reviewers agreed that the paper is well-written and easy to follow. The authors were responsive during the rebuttal phase and addressed most of the reviewers questions. + + The reviewers, however, disagree with the significance of this work. Whereas R4 is happy to see an established method (ADMM) applied to a nowadays very popular architecture, R1-R3 were skeptical about the scope of the experiments and the usefulness of the sparse approximation. + + Pros: + - Simple algorithmic description using well-known ADMM method. + - Consistent performance gains in small and mid-scale object classification problems. + + Cons: + - Lack of significance in light of current literature on the topic. + - Lack of numerical experiments on large-scale classification problems and/or other tasks. + - Lack of clarity when reporting speedup gains. + + Based on these assessments, the AC recommends rejection. Since this decision is not 100% aligned + with the reviewers, let me expand on the reasons why I recommend rejection. + + This paper presents a sound algorithm that sparsifies the weights of a trained-CNN, and it shows that + the resulting pruned network works well, better than the original one. A priori, this is a solid result. + My main problem is that this contribution has to be taken in the context of the already large body of + literature addressing this same question, starting from the seminal 'Predicting Parameters in Deep Learning', by Misha Denil et al., NIPS 2013, which is surprisingly ignored in the bibliography here; and onwards with 'Exploiting linear structure within convolutional networks for efficient evaluation', Denton et al, NIPS'14 (also ignored) and 'Speeding up convolutional neural networks with low rank expansions', Jadergerg et al., '14. These and many other works culminated in two recent papers, 'Deep Compression', by Han et al, ICLR'16 and 'Learning Structured Sparsity in Deep Neural Networks' by Wen et al, NIPS'16. + Several reviewers pointed out that comparisons with Wen et al were needed; the authors of the present submission now do cite this work, but do not compare their results quantitatively and do not present the reader with compelling reasons to choose their acceleration instead of Wen et al.'s. Moreover, they mention in the rebuttal that the paper was published after the submission, but the work was available on the arxiv well before the ICLR deadline (august 2016). Wen's paper presents (i) source code, (ii) experiments on imagenet, (iii) 3x to 5x speedups in both CPU and GPU, and (iv) accuracy improvements on Cifar of ~1.5%. As far as I can see, the present submission presents none of these. Further compression factors were obtained in 'Deep Compression', also in large-scale models. + + In light of these results, the authors should present more compelling evidence (in this case, since this is an experimental paper, empirical evidence in the form of higher accuracy and/or larger speedups/gains) as to why should a practitioner use their model, rather than simply presenting the differences between the approaches.",ICLR2017, +MmP56fCX4qA,1642700000000.0,1642700000000.0,1,HTp-6yLGGX,HTp-6yLGGX,Paper Decision,Accept (Poster),"This paper is proposed to deeply investigate the hot-refresh model upgrades of image retrieval systems. The hot-refresh model is very useful since the model can be quickly updated after the gallery is backfilled. To address the model regression with negative flips, this paper introduces a Regression-Alleviating Compatible Training (RACT) method by reducing negative flips. The proposed method has been verified on the large-scale image retrieval benchmark such as Google Landmark. The key contribution of this paper is the new setting targeting an important application in real-world image retrieval systems. However, some of the technical details are not fully explained. Despite these minor concerns, the AC will rate this paper as a poster acceptance based on the overall contributions.",ICLR2022, +HygHB7NlgN,1544730000000.0,1545350000000.0,1,ByeZ5jC5YQ,ByeZ5jC5YQ,Metareview,Accept (Oral),"The paper presents a novel strategy for statistically motivated feature selection i.e. aimed at controlling the false discovery rate. This is achieved by extending knockoffs to complex predictive models and complex distributions via (multiple) generative adversarial networks. + +The reviewers and ACs noted weakness in the original submission which seems to have been fixed after the rebuttal period -- primary related to missing experimental details. There was also some concern (as is common with inferential papers) that the claims are difficult to evaluate on real data, as the ground truth is unknown. To this end, the authors provide empirical results with simulated data that address this issue. There is also some concern that more complex predictive models are not evaluated. + +Overall the reviewers and AC have a positive opinion of this paper and recommend acceptance.",ICLR2019,4: The area chair is confident but not absolutely certain +HkeNtWZ2kN,1544450000000.0,1545350000000.0,1,r14Aas09Y7,r14Aas09Y7,"metareview: interesting idea, experiments could be improved",Reject,"The paper introduces a GAN architecture for generating small patches of an image and subsequently combining them. Following the rebuttal and discussion, reviewers still rate the paper as marginally above or below the acceptance threshold. + +In response to updates, AnonReviewer3 comments that ""ablation experiments do make the paper stronger"" but it ""still lacks convincing experiments for its main motivating use case: generating outputs at a resolution that won't fit in memory within a single forward pass"". + +AnonReviewer2 points to the major shortcoming that ""throughout the exposition it is never really clear why COCO-GAN is a good idea beyond the fact that it somehow works. I was missing a concrete use case where COCO-GAN performs much better."" + +Though authors provide additional experiments and reference high-resolution output during the discussion phase, they caution that these results are preliminary and could likely benefit from more time/work devoted to training. + +On balance, the AC agrees with the reviewers that the paper contains some interesting ideas, but also believes that experimental validation simply needs more work, and as a result the paper does not meet the bar for acceptance. +",ICLR2019,4: The area chair is confident but not absolutely certain +OHbr0iHUz3ts,1642700000000.0,1642700000000.0,1,HTx7vrlLBEj,HTx7vrlLBEj,Paper Decision,Accept (Spotlight),"Thank you for your submission to ICLR. The reviewers and I are in agreement that the paper presents a substantial contribution to the field at the intersection of differentiable simulation and ML methods. In particular, the half-inverse method is compelling, non-obvious, and hints of a nice path forward towards the goal of practical differentiable simulations within models. Overall I'm happy to recommend the paper be accepted.",ICLR2022, +H1WM7JTBz,1517250000000.0,1517260000000.0,84,B1Lc-Gb0Z,B1Lc-Gb0Z,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The submission proposes optimization with hard-threshold activations. This setting can lead to compressed networks, and is therefore an interesting setting if learning can be achieved feasibly. This leads to a combinatorial optimization problem due to the non-differentiability of the non-linearity. The submission proceeds to analyze the resulting problem and propose an algorithm for its optimization. + +Results show slight improvement over a recent variant of straight-through estimation (Hinton 2012, Bengio et al. 2013), called saturated straight-through estimation (Hubara et al., 2016). Although the improvements are somewhat modest, the submission is interesting for its framing of an important problem and improvement over a popular setting.",ICLR2018, +s1Nu0ipVu43q,1642700000000.0,1642700000000.0,1,xFOyMwWPkz,xFOyMwWPkz,Paper Decision,Accept (Poster),"This paper proposes a new method for understanding the role and importance of individual units in convolutional neural networks. The reviewers were in agreement that the technique is novel and provides potentially valuable insights into neural network behavior. The reviewers were less certain about the utility or significance of this idea; however, the authors partially addressed this concern by adding studies of using this technique as a pruning heuristic, and future researchers will be the best judge of the paper's eventual significance. With that in mind, I recommend acceptance so that this intriguing idea can become part of the research literature and future researchers will have this opportunity.",ICLR2022, +KASHz9JtHB,1576800000000.0,1576800000000.0,1,B1gXYR4YDH,B1gXYR4YDH,Paper Decision,Reject,"This paper proposes a way to handle the hard-negative examples (those very close to positive ones) in NLP, using a distant supervision approach that serves as a regularization. The paper addresses an important issue and is well written; however, reviewers pointed put several concerns, including testing the approach on the state-of-art neural nets, and making experiments more convincing by testing on larger problems. + +",ICLR2020, +OuhbDBefb,1576800000000.0,1576800000000.0,1,rkeUcRNtwS,rkeUcRNtwS,Paper Decision,Reject,"This paper is interested in finding salient areas in a deep learning image classification setting. The introduced method relies on masking images using Gaussian Gaussian light and shadow (GLAS) and estimating its impact on output. + +As noted by all reviewers, the paper is too weak for publication in its current form: +- Novelty is very low. +- Experimental section not convincing enough, in particular some metrics are missing. +- The writing should be improved.",ICLR2020, +rkerOwY-xN,1544820000000.0,1545350000000.0,1,ryl5khRcKm,ryl5khRcKm,"Mixed reviews, strong application paper",Accept (Poster),"The reviewers all agreed that the problem application is interesting, and that there is little new methodology, but disagreed as to how that should translate into a score. The highest rating seemed to heavily weight the importance of the method to biological application, whereas the lowest rating heavily weighted the lack of technical novelty. However, because the ICLR call for papers clearly calls out applications in biology, and all reviewers agreed on its strength in that regard, and it was well-written and executed, I would recommend it for acceptance.",ICLR2019,3: The area chair is somewhat confident +EHmfrSLVu,1576800000000.0,1576800000000.0,1,BJlrF24twB,BJlrF24twB,Paper Decision,Accept (Talk),"The paper efficiently computes quantities, such as variance estimates of the gradient or various Hessian approximations, jointly with the gradient, and the paper also provides a software package for this. All reviewers agree that this is a very good paper and should be accepted.",ICLR2020, +Byewd0_mlE,1544950000000.0,1545350000000.0,1,H1gMCsAqY7,H1gMCsAqY7,train a single neural network at different widths,Accept (Poster),"This paper proposed a method that creates neural networks that can run under different resource constraints. The reviewers have consensus on accept. The pro is that the paper is novel and provides a practical approach to adjust model for different computation resource, and achieved performance improvement on object detection. One concern from reviewer2 and another public reviewer is the inconsistent performance impact on classification/detection (performance improvement on detection, but performance degradation on classification). Besides, the numbers reported in Table 1 should be confirmed: MobileNet v1 on Google Pixel 1 should have less than 120ms latency [1], not 296 ms. + + +[1] Table 4 of https://arxiv.org/pdf/1801.04381.pdf",ICLR2019,5: The area chair is absolutely certain +BJTHLJprz,1517250000000.0,1517260000000.0,781,HJnQJXbC-,HJnQJXbC-,ICLR 2018 Conference Acceptance Decision,Reject,"The authors propose a system for asynchronous, model-parallel training, suitable for dynamic neural networks. To summarize the reviewers: + +PROS: +1. Paper contrasts well with existing work. +2. Positive results on dynamic neural network problems. +3. Well written and clear + +CONS: +1. Some concern about extrapolations/estimates to hardware other than that on CPU. +2. Comparisons with Dynet seem to suggest auto-batching results in a dynamic mode aren't very positive. + +For 1) the AC notes the author's objections to reviewer 1's views on the value of estimation/extrapolation to non-CPU hardware. However, reviewer 3 voiced a similar concern and both still feel that there is more to be done to be convincing in the experiments.",ICLR2018, +z0vj_NRqX6tz,1642700000000.0,1642700000000.0,1,CES-KyrKcTM,CES-KyrKcTM,Paper Decision,Reject,"The paper investigates weighted empirical risk minimization where the weights on an example in the training set is given by a polynomial function evaluated on the loss on the given example. Authors show that the choice of the weighting function induces a data-dependent variance penalization in the training objective. Authors present an algorithm for weighted ERM and empirical results to support their claims. While the problem setting is broadly relevant and the approach the authors take in this paper is interesting, several questions remain unanswered. First, the authors argue that variance penalization helps but do not compare with other regularized ERM approaches. Second, it is not clear if the proposed algorithm is indeed gradient descent on the weighted ERM objective as pointed out by one of the reviewers. Finally, the writing can be improved with more emphasis on the novelty and significance of the contributions. I believe the initial comments from the reviewers has already helped improve the quality of the paper. I encourage the authors to further incorporate the feedback and work towards a stronger submission.",ICLR2022, +WvhwkoHYBILz,1642700000000.0,1642700000000.0,1,Ti2i204vZON,Ti2i204vZON,Paper Decision,Reject,"Meta Review for Learning Representations for Pixel-based Control: What Matters and Why? + +In this work, the authors presented large-scale empirical evaluations and ablation studies to analyze various components (e.g. contrastive objectives, model-based approaches, data augmentation) for pixel-based control with distractors. Reviewer 7euW wrote a great summary for this paper: + +This paper presents an approach for learning representations from pixel data that are amenable for control tasks. The proposed approach is a simple baseline that does not require data augmentation, world models, contrastive losses etc. but only contains two simple sub-tasks that are supposed to contribute heavily towards an effective representation: reward prediction and state transition prediction. Along with evaluating this proposed baseline, the paper also compares it to several prior works on representation learning: i.e., several approaches such as data augmentation, distance metric losses, contrastive losses, relevant reconstruction etc. It is shown that the proposed simple baseline either outperforms several of these methods or at least is very close in performance. Finally, the paper presents an interesting discussion about how evaluating an algorithm is not just about the dataset and the chosen benchmark task, but requires a more nuanced point of view of several factors such as reward sparsity, action continuity/discreteness, relevance and irrelevance of features to the task, and so on. The findings of the paper are not just about the effectiveness of the proposed method, but a more overarching view of which types of representation learning methods work in what conditions. + +Along with myself, most reviewers (including the critical 61FY) agree that there is great value in the large-scale studies presented in this paper. Furthermore, I personally like how it links a large body of recent work in this topic together in one study. The reviews were mixed (6, 6, 3, 3), and the negative reviews (the 3's) generally had issues with not the study or experiments, but the conclusions the authors drew from them. In the words of 61FY (who managed to have a good discussion with the authors): + +*I'm not convinced by conclusions as the authors try to generalize behavior of specific implementation to a family of methods. If I were to implement a new agent, I don't feel like I can believe these conclusions so that makes me question what knowledge this paper can add to the community. Furthermore, many details are either missing or not made clear, and the main story isn't very strong. Therefore, I don't think this paper is ready for publication in the current status.* + +Although I really appreciate the effort and detail that went into this nice work, based on the current assessments from the 4 reviewers, I can't recommend it for acceptance in its current state. I feel though, that with a change of narrative, or even with a re-examination of the experimental results, the authors can turn the paper around into a highly impactful paper. The description of all of the methods explored, and experiments performed alone makes a wonderful survey of the field with sufficient impact, so I think the authors are *almost* there in publishing a highly impactful work that can make the community look deeper into pixel-based control methods (with distractors). I hope to read an updated version of this work in the future published at a journal or presented at a future conference. Good luck!",ICLR2022, +wY4TeWlaE-x,1642700000000.0,1642700000000.0,1,M-9bPO0M2K5,M-9bPO0M2K5,Paper Decision,Reject,"This paper proposes a method for class-imbalanced data based on meta-learning. The technical contribution of the proposed method is limited as it is a reasonable but straightforward extension of the existing method. In addition, as commented by the reviewers, +the comparison with existing methods is not enough, it is unclear why it is meta-learned with balanced test data, and hyperparameter tuning details are not given.",ICLR2022, +uDKs1huNQJtR,1642700000000.0,1642700000000.0,1,#NAME?,#NAME?,Paper Decision,Reject,"The paper proposes a design of interpretable neural networks where each neuron is hand-designed to serve a task-specific role, and the network weights can be optimized via a few interactions with the environment. The reviewers acknowledged that the interpretability of neural networks is an important research direction. However, the reviewers pointed out several weaknesses in the paper, and there was a clear consensus that the work is not ready for publication. The reviewers have provided detailed and constructive feedback to the authors. We hope that the authors can incorporate this feedback when preparing future revisions of the paper.",ICLR2022, +HyTgw1pHz,1517250000000.0,1517260000000.0,930,r1uOhfb0W,r1uOhfb0W,ICLR 2018 Conference Acceptance Decision,Reject,"This paper is interesting since it goes to showing the role of model averaging. The clarifications made improve the paper, but the impact of the paper is still not realised: the common confusion on the retraining can be re-examined, clarifications in the methodology and evaluation, and deeper contextulaisation of the wider literature.",ICLR2018, +YocRt6C_XBP,1610040000000.0,1610470000000.0,1,T3kmOP_cMFB,T3kmOP_cMFB,Final Decision,Reject,"The paper generated a lot of discussion. After reviewing all of the opinions, and my own reading of the paper, we have concluded that the theoretical innovation is too incremental for ICLR. It is possible that the idea of ""residual feedback"" could be helpful, but for this to be demonstrated effectively one would need to consider concrete models where the assumptions are verified.",ICLR2021, +OHMlbD4SQM,1642700000000.0,1642700000000.0,1,zhynF6JnC4q,zhynF6JnC4q,Paper Decision,Reject,"In this paper, the authors studied reinforcement learning applications that have access to both online and offline data (with limited online interaction though). In order to handle the mixture of online and offline data efficiently, the authors proposed a new paradigm called adaptive Q-learning, which treats offline and online data differently (as reflected by whether pessimism is implemented or not). The effectiveness of the proposed paradigm has been tested empirically. The reviewers have raised concerns about the sufficiency and significance of the experiments conducted in the paper, and pointed out that the proposed algorithmic idea is a somewhat incremental change over existing ones. The changes the authors promised to make will make the paper stronger.",ICLR2022, +ByeZO-P3kN,1544480000000.0,1545350000000.0,1,rklaWn0qK7,rklaWn0qK7,A nice example of allowing learning without losing guarantees,Accept (Poster),"Quality: The overall quality of the work is high. The main idea and technical choices are well-motivated, and the method is about as simple as it could be while achieving its stated objectives. + +Clarity: The writing is clear, with the exception of using alternative scripts for some letters in definitions. + +Originality: The biggest weakness of this work is originality, in that there is a lot of closely related work, and similar ideas without convergence guarantees have begun to be explored. For example, the (very natural) U-net architecture was explored in previous work. + +Significance: This seems like an example of work that will be of interest both to the machine learning community, and also the numerics community, because it also achieves the properties that the numerics community has historically cared about. It is significant on its own as an improved method, but also as a demonstration that using deep learning doesn't require scrapping existing frameworks but can instead augment them.",ICLR2019,3: The area chair is somewhat confident +SPgYWjNv25T,1642700000000.0,1642700000000.0,1,vUH85MOXO7h,vUH85MOXO7h,Paper Decision,Accept (Poster),"The paper investigates the neural tangent kernel NTK of infinitely wide ensembles of soft trees having a particular +soft decision functions in their internal nodes. A closed form of the NTK is presented as well as a result bounding +the changes of the NTK during training. Implications for practical training procedures are briefly discussed and +some experiments are also reported. + +The review and discussion phase were difficult with two rather uninformative but positive reviews and a negative +but detailed review. The latter had, however, some serious flaws. For these reasons I carefully read the paper on my own, +too. In turned out that the criticized flaws in the presentation mentioned by the negative reviewer are baseless as long as +one already knows what an NTK is. Given the title of the paper and the history of NTKs, I personally think that +such knowledge can and should be assumed. +Overall, I find the paper be actually very well written. The main issue I see is that the results are not overwhelmingly +surprising. Nonetheless, this is a solid contribution, which deserves to be published.",ICLR2022, +HkgJmxwlg4,1544740000000.0,1545350000000.0,1,ryxaSsActQ,ryxaSsActQ,"Interesting, but heuristic idea. Experiments somewhat unconvincing.",Reject,"This paper proposes a new loss function that can be used in place of the standard maximum likelihood objective in training NMT models. This leads to a small improvement in training MT systems. + +There were some concerns about the paper though: one was that the method itself seemed somewhat heuristic without a clear mathematical explanation. The second was that the baselines seemed relatively dated, although one reviewer noted that this seemed like a bit of a lesser concern. Finally, the improvements afforded were relatively small. + +Given the high number of good papers submitted to ICLR this year, it seems that this one falls short of the acceptance threshold.",ICLR2019,4: The area chair is confident but not absolutely certain +LhX3XGIQVL,1576800000000.0,1576800000000.0,1,BkeWw6VFwr,BkeWw6VFwr,Paper Decision,Accept (Poster),"The paper extends the work on randomized smoothing for certifiably robust classifiers developed in prior work to a weaker specification requiring that the set of top-k predictions remain unchanged under adversarial perturbations of the input (rather than just the top-1). This enables the authors to achieve stronger results on robustness of classifiers on CIFAR10 and ImageNet (where the authors report the top-5 accuracy). + +This is an interesting extension of certified defenses that is likely to be relevant for complex prediction tasks with several classes (ImageNet and beyond), where top-1 robustness may be difficult and unrealistic to achieve. + +The reviewers were in consensus on acceptance and minor concerns were alleviated during the rebuttal phase. + +I therefore recommend acceptance.",ICLR2020, +pgtydKXhX14,1610040000000.0,1610470000000.0,1,q8qLAbQBupm,q8qLAbQBupm,Final Decision,Accept (Poster)," +The paper offers a more systematic treatment of various symmetry-related results in the current literature. Concretely, the invariance properties exhibited by loss functions associated with neural networks give rise to various dynamical invariants of gradient flows. The authors address these dynamical invariants in a unified manner and study them wrt different variants of gradient flows aimed at reflecting different algorithmic aspects of real training processes. + +The simplicity and the generality of dynamical invariants are both the strength and the weakness of the approach. On one hand, they provide a simple way of obtaining non-trivial generalities for the dynamics of learning processes. On the other hand, they abstracts away the very structure of neural networks from which they derive, and hence only allow relatively generic statements. Perhaps the approach should be positioned more as a conceptual method for studying invariant loss functions. + +Overall, although the technical contributions in the paper are rather incremental, the conceptual contribution of using dynamical invariants to unify and somewhat simplify existing analyses in a clear and clean symmetry-based approach is appreciated by the reviews and warrant a recommendation for borderline acceptance. +",ICLR2021, +QgWQgD09mUK,1610040000000.0,1610470000000.0,1,nkap3LV7t7O,nkap3LV7t7O,Final Decision,Reject,"This paper proposes to (re-)examine VAEs with calibrated uncertainties for the likelihood, which is say VAEs in which the variance is learned rather than chosen as a fixed hyperparameter. The authors argue that doing so provides a reasonable means of automatically navigating the tradeoff between minimizing the distortion (the reconstruction loss) and the rate (the KL loss) in the variational objective. In particular, the authors propose to use a diagonal covariance Σ = σ^2 Ι that is shared across pixels, and note that it is trivial to define σ(z) = MSE(x, μ(z)) on a per-image basis to minimize the reconstruction loss. + +This is very much a borderline paper. Reviewers appreciate that the writing is clear, and acknowledge that revisiting the idea of learning calibrated is of interest to the community. At the same time, the reviewers note that the proposed approach has very limited technical novelty, and note problems with the experimental evaluation. + +The metareviewer has read the paper, and is critical of the framing of this work. The manuscript in its current form does not do a sufficiently good job of discussing the large and detailed literature that exists on this topic. Learning calibrated decoders is by no means new, which this submission could and should acknowledge much more clearly. The two seminal papers on VAEs both considered learning calibrated decoders. Moreover there is a lack of thoughtful discussion of the reasons why learning a pixel-wise σ(z) is not common practice. The authors note that this can lead to problems with training stability, but fail to note that this problem is mathematically ill-posed; A well-known property of VAEs is that high-capacity models will memorize the training data, in the sense that the optimal learned marginal likelihood is equal to the empirical distribution over the training set (i.e. a mixture over delta peaks). + +The metareviewer would expect to see a more thoughtful discussion of the long line of work on navigating the trade-off between rate and distortion, as well as the role of model capacity. A good place to start would be a more careful discussion of the autoencoding and autodecoding limits (Alemi et al 2018) and the GECO paper (Rezende et al 2018). More broadly, the metareviewer would expect some discussion of approaches that improve the quality of generation such as [1], and work that considers effect of model capacity on generalization, such as [2]. + +In terms of experimental evaluation, this paper also somewhat falls short. As R4 notes, some of the results look worryingly bad, which may be due to the fact that the authors train for only 10 epochs (as indicated in Appendix B). Moreover, what is once again lacking in experiments is a systematic consideration of the role of model capacity. Some comparison to more recent baselines than the β-VAE (e.g. GECO) would also be helpful here. + +The metareviewer is sympathetic to the basic premise of this paper, which is the claim that learning a σ that is shared across pixels is a pretty good best practice in terms of finding a reasonable balance between rate and distortion. There is certainly room for a paper that communicates this idea. However, such a paper should (a) more explicitly position itself as revisiting this idea rather than introducing this idea, (b) include a more thoughtful discussion of related work, and (c) include a more robust empirical evaluation. + +[1] Engel, J., Hoffman, M. & Roberts, A. Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models. arXiv:1711.05772 [cs, stat] (2017). + +[2] Shu, R., Bui, H. H., Zhao, S., Kochenderfer, M. J. & Ermon, S. Amortized inference regularization. in Proceedings of the 32nd International Conference on Neural Information Processing Systems 4398–4407 (Curran Associates Inc., 2018).",ICLR2021, +qusY1aoZ3X9,1642700000000.0,1642700000000.0,1,ETiaOyNwJW,ETiaOyNwJW,Paper Decision,Reject,"This paper considers GNNs for link-prediction (predicting which links are likely to appear next). An idea that has been used before is to add virtual nodes to improve the ``under-reaching” problem in shallow GNNs; this paper considers this systematically in the context of link prediction. Specifically, one approach developed is to cluster the graph into clusters C(i), I = 1, 2, …, k for some k and to add a virtual node u(i) for each index i, which is made adjacent to each node in C(i). This can ease information exchange, particularly in message-passing GNNs. + +Link prediction is an important problem. However, there seem to be at least three issues with this work: the performance gains obtained are not strong enough, it is not conceptually clear why virtual nodes should help with link prediction, and the analysis is quite a bit about repeating existing analyses on nodes alone. I recommend the authors to address these issues thoroughly in the next version of the paper.",ICLR2022, +Bke_NyaHM,1517250000000.0,1517260000000.0,380,HkmaTz-0W,HkmaTz-0W,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"This work proposes an improved visualisation techniques for ReLU networks that compensates for filter scale symmetries/invariances, thus allowing a more meaningful comparison of low-dimensional projected optimization landscapes between different network architectures. + +- the visualisation techniques are a small variation over previous works ++ extensive experiments provide nice visualisations and yield a clearer visual picture of some properties of the optimization landscape of various architectural variants + +A promising research direction, which could be further improved by providing more extensive and convincing support for the significance of its contribution in comparison to prior techniques, and to its empirically derived observations, findings and claims. +",ICLR2018, +5ptoVuMv7kFk,1642700000000.0,1642700000000.0,1,F1Z3QH-VjZE,F1Z3QH-VjZE,Paper Decision,Reject,"This paper is concerned with fairness in the generative setting, specifically the setting in which various groups have very different sizes, and are therefore treated disproportionately by the model, with the group memberships further being unknown. + +The reviewers generally agreed that the setting was interesting and important. However, they were critical of the writing quality, significance, and quality of the theoretical contribution. + +The authors made significant improvements in the review period, and while these were not quite enough to satisfy enough reviewers, opinions clearly changed in a positive direction during the discussion period. Future changes motivated by the existing reviewer concerns should significantly improve the paper.",ICLR2022, +H1ebTfI_x,1486400000000.0,1486400000000.0,1,ByEPMj5el,ByEPMj5el,ICLR committee final decision,Reject,"This paper aims to present an experimental framework for selecting machine learning models that can generate novel objects. As the work is devoted to a relatively subjective area of study, it is not surprising that opinions of the work are mixed. + + A large section of the paper is devoted to review, and more detail could be given to the experimental framework. It is not clear whether the framework can actually be useful outside the synthetic setup described. Moreover, I worry it encourages unhealthy directions for the field. Over 1000 models were trained and evaluated. There is no form of separate held-out comparison: the framework encourages people to keep trying random stuff until the chosen measure reports success.",ICLR2017, +Sk118yTrG,1517250000000.0,1517260000000.0,690,BybQ7zWCb,BybQ7zWCb,ICLR 2018 Conference Acceptance Decision,Reject,"The paper extends an existing work with three different frequency representations of audios and necessary network structure modifications for music style transfer. +It is an interesting study but does not provide ""sufficiently novel or justified contributions compared to the baseline approach of Ulyanov and Lebedev"". Also the revisions can not fully address reviewer 2's concerns. ",ICLR2018, +VRWzAXKCba,1576800000000.0,1576800000000.0,1,BkgMbCVFvr,BkgMbCVFvr,Paper Decision,Reject,"The paper presents a new dataset, containing around 8k pictures of 30 horses in different poses. This is used to study the benefits of pretraining for in- and out-of-domain images. + +The paper is somewhat lacking in novelty. Others have studied the same type of pre-training in the past using other datasets, which makes the dataset the main novelty. But reviewers raised many questions about the dataset, in particular about how many of the frames of the same horse might be similar, and of how few horses there are; few enough to potentially not make the results statistically meaningful. The authors replied to these questions more by appealing to standards in other fields than by explaining why this is a good choice. Apart from these crucial weaknesses, however, the research appears good. + +This is a pretty clear reject based on lack of novelty and oddities with the dataset.",ICLR2020, +mTFv8-Rs2Na8,1642700000000.0,1642700000000.0,1,ZeE81SFTsl,ZeE81SFTsl,Paper Decision,Reject,"Dear authors, + +I apologize to the authors for insufficient discussion in the discussion period. Thanks for carefully responding to reviewers. Nevertheless, I have read the paper as well, and the situation is clear to me (even without further discussion). I will not summarize what the paper is about, but will instead mention some of the key issues. + +1) The proposed idea is simple, and in fact, it has been known to me for a number of years. I did not think it was worth publishing. This on its own is not a reason for rejection, but I wanted to mention this anyway to convey the idea that I consider this work very incremental. +2) The idea is not supported by any convergence theory. Hence, it remains a heuristic, which the authors admit. In such a case, the paper should be judged by its practical performance, novelty and efficacy of ideas, and the strength of the empirical results, rather than on the theory. However, these parts of the paper remain lacking compared to the standard one would expect from an ICLR paper. +3) Several elements of the ideas behind this work existed in the literature already (e.g., adaptive quantization, time-varying quantization, ...). Reviewers have noticed this. +4) The authors compare to fixed / non-adaptive quantization strategies which have already been surpassed in subsequent work. Indeed, QSGD was developed 4 years ago. The quantizers of Horvath et al in the natural compression/natural dithering family have exponentially better variance for any given number of levels. This baseline, which does not use any adaptivity, should be better, I believe, to what the author propose. If not, a comparison is needed. +5) FedAvg is not the theoretical nor practical SOTA method for the problem the authors are solving. Faster and more communication efficient methods exist. For example, method based on error feedback (e.g., the works of Stich, Koloskova and others), MARINA method (Gorbunov et al), SCAFFOLD (Karimireddy et al) and so on. All can be combined with quantization. +6) The reviewer who assigned this paper score 8 was least confident. I did not find any comments in the review of this reviewer that would sufficiently justify the high score. The review was brief and not very informative to me as the AC. All other reviewers were inclined to reject the paper. +7) There are issues in the mathematics - although the mathematics is simple and not the key of the paper. This needs to be thoroughly revised. Some answers were given in author response. +8) Why should expected variance be a good measure? Did you try to break this measure? That is, did you try to construct problems for which this measure would work worse than the worst case variance? + +Because of the above, and additional reasons mentioned in the reviewers, I have no other option but to reject the paper. + +Area Chair",ICLR2022, +UAfCAtXDUk,1576800000000.0,1576800000000.0,1,HkeMYJHYvS,HkeMYJHYvS,Paper Decision,Reject,"This paper received all negative reviewers, and the scores were kept after the rebuttal. The authors are encouraged to submit their work to a computer vision conference where this kind of work may be more appreciated. Furthermore, including stronger baselines such as Acuna et al is recommended.",ICLR2020, +lqmh0Ir-dV,1576800000000.0,1576800000000.0,1,ByglLlHFDS,ByglLlHFDS,Paper Decision,Accept (Poster),"The paper proposes a new algorithm called Expected Information Maximization (EIM) for learning latent variable models while computing the I-projection solely based on samples. The reviewers had several questions, which the authors sufficiently answered. The reviewers agree that the paper should be accepted. The authors should carefully read the reviewer questions and comments and use them to improve their final manuscript. ",ICLR2020, +NaqDzxxFPRE,1642700000000.0,1642700000000.0,1,Az-7gJc6lpr,Az-7gJc6lpr,Paper Decision,Accept (Poster),"This papers studies the classical problem of relational learning from a probabilistic perspective. The authors propose four reasonable constraints to encode relational properties, and develop a PGM-based variational method for learning relational properties from data. After extensive discussion with the authors, a majority of the reviewer reviewers agree the approach is interesting, if not without some flaws. + +The problem studied is interesting, novel, and could lead to new developments in the area of relational learning. It is expected that the experiments have some limitations given the authors have approached the problem from a fresh new angle, which the reviewers have appreciated. + +Please pay attention to the suggestions from the reviewers, and in particular, please add a more detailed discussion with statistical relational learning: This material may not be familiar to the broader ML audience, and therefore it is essential to make these comparisons explicit.",ICLR2022, +#NAME?,1576800000000.0,1576800000000.0,1,SylUiREKvB,SylUiREKvB,Paper Decision,Reject,"The paper proposes a neural network architecture that uses a hypernetwork (RNN or feedforward) to generate weights for a network (variational RNN), that models sequential data. An empirical comparison of a large number of configurations on synthetic and real world data show the promise of this method. + +The authors have been very responsive during the discussion period, and generated many new results to address some reviewer concerns. Apart from one reviewer, the others did not engage in further discussion in response to the authors updating their paper. + +The paper provides a tweak to the hypernetwork idea for modeling sequential data. There are many strong submissions at ICLR this year on RNNs, and the submission in its current state unfortunately does not pass the threshold.",ICLR2020, +HyefUlvSeV,1545070000000.0,1545350000000.0,1,HkxjYoCqKX,HkxjYoCqKX,new approach,Accept (Poster),This paper proposes an effective method to train neural networks with quantized reduced precision. It's fairly straight-forward idea and achieved good results and solid empirical work. reviewers have a consensus on acceptance. ,ICLR2019,5: The area chair is absolutely certain +BJ86hMIOl,1486400000000.0,1486400000000.0,1,BJVEEF9lx,BJVEEF9lx,ICLR committee final decision,Reject,"The consensus of the reviewers, although their reviews where somewhat succinct, was that the paper proposes an interesting research direction by training neural networks to approximate datastructures by constraining them to (attempt to) respect the axioms of the structure, but is thin on the ground in terms of evaluation and comparison to existing work in the domain (both in terms of models and ""standard"" experiments""). The authors have not sought to defend their paper against the reviewers' critique, and thus I am happy to accept the consensus and reject the paper.",ICLR2017, +jqZYjNxLZw,1576800000000.0,1576800000000.0,1,BkxtNaEYDr,BkxtNaEYDr,Paper Decision,Reject,"The paper puts forward a theoretical investigation of the learnability of +tree-structured Boolean circuits with neural networks. +The authors identify *local correlations*, ie correlation of each internal +target circuit gate with the target output, as critical property for +characterizing learnability by layerwise training. + +The reviewers agree that the paper is well written and content to be correct +(to the best of their knowledge). +However, they have reservations about the strength of the assumptions about the +target functions as well as the layerwise training procedure. + +I think this paper is slightly below acceptance threshold for ICLR, which is a quite +applied conference. +The assumptions are quite strong, ie local correlations and the topology of the +circuit to be known as well as layerwise training, and possibly too far removed +from current deep learning practice.",ICLR2020, +B15yD1aSG,1517250000000.0,1517260000000.0,914,BkeC_J-R-,BkeC_J-R-,ICLR 2018 Conference Acceptance Decision,Reject,"The proposed method combines supervised pretraining given some expert data and further uses the supervision to regularize the Q-updates to prevent the agent from exploring 'nonsense' directions. There a significant problems with the paper: the approach is not novel, the assumption of large amounts of expert data is problematic, and the claim of vastly accelerated learning is not supported empirically, either in the main paper or in the additional mujoco experiments added in the appendix.",ICLR2018, +Syg5cM9ee4,1544750000000.0,1545350000000.0,1,HJz05o0qK7,HJz05o0qK7,"Worthwhile first step on an important problem, but clarity issues.",Accept (Poster),"This paper presents a method for measuring the degree to which some representation for a composed object effectively represents the pieces from which it is composed. All three authors found this to be an important topic for study, and found the paper to be a limited but original and important step toward studying this topic. However, two reviewers expressed serious concerns about clarity, and were not fully satisfied with the revisions made so far. I'm recommending acceptance, but I ask the authors to further revise the paper (especially the introduction) to make sure it includes a blunt and straightforward presentation of the problem under study and the way TRE addresses it. + +I'm also somewhat concerned at R2's mention of a potential confound in one experiment. The paper has been updated with what appears to be a fix, though, and R2 has not yet responded, so I'm presuming that this issue has been resolved. + +I also ask the authors to release code shortly upon de-anonymization, as promised.",ICLR2019,3: The area chair is somewhat confident +56-zdW0ZRT,1610040000000.0,1610470000000.0,1,LvJ8hLSusrv,LvJ8hLSusrv,Final Decision,Reject,"This paper proposes a tuning strategy for Hamiltonian Monte Carlo (HMC). The proposed algorithm optimizes a modified variational objective over the T step distribution of an HMC chain. The proposed scheme is evaluated experimentally. + +All of the reviewers agreed that this is an important problem and that the proposed methods is promising. Unfortunately, reviewers had reservations about the empirical evaluation and the theoretical properties of the scheme. Because the evaluation of the scheme is primarily empirical, I cannot recommend acceptance of the paper in its current form. + +I agree with the following specific reviewer concerns. The proposed method does not come with any particular guarantees, and particularly no guarantees regarding the effect of dropping the entropy term and using an SKSD training scheme to compensate. While guarantees are not necessary for publication, the paper should make up for this with comprehensive and convincing experiments. I agree with R1 that more careful ablation studies on toy models are needed, if nothing else to reveal the strengths and weaknesses of the proposed approach. I would also recommend a more careful discussion about the computational cost of this method and how it can be fairly compared to baselines. I don't agree that ""deliberately wasteful"" experiments reveal much, especially if running more realistic experiments reduces the relative impact of the proposed method.",ICLR2021, +VF_pFYuiMj,1576800000000.0,1576800000000.0,1,SJeNz04tDS,SJeNz04tDS,Paper Decision,Accept (Poster),"This paper introduces the problem of overlearning, which can be thought of as unintended transfer learning from a (victim) source model to a target task that the source model’s creator had not intended its model to be used for. The paper raises good points about privacy legislation limitations due to the fact that overlearning makes it impossible to foresee future uses of a given dataset. + +Please incorporate the revisions suggested in the reviews to add clarity to the overlearning versus censoring confusion addressed by the reviewers.",ICLR2020, +SkcmSJ6Bf,1517250000000.0,1517260000000.0,536,HJYoqzbC-,HJYoqzbC-,ICLR 2018 Conference Acceptance Decision,Reject,"This paper investigates the performance of various second-order optimization methods for training neural networks. Comparing different optimizers is worthwhile, but as this is an empirical paper which doesn't present novel techniques, the bar is very high for the experimental methodology. Unfortunately, I don't think this paper clears the bar: as pointed out by the reviewers, the comparisons miss several important methods, and the experiments miss out on important aspects of the comparison (e.g. wall clock time, generalization). I don't think there is enough of a contribution here to merit publication at ICLR, though it could become a strong submission if the reviewers' points were adequately addressed. +",ICLR2018, +BkarmkaHG,1517250000000.0,1517260000000.0,136,Sy0GnUxCb,Sy0GnUxCb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper received divergent reviews (7, 3, 9). The main contributions of the paper -- that multi-agent competition serves as a natural curriculum, opponent sampling strategies, and the characterization of emergent complex strategies -- are certainly of broad interest (although the first is essentially the same observation as AlphaZero, but the different environment makes this of broader interest). + +In the discussion between R2 and the authors, I am sympathetic to (a subset of) both viewpoints. + +To be fair to the authors, discovery (in this case, characterization of emergent behavior) can be often difficult to quantify. R2's initial review was unnecessary harsh and combative. The points presented by R2 as evidence of poor evaluation have clear answers by the authors. It would have been better to provide suggestions for what the authors could try, rather than raise philosophical objections that the authors cannot experimentally rebut. + +On the other hand, I am disappointed that the authors were asked a reasonable, specific, quantifiable request by R2 -- +""By the end of Section 5.2, you allude to transfer learning phenomena. It would be nice to study these transfer effects in your results with a quantitative methodology.” +-- and they chose to respond with informal and qualitative assessments. It doesn't matter if the results are obvious visually, why not provide quantitative evaluation when it is specifically asked? + +Overall, we recommend this paper for acceptance, and ask the authors to incorporate feedback from R2. ",ICLR2018, +nfUx-Z5K7v2,1610040000000.0,1610470000000.0,1,0migj5lyUZl,0migj5lyUZl,Final Decision,Reject,"While the paper contains some interesting ideas, the reviewers felt that overall the paper is not theoretical well supported, and likewise the experiments are not fully convincing. Even after the rebuttal, these concerns still persist.",ICLR2021, +WnQr_I9M1T,1576800000000.0,1576800000000.0,1,r1glDpNYwS,r1glDpNYwS,Paper Decision,Reject,"Thanks for the discussion with reviewers, which improved our understanding of your paper significantly. +However, we concluded that this paper is still premature to be accepted to ICLR2020. We hope that the detailed comments by the reviewers help improve your paper for potential future submission.",ICLR2020, +8AJOxAYA6F1,1610040000000.0,1610470000000.0,1,xTV-wQ-pMrU,xTV-wQ-pMrU,Final Decision,Reject,"There is clear consensus on this submission. Reviewers cite a lack of comparison +with recent state-of-the-art methods and experiments on more realistic datasets. +Though the reviewers find aspects of the approach interesting, the decision is +to reject. +",ICLR2021, +vP7xavOlE4N,1642700000000.0,1642700000000.0,1,MIX3fJkl_1,MIX3fJkl_1,Paper Decision,Accept (Poster),The authors propose a new framework of population learning that optimizes a single conditional model to learn and represent multiple diverse policies in real-world games. All reviewers agree the ideas are interesting and the empirical results are strong. The meta reviewer agrees and recommends acceptance.,ICLR2022, +NgABgojUX,1576800000000.0,1576800000000.0,1,ryl0cAVtPH,ryl0cAVtPH,Paper Decision,Reject,"The paper addresses the question of why warm starting could result in worse generalization ability than training from scratch. The reviewers agree that increasing the circumstances in which warm starting could be applied is of interest, in particular to reduce training time and computational resources. However, the reviewers were unanimous in their opinion that the paper is not suitable for publication at ICLR in its current form. Concerns included that the analysis was not sufficiently focused and the experiments too small scale. As the analysis component of the paper was considered to be limited, the experimental results were insufficient on the balance to push the paper to an acceptable state.",ICLR2020, +rJ--IC4fNe,1576800000000.0,1576800000000.0,1,rkgKW64FPH,rkgKW64FPH,Paper Decision,Reject,"There was some interest in the ideas presented, but this paper was on the borderline and ultimately not able to be accepted for publication at ICLR. + +The primary reviewer concern was about the level of novelty and significance of the contribution. This was not sufficiently demonstrated.",ICLR2020, +H1ll1oYQl4,1544950000000.0,1545350000000.0,1,BJxvEh0cFQ,BJxvEh0cFQ,Simple and effective parameter efficient method for finetuning,Accept (Poster),Reviewers largely agree that the proposed method for finetuning the deep neural networks is interesting and empirical results clearly show the benefits over finetuning only the last layer. I recommend acceptance. ,ICLR2019,4: The area chair is confident but not absolutely certain +2MTnG4OKuk,1576800000000.0,1576800000000.0,1,rJleKgrKwS,rJleKgrKwS,Paper Decision,Accept (Poster),"This paper presents a number of improvements on existing approaches to neural logic programming. The reviews are generally positive: two weak accepts, one weak reject. Reviewer 2 seems wholly in favour of acceptance at the end of discussion, and did not clarify why they were sticking to their score of weak accept. The main reason Reviewer + 1 sticks to 6 rather than 8 is that the work extends existing work rather than offering a ""fundamental contribution"", but otherwise is very positive. I personally feel that +a) most work extends existing work +b) there is room in our conferences for such well executed extensions (standing on the shoulders of giants etc). + +Reviewer 3 is somewhat unconvinced by the nature of the evaluation. While I understand their reservations, they state that they would not be offended by the paper being accepted in spite of their reservations. + +Overall, I find that the review group leans more in favour of acceptance, and an happy to recommend acceptance for the paper as it makes progress in an interesting area at the intersection of differentiable programming and logic-based programming.",ICLR2020, +XOf51BWGktE,1610040000000.0,1610470000000.0,1,Fblk4_Fd7ao,Fblk4_Fd7ao,Final Decision,Reject,"This paper received borderline scores, R1, R3, R4 gave a score of 6 and recommended a borderline acceptance. R2 provided by far the most detailed review and recommended a score of 5 (i.e., borderline reject). After the rebuttal, R2 comments, ""I believe that the paper is still below the acceptance threshold, although only marginally"". Overall, I concur with R2. The reasons are detailed below: + +The paper proposes a method for communication between two agents, wherein one agent actuates its joints to communicate intent. Intuitively, this resembles making a gesture. The paper considers the setting of a discrete number of intents. The sender agent is modeled as a neural network that takes as input the intent and outputs a trajectory of joints. The receiver observes a noisy version of the trajectory and outputs the intent. The parameters of the sender policy and receiver discrimination network are optimized to maximize classification accuracy. It is shown that if the intents are sampled from Zipf distribution and trajectories are penalized based on their energy, then a receiver agent initialized from scratch is better at inferring the intent from a pre-trained speaker agent, as opposed to when the distribution is uniform or when the energy regularization is not used (Figure 2). + +Further, section 5.2 shows that when the listener is provided with the energy of the trajectory then it is better at recognizing the intent as opposed to being provided with the entire trajectory when a number of intents are small. With a larger number of intents (N=10), the performance is at chance accuracy. + +The biggest challenge with the paper is that it is very poorly written. Large parts of the method and experimental setup are in the appendix (A.2 / A.3), which makes it hard to understand the paper. Section 4.2 is rather confusing because the ideas introduced are not used for training, but simply for evaluation. Further, the authors point out in the rebuttal that torque curriculum is not required, but it is still there in the paper and makes it more confusing. I recommend the authors to substantially rewrite the paper and focus on relevant parts instead of philosophical arguments. Lastly, I am confused by results in Table 2 -- the authors mention in the text that with 10 intents, intent identification is at chance (i.e. 34% accuracy), but the table shows 56% accuracy. A clarification would be helpful here. + +The problem of communicating intents via gestures, when the agents are unaware of mapping from intents to gestures is an exciting area of research. From the perspective of emergent gestures, this paper has a novel contribution. However, the settings are toy and even in such a setup, the results are underwhelming. The assumptions that make the setup toy are: the listener agent knows about all joint locations of the sender (with some noise) and also has access to the energy exerted by the agent. Without access to energy, the performance is poor. In real-world scenarios, these are big assumptions. Furthermore, even when the energy is known For instance, even when the number of intent is small (i.e., N=10,) the performance is bad. The authors argue that is due to local minima in the optimization -- but that's exactly where the contribution could have been. + +I will reiterate, that the authors claim their contribution is in using energy minimization + Zipf intent distribution as a mechanism for communicating intent -- which I agree to. However, as pointed out earlier, the paper is not well executed or written and therefore I recommend rejection. + +",ICLR2021, +By_7nML_e,1486400000000.0,1486400000000.0,1,BkjLkSqxg,BkjLkSqxg,ICLR committee final decision,Reject,"Let me start by saying that your area chair does not read Twitter, Reddit/ML, etc. The metareview below is, therefore, based purely on the manuscript and the reviews and rebuttal on OpenReview. + + The goal of the ICLR review process is to establish a constructive discussion between the authors of a paper on one side and reviewers and the broader machine-learning community on the other side. The goal of this discussion is to help the authors leverage the community for improving their manuscript. + + Whilst one may argue that some of the initial reviews could have provided a more detailed motivation for their rating, there is no evidence that the reviewers were influenced (or even aware of) discussions about this paper on social or other media --- in fact, none of the reviews refers to claims made in those media. Suggestions by the authors that the reviewers are biased by (social) media are, therefore, unfounded: there can be many valid reasons for the differences in opinion between reviewers and authors on the novelty, originality, or importance of this work. The authors are free to debate the opinion of the reviewers, but referring to the reviews as ""absolute nonsense"", ""unreasonable"", ""condescending"", and ""disrespectful"" is not helping the constructive scientific discussion that ICLR envisions and, frankly, is very offensive to reviewers who voluntarily spend their time in order to improve the quality of scientific research in our field. + + Two area chairs have read the paper. They independently reached the conclusion that (1) the reviewers raise valid concerns with respect to the novelty and importance of this work and (2) that the paper is, indeed, borderline for ICLR. The paper is an application paper, in which the authors propose the firstÊend-to-end sentence level lip reading using deep learning. Positive aspects of the paper include: + + - A comprehensive and organized review about previous work. + - Clear description of the model and experimental methods. + - Careful reporting of the results, with attention to detail. + - Proposed method appears to perform better than the prior state-of-the-art, and generalizes across speakers. + + However, the paper has several prominent negative aspects as well: + + - The GRID corpus that is used for experimentation has very substantial (known) limitations. In particular, it is constructed in a way that leads to a very limited (non-natural) set of sentences.Ê(For every word, there is an average of just 8.5 possible options the model has to choose from.) + - The paper overstates some of its claims. In particular, the claim that the model is ""outperforming experienced human lipreaders"" is questionable: it is not unlikely that model achieves its performance by exploiting unrealistic statistical biases in the corpus that humans cannot / do not exploit. Similarly, the claims about the ""sentence-level"" nature of the model are not substantiated: it remains unclear what aspects of the model make this a sentence-level model, nor is there much empirical evidence that the sentence-level treatment of video data is helping much (the NoLM baseline is almost as good as LipNet, despite the strong biases in the GRID corpus). + - The paper makes several other statements that are not well-founded. As one of the reviewers correctly remarks, the McGurk effect does not show that lipreading plays a crucial role in human communication (it merely shows that vision can influence speech recognition). Similarly, the claim that ""Bi-GRUs are crucial for efficient further aggregation"" is not supported by empirical evidence. + + A high-level downside of this paper is that, while studying a relevant application of deep learning, it presents no technical contributions or novel insights that have impact beyond the application studied in the paper.",ICLR2017, +N49tMPeeBF,1576800000000.0,1576800000000.0,1,B1l6nnEtwr,B1l6nnEtwr,Paper Decision,Reject,"The work proposes to learn neural networks using a homotopy-based continuation method. Reviewers found the idea interesting, but the manuscript poorly written, and lacking in experimental results. With no response from the authors, I recommend rejecting the paper.",ICLR2020, +twAV1SpA4M2,1642700000000.0,1642700000000.0,1,alaQzRbCY9w,alaQzRbCY9w,Paper Decision,Reject,"There was a consensus among the reviewers to reject the paper. While they noted that the paper proposed a new interesting stochastic algorithm for deep learning, they think the paper needs to be substantially improved in both theory and empirical study. The paper was judged quite incremental in comparison to the work of Öztoprak et al 2018 (where most of the theory was developed), while not showing improved empirical performance on the benchmarks.",ICLR2022, +2qoIFs-0i,1576800000000.0,1576800000000.0,1,rylMgCNYvS,rylMgCNYvS,Paper Decision,Reject,"This paper presents an analysis of the languages that can be accepted by a counter machine, motivated by recent work that suggests that counter machines might be a good formal model from which to approach the analysis of LSTM representations. + +This is one of the trickiest papers in my batch. Reviewers agree that it represents an interesting and provocative direction, and I suspect that it could yield valuable discussion at the conference. However, reviewers were not convinced that the claims made (or implied) _about LSTMs_ are motivated, given the imperfect analogy between them and counter machines. The authors promise some empirical evidence that might mitigate these concerns to some extent, but the paper has not yet been updated, so I cannot take that into account. + +As a very secondary point, which is only relevant because this paper is borderline, LSTMs are no longer widely used for language tasks, so discussion about the capacity of LSTMs _for language_ seems like an imperfect fit for an machine learning conference with a fairly applied bent.",ICLR2020, +ByCum1arM,1517250000000.0,1517260000000.0,178,ryH20GbRW,ryH20GbRW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),All three reviewers recommend acceptance. The authors did a good job at the rebuttal which swayed the first reviewer to increase the final rating. This is a clear accept.,ICLR2018, +BkejfJ71l4,1544660000000.0,1545350000000.0,1,Bkg3g2R9FX,Bkg3g2R9FX,Summary review,Accept (Poster),"The paper was found to be well-written and conveys interesting idea. However the AC notices a large body of clarifications that were provided to the reviewers (regarding the theory, experiments, and setting in general) that need to be well addressed in the paper. ",ICLR2019,5: The area chair is absolutely certain +2zv1GGPUg1f,1642700000000.0,1642700000000.0,1,FEBFJ98FKx,FEBFJ98FKx,Paper Decision,Accept (Poster),"The paper proposes a GAN framework for dynamic point cloud superresolution. It does not need scene flow supervision for training and has an interesting adaptive upsampling mechanism. Results are shown on several datasets and are reasonably convincing. Overall, all the reviewers are slightly positive about the work. After the rebuttal, all the five reviewers converged to a marginally-above-the-threshold recommendation. The meta-reviewer agreed with their assessment and would like to recommend accepting the paper.",ICLR2022, +Skleryyz1V,1543790000000.0,1545350000000.0,1,S1GcHsAqtm,S1GcHsAqtm,Reject,Reject,"The area chair agrees with the authors and the reviewers that the topic of this work is relevant and important. The area chair however shares the concerns of the reviewers about the setup and the empirical evaluation: +- Having one model that can be pruned to varying sizes at run-time is convenient, but in practice it is likely to be OK to do the pruning at training time. In light of this, the empirical results are not so impressive. +- Without quantization, distillation and fused ops, the value of the empirical results seems questionable as these are important and well-known techniques that are often used in practice. A more thorough evaluation that includes these techniques would make the paper much stronger.",ICLR2019,4: The area chair is confident but not absolutely certain +j_tEqELhuY-,1610040000000.0,1610470000000.0,1,90JprVrJBO,90JprVrJBO,Final Decision,Accept (Poster),"This paper proposes a learning based approach for solving combinatorial optimization problems such as routine using continuous optimizers. The key idea is to learn a continuous latent space via conditional VAE to represent solutions and perform search in this latent space for new problems at the test-time. The approach is novel and experiments showed good results including ablation analysis. + +Reviewer comments are adequately addressed during the response phase and I find the changes satisfactory. Overall, this is a good paper and I recommend accepting it. + +One last comment: It would be a great addition if the paper could add discussion about the applicability of this approach to arbitrary combinatorial optimization problems and what design choices are critical to come up with an effective instantiation.",ICLR2021, +SygJTxXx1V,1543680000000.0,1545350000000.0,1,r1ge8sCqFX,r1ge8sCqFX,Lack of novelty and potentially flawed empirical study,Reject,"The paper evaluates several off-the-shelf algorithms for predicting the return on real-estate property investment. The problem is interesting, but there is a consensus that the paper contains little technical novelty, and the empirical study on a fairly small dataset is also not convincing. ",ICLR2019,5: The area chair is absolutely certain +#NAME?,1610040000000.0,1610470000000.0,1,OMizHuea_HB,OMizHuea_HB,Final Decision,Accept (Poster),"The paper focuses on the task of learning audiovisual representations through contrastive learning on unlabelled videos. This work is another addition to the ever-growing literature on self-supervised learning (SSL) with emphasis on video and multi-modal data. The main contribution of this work is the manner in which it tackles a well-known drawback of contrastive learning, namely the strategy used to sample negatives in the contrastive pipeline. The authors propose an active sampling strategy that adaptively chooses negative samples that are informative and diverse. This active selection technique is similar in spirit to many selector functions proposed in the active learning literature. It seems to be the first time it is used for contrastive SSL. + +Based on all the reviews and the subsequent discussions, it seems that the reviewers' comments were addressed. The authors are commended on integrating the reviewers' suggestions and making the necessary edits to the paper in a timely manner. ",ICLR2021, +9PjfTh7pUC,1576800000000.0,1576800000000.0,1,rklHqRVKvH,rklHqRVKvH,Paper Decision,Accept (Talk),"The paper shows empirical evidence that the the optimal action-value function Q* often has a low-rank structure. It uses ideas from the matrix estimation/completion literature to provide a modification of value iteration that benefits from such a low-rank structure. +The reviewers are all positive about this paper. They find the idea novel and the writing clear. +There have been some questions about the relation of this concept of rank to other definitions and usage of rank in the RL literature. +The authors’ rebuttal seem to be satisfactory to the reviewers. Given these, I recommend acceptance of this paper.",ICLR2020, +8BHtMj-Jywa,1610040000000.0,1610470000000.0,1,PcBVjfeLODY,PcBVjfeLODY,Final Decision,Reject,"All reviews are somewhat below the acceptance threshold. The main concerns are in terms of lack of novelty, and that some of the paper's main claims are unsupported. Many of the criticisms are quite focused on specific details, but these seem significant enough to have been deal-breakers for this submission.",ICLR2021, +TI0uX_gI1I,1610040000000.0,1610470000000.0,1,MjvduJCsE4,MjvduJCsE4,Final Decision,Accept (Poster),"This paper presents an empirical study focusing on Bayesian inference on NNGP - a Gaussian process where the kernel is defined by taking the width of a Bayesian neural network (BNN) to the infinity limit. The baselines include a finite width BNN with the same architecture, and a proposed GP-BNN hybrid (NNGP-LL) which is similar to GPDNN and deep kernel learning except that the last-layer GP has its kernel defined by the width-limit kernel. Experiments are performed on both regression and classification tasks, with a focus on OOD data. Results show that NNGP can obtain competitive results comparing to their BNN counterpart, and results on the proposed NNGP-LL approach provides promising supports on the hybrid design as to combine the best from both GP and deep learning fields. + +Although the proposed approach is a natural extension of the recent line of work on GP-BNN correspondence, reviewers agreed that the paper presented a good set of empirical studies, and the NNGP-LL approach, evaluated in section 5 with SOTA deep learning architectures, provides a promising direction of future for scalable uncertainty estimation. This is the main reason that leads to my decision on acceptance. + +Concerns on section 3's results on under-performing CNN & NNGP results on CIFAR-10 has been raised, which hinders the significance of the results there (since they are way too far from expected CNN accuracy). The compromise for model architecture in order to enable NNGP posterior sampling is understandable, although this does raise questions about the robustness of posterior inference for NNGP in large architectures.",ICLR2021, +SZx6xDCUut,1576800000000.0,1576800000000.0,1,B1xw9n4Kwr,B1xw9n4Kwr,Paper Decision,Reject,"This paper focuses on understanding the role of model architecture on convergence behavior and in particular on the speed of training. The authors study the gradient flow of training via studying an ODE's coefficient matrix H. They study the effect of H in terms of possible paths in the network. The reviewers all agreed that characterizing the behavior in terms of path is nice. However, they had concerns about novelty with respect to existing work on NTK. Other comments by reviewers include (1) poor literature review (2) subpar exposition and (3) hand-wavy and rack of rigor in some results. While some of these concerns were alleviated during the discussion. Reviewers were not fully satisfied. I general agree with the overall assessment of the reviewers. The paper has some interesting ideas but suffers from lack of clarity and rigor. Therefore, I can not recommend acceptance in the current form.",ICLR2020, +TcV0EEEMDgX,1642700000000.0,1642700000000.0,1,1XdUvpaTNlM,1XdUvpaTNlM,Paper Decision,Reject,"The reviewers consider the authors' approach to pruning of convolutional networks reasonable; but neither sufficiently novel nor sufficiently well explored for inclusion in the conference. In particular, the reviewers would like to see a more explicit discussion of the effect on training time of the authors' method, and more discussion and comparison against previous probabilistic pruning methods.",ICLR2022, +EK1-jNiy4d5,1610040000000.0,1610470000000.0,1,q_kZm9eHIeD,q_kZm9eHIeD,Final Decision,Reject," +The paper considers the risk sensitive RL by exploiting entropic risk. The major contribution of this paper is providing the theoretical guarantees for the proposed risk-senstive value iteration with function approximation. + +The major concern of this paper is the similarity to the existing work in (Fei et al., 2020). I encourage the authors to reorganize the paper and emphasize the differences to highlight the major contribution. ",ICLR2021, +rJxKFRTaJV,1544570000000.0,1545350000000.0,1,HJflg30qKX,HJflg30qKX,ICLR 2019 decision,Accept (Poster),"This paper studies the behavior of weight parameters for linear networks when trained on separable data with strictly decreasing loss functions. For this setting the paper shows that the gradient descent solution converges to max margin solution and each layer converges to a rank 1 matrix with consequent layers aligned. All reviewers agree that the paper provides novel results for understanding implicit regularization effects of gradient descent for linear networks. Despite the limitations of this paper such as studying networks with linear activation, studying gradient descent not with practical step sizes, assuming data is linearly separable, reviewers find the results useful and a good addition to existing literature.",ICLR2019,4: The area chair is confident but not absolutely certain +o30pC3H2Uu,1576800000000.0,1576800000000.0,1,SJgdpxHFvH,SJgdpxHFvH,Paper Decision,Reject,"The reviewers reached a consensus that the paper was not ready to be accepted in its current form. The main concerns were in regard to clarity, relatively limited novelty, and a relatively unsatisfying experimental evaluation. Although some of the clarity concerns were addressed during the response period, the other issues still remained, and the reviewers generally agreed that the paper should be rejected.",ICLR2020, +6hYu9QQx4qZ,1610040000000.0,1610470000000.0,1,l0mSUROpwY,l0mSUROpwY,Final Decision,Accept (Poster),"Protein molecule structure analysis is an important problem in biology that has recently become of increasing interest in the ML field. The paper proposes a new architecture using a new type of convolution and pooling both on Euclidean as well as intrinsic representations of the proteins, and applies it to several standard tasks in the field. + +Overall the reviews were strong, with the reviewers commending the authors for an important result on the intersection of biology and ML. The reviewers raised the points of: +- weak baselines (The authors responded with adding suggested comparison, which were not completely satisfactory) +- focus mostly on recent protein literature +- the reliance of the method on the 3D structure. The AC however does not find this as a weakness, as there are multiple problems that rely on 3D structure, which with recent methods can be predicted computationally rather than experimentally. + +We believe this to be an important paper and thus our recommendation is Accept. As the AC happens to have expertise in both 3D geometric ML and structural biology, he/she would strongly encourage the authors to better do their homework as there have been multiple recent works on convolutional operators on point clouds, as well as intrinsic representation-based ML methods for proteins. ",ICLR2021, +kQujRaJ0z,1576800000000.0,1576800000000.0,1,HyebplHYwB,HyebplHYwB,Paper Decision,Accept (Poster),"This paper introduces a way to measure dataset similarities. Reviewers all agree that this method is novel and interesting. A few questions initially raised by reviewers regarding models with and without likelihood, geometric exposition, and guarantees around GW, are promptly answered by authors, which raised the score to all weak accept. +",ICLR2020, +05EPoF_Vuer,1642700000000.0,1642700000000.0,1,gNp54NxHUPJ,gNp54NxHUPJ,Paper Decision,Accept (Poster),"Dear Authors, + +The paper was received nicely and discussed during the rebuttal period. There is consensus among the reviewers that the paper should be accepted: + +- The new result about query complexity of regression problem that the authors have added. Along with the result on + for (noisy) Vandemonde matrix, these make the paper lie above the accept bar. +- The authors have providing satisfying clarifications during the rebuttal that convinced reviewers to increase further their scores. + +The current consensus is that the paper deserves publication. + +Best AC",ICLR2022, +QJEcfi_vQn,1576800000000.0,1576800000000.0,1,H1e5GJBtDr,H1e5GJBtDr,Paper Decision,Reject,"This paper proposes a self-attention-based autoregressive model called Axial Transformers for images and other data organized as high dimensional tensors. The Axial Attention is applied within each axis of the data to accelerate the processing. + +Most of the authors claim that main idea behind Axial Attention is widely applicable, which can be used in many core vision tasks, such as detection and classification. However, the revision fails to provide more application for Axial attention. + +Overall, the idea behind this paper is interesting but more convincing experimental results are needed. +",ICLR2020, +Z2es9fQ2-,1576800000000.0,1576800000000.0,1,HygSq3VFvH,HygSq3VFvH,Paper Decision,Reject,"The paper considers a setting where the state of a (robotics) environment can be divided roughly into ""context states"" (such as variables under the robot's direct control) and ""states of interest"" (such as the state variables of an object to be manipulated), and learn skills by maximizing a lower bound on the mutual information between these two components of the state. Experimental results compare to DDPG/SAC, and show that the learned discriminator is somewhat transferable between environments. + +Reviewers found the assumptions necessary on the degree of domain knowledge to be quite strong and domain-specific, and that even after revision, the authors were understating the degree to which this was necessary. The paper did improve based on reviewer feedback, and while R3 was more convinced by the follow-up experiments (though remarked that requiring environment variations to obtain new skills was a ""significant step backward from things like [Diversity is All You Need]""), the other reviewers remained unconvinced regarding domain knowledge and in particular how it interacts with the scalability of the proposed method to complex environments/robots. + +Given the reviewers' concerns regarding applicability and scalability, I recommend rejection in its present form. A future revision may be able to more convincingly demonstrate that limitations based on domain knowledge are less significant than they appear.",ICLR2020, +S1gISyLxeE,1544740000000.0,1545350000000.0,1,HylJtiRqYQ,HylJtiRqYQ,Too preliminary for ICLR-2018,Reject,The reviewers are unanonymous in their assessment that the paper is not ICLR quality in its current form.,ICLR2019,5: The area chair is absolutely certain +PHUUTSFtxj,1576800000000.0,1576800000000.0,1,ByeNra4FDB,ByeNra4FDB,Paper Decision,Accept (Poster),"The paper proposes a new method for out-of-distribution detection by combining random network distillation (RND) and blurring (via SVD). The proposed idea is very simple but achieves strong empirical performance, outperforming baseline methods in several OOD detection benchmarks. There were many detailed questions raised by the reviewers but they got mostly resolved, and all reviewers recommend acceptance, and this AC agrees that it is an interesting and effective method worth presenting at ICLR. ",ICLR2020, +SyOirJarM,1517250000000.0,1517260000000.0,643,BJgPCveAW,BJgPCveAW,ICLR 2018 Conference Acceptance Decision,Reject,"The paper received weak scores: 4,4,5. R2 complained about clarity. R3's point about the lack of fully connected layers in current SOA deepnets is very valid and the authors response far from convincing. Unfortunately the major revision provided by the authors was not commented on by the reviewers, but many of the major shortcomings of the work still remain. +Generally, the paper is below the acceptance threshold, so cannot be accepted.",ICLR2018, +rJR8EyaHf,1517250000000.0,1517260000000.0,364,SyUkxxZ0b,SyUkxxZ0b,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"This paper studies the interplay between adversarial examples and generalization in the uniform setting (not specific assumptions on the architecture) in a toy high-dimensional setting. In particular, the authors show a fundamental tradeoff between generalization error and the average distance of adversarial examples. + +Reviewers were skeptical about the possible significance of this work, but the paper underwent a major revision that greatly improved the quality of presentation. That said, the results are still preliminary since they only consider a toy dataset (concentric spheres). The AC recommends re-submitting this work to the workshop track.",ICLR2018, +KI159IAuVf,1642700000000.0,1642700000000.0,1,Z0XiFAb_WDr,Z0XiFAb_WDr,Paper Decision,Reject,"The paper presents the Language-complete Abstraction and Reasoning Corpus (LARC): a collection of natural language descriptions by a group of human participants who instruct each other on how to solve tasks in the Abstraction and Reasoning Corpus (ARC). + +Overall, the reviewers found the LARC benchmarks to be well-motivated. However, there were concerns about whether the value of the dataset to downstream tasks. Results from additional program synthesis systems (like Codex and GPT-Neo) would also make the paper stronger. I agree with these objections and am recommending rejection this time around. However, I encourage the authors to continue pursuing this line of work and resubmit after incorporating the feedback from this round.",ICLR2022, +II4lbZQObA,1576800000000.0,1576800000000.0,1,BJxbOlSKPr,BJxbOlSKPr,Paper Decision,Reject,"The presented paper gives a differentiable product quantization framework to compress embedding and support the claim by experiments (the supporting materials are as large as the paper itself). Reviewers agreed that the idea is simple is interesting, and also nice and positive discussion appeared. However, the main limiting factor is the small novelty over Chen 2018b, and I agree with that. Also, the comparison with low rank is rather formal: of course it would be of full rank , as the authors claim in the answer, but looking at singular values is needed to make this claim. Also, one can use low-rank tensor factorization to compress embeddings, and this can be compared. +To summarize, I think the contribution is not enough to be accepted.",ICLR2020, +7zzTp7G9Lr,1576800000000.0,1576800000000.0,1,BJg8_xHtPr,BJg8_xHtPr,Paper Decision,Reject,"The author proposes a object-oriented probabilistic generative model of 3D scenes. The model is based on the GQN with the key innovation being that there is a separate 3D representation per object (vs a single one for the entire scene). A scene-volume map is used to prevent two objects from occupying the same space. The authors show that using this model, it's possible to learn the scene representation in an unsupervised manner (without the 3D ground truth). + +The submission has received relatively low scores with one weak accept and 3 weak rejects. All reviewers found the initial submission to be unclear and poorly written (with 1 reject and 3 weak rejects initially). The initial submission also failed to acknowledge prior work on object based representations in the 3D vision community. Based on the reviewer feedback, the authors greatly improved the paper by reworking the notation and the description of the model, and included a discussion of related work from 3D vision. Overall, the exposition of the paper was substantially improved. Some of the reviewers recognize the improvement, and lifted their scores. + +However, the work still have some issues: +1. The experimental section is still weak +The reviewers (especially those from an computer vision background) questioned the lack of baseline comparisons and ablation studies, which the authors (in their rebuttal) felt to be unnecessary. It is this AC's opinion that comparisons against alternatives and ablations is critical for scientific rigor, and high quality work aims not to just propose new models, but also to demonstrate via experimental analysis how the model compares to previous models, and what parts of the model is necessary, coming up with new metrics, baselines, and evaluation when needed. + +It is the AC's opinion that the authors should attempt to compare against other methods/baselines when appropriate. For instance, perhaps it would make sense to compare the proposed model against IODINE and MONet. Upon closer examination of the experimental results, the AC also finds that the description of the object detection quality to be not very precise. Is the evaluation in 2D or 3D? The filtering of predictions that are too far away from any ground truth also seems unscientific. + +2. The objects and arrangements considered in this paper is very simplistic. + +3. The writing is still poor and need improvement. +The paper needs an editing pass as the paper was substantially rewritten. There are still grammar/typos, and unresolved references to Table ?? (page 8,9). + + +After considering the author responses and the reviewer feedback, the AC believe this work shows great promise but still need improvement. The authors have tackled a challenging and exciting problem, and have provided a very interesting model. The work can be strengthened by improving the experiments, analysis, and the writing. The AC recommend the authors further iterate on the paper and resubmit. As the revised paper was significantly different from the initial submission, an additional review cycle will also help ensure that the revised paper is properly fully evaluated. The current reviewers are to be commended for taking the time and effort to look over the revision. ",ICLR2020, +28dJj9iHUFv,1642700000000.0,1642700000000.0,1,HObMhrCeAAF,HObMhrCeAAF,Paper Decision,Accept (Poster),"The paper is focussed on proposing a new evaluation metric for evaluating untrained, randomly initialized neural network architectures towards predicting their accuracy/performance after training. The metric they propose is based on evaluating the gradient sign. The method shown to outperform existing approaches on NAS benchmarks. + +The reviewers found the paper's idea simple but effective. The experimental evaluation and efficacy of the proposed method were the main strong points of the paper. The paper was also significantly improved during the discussion period both in terms of presentation and the scope of experiments/comparisons was enlarged. + +While the metric is theoretically motivated, I personally found some of the theoretical statements weak in terms of assumptions/clarity. I would request the authors to consider taking this and other suggestions made by reviewers into account + +Overall I recommend acceptance based on the strong and thorough experimental results shown by the paper on a problem of clear interest to the community.",ICLR2022, +9z66ZRXMet,1576800000000.0,1576800000000.0,1,H1eH9hNtwr,H1eH9hNtwr,Paper Decision,Reject,"The paper proposed U-net for segmentation of stagnant zones in computed tomography. Technical contribution of the paper is severely limited, and is not of the quality expected of publications in this venue. The paper is not anonymized and violates the double blind review rule. I'm thus recommending rejection.",ICLR2020, +SmgnwHjeE_h,1610040000000.0,1610470000000.0,1,8QAXsAOSBjE,8QAXsAOSBjE,Final Decision,Reject,"The initial reviews for this paper were very borderline. The authors provided detailed responses as well as a few additional results and observations. The authors' responses answered the reviewers' questions and addressed their main comments (including in the discussion of related works as well as with more in-depth analysis in a new Section 5.1). Unfortunately, the reviewers did not come to a consensus. + +Overall, this paper extends some current methodology for emotional classification, is well-executed, and provides a reasonably thorough study. The results are somewhat in line with previous results from other fields (and notably NLP), but the authors demonstrate the efficacy of using primary multi-task learning for multimodal conversational analysis. + +Unfortunately, this paper also has some flaws as highlighted by the initial reviews. As stated above, the authors did provide a strong rebuttal, but given the different comments raised by the reviewers that spanned many aspects of the paper including motivation, possibly limited contribution and novelty, missing related work, somewhat shallow analysis of the results, I find that another full round of reviewing would be useful to assess the paper. + +As a result, this remains a very borderline paper, and given the strong competition at this year's conference, I cannot recommend acceptance at this stage. + +I suggest that the authors incorporate some of the discussions from this forum (and especially with respect to related work, new findings, and clearly defining the motivation and contribution of this work) into the next version of their paper.",ICLR2021, +tN9wQNjIxz4,1642700000000.0,1642700000000.0,1,0U0C2pXfTZl,0U0C2pXfTZl,Paper Decision,Reject,"The most positive reviewers have not decided to step forward to champion the paper. Others have a negative impression which has not sufficiently changed after the answers from authors. Actually, it is acknowledge that there have been many modifications, but they are not happy enough with this situation: modifications (some significant ones) cannot always be fully checked again and even with the efforts that were made by reviewers, strong concerns remained. It has been pointed out that the direction has potential. My recommendation is based on the data that I have available.",ICLR2022, +OcFogYsPpd,1610040000000.0,1610470000000.0,1,GJkTaYTmzVS,GJkTaYTmzVS,Final Decision,Reject,"The paper studies a novel problem setting of automatically grading interactive programming exercises. Grading such interactive programs is challenging because they require dynamic user inputs. The paper's main strengths lie in formally introducing this problem, proposing an initial solution using reinforcement learning, and curating a large dataset from code.org. All reviewers generally appreciated the importance of the research problem studied and the potential of the work. Even though the reviewers found the work interesting, there was a clear consensus that the work is still immature and not yet ready for publication. I appreciate the authors' engagement with the reviewers during the discussion phase. Overall, the reviewers have provided very detailed and constructive feedback to the authors. We hope that the authors can incorporate this feedback when preparing future revisions of the paper. +",ICLR2021, +xoAM5uSgkW_,1610040000000.0,1610470000000.0,1,iTeUSEw5rl2,iTeUSEw5rl2,Final Decision,Reject,"This paper presented an online continual learning method where there may be a shift in data distribution at test time. The paper proposes a Conditional Invariant Experience Replay (CIER) approach to correct the short which matches the distribution of inputs conditioned on the outputs. This is based on an adversarial training scheme. + +The reviewers found the problem setting interesting but found the approach to be lacking in novelty and problem formulation somewhat restrictive (e.g., requiring domain id during training). The author feedback was taken into account but the reviewers stayed with their original assessment and, even after the rebuttal phase, none of the reviewers is in favor of accepting the paper. + +The authors are advised to consider the feedback from the reviewers which will hopefully help to improve the paper for a future submission to another venue.",ICLR2021, +aa6gblT3Nl,1576800000000.0,1576800000000.0,1,BkgZSCEtvr,BkgZSCEtvr,Paper Decision,Reject,Novelty of the proposed model is low. Experimental results are weak.,ICLR2020, +VbKOJXEoeR,1576800000000.0,1576800000000.0,1,HJxhWa4KDr,HJxhWa4KDr,Paper Decision,Reject,"Reviewers raise the serious issue that the proof of Theorem 2 is plagiarized from Theorem 1 of ""Demystifying MMD GANs"" (https://arxiv.org/abs/1801.01401). With no response from the authors, this is a clear reject. +",ICLR2020, +o3LJGTcP5So,1642700000000.0,1642700000000.0,1,3pugbNqOh5m,3pugbNqOh5m,Paper Decision,Accept (Poster),"This paper proposes a class of neural processes that lifts the limitations of conditional neural processes (CNPs) and produces dependent/correlated outputs but that, as CNPs, is inherently scalable and it is easy to train via maximum likelihood. The proposed model is extended to multi-output regression and to capture non-Gaussian output distributions. Results are presented on synthetic data, an electroencephalogram dataset and on a climate modeling problem. The paper parameterizes the prediction map as a Gaussian, where the mean and covariance are determined using neural networks. Non-Gaussian prediction maps are obtained using copulas. + +Technically speaking, the reviewers found the approach to be incremental and only marginally significant and I agree with them. Issues such as estimates of computational cost, using fixed lengthscales for the covariances and relationships/using normalizing flows have been addressed by the authors satisfactorily. Empirically, the contribution of the paper is somewhat significant, as it provides similar flexibility to other more computationally expensive processes and more general assumptions than conditional neural processes.",ICLR2022, +rJgRcajgl4,1544760000000.0,1545350000000.0,1,SJz6MnC5YQ,SJz6MnC5YQ,No reviewer was willing to champion this work,Reject,"Although one reviewer recommended accepting this paper, they were not willing to champion it during the discussion phase and did not seem to truly believe it is currently ready for publication. Thus I am recommending rejecting this submission.",ICLR2019,4: The area chair is confident but not absolutely certain +QEu4VxDy0I6,1642700000000.0,1642700000000.0,1,6IYp-35L-xJ,6IYp-35L-xJ,Paper Decision,Accept (Poster),"This paper is close to the borderline, but I think it is good enough that I recommend its acceptance. Although there were some problems raised by the reviewers, the authors managed to successfully address a majority of them. Having said that, I still recommend that the authors carefully analyze the reviews again and make sure that they incorporated reviewers' comments in the final version of the paper. A lot of them were constructive and might improve the quality of the paper.",ICLR2022, +SJlJ3qJOlN,1545240000000.0,1545350000000.0,1,HJgeEh09KQ,HJgeEh09KQ,A novel and scalable approach to robustness analysis of neural nets,Accept (Poster),"The paper addresess an important problem of neural net robustness verification, and presents a novel approach outperforming state of art; author provided details rebuttals which clarified their contributions over the state of art and highlighted scalability; this work appears to be a solid and useful contribution to the field. + ",ICLR2019,4: The area chair is confident but not absolutely certain +4kseDFG3Ub,1576800000000.0,1576800000000.0,1,B1xMEerYvB,B1xMEerYvB,Paper Decision,Accept (Poster),"The paper discusses smooth market games and demonstrate the merit of the approach. The reviewers agree on the quality of the paper, and the comments have been addressed well by the authors. +",ICLR2020, +rkk0IkTSM,1517250000000.0,1517260000000.0,890,B1mSWUxR-,B1mSWUxR-,ICLR 2018 Conference Acceptance Decision,Reject,"There are some interesting ideas discussed in the paper, but the reviewers expressed difficulty understanding the motivation and the theoretical results. The experiments do not seem convincing in showing that SQDML achieves significant gains. Overall, the the paper needs either stronger and clearer theoretical results, or more convincing experiments for publication at ICLR.",ICLR2018, +Sk6Tf1pHz,1517250000000.0,1517260000000.0,37,SJzRZ-WCZ,SJzRZ-WCZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper characterizes the induced geometry of the latent space of deep generative models. The motivation is established well, such that the paper convincingly discusses the usefulness derived from these insights. For example, the results uncover issues with the currently used methods for variance estimation in deep generative models. The technique invoked to mitigate this issue does feel somehow ad-hoc, but at least it is well motivated. + +One of the reviewers correctly pointed out that there is limited novelty in the theoretical/methodological aspect. However, I agree with the authors’ rebuttal in that characterizing geometries on stochastic manifolds is much less studied and demonstrated, especially in the deep learning community. Therefore, I believe that this paper will be found useful by readers of the ICLR community, and will stimulate future research. ",ICLR2018, +8l3nr14lO,1576800000000.0,1576800000000.0,1,rJx2slSKDS,rJx2slSKDS,Paper Decision,Reject,"This paper proposes to improve VAE/GAN by performing variational inference with a constraint that the latent variables lie on a sphere. The reviewers find some technical issues with the paper (R3's comment regarding theorem 3). They also found that the method is not motivated well, and the paper is not convincing. Based on this feedback, I recommend to reject the paper.",ICLR2020, +iXujitj7wWc,1642700000000.0,1642700000000.0,1,6w2zSI9RAnf,6w2zSI9RAnf,Paper Decision,Reject,"After going over the reviews and the rebuttal, and skimming the paper, I feel like unfortunately this paper is not ready to be accepted. + +My reasoning is as follows. I feel the comparison with A2C and PPO is not and should not be the main target of the work. Of course they are good to have as reference points, and they should be in the paper. But the work is not trying to claim that the distilled symbolic policy is more data efficient (or outperforms these methods). If that would be the point, that one has questions similar to reviewer cXsw about these baselines maybe underperforming (compared to other published work). Maybe this is due to a change in setup as argued by the rebuttal, nevertheless this makes comparison and understanding the results difficult. The other argument is that A2C / PPO are not the most efficient DRL methods for atari. Lastly, is the question of distilled symbolic policy having access to an expert, making this not an apples to apples comparison. +But as I said, and I think this is the point of the authors as well, this is not the point of the paper. But then I find the results not being sufficiently contextualized either by comparing to other methods in this space, or various ablation studies to motivate the choices taken by the authors. Similar points were raised by other reviewers (wezQ, cXsw). Some of these ablations have been brought forward in the rebuttal, but I think they should be a more central part of the work and implies considerable edits to the paper. +I think the stance that the object identification is decoupled from the symbolic policy is also a bit dangerous. I.e. a learned object identifier (particularly in a visual more complex setting) will have different failure modes, which will affect the policy. I think having a paragraph discussing the issues raised by reviewer AL2N would actually strengthen the paper, and being open about open questions/weaknesses. Alternatively additional ablation or experiments in either other kind of environments (e.g. 3D or environments with occlusion) or just assuming some form of failure at segmentation the visual stream into objects to show robustness would be of interest. + +Overall I urge the authors to resubmit their work after properly integrating some of the feedback. In particular focusing on ablation studies or having baselines that are more similar in spirit or at least being more explicit of how it compares with existing work and what aspect of that existing work is trying to fix. For e.g. part of the approach is that it relies on distillation rather than dealing with the RL objective (as other methods might try to do). Now if you take those methods, but you phrase them in a distillation process how would they do? I don’t know if all of this needs to be done, but it just feel as a work to be less grounded and sufficiently far to other existing methods to trivially understand the relationship, while directly only comparing to non-symbolic methods in a way that is not in some sense in the advantage of the non-symbolic methods. + Additionally, being more explicit about the potential weaknesses of the method, maybe empirically showing what happens with imperfect segmentation. The work is interesting, and I agree that this is a young field and the goal is *not* to produce state of the art results or outperform DRL methods. And is *not* to solve all the problems with symbolic methods at once, but to improve our understanding in this space. But I think the framing is not the right one in the current manuscript.",ICLR2022, +mI5CcisFKOKn,1642700000000.0,1642700000000.0,1,LM17I_oVVPB,LM17I_oVVPB,Paper Decision,Reject,"The paper presents a general solution method for constrained RL problems using reward-free exploration. While the reviewers found this reduction interesting in general, they had concerns about the price of this reduction in general (such as the increased regret or for suboptimal dependence of the bounds on some problem parameters), which is to be paid in exchange for the simplicity and flexibility of the proposed approach. This, coupled with the limited technical novelty used in the derivations, made all reviewers think that this is a borderline paper, and I also agree with this assessment. The paper could benefit a lot from presenting more evidence of the benefits of their approach (either theoretically or empirically). Based on the above, unfortunately, I am not able to recommend acceptance at this point.",ICLR2022, +S0s8xOtxicB,1610040000000.0,1610470000000.0,1,Pbj8H_jEHYv,Pbj8H_jEHYv,Final Decision,Accept (Spotlight),"Very good paper: it proposes a novel parameterization of orthogonal convolutions that uses the Cayley transform in the Fourier domain. The paper discusses several aspects of the proposed parameterization, including limitations and computational considerations, and showcases it in the important application of adversarial robustness, achieving good results. The reviews are all very positive, so I'm happy to recommend acceptance. + +Also, a big shout-out to the reviewers and to the authors for being *outstanding* during the discussion period. The reviewers engaged with the paper to a great depth, and the authors improved the paper considerably as a response. Well done to all of you.",ICLR2021, +H1xyINe1gV,1544650000000.0,1545350000000.0,1,S1grRoR9tQ,S1grRoR9tQ,Good Bayesian approach to deep networks with spike-and-slab prior but with limited originality and lack of experiment support,Reject,"This paper proposes a Bayesian alternative to dropout for deep networks by extending the EM-based variable selection method with SG-MCMC for sampling weights and stochastic approximation for tuning hyper-parameters. The method is well presented with a clear motivation. The combination of SMVS, SG-MCMC, and SA as a mixed optimization-sampling approach is technically sound. + +The main concern raised by the readers is the limited originality. SG-MCMC has been studied extensively for Bayesian deep networks and applying the spike-and-slab prior as an alternative to dropout is a straightforward idea. The main contribution of the paper appears to be extending EMVS to deep net with commonly used sampling techniques for Bayesian networks. + +Another concern is the lack of experimental justification for the advantage of the proposed method. While the authors promise to include more experiment results in the camera-ready version, it requires a considerable amount of effort and the decision unfortunately has to be made based on the current revision.",ICLR2019,3: The area chair is somewhat confident +pI6KzZlJgn,1642700000000.0,1642700000000.0,1,4Stc6i97dVN,4Stc6i97dVN,Paper Decision,Reject,"The paper gives high probability bounds on excess risk for differentially private learning algorithms, in the setting where the loss is assumed to be Lipschitz, smooth, and assumed to satisfy the Polyak-Łojasiewicz (PL) condition. The key idea in the paper is to leverage the curvature in the loss (PL condition) and the generalized Bernstein condition. + +Authors show that they get sharper bounds of the order \sqrt{p}/(n\epsilon) when the loss is assumed to satisfy the PL condition besides being convex Lipschitz/smooth. Without using some curvature information about the loss function, the best upper bounds we can get are in the order of \sqrt{p}/(n\epsilon) + 1/\sqrt{n} — and this is tight at least in terms of the dependence on n given the nearly matching lower bounds — in fact, the dependence on n is tight as it matches the non-private settings. + +So, I find it a bit misleading when authors say that they improve over the existing results. That statement is not true in its generality — it is true that we can leverage the PL condition to give faster rates but that is not the setting of prior work. Again, the bounds that authors compare against are for smooth/Lipschitz convex loss functions and without any assumption on the curvature of the loss. + +If we do look at the literature for when and/or how can curvature help, we can compare against the existing bounds for strongly convex losses. The best-known result in the setting that is most closely related is that of Feldman et al. (STOC 2020): https://dl.acm.org/doi/pdf/10.1145/3357713.3384335. As we can check from Theorem 4.9 in that paper, the bounds we get are in the order of 1/n + d/n^2 which is actually better — not surprising since PL condition is a weaker condition. There is merit to the results in this paper but the current narrative is quite misleading and a more careful comparison with the existing literature is needed. The bounds are hard to parse — for example, what is the dependence on the strong convexity parameter (\mu)? It would also help to instantiate specific loss functions so that we can fix some of the parameters in the bound to have a clear comparison with the existing bounds.",ICLR2022, +yQmvz5RzSiH9,1642700000000.0,1642700000000.0,1,tV3N0DWMxCg,tV3N0DWMxCg,Paper Decision,Accept (Spotlight),"This paper presents a method for producing higher quality uncertainty estimates by mapping the predictions from an arbitrary (e.g. deep learning) model to an exponential family distribution. This is achieved by using the model to map from the inputs to a low-dimensional latent space and then using a normalizing flow to map to the parameters of the distribution. The authors show empirically that this improves over a variety of baselines on a number of OOD and uncertainty quantification tasks. This paper received 5 reviews who all agreed that the paper should be accepted (6, 6, 8, 8, 8). The reviewers in general found the method novel compared to existing literature, compelling and the results strong. Multiple reviewers asked for experiments with higher dimensional output distributions (e.g. CIFAR 100) and had concerns regarding the ""entropy regularization"" term (akin to the beta term in a beta VAE, this is a constant applied to the entropy term). The reviewers seemed satisfied with the author response, however, and the concensus decision is to accept.",ICLR2022, +L2RMQ1mGbqU,1642700000000.0,1642700000000.0,1,rwEv1SklKFt,rwEv1SklKFt,Paper Decision,Reject,"The authors claim that backdoored classifiers are ""fundamentally broken"" by demonstrating that other backdoors can be generated for such classifiers without the knowledge of the original backdoors. The proposed method, however, requires manual intervention and is not justified by theoretical arguments. Numerous questions asked by the reviewers were not addressed in the rebuttal period.",ICLR2022, +GFORc7FXoY,1610040000000.0,1610470000000.0,1,q3KSThy2GwB,q3KSThy2GwB,Final Decision,Accept (Spotlight),"This paper introduces a method for approximating real-time recurrent learning (RTRL) in a more computationally efficient manner. Using a sparse approximation of the Jacobian, the authors show how they can reduce the computational costs of RTRL applied to sparse recurrent networks in a manner that introduces some bias, but which manages to preserve good performance on a variety of tasks. + +The reviewers all agreed that the paper was interesting, and all four reviewers provided very thorough reviews with constructive criticisms. The authors made a very strong effort to attend to all of the reviewers' comments, and as a result, some scores were adjusted upward. By the end, all reviewers had provided scores above the acceptance threshold. + +In the AC's opinion, this paper is of real interest to the community and may help to develop new approaches to training RNNs at large-scale. As such, the AC believes that it should be accepted and considered for a spotlight.",ICLR2021, +AbIEcVLZCMc,1642700000000.0,1642700000000.0,1,ufGMqIM0a4b,ufGMqIM0a4b,Paper Decision,Accept (Poster),"3 reviewers recommend accept, 1 rates the paper marginally above acceptance. The authors provided satisfactory answers to criticism -- all in all this is a paper worth accepting at ICLR. Please make sure that criticism in the reviews is adequately addressed in the final version, e.g. include various experimental results in the rebuttal, add the symbols in sec 3.2 & 3.3 to fig. 2, add a related discussion on ablations when the model is fully trained, etc.",ICLR2022, +yo21e2TDLOI,1642700000000.0,1642700000000.0,1,_4D8IVs7yO8,_4D8IVs7yO8,Paper Decision,Reject,"This paper proposes a simple approach to improve the robustness of training a sparsely gated mixture-of-experts model, which at a high level simply consists in training initially as a dense gated model, to better warm start a final phase of sparse training. Results are presented to highlight the potential benefits of this approach. + +The authors have provided a detailed response and updated results, in response to the reviews. Each reviewer has also responded at least once to the author response. Despite that engagement, all reviewers are leaning towards rejection (though there is one reviewer with a rating of 6, they regardless state that ""I'm confident this will make a great resubmission at a future venue"", indicating they actually support rejection). + +The reviewers point out that the proposed method is not really novel, pointing to an existing recent paper. Even without that prior work, I would also argue that the proposed approach is conceptually straightforward and has benefits that were fairly predictable and not particularly surprising. Given the generally lukewarm reception from the reviewers, I think there is a legitimate concern to be had here about this work's potential for impact. + +Though the review process has definitely improved the paper's manuscript since its submission, I unfortunately could not find a reason to dissent from the reviewers' consensus that this submission is not ready to be published. Therefore recommend it be rejected at this time.",ICLR2022, +xXCeknlK9kkT,1642700000000.0,1642700000000.0,1,EskfH0bwNVn,EskfH0bwNVn,Paper Decision,Accept (Oral),"All reviewers are very positive about this paper. The reviewer with the lowest score did independent experiments that show that the authors' method works well, and has had an extensive discussion with the authors that justifies a higher score. The paper is potentially very valuable to practitioners, since it shows how to compensate for a training set that is not representative of the test data. + +Suggestion from the area chair to the authors: Briefly discuss the relationship between influence scores and propensity scores, which are standard in the literature on causal modeling and on sample selection bias, as in https://jmlr.csail.mit.edu/papers/volume10/bickel09a/bickel09a.pdf for example.",ICLR2022, +oW4FaW6ip4,1610040000000.0,1610470000000.0,1,3JI45wPuReY,3JI45wPuReY,Final Decision,Reject,"This work proposes a framework to search for the topology of an artificial neural network jointly with the network training, via a genetic algorithm that can decide structural actions, such as addition or removal of neurons and layers. An extra heuristic based on Bayesian information criterion helps the optimization process decide on its decisions about the topology. They demonstrate improvements over baseline fully-connected networks on SVHN and (augmented) CIFAR-10. + +Reviewers and myself agree that this is an interesting idea, and that the paper is easy to follow. While I may not agree that we need to achieve SOTA on these datasets, or see large scale ImageNet-type experiments for novel ideas, I agree with the reviewers, esp R1's point that the current experiments are not satisfactory to meet the bar for acceptance at ICLR. + +CIFAR-10 and SVHN are well-established tasks, and showing baseline accuracy of 75%/48% on them respectively doesn't seem to do them justice, especially when most methods (even with low compute requirements) can get > 95% on both, for the past few years. For this work to be of interest to the broader community, it needs to be improved to incorporate at least respectable baselines on these small datasets, and perhaps be improved to work beyond fully connected networks. + +At this stage, we need to see a revision of the method and see improvements before an acceptance decision can be made.",ICLR2021, +yC3llk6YoY6,1642700000000.0,1642700000000.0,1,qrdbsZEZPZ,qrdbsZEZPZ,Paper Decision,Reject,"The premise is an exciting observation: Differential privacy in federated +learning might imply being certified against poisoning attacks. While +this may be considered not surprising by some, the connection between +differential privacy and robustness is interesting to many. The +relationship was characterized both theoretically and empirically. + +The reviewers discussed the paper extensively with the authors, and +while many issues were clarified, issues on correctness still +remained: it is unclear if the proposed DP mechanism actually is DP, +and subsampling amplification also had issues. Clarity needs to be +added in the writing, and the extensive comments by the reviewers +hopefully help the authors in that.",ICLR2022, +hQyHTkVPCxw,1610040000000.0,1610470000000.0,1,dYeAHXnpWJ4,dYeAHXnpWJ4,Final Decision,Accept (Oral),"This paper studies why input gradients can give meaningful feature attributions even though they can be changed arbitrarily without affecting the prediction. The claim in this paper is that ""the learned logits in fact represent class conditional probabilities and hence input gradients given meaningful feature attributions"". The main concern is that this claim is verified very indirectly, by adding a regularization term that promotes logits learning class conditional probabilities and observing that input gradient quality also improves. Nevertheless, there are interesting insights in the paper and the questions it asks are very timely and important, and overall, it could have a significant impact on further research in this area.",ICLR2021, +BJ2zBJ6Hf,1517250000000.0,1517260000000.0,524,rJLTTe-0W,rJLTTe-0W,ICLR 2018 Conference Acceptance Decision,Reject,"Thank you for submitting you paper to ICLR. The consensus from the reviewers is that this is not quite ready for publication. There is also concern about whether ICLR, with its focus on representational learning, is the right venue for this work. + +One of the reviewers initially submitted an incorrect review, but this mistake has now been rectified. Apologies that this was not done sooner in order to allow you to address their concerns.",ICLR2018, +AGDDrFGb5Dq,1642700000000.0,1642700000000.0,1,BK-4qbGgIE3,BK-4qbGgIE3,Paper Decision,Accept (Poster),This well written and well motivated paper has been independently reviewed by four expert reviewers. They all voted for the acceptance with three straight accepts and one marginal. The feedback provided to authors was constructive and the authors responded comprehensively. I recommend acceptance of this work for ICLR.,ICLR2022, +LRRwH-F3Juo,1642700000000.0,1642700000000.0,1,VimqQq-i_Q,VimqQq-i_Q,Paper Decision,Accept (Poster),"This paper presents some insightful suggestions for researchers studying generalization in federated learning by separating two types of performance gaps between training and test performance, the participation gap (due to partial client participation) and the performance gap (due to data heterogeneity). They suggest that federated learning researchers use a three-way split between participating clients' training data, participants clients' validation data, and non-participating clients' data to measure the generalization performance of an FL model. The paper presents thorough experiments to support their conclusions. A common concern about the paper is that the authors' suggestions, although relevant and reasonable, are somewhat unsurprising and have been noted in different forms in other works in federated learning. Another concern is that the conclusions are purely based on experiments and are not supported by theoretical justification. Despite these concerns, the reviewers commended the overall insights presented in the paper. + +There was a healthy post-rebuttal discussion and some reviewers reevaluated the paper and raised their initial scores. Therefore, I recommend acceptance of the paper. I encourage the authors to take the reviewer's constructive suggestions into account when preparing the final version of the paper.",ICLR2022, +7Cc6dSGla,1576800000000.0,1576800000000.0,1,SJxrKgStDH,SJxrKgStDH,Paper Decision,Accept (Poster),"After the author response and paper revision, the reviewers all came to appreciate this paper and unanimously recommended it be accepted. The paper makes a nice contribution to generative modelling of object-oriented representations with large numbers of objects. The authors adequately addressed the main reviewer concerns with their detailed rebuttal and revision.",ICLR2020, +BjcWbtoRNLd,1610040000000.0,1610470000000.0,1,QTgP9nKmMPM,QTgP9nKmMPM,Final Decision,Reject,"In this paper, the authors propose a new layer-by-layer training approach for GNN in particular for a large graph. The proposed approach can be easily parallelizable and scale well to a large graph. Reviewers are concerned about the novelty of the approach and the lack of theoretical analysis, and it is not well addressed by the rebuttal. Therefore, this paper is below the acceptance threshold of ICLR. I encourage the authors to revise the paper based on the reviewer's comments and resubmit it to a future venue.",ICLR2021, +sRa22R6KFo2,1610040000000.0,1610470000000.0,1,zWvMjL6o60V,zWvMjL6o60V,Final Decision,Reject,"During the discussion among reviewers, we have shared the concern that this work has a significant overlap with [Liu et al. 2018] and [Liu & Motani 2020]. Although the authors tried to address this concern by the author response, I also think that the difference is not enough. In particular, the reviewers pointed out that Figure 1, Table 1, and Figure 3 are exactly the same with those in [Liu, 2020], and Proposition 2 in [Liu & Motani 2020] is Proposition 1 in this paper. Since these overlaps are not acceptable, I will reject the paper.",ICLR2021, +BkhNhMIue,1486400000000.0,1486400000000.0,1,rJq_YBqxx,rJq_YBqxx,ICLR committee final decision,Reject,"This paper is concurrently one of the first successful attempts to do machine translation using a character-character MT modeling, and generally the authors liked the approach. However, the reviewers raised several issues with the novelty and experimental setup of the work. + + Pros: + - The analysis of the work was strong. This illustrated the underlying property, and all reviewers praised these figures. + + Mixed: + - Some found the paper clear, praising it as a ""well-written paper"", however other found that important details were lacking and the notation was improperly overloaded. As the reviewers were generally experts in the area, this should be improved + - Reviewers were also split on results. Some found the results quite ""compelling"" and comprehensive, but others thought there should be more comparison to BPE and other morphogically based work + - Modeling novelty was also questionable. Reviewers like the partial novelty of the character based approach, but felt like the ML contributions were too shallow for ICLR + + Cons: + - Reviewers generally found the model itself to be overly complicated and the paper to focus too much on engineering. + - There were questions about experimental setup. In particular, a call for more speed numbers and broader comparison.",ICLR2017, +el6ByUFMNt,1642700000000.0,1642700000000.0,1,xKZ4K0lTj_,xKZ4K0lTj_,Paper Decision,Accept (Poster),"The reviewers agree that addressing long-horizon tasks with off-line learning and fine tuning afterwards from demonstrations is an interesting and relevant topic. The technical ideas about learning a relevance metric to select relevant off-line data, and to learn an inverse skill dynamics models. The experimental results are convincing, even if success rates are sometimes lower than expected. All reviewers recommend acceptance of the paper.",ICLR2022, +Byg_im0lxN,1544770000000.0,1545350000000.0,1,SyGjQ30qFX,SyGjQ30qFX,technical details require clarification and experiments lack sufficient comparisons ,Reject,"This paper proposes TopicGAN, a generative adversarial approach to topic modeling and text generation. TopicGAN operates in two steps: it first generates latent topics and produces bag-of-words corresponding to those latent topics. In the second step, the model generates text conditioning on those topic words. + +Pros: +It combines the strength of topic models (interpretable topics that are learned unsupervised) with GAN for text generation. + +Cons: +There are three major concerns raised by reviewers: (1) clarity, (2) relatively thin experimental results, and (3) novelty. Of these, the first two were the main concerns. In particular, R1 and R2 raised concerns about insufficient component-wise evaluation (e.g., text classification from topic models) and insufficient GAN-based baselines. Also, the topic model part of TopicGAN seems somewhat underdeveloped in that the model assumes a single topic per document, which is a relatively strong simplifying assumption compared to most other topic models (R1, R3). The technical novelty is not extremely strong in that the proposed model combines existing components together. But this alone would have not been a deal breaker if the empirical results were rigorous and strong. + +Verdict: +Reject. Many technical details require clarification and experiments lack sufficient comparisons against prior art.",ICLR2019,5: The area chair is absolutely certain +gP34QhvIYJ,1576800000000.0,1576800000000.0,1,B1l0wp4tvr,B1l0wp4tvr,Paper Decision,Reject,"This paper considers the information plane analysis of DNNs. Estimating mutual information is required in such analysis which is difficult task for high dimensional problems. This paper proposes a new ""matrix–based Renyi’s entropy coupled with ´tensor kernels over convolutional layers"" to solve this problem. The methods seems to be related to an existing approach but derived using a different ""starting point"". Overall, the method is able to show improvements in high-dimensional case. + +Both R1 and R3 have been critical of the approach. R3 is not convinced that the method would work for high-dimensional case and also that no simulation studies were provided. In the revised version the authors added a new experiment to show this. R3's another comment makes an interesting point regarding ""the estimated quantities evolve during training, and that may be interesting in itself, but calling the estimated quantities mutual information seems like a leap that's not justified in the paper."" I could not find an answer in the rebuttal regarding this. + +R1 has also commented that the contribution is incremental in light of existing work. The authors mostly agree with this, but insist that the method is derived differently. + +Overall, I think this is a reasonable paper with some minor issues. I think this can use another review cycle where the paper can be improved with additional results and to take care of some of the doubts that reviewers' had this time. + +For now, I recommend to reject this paper, but encourage the authors to resubmit at another venue after revision.",ICLR2020, +KJQQ_vTQ188,1642700000000.0,1642700000000.0,1,X6D9bAHhBQ1,X6D9bAHhBQ1,Paper Decision,Accept (Spotlight),"The paper extends MuZero to stochastic (but observable) MDPs. To represent stochastic dynamics, it splits transitions into two parts: a deterministic transition to an afterstate (incorporating all observations and actions up to the current time), followed by a stochastic outcome (accounting for new randomness that follows the last action). The transition to an afterstate is similar in spirit to ordinary MuZero's dynamics model; the stochastic outcome is learned by a VQ-VAE. At planning time, MuZero retains the MCTS lookahead from ordinary MuZero. Stochastic MuZero achieves impressive results: e.g., it maintains the original MuZero's strong performance on the deterministic game of Go, while improving on MuZero significantly (and achieving superhuman performance) on the stochastic game of backgammon. + +This is a strong paper overall: it presents a convincing and successful extension of the already-influential MuZero work, along with large-scale computational experiments confirming the utility of the approach. There are nonetheless a few weaknesses: first, compared to the original AlphaZero and MuZero work, it is perhaps less surprising that the given approach is successful, since it is more closely related to prior work. Second, due to the large-scale computational infrastructure needed, it is only possible to run some of the experiments once. This is not in itself a problem, but care needs to be taken in interpreting the results of such single-run experiments: e.g., any figures that show results of single-run experiments should have a clear warning label, and any statements such as ""stochastic MuZero performs better than original MuZero"" should be tempered with a caveat about how reliable these conclusions are likely to be. Section 5.4 (which runs shorter experiments using three random seeds each) makes a start at evaluating reliability, but (a) the headline results in previous sections do not contain any caveats or pointers to 5.4, and (b) 5.4 should explicitly acknowledge that it cannot hope to detect even quite-common failure cases with so few seeds.",ICLR2022, +4mgEyD8emX,1576800000000.0,1576800000000.0,1,BJxg_hVtwH,BJxg_hVtwH,Paper Decision,Accept (Poster),"The paper proposed an operation called StructPool for graph-pooling by treating it as node clustering problem (assigning a label from 1..k to each node) and then use a pairwise CRF structure to jointly infer these labels. The reviewers all think that this is a well-written paper, and the experimental results are adequate to back up the claim that StructPool offers advantage over other graph-pooling operations. Even though the idea of the presented method is simple and it does add more (albeit by a constant factor) to the computational burden of graph neural network, I think this would make a valuable addition to the literature.",ICLR2020, +n2tAR71-tIr,1610040000000.0,1610470000000.0,1,pOHW7EwFbo9,pOHW7EwFbo9,Final Decision,Reject,"Considering reviewers' comments and comparing with similar papers recently published or submitted, this is a good paper but hasn't reached the bar of ICLR. We believe that the paper is not ready for publication yet, and strongly encourage the authors to use the reviewers' feedback to improve the work and resubmit to one of the upcoming conferences. +",ICLR2021, +ITVu6fXnodG,1642700000000.0,1642700000000.0,1,085y6YPaYjP,085y6YPaYjP,Paper Decision,Accept (Poster),"The paper considers the problem of accelerated magnetic resonance imaging where the goal is to reconstruct an image from undersampled measurements. The paper proposes a zero-shot self-supervised learning approach for accelerated deep learning based magnetic resonance imaging. The approach partitions the measurements from a single scan into two disjoint sets, one set is used for self-supervised learning, and one set is used to perform validation, specifically to select a regularization parameter. The set that is used for self-supervised learning is then again split into two different sets, and a network is trained to predict the frequencies from one set based on the frequencies in the other set. This enables accelerated MRI without any training data. +The paper evaluates on the FastMRI dataset, a standard dataset for deep learning based MRI research, and the paper compares to a trained baseline and an un-trained baseline (DIP). The paper finds their self-supervised method to perform very well compared to both and shows images that indicate excellent performance. It would have been even better to compare the method on the test set of the FastMRI competition to have a proper benchmark comparison. + +Here is how the discussion went: +- Reviewer pt6r is supportive of acceptance, but notes a few potential irregularities, such as the method pre-trained on brain and tested on knees performing better than the method pre-trained on knees and tested on knees, and not providing a comparison of the computational cost. The authors added a table to the appendix revealing that the computational costs are very high, much higher than for DIP even. The reviewer was content with the response and raised the score. + +- Reviewer mBMk argues that the contribution is too incremental compared to prior work, in particular relative to the results of [Yaman et al., 2020], and also argues that the idea of partitioning the measurements is not new. The authors argue in response that their approach of partitioning the measurements is new, and the reviewer was inclined to raise the score slightly, but still thinks that the novelty on the technical ML side remains limited, and doesn't want to back the submission too much, and did not raise the score at the end in the system. + +- Reviewer 19v3 has the concern that the all elements used (transfer learning, plug-and-play, etc) are well known techniques and have been applied before to MRI, and therefore thinks that the paper does not clear the bar for acceptance. The paper points out that while those ideas might be applied for the first time to MRI, they have been used before in other image reconstruction problems, in particular denoising. + +I've read the paper in detail too, and am somewhat on the fence: I think it's very valuable to see that a clever application of self-supervised learning works so well for MRI. I agree with the reviewers that the technical novelty is relatively small, but on the other hand this is the first time that I see self-supervised learning being applied that successfully to MRI. I don't share the concern about novelty --- yes, the paper's approach builds on prior work, but it's not clear from the literature how well such a well tuned self-supervised learning approach would work. +What I would have liked to see in addition to the experimental results is a proper evaluation on the FastMRI dataset: An advantage of the FastMRI dataset is that it provides a benchmark and if researchers evaluate on that benchmark (on the testset/validation set) we can compare different methods well. The paper under review doesn't do that, it only evaluates on 30 test slices, and thus it's hard to benchmark the method. Also, the paper would benefit from more ablation studies. + +In conclusion, I would be happy to discuss this paper at the conference, and think that other researchers in the intersection of deep learning and inverse problems would be too, and therefore recommend acceptance.",ICLR2022, +a-tjlf_TDF,1576800000000.0,1576800000000.0,1,rJlDoT4twr,rJlDoT4twr,Paper Decision,Reject,"This paper presents a unified probabilistic approach for deep continual learning by combining generative and discriminative models into one framework that solves the following problems: catastrophic forgetting, and identifying out of distribution and open set examples. The method termed, OCDVAE in the paper achieves closer or better to SOTA results in different evaluation tasks. + +The reviewers had several concerns about the presentation of the paper and some errors in the equations, all of which seem to have been fixed in the latest upload made by the authors. Blind review #3 was delayed as the original reviewer refused to review the paper and this review was then obtained by someone else after the new upload of the paper, so this review looks at the new version of the paper. I would recommend the authors to incorporate suggestions provided by reviewer #3 in the final version of the paper including expanding on the related work section. + +However, as of now I recommend to reject the paper.",ICLR2020, +GPB4KI6f9CX,1642700000000.0,1642700000000.0,1,TD-5kgf13mH,TD-5kgf13mH,Paper Decision,Reject,"This paper experiments with a combination of Sparse MoEs and Ensembles on the Vision Transformer (ViT), showing improved performance. To efficiently combine Sparse MoEs and Ensembles, the paper presents Partitioned Batch Ensembles (PBE), where the parameters of the self-attention layers are shared, and an ensemble of Sparse MOEs are used for the MLP layers of the Transformer blocks. + +While reviewers agree that the proposed approach is interesting, they also point out several weaknesses, such as the limited novelty of the proposed method (a simple combination of existing techniques) and small experimental gains. They also pointed out several weakness related to the experimental part. While the authors responded in a very detailed manner to several of these points and presented several additional experiments, I feel this paper will benefit from consolidating all these new results and going through another round of reviews.",ICLR2022, +dp3d1Iy5YCy,1642700000000.0,1642700000000.0,1,nnU3IUMJmN,nnU3IUMJmN,Paper Decision,Accept (Poster),"Reviewers were in agreement but borderline. The paper has a nice hypothesis and develops the work using two realistic datasets, Wikipedia and Code. One reviewer was initially more negative but changed their views based on the authors improvements to the paper. +The idea is fairly simple, but does require modellers come up with the structural features. There was discussion that more down-stream tasks are needed to highlight the approach. Moreover, more datasets should be experimented with. In all, experiments are good but improvement is easily done.",ICLR2022, +0tm3sfSNJI,1642700000000.0,1642700000000.0,1,mhYUBYNoGz,mhYUBYNoGz,Paper Decision,Accept (Poster),"This paper proposes a new theory for modified DRM and PINN for solving elliptical PDEs, and delivers valuable advances on important topics.",ICLR2022, +S1SrSypHG,1517250000000.0,1517260000000.0,558,S1347ot3b,S1347ot3b,ICLR 2018 Conference Acceptance Decision,Reject,"This work is interested in using sentence vector representations as a method for both doing extractive summarization and as a way to better understand the structure of vector representations. While the methodological aspects utilize representation learning, the reviewers felt that the main thrust of the work would be better suited for a summarization workshop or even NLP venue, as it did not target DL based contributions. Additionally they felt that the work did not significantly engage with the long literature on the problem of summarization.",ICLR2018, +GM-O2ojRNJ,1576800000000.0,1576800000000.0,1,ryl3blSFPr,ryl3blSFPr,Paper Decision,Reject,"This work presents a simple technique for improving the latent space geometry of text autoencoders. The strengths of the paper lie in the simplicity of the method, and results show that the technique improves over the considered baselines. However, some reviewers expressed concerns over the presented theory for why input noise helps, and did not address concerns that the theory was useful. The paper should be improved if Section 4 were instead rewritten to focus on providing intuition, either with empirical analysis, results on a toy task, or clear but high level discussion of why the method helps. The current theorem statements seem either unnecessary or make strong assumptions that don't hold in practice. As a result, Section 4 in its current form is not in service to the reader's understanding why the simple method works. +Finally, further improvements to the paper could be made with comparisons to additional baselines from prior work as suggested by reviewers.",ICLR2020, +mwk7JmOjT8s,1642700000000.0,1642700000000.0,1,twgEkDwFTP,twgEkDwFTP,Paper Decision,Reject,"This paper studies the effect of importance weighting in three model classes: linear models, linearized networks, and wide fully connected networks, and show that under certain assumptions, gradient descent for training an overparameterized model converges to the same ERM interpolator regardless of the reweighting scheme. The reviewers acknowledge that this paper had good exposition and writing in general, but they were in consensus that the initial version of this paper includes many inaccurate overclaiming statements. In summary, after discussions, the reviewers would like the authors to: + +- revise the abstract and the introduction, specifically, adding appropriate qualifiers on the neural networks, losses, full gradient descent training, etc (Reviewers hLzT, M2hT, fZgx) +- address the discrepancy between theory and experiments, e.g. the inconsistency of the loss chosen in theory and experiments, the requirement that the widths of the neural networks need to large (Reviewers wG4N, fZgx) +- add experimental results for early stopping (Reviewer hLzT) +- empirically verify that the final solutions of ERM, DRO, and IW initialized at the same \theta^0 are very similar (Reviewer M2hT) +- discuss the novelty compared to Sagawa et al in the paper (Reviewer wG4N) + +thus, this submission needs a major revision, and is not ready for acceptance in its current form. We encourage the authors to revise accordingly, and resubmit in the future.",ICLR2022, +D7LHRKEVk3,1576800000000.0,1576800000000.0,1,S1lEX04tPr,S1lEX04tPr,Paper Decision,Accept (Poster),"This paper was generally well received by reviewers and was rated as a weak accept by all. +The AC recommends acceptance. +",ICLR2020, +6CYya7dyCj,1576800000000.0,1576800000000.0,1,HkxCcJHtPr,HkxCcJHtPr,Paper Decision,Reject,"This work propose a compression-aware training (CAT) method to allows efficient compression of feature maps during inference. I read the paper myself. The proposed method is quite straightforward and looks incremental compared with existing approaches based on entropy regularization. + +",ICLR2020, +ywTg9HO2d,1576800000000.0,1576800000000.0,1,H1lDbaVYvH,H1lDbaVYvH,Paper Decision,Reject,"This paper proposes augmentation of the state exploration strategy that is interesting and has a potential to lead to improvement. However, the current presentation makes it difficult to properly assess that. In particular, the way the authors convey both the underlying intuition and its implementation is fairly vague and does not build confidence in the grounding of the underlying methodology.",ICLR2020, +Z9Ja6s6vb,1576800000000.0,1576800000000.0,1,Bkeb7lHtvH,Bkeb7lHtvH,Paper Decision,Accept (Spotlight),"The paper considers the problem of training neural networks asynchronously, and the gap in generalization due to different local minima being accessible with different delays. The authors derive a theoretical model for the delayed gradients, which provide prescriptions for setting the learning rate and momentum. + +All reviewers agreed that this a nice paper with valuable theoretical and empirical contributions. + +",ICLR2020, +1yX16aYe6A,1576800000000.0,1576800000000.0,1,H1eLVxrKwS,H1eLVxrKwS,Paper Decision,Reject,"Perturbation-based methods often produce artefacts that make the perturbed samples less realistic. This paper proposes to corrects this through use of an inpainter. Authors claim that this results in more plausible perturbed samples and produces methods more robust to hyperparameter settings. +Reviewers found the work intuitive and well-motivated, well-written, and the experiments comprehensive. +However they also had concerns about minimal novelty and unfair experimental comparisons, as well as inconclusive results. Authors response have not sufficiently addressed these concerns. +Therefore, we recommend rejection.",ICLR2020, +HJbx71pBM,1517250000000.0,1517260000000.0,56,ByOExmWAb,ByOExmWAb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper makes progress on the open problem of text generation with GANs, by a sensible combination of novel approaches. The method was described clearly, and is somewhat original. The only problem is the hand-engineering of the masking setup. +",ICLR2018, +_guXzMQ6d6,1576800000000.0,1576800000000.0,1,HJlxIJBFDr,HJlxIJBFDr,Paper Decision,Accept (Poster),"The paper introduces a policy gradient estimator that is based on stochastic recursive gradient estimator. It provides a sample complexity result of O(eps^{-3/2}) trajectories for estimating the gradient with the accuracy of eps. +This paper generated a lot of discussions among reviewers. The discussions were around the novelty of this work in relation to SARAH (Nguyen et al., ICML2017), SPIDER (Fang et al., NeurIPS2018) and the work of Papini et al. (ICML 2018). SARAH/SPIDER are stochastic variance reduced gradient estimators for convex/non-convex problems and have been studied in the optimization literature. +To bring it to the RL literature, some adjustments are needed, for example the use of importance sampling (IS) estimator. The work of Papini et al. uses IS, but does not use SARAH/SPIDEH, and it does not use step-wise IS. + +Overall, I believe that even though the key algorithmic components of this work have been around, it is still a valuable contribution to the RL literature. +",ICLR2020, +SJlCZ4AflN,1544900000000.0,1545350000000.0,1,H1eqjiCctX,H1eqjiCctX,Further development/continuation of Arora et al.,Accept (Poster),"AR1 is concerned about lack of downstream applications which show that higher-order interactions are useful and asks why not to model higher-order interactions for all (a,b) pairs. AR2 notes that this submission is a further development of Arora et al. and is satisfied with the paper. AR3 is the most critical regarding lack of explanations, e.g. why linear addition of two word embeddings is bad and why the corrective term proposed here is a good idea. The authors suggest that linear addition is insufficient when final meaning differs from the individual meanings and show tome quantitative results to back up their corrective term. + +On balance, all reviewers find the theoretical contributions sufficient which warrants an accept. The authors are asked to honestly reflect all uncertain aspects of their work in the final draft to reflect legitimate concerns of reviewers.",ICLR2019,4: The area chair is confident but not absolutely certain +ryxzgItblN,1544820000000.0,1545350000000.0,1,HklVMnR5tQ,HklVMnR5tQ,"Borderline, with no clear reviewer endorsement",Reject,"The reviewers appreciated the clarity of writing, and the importance of the problem being addressed. There was a moderate amount of discussion around the paper, but the two reviewers who responded to the author discussion were split in their opinion, with one slightly increasing their score to a 6, and the other remaining unconvinced. The scores overall are borderline for ICLR acceptance, and given that, no reviewer stepped forward to champion the paper.",ICLR2019,3: The area chair is somewhat confident +SJgsomgWl4,1544780000000.0,1545350000000.0,1,B1edvs05Y7,B1edvs05Y7,Lack of novelty and strong empirical results; no rebuttal,Reject,"This paper proposes a sparse binary compression method for distributed training of neural networks with minimal communication cost. Unfortunately, the proposed approach is not novel, nor supported by strong experiments. The authors did not provide a rebuttal for reviewers' concerns. +",ICLR2019,5: The area chair is absolutely certain +EW66VVXFfV,1610040000000.0,1610470000000.0,1,vnlqCDH1b6n,vnlqCDH1b6n,Final Decision,Reject,"There were both positive and negative assessments of this paper by the reviewers: It was deemed a well written paper that explores cleanly rederiving the TC-VAE in the Wasserstein Autoencoder Framework and that has experiments comparing to competing approaches. However, there are two strong concerns with this paper: First, novelty appears to be strongly limited as it appears a rederivation using known approaches. Second, two reviewers were not convinced by the experimental results and do not agree with the claim that the proposed approach is better than competing methods in providing disentangled representations. I agree with this concern, in particular as assessing unsupervised disentanglement models is known to be very hard and easily leads to non-informative results (see e.g. the paper cited by the authors from Locatello et al., 2019). Overall, I recommend rejecting this paper.",ICLR2021, +TzznS3kPo0Ch,1642700000000.0,1642700000000.0,1,dLTXoSIcrik,dLTXoSIcrik,Paper Decision,Reject,"In this paper, the authors proposed an offline policy optimization algorithm, motivated by an analysis of the upper bound error of importance sampling policy value estimator. Specifically, by the decomposition of the error in a particularly way, the authors identified some error which does not converge. Then, the authors introduce the contraints over feasible actions to avoid the overfitting induced by such errors. Finally, the authors tested the proposed algorithm empirically. + +The paper is well-motivated and the authors addressed some of the questions in their rebuttals. However, there are still several issues need to be addressed, + +- The alternative practical estimator with plug-in behavior distribution would perfectly avoid the over-fitting, which is, however, ignored. This is an important and easy-to-implemented competitor. + +- The pessimistic principle in the face of uncertainty (PFU) has been exploited extensively in offline policy optimization problem. How the proposed algorithm is connected to the PFU has not been discussed carefully, especially in terms of non-asymptotic sample complexity, which makes the paper is not well-positioned. + +- While the motivation is derived from the unbiased importance sampling estimator, the counterfactual risk minimization in Equation 7 is introduced suddently, without clear justification. + +- In my opinion, for a better clarification of the paper, the expressiveness of the policy family should not be discussed in this way. I understand the authors would like to avoid any possible degeneration, and explain the asymptotic lossless in terms of policy flexibility. However, the whole point of the paper is trying to introduce some mechanism to avoid the possible overfitting by regularizing the policy family. In other words, the restriction is on purpose and beneficial. I think the argument of policy family expressiveness should be re-considered and re-discussed. + + +Minor: + +- Markovian vs. non-Markovian baseline comparison is not fair, and more comparison on well-known benchmarks, e.g., OpenAI gym, should be conducted. +- The \sigma upper bound should be explicitly provided and verified in practice. + +In sum, the paper is well-motivated, however, need further improvement to be pulished.",ICLR2022, +TFUtkTNm1OJ,1642700000000.0,1642700000000.0,1,6tmjoym9LR6,6tmjoym9LR6,Paper Decision,Accept (Poster),"The paper introduces a method to train neural networks based on so-called stability regularisation. The method encourages the outputs of functions of Gaussian random variables to be close to discrete and does not require temperature annealing like the Gumbel Softmax. All reviewers agreed that the proposed method was novel and of interest. The authors conducted extensive experiments. They also adequately addressed the concerns raised by the reviewers (e.g., theoretical foundation and computational cost).",ICLR2022, +VhH7thgG2,1576800000000.0,1576800000000.0,1,S1lXnhVKPr,S1lXnhVKPr,Paper Decision,Reject,The paper presents a novel variance reduction algorithm for SGD. The presentation is clear. But the theory is not good enough. The reivewers worry about the converge results and the technical part is not sound.,ICLR2020, +k897WTvqj6,1576800000000.0,1576800000000.0,1,SyljQyBFDH,SyljQyBFDH,Paper Decision,Accept (Poster),Four knowledgable reviewers recommend accept. Good job!,ICLR2020, +tYBc5fZvri,1576800000000.0,1576800000000.0,1,SygcCnNKwr,SygcCnNKwr,Paper Decision,Accept (Poster),"Main content: + +Blind review #1 summarizes it well: + +This paper first introduces a method for quantifying to what extent a dataset split exhibits compound (or, alternatively, atom) divergence, where in particular atoms refer to basic structures used by examples in the datasets, and compounds result from compositional rule application to these atoms. The paper then proposes to evaluate learners on datasets with maximal compound divergence (but minimal atom divergence) between the train and test portions, as a way of testing whether a model exhibits compositional generalization, and suggests a greedy algorithm for forming datasets with this property. In particular, the authors introduce a large automatically generated semantic parsing dataset, which allows for the construction of datasets with these train/test split divergence properties. Finally, the authors evaluate three sequence-to-sequence style semantic parsers on the constructed datasets, and they find that they all generalize very poorly on datasets with maximal compound divergence, and that furthermore the compound divergence appears to be anticorrelated with accuracy. + +-- + +Discussion: + +Blind review #1 is the most knowledgeable in this area and wrote ""This is an interesting and ambitious paper tackling an important problem. It is worth noting that the claim that it is the compound divergence that controls the difficulty of generalization (rather than something else, like length) is a substantive one, and the authors do provide evidence of this."" + +-- + +Recommendation and justification: + +This paper deserves to be accepted because it tackles an important problem that is overlooked in current work that is evaluated on datasets of questionable meaningfulness. It adds insight by focusing on the qualities of datasets that enable testing how well learning algorithms do on compositional generalization, which is crucial to intelligence.",ICLR2020, +ryeHXxhnJV,1544500000000.0,1545350000000.0,1,S14g5s09tm,S14g5s09tm,decision,Reject,"The paper received mixed reviews. The proposed ideas are reasonable and it shows that unpaired data can improve the performance of unseen video (action) classification tasks and other related tasks. The authors rightfully argue that the main contribution is the use of unpaired, multimodal data for learning a joint embedding (that generalizes to unseen actions) with positive results, but not the use of attentional pooling mechanism. Despite this, as the Reviewer3 points out, technical novelty seems minor as there are quite many papers on learning joint embedding for multimodal data. Many of these works were evaluated for fine-grained image classification setting, but there is no reason that such methods cannot be used here. The revision only compares against methods published in 2017 or before. So more comprehensive evaluation would be needed to fully justify the proposed method. In addition, it seems that the proposed method has fairly marginal gain for the generalized zero-shot learning setting. Overall, the paper can be viewed as an application paper on unseen action recognition tasks but the technical novelty and more rigorous comparisons against recent related work are somewhat lacking. I recommend rejection due to several concerns raised here and by the reviewers. +",ICLR2019,4: The area chair is confident but not absolutely certain +z-ZrW9WJld6,1642700000000.0,1642700000000.0,1,R79ZGjHhv6p,R79ZGjHhv6p,Paper Decision,Accept (Poster),"The authors propose a neural network model to preserve the sub-class similarity. The key of the model is to add a prototype layer to a multi-scale deep nearest neighbor network. The prototype layer stores the representative prototypes of some fine-grained sub-classes. The use of the prototype layer preserves intepretability and computational efficiency. Experimental results demonstrate that the proposed approach reaches state-of-the-art prototype learning performance. + +The reviewers generally find the paper clear and with sufficient contributions. The empirical validation is sufficiently thorough to back the claims in the paper. The main concern prior to the rebuttal among some of the reviewers was about the novelty of the paper (e.g. with respect to DkNN), but the authors convinced most of the reviewers in the rebuttal about the key differences. The authors are encouraged to highlight the novelty aspect more clearly in the revision. Another suggestion was to add an ablation study to justify the importance of the r1/r2 parameters, and the authors have done a successful job addressing the suggestion. Several other comments, such as explanations of the hyperparameters, have been taken into account in the revision. The reviewers thus reach the consensus to recommend acceptance.",ICLR2022, +YEVqGDsLM0,1576800000000.0,1576800000000.0,1,SJxDKerKDS,SJxDKerKDS,Paper Decision,Reject,"The topic of macro-actions/hierarchical RL is an important one and the perspective this paper takes on this topic by drawing parallels with action grammars is intriguing. However, some more work is needed to properly evaluate the significance. In particular, a better evaluation of the strengths and weaknesses of the method would improve this paper a lot.",ICLR2020, +cfJJ7QDY7aa,1610040000000.0,1610470000000.0,1,Qpik5XBv_1-,Qpik5XBv_1-,Final Decision,Reject,"The paper proposes to improve image segmentation from referring expression by integrating visual and language features using an UNet architecture and experimenting with top-down, bottom-up, and combined (dual) modulation. + +Review Summary: The submission received divergent reviews with scores spanning from 2 (R2) to 5 (R3,R4) to 10 (R1). The author response failed to address the reviewer concerns with some reviewer (R4) lowering their score tto 4 after the rebuttal. It also became clear that some relevant work (Mei et al, 2018) was used for the baseline but not cited. The author response also did not recognize the importance of significance tests. + +As there is considerable work in the area of image segmentation from referring expression, and the proposed model is very similar to the LingUNet model of Misra 2018, the originality and significance of the work is fairly low. The main contributions appears to be experimental comparisons of the three types of modulation (top-down, bottom-up, dual). + +Pros: +- Investigation of a important problem of grounding language to visual regions +- Experimental study of whether dual modulation improves image segmentation from referring expression + +Cons +- Relatively minor novelty with limited analyses (R3,R2) +- Missing citations (see R3's comments). Relevant work (Mei et al, 2018) which was the basis for the top-down baseline model, was used but not cited or properly compared against +- Relatively weak experimental results (R2,R4). As R4 noted, while validation results are good, test results are weak compared to existing work, indicating potential overtuning. +- No qualitative comparisons against baselines. +- Cognitive claims not backed up and limited discussion/analysis (R2) + +Recommendation: +The AC concurs with R2, R3, and R4 that the work is limited in novelty and not ready for publication at ICLR. Despite R1's high score, referring expression for image segmentation is a well studied task, and it is unclear what are the key innovations of the proposed model over LingUNet. Due to the limited novelty, relatively weak test results, as well as other flaws pointed out by the reviewers, the AC recommends rejection. +",ICLR2021, +wbJ7vGUu6IL,1610040000000.0,1610470000000.0,1,p65lWYKpqKz,p65lWYKpqKz,Final Decision,Reject,"This paper was reviewed by 5 experts in the field. The reviewers raised their concerns on lack of novelty, unconvincing experiment, and the presentation of this paper, While the paper clearly has merit, the decision is not to recommend acceptance. The authors are encouraged to consider the reviewers' comments when revising the paper for submission elsewhere.",ICLR2021, +tFKqvND6zj,1576800000000.0,1576800000000.0,1,SJgzXaNFwS,SJgzXaNFwS,Paper Decision,Reject,"The reviewers were unanimous that this submission is not ready for publication at ICLR in its present form. + +Concerns raised included lack of relevant baselines, and lack of sufficient justification of the novelty and impact of the approach.",ICLR2020, +Skc7EkTBG,1517250000000.0,1517260000000.0,322,B1e5ef-C-,B1e5ef-C-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"sadly, none of the reviewers seem to have been able to fully appreciate and check the proofs. + +but in the words of even the least positive reviewer: +In general, I find many of the observations in this paper interesting. However, this paper is not strong enough as a theory paper; rather, the value lies perhaps in its fresh perspective. + +i think we can all gain from fresh perspectives of LSTMs and DL for NLP :) +",ICLR2018, +UwTJ7ZsdAZ6,1642700000000.0,1642700000000.0,1,uVTp9Z-IUOC,uVTp9Z-IUOC,Paper Decision,Reject,"The paper considers test time adaptation to distribution shift which is a very important and impactful problem. The authors propose an empirical method that has different pieces, the most important ones being input transformation and confidence maximization and using likelihood ratio loss. + +There were various concerns that got addressed during the rebuttal period such as, novelty of the proposed method, ablation study of different parts of the model, novelty and importance of diversity regularizer, choice of optimization. However there are still three remaining concerns that addressing them will improve the paper significantly: First, clear motivation behind the method for the cases when the model is certain but we have data imbalance. Second, analysis in the online setting of batch-by-batch prediction and adaptation. Third, +establishing the claim regarding data subset experiment that it enable the model to adapt on a subset of data and later switch to complete execution mode without adaptation for efficient run time and improved throughput. How is the method to know the data distribution has changed, or that it has sufficiently adapted to it when the data distribution is not changing?",ICLR2022, +2DZE1yH4oBY,1642700000000.0,1642700000000.0,1,gmxgG6_BL_N,gmxgG6_BL_N,Paper Decision,Reject,"This work has generated a lot of discussion between authors and reviewers and among reviewers. +Overall it is reported that the results on EEG are not conclusive and directly relevant for this field. +Besides the theoretical contribution is not reported as a strong point of the work and the +comparison with alternative baseline methods is judged too limited. + +For all these reasons the paper cannot be endorsed for publication at ICLR this year.",ICLR2022, +mfvaPzM-lL4,1642700000000.0,1642700000000.0,1,7IWGzQ6gZ1D,7IWGzQ6gZ1D,Paper Decision,Accept (Poster),"This work extends the successor feature framework by focusing on the question of which policies should be learned in order to get the best generalization performance. The reviewers all agree that the question being addressed is interesting and important. One concern raised by two of the reviewers is that the work is rather incremental, providing a relatively small extension from the work of Barreto et al. Nevertheless, the authors have provided a convincing rebuttal, resulting in an increase in score of two of the reviewers. Hence, I recommend acceptance. I do want to ask the authors to carefully read the post-rebuttal point mentioned by reviewer 3QcK about clarifying the unsupervised RL setting.",ICLR2022, +9MLzF2R7wm,1610040000000.0,1610470000000.0,1,Y3pk2JxYmO,Y3pk2JxYmO,Final Decision,Reject,"This paper introduces modifications that allow to make the training of contrastive-learning-based models practical. The goal of the paper is very interesting, and the motivation clear. This paper tackles a very important issue with recent unsupervised feature learning methods. +However, while the goal is great, the present submission does not provide time improvements on par with the ambitions of this work. As noted by R2, many other hacks could be used in conjunction with the current work to scale this goal to the extreme, yielding time improvements which would be of a more impressive magnitude. In its current form, this paper unfortunately doesn’t meet the bar of acceptance. +Given the interesting scope of this work, I strongly encourage the authors to take the feedback from reviews and discussions into account and submit to another venue.",ICLR2021, +7fqwQJIoMxI,1642700000000.0,1642700000000.0,1,sPIFuucA3F,sPIFuucA3F,Paper Decision,Accept (Poster),"This paper studies off-policy learning of contextual bandits with neural network generalization. The proposed algorithm NeuraLCB acts based on pessimistic estimates of the rewards obtained through lower confidence bounds. NeuraLCB is both analyzed and empirically evaluated. + +This paper received four borderline reviews, which improved during the rebuttal phase. The main strengths of this paper are that it is well executed and that the result is timely, considering the recent advances in pessimism for offline RL. The weakness is that the result is not very technically novel, essentially a direct combination of pessimism with neural networks. This paper was discussed and all reviewers agreed that the strengths of this paper outweigh its weaknesses. I agree and recommended this paper to be accepted.",ICLR2022, +lBe-95xSYW,1576800000000.0,1576800000000.0,1,HJlF3h4FvB,HJlF3h4FvB,Paper Decision,Reject,"This paper tries to bridge early stopping and distillation. + +1) In Section 2, the authors empirically show more distillation effect when early stopping. +2) In Section 3, the authors propose a new provable algorithm for training noisy labels. + +In the discussion phase, all reviewers discussed a lot. In particular, a reviewer highlights the importance of Section 3. On the other hand, other reviewers pointed out ""what is the role of Section 2"", as the abstract/intro tends to emphasize the content of Section 2. + +I mostly agree all pros and cons pointed out by reviewers. I agree that the paper proposed an interesting idea for refining noisy labels with theoretical guarantees. However, the major reason for my reject decision is that the current write-up is a bit below the borderline to be accepted considering the high standard of ICLR, e.g., many typos (what is the172norm in page 4?) and misleading intro/abstract/organization. In overall, it was also hard for me to read the paper. I do believe that the paper should be much improved if the authors make more significant editorial efforts considering a more broad range of readers. + +I have additional suggestions for improving the paper, which I hope are useful. + +* Put Section 3 earlier (i.e., put Section 2 later) and revise intro/abstract so that the reader can clearly understand what is the main contribution. +* Section 2.1 is weak to claim more distillation effect when early stopping. More experimental or theoretical study are necessary, e.g., you can control temperature parameter T of knowledge distillation to provide the ""early stopping"" effect without actual ""early stopping"" (the choice of T is not mentioned in the draft as it is the important hyper-parameter). +* More experimental supports for your algorithm should be desirable, e.g., consider more datasets, state-of-the-art baselines, noisy types, and neural architectures (e.g., NLP models). +* Softening some sentences for avoiding some potential over-claims to some readers. +",ICLR2020, +Tl4KhipJIvi,1610040000000.0,1610470000000.0,1,LuyryrCs6Ez,LuyryrCs6Ez,Final Decision,Reject,"This paper was reviewed by 3 experts in the field. The reviewers raised their concerns on lack of novelty, unconvincing experiment, and the presentation of this paper, While the paper clearly has merit, the decision is not to recommend acceptance. The authors are encouraged to consider the reviewers' comments when revising the paper for submission elsewhere.",ICLR2021, +OWy8iidNeV,1576800000000.0,1576800000000.0,1,SklnVAEFDB,SklnVAEFDB,Paper Decision,Reject,"This paper proposes a hybrid LSTM-Transformer method to use pretrained Transformers like BERT that have a fixed maximum sequence lengths on texts longer than that limit. + +The consensus of the reviewers is that the results aren't sufficient to justify the primary claims of the paper, and that—in addition—the missing details and ablations cast doubt on the reliability of those results. This is an interesting research direction, but substantial further experimental work would be needed to turn this into something that's ready for publication at a top venue.",ICLR2020, +UUfXk4jP5vx,1642700000000.0,1642700000000.0,1,xxyTjJFzy3C,xxyTjJFzy3C,Paper Decision,Reject,"This submission received a diverging set of the final ratings: 6, 3, 6, 5. On the positive side, reviewers appreciated practicality of the approach and supporting empirical results. At the same time, all of them expressed concerns with the presentation (typos, unfinished sentences, inconsistent notations). Additional requests for clarifications and ablation studies have been mainly addressed in the rebuttal. The most skeptical reviewer did not participate in the post-rebuttal discussion, thus the final decision took that into account.. + +The AC has read the paper and verified that the minor technical issues pointed out by the reviewers have been fixed in the updated version (there are still a couple of typos remaining). This submission was further discussed between AC and SAC, as well as in the PC calibration meeting. Both AC and SAC agreed with the comment of Reviewer aAcK who pointed out that generating adversarial samples for mining hard examples has been explored in more general but related contexts before, which limits the novelty of this work to an application of a known idea to a particular domain (3D). At the same time, performance gains on the ModelNet40 dataset are marginal compared to the point cloud based baselines, while the proposed method still uses point clouds for generating adversarial views. In combination with other minor issues pointed out by the reviewers, and given that none of the reviewers was championing the paper, AC and SAC believe that the weaknesses of this paper at the end outweigh its strengths and do not recommend acceptance at this stage.",ICLR2022, +k7TYQro_b1,1610040000000.0,1610470000000.0,1,jYVY_piet7m,jYVY_piet7m,Final Decision,Reject,"This paper proposes a new method to combine non-autoregressive (NAT) and autoregressive (AT) NMT. Compared with the original iterative refinement for non-autoregressive NMT, their method first generates a translation candidate using AT and then fill in the gap using NAT. + +All of the reviewers think the idea is interesting and this research topic is not well-studied. However, the empirical part did not convince all the reviewers. The revised version and response is good; however, it still does not solve some major concerns of reviewers. +",ICLR2021, +phwDoavBwS6,1642700000000.0,1642700000000.0,1,aJ9BXxg352,aJ9BXxg352,Paper Decision,Reject,"The paper considers input-dependent randomized smoothing to obtain certified robust classification. The main contribution is the derivation of necessary conditions on how the variance of the smoothing distributions (assumed to be spherically symmetric Gaussian distributions) has to change to achieve certified robustness. All reviewers like this result, as it provides guidance on designing input-dependent smoothing, which is an interesting result for the community and certainly helps future research. + +On the negative side, the smoothing method derived based on the theory provides little (if at all) improvement in practice, it cannot be scaled to higher dimensions, it does not address the problems it claims to address (the ""waterfall"" effect, as also admitted by the authors in the discussion), and the presentation should be significantly improved. + +The paper received mixed reviews. While I think that the presented theoretical results are useful and interesting, the problems mentioned above make me to side with the negative reviewers and suggest rejection of the paper at this point (although this was not an easy decision). + +While this is only lightly touched in the reviews, I strongly recommend the authors to make the presentation of the theoretical results more comprehensible. It is quite hard to follow the paper as notation is introduced continuously in an ad-hoc and confusing way (e.g., in the proof of Theorem 2, $a$ denotes $\delta$ and $\|\delta\|$), and things are often not adequately defined (e.g., the certified robust radius is not defined formally; in Lemma 1, $x$ is undefined and used for $x_0$ as well as a free parameter, $\chi_N^2$ is only implicitly defined, etc.)",ICLR2022, +SJeg8MrQgN,1544930000000.0,1545350000000.0,1,Hyxtso0qtX,Hyxtso0qtX,meta review,Reject,"This paper proposes a method for incentivizing exploration in self-supervised learning using an inverse model, and then uses the learned inverse model for imitating an expert demonstration. The approach of incentivizing the agent to visit transitions where a learned model performs poorly. This relates to prior work (e.g. [1]), but using an inverse model instead of a forward model. The results are promising on challenging problem domains, and the method is simple. The authors have addressed several of the reviewer concerns throughout the discussion period. +However, three primary concerns remain: +(A) First and foremost: There has been confusion about the problem setting and the comparisons. I think these confusions have stemmed from the writing in the paper not being sufficiently clear. First, it should be made clear in the plots that the ""Demos"" comparison is akin to an oracle. Second, the difference between self-supervised imitation learning (IL) and traditional IL needs to be spelled out more clearly in the paper. Given that self-supervised imitation learning is not a previously established term, the problem statement needs to be clearly and formally described (and without relying heavily on prior papers). Further, the term self-supervised imitation learning does not seem to be an appropriate term, since imitation learning from an expert is, by definition, not self-supervised, as it involves supervisory information from an expert. Changing this term and clearly defining the problem would likely lead to less confusion about the method and the relevant comparisons. +(B) The ""Demos"" comparison is meant as an upper bound on the performance of this particular approach. However, it is also important to understand what the upper bound is on these problems in general, irrespective of whether or not an inverse model is used. Training a policy with behavior cloning on demonstrations with many (s,a) pairs would be able to better provide such a comparison. +(C) Inverse models inherently model the part of the environment that is directly controllable (e.g. the robot arm), and often do not effectively model other aspects of the environment that are only indirectly controllable (e.g. the objects). If the method overcomes this issue, then that should be discussed in the paper. Otherwise, the limitation should be outlined and discussed in more detail, including text that outlines which forms of problems and environments this approach is expected to be able to handle, and which of those it cannot handle. + +Generally, this paper is quite borderline, as indicated by the reviewer's scores. After going through the reviews and parts of the paper in detail, I am inclined to recommend reject as I think the above concerns do not outweigh the pros. + +One more minor comment is that the paper should consider mentioning the related work by Torabi et al. [2], which considers a similar approach in a slightly different problem setting. + +[1] Stadie et al. https://arxiv.org/abs/1507.00814 +[2] Torabi et al. IJCAI '18 (https://arxiv.org/abs/1805.01954)",ICLR2019,4: The area chair is confident but not absolutely certain +3qGdtxqWOcX,1610040000000.0,1610470000000.0,1,#NAME?,#NAME?,Final Decision,Reject,"This work presents an algorithm - graph-structured reinforcement learning (GSRL)- to address the problem of exploration in sparse reward settings. The core elements of this work are 1) to build a state-transition graph from experienced trajectories in the replay buffer; 2) learn an attention module that chooses a goal from a subset of nodes in the graph and 3) policy learning via DDPG using ""related trajectories"", where trajectories that are related to the generated goal are sampled from the replay buffer. + +Pros: +- all reviewers agree that the idea/work is interesting and valuable to the community +- reviewers appreciate the theoretical graph-based foundation/motivations + +Cons: +- clarity: the manuscript still remains hard to follow. Many critical components for understanding are in the appendix. +-- One of the key steps in this work is the discretization of the state/action space for graph construction. However, this is not mentioned very clearly, which creates a lot of confusion given that you're considering continuous control domains. +-- Furthermore, the group selection part and training the attention module is expressed in an overly complex manner. Without the reviewers inquiries it would have been impossible to decode the technical details of this key contribution, and unfortunately it remains hard to read/follow. +-- while the ablation experiments (impact of discretization, group selection ..) are appreciated, but it is not clear on which environment they were generated (average across all? or only one of them?). +-- do you use DQN and DDPG? There are some conflicting statements in your paper, namely first you say ""We use off-policy algorithms named DQN (Mnih et al., 2013) for discrete action space and DDPG (Lillicrap et al., 2015) for continuous action space"", then in the experiments you say ""to demonstrate the real performance gain of our GSRL we set the policy network with DDPG for GSRL and all baselines"". +- I agree with the reviewers that it's not clear why the chosen baselines are very relevant - there seem to be other more relevant baselines. +- the significance of the attention module is not very clear, and is not analysed properly. What does it really learn? some form of deeper analysis would be useful here. How would a version that simply picks the most uncertain state in the graph? The ablation graph presented is not very convincing. + + +Overall, I believe that this work will make a valuable contribution in the future, with an iteration to improve clarity and better show-case the significance of the attention module.",ICLR2021, +40rwNMk0kVw,1642700000000.0,1642700000000.0,1,bCrdi4iVvv,bCrdi4iVvv,Paper Decision,Accept (Poster),"This paper analyzes the extent to which parameterized layers within a CNN can be replaced by parameter-free layers, with specific focus on utilizing max-pooling as a building block. After the author response and discussion, all reviewers favor accepting the paper. The AC agrees that its empirical results open a potentially interesting discussion on network design.",ICLR2022, +HklkZpA4lV,1545030000000.0,1545350000000.0,1,HklKui0ct7,HklKui0ct7,Interesting improvement to inverse propensity weighting based estimators for off-policy evaluation,Accept (Poster),"This is an interesting paper that shows how improved off-policy estimation (and optimization) can be improved by explicitly estimating the data logging policy. It is remarkable that the estimation variance can be reduced over using the original logging policy for IPW, although this result depends on the (somewhat impractical) assumption that the parametric form for the true logging policy is known. The reviewers unanimously recommended the paper be accepted. However, there remain criticisms of the theoretical analysis that the authors should take into account in preparing a final version (namely, motivating the assumptions needed to obtain the results, and providing stronger intuitions behind the reduced variance).",ICLR2019,5: The area chair is absolutely certain +VcEDVbgeNwi,1610280000000.0,1610470000000.0,1,3X64RLgzY6O,3X64RLgzY6O,Final Decision,Accept (Poster),"This work compares and contrasts the learning rate dynamics of GD and SGD and shows that under practical learning rate settings, SGD is biased to approach the minimum along the direction of steepest descent, leading to better performance. Reviewers agree that the theoretical results are significant. The authors satisfactorily responded to reviewers’ questions and improved the paper’s clarity during the discussion phase.",ICLR2021, +H1lVY4axxN,1544770000000.0,1545350000000.0,1,SJequsCqKQ,SJequsCqKQ,Metareview,Reject,"The paper presents a conformal prediction approach to supervised classification, with the goal of reducing the overconfidence of standard soft-max learning techniques. The proposal is based on previously published methods, which are extended for use with deep learning predictors. Empirical evaluation suggests the proposal results in competitive performance. This work seems to be timely, and the topic is of interest to the community. + +The reviewers and AC opinions were mixed, with reviewers either being unconvinced about the novelty of the proposed work or expressing issues about the strength of the empirical evidence supporting the claims. Additional experiments would significantly strengthen this submission.",ICLR2019,3: The area chair is somewhat confident +5Kcks_JU1pY,1642700000000.0,1642700000000.0,1,OgCcfc1m0TO,OgCcfc1m0TO,Paper Decision,Reject,"Given the increasing scale of large models (e.g. CLIP), there's an argument that we need better automated techniques for properly utilizing (prompting) these models. Given the success of prompt learning within pure NLP models, the authors apply the same approach to the V+L domain and show that it also is applicable here. Generally, reviewers felt that the results were clear and thorough, yet technically limited. The approach is not novel and the result not surprising. There is a documentary benefit to having this work out in the community for others to reference and extend.",ICLR2022, +HJKb2GIOl,1486400000000.0,1486400000000.0,1,ByG4hz5le,ByG4hz5le,ICLR committee final decision,Invite to Workshop Track,"Reviewers feel the work is well executed and that the model makes sense, but two of the reviewers were not convinced that the proposed method contains enough novelty in light of prior work. The comparison of the soft vs hard attention model variations is perhaps one of the more novel aspects of the work; however, the degree of novelty within these formulations and the insights obtained from their comparison were not perceived as being enough to warrant higher ratings. We would like to invite the authors to submit this paper to the workshop track.",ICLR2017, +ajquv1AZoB3,1610040000000.0,1610470000000.0,1,5lhWG3Hj2By,5lhWG3Hj2By,Final Decision,Accept (Poster),"While the reviewers seem to like the main idea of the work, they had several concerns, particularly regarding the experiments (both their setup and description) and the overall language of the paper that they found it more suitable for the control community than the ML and representation learning community. The authors provided very long response and tried to address the issues raised by the reviewers during the rebuttals. Fortunately, the response addressed some of the issues they raised and now they all see the paper marginally above the line. However, reading the reviews and response shows that the paper can highly benefit from better writing and describing the experiments. So, I would strongly recommend that the authors include all the information they provided for the reviewers during the rebuttal phase in the paper and improve its quality. ",ICLR2021, +14KLzvQRtm,1576800000000.0,1576800000000.0,1,BkeJm6VtPH,BkeJm6VtPH,Paper Decision,Reject,There are several concerns with the brittleness and reproducibility of the proposed approach and experiments.,ICLR2020, +0_RwdLlFwot,1642700000000.0,1642700000000.0,1,WDBo7y8lcJm,WDBo7y8lcJm,Paper Decision,Reject,"This paper studies knowledge distillation and explores why distillation gains are not uniform. Reviewers consistently find this paper an interesting read, but had common concerns on generalizability and limited improvements/contributions. +In general, reviewers mostly gave a score that is below the acceptance threshold, or expressed concerns otherwise. Summing these up, we conclude this paper is of interest to the ICLR audience, but current form is not ready yet for acceptance. + +Summary Of Reasons To Publish: +interesting analysis of the causes of non-uniform gains in distillation + +Summary Of Suggested Revisions: + + (1) the improvements are marginal and (2) the contribution of AdaMargin is limited, (3) generalizability to other KDs",ICLR2022, +HyV8rk6Sf,1517250000000.0,1517260000000.0,570,SJ1fQYlCZ,SJ1fQYlCZ,ICLR 2018 Conference Acceptance Decision,Reject,"The authors give evidence that is certain cases, the ordering of sample inclusion in a curriculum is not important. However, the reviewers believe the experiments are inconclusive, both in the sense that as reported, they do not demonstrate the authors' hypothesis, and that they may leave out many relevant factors of variation (such as hyper-parameter tuning). ",ICLR2018, +gAHR7n3M_,1576800000000.0,1576800000000.0,1,HJe9cR4KvB,HJe9cR4KvB,Paper Decision,Reject,"While the revised paper was better and improved the reviewers assessment of the work, the paper is just below the threshold for acceptance. The authors are strongly encouraged to continue this work.",ICLR2020, +r10fQyaSz,1517250000000.0,1517260000000.0,96,B1ae1lZRb,B1ae1lZRb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Meta score: 7 + +The paper combined low precision computation with different approaches to teacher-student knowledge distillation. The experimentation is good, with good experimental analysis. Very clearly written. The main contribution is in the different forms of teacher-student training combined with low precision. + +Pros: + - good practical contribution + - good experiments + - good analysis + - well written +Cons: + - limited originality",ICLR2018, +hm9sM8lYCrg0,1642700000000.0,1642700000000.0,1,qPzR-M6HY8x,qPzR-M6HY8x,Paper Decision,Reject,"The paper proposes a new method for the problem of learning under instance-dependent noise (IDN). The idea is to construct a variational approximation to the ideal training objective, which involves learning a single scalar C(x) per instance. In turn, each such scalar is treated as an additional parameter to be learned by the network. + +Reviewers generally found the basic idea of the proposal to be interesting and novel, with the response clarifying some initial questions on the design of the network to learn C(x). The paper is also well-written, and presents experiments on image and text classification benchmarks. Some concerns were however raised: + +(1) _Limited theoretical justification_. There is limited formal analysis of when the proposed method can work well. + +(2) _Lack of comparison to IDN baselines_. The original submission did not include any IDN baselines as comparison. The revision included results of the method of (Zhang et al., '21a), which is on-par or better than the proposed method; it seems that this baseline really ought to have been included in the original submission, but it is appreciated that these have been added. A related concern was the marginal gains over the GCE method on the CIFAR datasets. + +(3) _Sufficiency of learning a single parameter_. The paper learns a single scalar per sample. Several reviewers were unsure on the sufficiency of this parameter to capture the underlying noise distribution. + +For (1), the authors acknowledge theoretical analysis as an important future direction. This is perfectly reasonable, but does then require weighting more any issues with the the conceptual and empirical contributions of the paper. + +For (2), the response clarified that most of these operate either in the binary setting, or require auxiliary information. This is a valid motivation for the present work; it would however be more compelling to include results in a binary setting, to better understand the strengths and weaknesses compared to existing proposals. The response also clarified the present method does not claim to improve upon state-of-the-art performance, but rather proposes a simple method which has additional applications (as shown in Appendix E). This is a reasonable claim; however, to my taste, there is insufficient discussion of the PLC method (Zhang et al., '21a), and what new conceptual information the present work offers. + +For (3), the response argued that the present results already demonstrate the efficacy of using a single parameter, and that using multiple parameters can be studied in future work. One reviewer was not convinced of the efficacy being shown in some of the results in Appendix E. It could strengthen the work if there is an empirical analysis of when the single parameter assumption starts to break down; e.g., perhaps under increasing levels of CCN noise? + +Overall, the paper has interesting ideas and some nice analyses. At the same time, there was clear scope for improvement in the original submission. This was partially addressed in the revision, but given that several domain experts retain reservations (particularly in regards to comparisons against prior IDN works), it is encouraged that the authors incorporate the above comments for a future version of the paper.",ICLR2022, +Pytj0daESK,1610040000000.0,1610470000000.0,1,DegtqJSbxo,DegtqJSbxo,Final Decision,Reject,"The reviewers indicated a number of concerns (which I agree with) which have not been addressed by the authors as they have not provided any response. Indeed, the paper would be significantly improved once these issues are addressed. ",ICLR2021, +r1gXqdEEg4,1544990000000.0,1545350000000.0,1,Byf5-30qFX,Byf5-30qFX,"A potentially influential approach despite its limitations, well delivered and improved following feedback.",Accept (Poster),"This work proposes a method for extending hindsight experience replay to the setting where the goal is not fixed, but dynamic or moving. It proceeds by amending failed episodes by searching replay memory for a compatible trajectories from which to construct a trajectory that can be productively learned from. + +Reviewers were generally positive on the novelty and importance of the contribution. While noting its limitations, it was still felt that the key ideas could be useful and influential. The tasks considered are modifications of OpenAI robotics environments, adapted to the dynamic goal setting, as well as a 2D planar ""snake"" game. There were concerns about the strength of the baselines employed but reviewers seemed happy with the state of these post-revision. There were also concerns regarding clarity of presentation, particularly from AnonReviewer2, but significant progress was made on this front following discussions and revision. + +Despite remaining concerns over clarity I am convinced that this is an interesting problem setting worth studying and that the proposed method makes significant progress. The method has limitations with respect to the sorts of environments where we can reasonably expect it to work (where other aspects of the environment are relatively stable both within and across episodes), but there is lots of work in the literature, particularly where robotics is concerned, that focuses on exactly these kinds of environments. This submission is therefore highly relevant to current practice and by reviewers' accounts, generally well-executed in its post-revision form. I therefore recommend acceptance.",ICLR2019,4: The area chair is confident but not absolutely certain +nthmMFJen64,1610040000000.0,1610470000000.0,1,PrzjugOsDeE,PrzjugOsDeE,Final Decision,Accept (Poster),"The submission proposes a novel conditional GAN formulation where continuous scalars (named regression labels) are fed into the GAN as a conditioning variable. Since cGANs with discrete labels are trained to minimize the empirical loss, they fail for continuous conditions, because there might be few or even zero samples for many labels values and also the label cannot be embedded by one-hot encoding like discrete labels. As a solution, the authors propose new methods of encoding the label. + +The paper received a clear accept, two weak accepts and a weak reject. As agreed by all the reviewers, the paper proposes an interesting framework to eliminate some weaknesses of GANs. The rebuttal adequately addresses the reviewer comments and hence the meta reviewer recommends acceptance. ",ICLR2021, +u9SwzOjXYb,1576800000000.0,1576800000000.0,1,rkgMkCEtPB,rkgMkCEtPB,Paper Decision,Accept (Poster),"Paper received mixed reviews: WR (R1), A (R2 and R3). AC has read reviews/rebuttal and examined paper. AC agrees that R1's concerns are misplaced and feels the paper should be accepted. +",ICLR2020, +BkxJaVAZg4,1544840000000.0,1545350000000.0,1,rJxgknCcK7,rJxgknCcK7,Meta-Review,Accept (Oral),"This paper proposes the use of recently propose neural ODEs in a flow-based generative model. + +As the paper shows, a big advantage of a neural ODE in a generative flow is that an unbiased estimator of the log-determinant of the mapping is straightforward to construct. Another advantage, compared to earlier published flows, is that all variables can be updated in parallel, as the method does not require ""chopping up"" the variables into blocks. The paper shows significant improvements on several benchmarks, and seems to be a promising venue for further research. + +A disadvantage of the method is that the authors were unable to show that the method could produce results that were similar (of better than) the SOTA on the more challenging benchmark of CIFAR-10. Another downside is its computational cost. Since neural ODEs are relatively new, however, these problems might resolved with further refinements to the method. ",ICLR2019,4: The area chair is confident but not absolutely certain +Yh9euqKrt,1576800000000.0,1576800000000.0,1,rkeJRhNYDH,rkeJRhNYDH,Paper Decision,Accept (Poster),"This paper presents a new dataset for fact verification in text from tables. The task is to identify whether a given claim is supported by the information presented in the table. The authors have also presented two baseline models, one based on BERT and based on symbolic reasoning which have an ok performance on the dataset but still very behind the human performance. The paper is well-written and the arguments and experiments presented in the paper are sound. + +After reviewer comments, the authors have incorporated major changes in the paper. I recommend an Accept for the paper in its current form.",ICLR2020, +T2kvFLucPj,1576800000000.0,1576800000000.0,1,HJlTpCEKvS,HJlTpCEKvS,Paper Decision,Reject,"An approach to make multi-task learning is presented, based on the idea of assigning tasks through the concepts of cooperation and competition. + +The main idea is well-motivated and explained well. The experiments demonstrate that the method is promising. However, there are a few concerns regarding fundamental aspects, such as: how are the decisions affected by the number of parameters? Could ad-hoc algorithms with human in the loop provide the same benefit, when the task-set is small? More importantly, identifying task groups for multi-task learning is an idea presented in prior work, e.g. [1,2,3]. This important body of prior work is not discussed at all in this paper. + +[1] Han and Zhang. ""Learning multi-level task groups in multi-task learning"" +[2] Bonilla et al. ""Multi-task Gaussian process prediction"" +[3] Zhang and Yang. ""A Survey on Multi-Task Learning"" +",ICLR2020, +ooJFbSgM6vK,1610040000000.0,1610470000000.0,1,9DQ0SdY4UIz,9DQ0SdY4UIz,Final Decision,Reject,"Thanks for your submission to ICLR. + +This paper proposes a subspace indexing model for low-dimensional embedding. The reviewers were all generally in agreement that the paper is not ready for publication. In particular, they felt that the paper had several key weaknesses: + +-Relevant literature is not discussed +-Relevant methods are not evaluated against in the experiments +-Experiments on the whole were limited and not sufficiently convincing +-The novelty of the paper is not very high + +Please consider the reviewer comments carefully when preparing a future version of your paper.",ICLR2021, +79sums7euySs,1642700000000.0,1642700000000.0,1,2d4riGOpmU8,2d4riGOpmU8,Paper Decision,Reject,"This paper proposes to repeatedly apply the classifier two-sample tests (proposed by Kim, Ramdas, Singh, Wasserman, in 2016, and developed further by Lopez-Paz, Oquab, in 2017) for the purpose of detecting covariate shift. The authors propose methods to extend the aforementioned tests to a sequential setting. Overall, the reviewers do not lean towards acceptance, and neither do I. Several constructive suggestions are provided by reviewers, some are summarized below. + +The authors claim that sequential tests are not desirable in such a setting, and thus choose to pay a multiple testing price by repeatedly applying a batch test. However, sequential tests are in fact applicable (they will control type-1 error) but may have a worse power if the alternative is not true at the very start --- but these were entirely dropped from the simulations; in fact, comparing the increased type-1 error of the authors' approach to the increased type-2 error of sequential approaches may be worth clarifying. + +Perhaps the ""right"" solution that the authors are looking for could be gotten by converting a sequential test into a sequential changepoint detection algorithm (via repeated application of a sequential test, each started at a new time). Also see ""Conformal test martingales for change-point detection"" and ""Inductive Conformal Martingales for Change-Point Detection"" by Vovk et al., which are currently not cited.",ICLR2022, +ry0AB1TBf,1517250000000.0,1517260000000.0,689,S1Ow_e-Rb,S1Ow_e-Rb,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers rightly point out that presented analysis is limiting and that the experimental results are not extensive enough. Moreover, several existing work that use raw waveforms have interesting analysis of what the network is trying to learn. Given these comments, the AC recommends that the paper be rejected. +",ICLR2018, +pZn9H9Ucnv,1576800000000.0,1576800000000.0,1,SyxDXJStPS,SyxDXJStPS,Paper Decision,Reject,"The submission performs empirical analysis on f-VIM (Ke, 2019), a method for imitation learning by f-divergence minimization. The paper especially focues on a state-only formulation akin to GAILfO (Torabi et al., 2018b). The main contributions are: +1) The paper identifies numerical proplems with the output activations of f-VIM and suggest a scheme to choose them such that the resulting rewards are bounded. +2) A regularizer that was proposed by Mescheder et al. (2018) for GANs is tested in the adversarial imitation learning setting. +3) In order to handle state-only demonstrations, the technique of GAILfO is applied to f-VIM (then denoted f-VIMO) which inputs state-nextStates instead of state-actions to the discriminator. + +The reviewers found the submitted paper hard to follow, which suggests a revision might make more apparent the author's contributions in later submissions of this work. ",ICLR2020, +KN0yKKJfOM,1576800000000.0,1576800000000.0,1,Bye8hREtvB,Bye8hREtvB,Paper Decision,Reject,"The paper proposes learning a latent embedding for image manipulation for PixelCNN by using Fisher scores projected to a low-dimensional space. +The reviewers have several concerns about this paper: +* Novelty +* Random projection doesn’t learn useful representation +* Weak evaluations +Since two expert reviewers are negative about this paper, I cannot recommend acceptance at this stage. +",ICLR2020, +cBmerk8Ak1T,1642700000000.0,1642700000000.0,1,tBIQEvApZK5,tBIQEvApZK5,Paper Decision,Accept (Poster),"This is a borderline paper. +This paper proposed feature kernel distillation (FKD), a new distillation framework, by matching the kernels obtained from the networks of student and the teacher. Theoretical justification is provided by extending the results of Allen-Zhu and Li(2020)(ALi20 hereafter). Empirical results show superiority of FKD over vanilla KD on several datasets. +There is however concern that the technical novelty is limited and incremental, an opinion shared by DKJu, and 68WG, compared to ALi20. Reviewer DKJu suggests that the authors could highlight those results which are not straightforward extensions of ALi20. Another important point of concern is that the paper may have some Overstated claims. The authors clarified that the language of the claims be suitably edited. In this regard Reviewer h8ud have some specific suggestions which should be easy to incorporate. + +In view of additional experiments conducted and detailed discussion during rebuttal addressed some of the concerns of the reviewers. +If accepted, the final version, should include most of the discussion and additional experiments.",ICLR2022, +gQfCiCGhVpX,1610040000000.0,1610470000000.0,1,rLj5jTcCUpp,rLj5jTcCUpp,Final Decision,Reject,"This paper addresses a meta-learning method which works for cases where both the distribution and the number of features may vary across tasks. The method is referred to as 'distribution embedding network (DEN)' which consists of three building block. While the method seems to be interesting and contains some new ideas, all of reviewers agree that the description for each module in the model is not clear and the architecture design needs further analysis. In addition, experiments are not sufficient to justify the method. Without positive feedback from any of reviewers, I do not have choice but to suggest rejection. +",ICLR2021, +ryeVIZUNgV,1545000000000.0,1545350000000.0,1,ByGq7hRqKX,ByGq7hRqKX,meta-review,Reject,"The authors have proposed a language+vision 'dual' attention architecture, trained in a multitask setting across SGN and EQA in vizDoom, to allow for knowledge grounding. The paper is interesting to read. The complex architecture is very clearly described and motivated, and the knowledge grounding problem is ambitious and relevant. However, the actual proposed solution does not make a novel contribution and the reviewers were unconvinced that the approach would be at all scalable to natural language or more complex tasks. In addition, the question was raised as to whether the 'knowledge grounding' claims by the authors are actually much more shallow associations of color and shape that are beneficial in cluttered environments. +This is a borderline case, but the AC agrees that the paper falls a bit short of its goals.",ICLR2019,4: The area chair is confident but not absolutely certain +AkbtoRfchLw0,1642700000000.0,1642700000000.0,1,c8JDlJMBeyh,c8JDlJMBeyh,Paper Decision,Reject,"The paper provides a way for explaining the reasoning of a neural network to humans in the form of a class-specific structural concept graph (c-SCG). The c-SCG can be modified by humans. The modified c-SCG can be incorporated in training a new student model. Experiments show that the new model performs better on classes that their corresponding c-SCG have been modified. While all the reviewers agree that the paper puts forth an interesting idea, some concerns have been raised by reviewers about the scale of experiments and the lack of theoretical guarantee on the fidelity of the SCG. The authors have added two large scale experiments which confirm their previous results as part of their rebuttal. This paper is borderline and needs to be discussed further.",ICLR2022, +jCX0jNGOUv,1576800000000.0,1576800000000.0,1,r1lGO0EKDH,r1lGO0EKDH,Paper Decision,Accept (Talk),"The authors present an approach for learning graph embeddings by first fusing the graph to generate a new graph with encodes structural information as well as node attribution information. They then iteratively merge nodes based spectral similarities to obtain coarser graphs. They then use existing methods to learn embeddings from this coarse graph and progressively refine the embeddings to finer graphs. They demonstrate the performance of their method on standard graph datasets. + +This paper has received positive reviews from all reviewers. The authors did a good job of addressing the reviewers' concerns and managed to convince the reviewers about their contributions. I request the authors to take the reviewers suggestions into consideration while preparing the final draft of the paper and recommend that the paper be accepted.",ICLR2020, +rK7XF59XbMn,1610040000000.0,1610470000000.0,1,vT0NSQlTA,vT0NSQlTA,Final Decision,Reject,"The submission is acknowledged as having potential value in terms of proposing a new approach for exploration based on ensembles and value functions. However, there are lingering concerns about the discussion of what this paper brings to the table vis-a-vis prior work, together with a lack of clear demonstration of the explicit gains from the exploration mechanism and more experimental studies. The author(s) would do well to revise as per the feedback given and resubmit a version with a more compelling argument. ",ICLR2021, +8P27WjXmP4u,1642700000000.0,1642700000000.0,1,f4c4JtbHJ7B,f4c4JtbHJ7B,Paper Decision,Reject,"This work received borderline rates with slight preference to rejection. The main concerns range from writing, novelty to empirical evaluations. Given that no authors’ responses are submitted, we have decided to reject this work.",ICLR2022, +HkgYJ8R1eV,1544710000000.0,1545350000000.0,1,SyxMWh09KX,SyxMWh09KX,Interesting results but very unclear narrative,Reject,"This paper describes an incorporation of attention into model agnostic meta learning. The reviewers found that the paper was rather confusing in its presentation of both the method and the tasks. While the results seemed interesting, it was difficult to frame them due to lack of clarity as to what the task is, and the relation between attention and MAML. It sounds like this paper needs a bit more work, and thus is not suitable for publication at this time. + +It is disappointing that the reviews were so short, but as the authors did not challenge them, unfortunately the AC must decide on the basis of the first set of comments by reviewers.",ICLR2019,4: The area chair is confident but not absolutely certain +oK8EmQARYQw,1610040000000.0,1610470000000.0,1,cvNYovr16SB,cvNYovr16SB,Final Decision,Reject,"The authors propose a particle-based entropy estimate for intrinsic motivation for pre-training an RL agent to then perform in an environment with rewards. As the reviewers discussed, and also mentioned in their reviews, this paper bears stark similarity to work of 5 months ago, presented at the ICML 2020 Lifelong ML workshop, namely, ""A Policy Gradient Method for Task-Agnostic Exploration"", Mutti et al, 2020--MEPOL. What is novel here is the adaptation of this entropy estimate to form an intrinsic reward via a contrastive representation and the subsequent demonstration on standardized RL environments. The authors have added a comparison to MEPOL, and in these experiments, APT outperforms this method, sometimes by some margin. Unfortunately this work does not meet the bar for acceptance relative to other submissions. +",ICLR2021, +Hye3dRXleN,1544730000000.0,1545350000000.0,1,H1g2NhC5KQ,H1g2NhC5KQ,A simple and effective approach to style transfer based on recent developments in unsupervised NMT,Accept (Poster),"The paper shows how techniques introduced in the context of unsupervised machine translation can be used to build a style transfer methods. + +Pros: + +- The approach is simple and questions assumptions made by previous style transfer methods (specifically, they show that we do not need to specifically enforce disentanglement). + +- The evaluation is thorough and shows benefits of the proposed method + +- Multi-attribute style transfer is introduced and benchmarks are created + +- Given the success of unsupervised NMT, it makes a lot of sense to see if it can be applied to the style transfer problem + +Cons: + +- Technical novelty is limited + +- Some findings may be somewhat trivial (e.g., we already know that offline classifiers are stronger than the adversarials, e.g., see Elazar and Goldberg, EMNLP 2018). + + + + +",ICLR2019,4: The area chair is confident but not absolutely certain +HkghsjrZgV,1544800000000.0,1545350000000.0,1,HJeRm3Aqt7,HJeRm3Aqt7,Intersting benchmark suite that could be extended.,Reject,"The paper introduces a benchmark suite providing a series of synthetic distributions and metrics for the evaluation of generative models. While providing such a tool-kit is interesting and helpful and it extends existing approaches for evaluating generative models on simple distributions, it seems not to allow for very different additional conclusions or insights.This limits the paper's significance. Adding more problems and metrics to the benchmark suite would make it more convincing.",ICLR2019,3: The area chair is somewhat confident +BkiY4k6rM,1517250000000.0,1517260000000.0,402,rkEtzzWAb,rkEtzzWAb,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"Pros: + - The paper proposes interesting new ideas on evaluating generative models. + - Paper provides hints at interesting links between structural prediction and adversarial learning. + - Authors propose a new dataset called Thin-8 to demonstrate the new ideas and argue that it is useful in general to study generative models. + - The paper is well written and the authors have made a good attempt to update the paper after reviewer comments. + +Cons: +- The proposed ideas are high level and the paper lack deeper analysis. +- Apart from demonstrating that the parametric divergences perform better than non-parametric divergences are interesting, but the reviewers think that practical importance of the results are weak in comparison to previous works. +With this analysis, the committee recommends this paper for workshop.",ICLR2018, +VrziCp9e44R,1610040000000.0,1610470000000.0,1,4CqesJ7GO7Q,4CqesJ7GO7Q,Final Decision,Reject,"Analyzing class-wise adversarial vulnerability of models is an interesting direction to pursue. However, the authors should consult the references pointed out in the reviews to put their contributions in the right perspective. Overall, the lines of inquiry explored in this paper are of interest but, as some of the reviewers point out, there are improvement in the methodology that still need to be addressed before this paper is ready for publication. (I very much recommend that the authors do build on this feedback and revise the paper, as it will be a valuable contribution then.) ",ICLR2021, +os3ZBfs_oE,1610040000000.0,1610470000000.0,1,sSjqmfsk95O,sSjqmfsk95O,Final Decision,Accept (Spotlight),"This paper received two clear accept, one accept, one borderline accept and one reject review. R4 identified that the paper falls short in discussing recent works from CVPR and ECCV 2020 on the image inpainting and completion tasks which also tackle challenging scenarios in these tasks. The authors improve their related work section with these more recent works while pointing out that the task still remains unsolved and they propose an effective technique towards the solution. The meta reviewer recommends acceptance based on the following observations. + +The submission proposes a GAN architecture for image inpainting using co-modulation, which is similar to the weight modulation in StyleGAN2 but is conditioned on both the input image and the stochastic variable instead of only the stochastic variable. The main novelty of co-modulation appears to be interesting as well as being generalisable to different tasks. The approach is shown to perform well in the image painting with large-scale missing pixels and some image-to-image translation tasks. Furthermore a new metric P-IDS/U-IDS is proposed to evaluate the perceptual fidelity of inpainted images. ",ICLR2021, +ENXIAQZN4N,1642700000000.0,1642700000000.0,1,dDo8druYppX,dDo8druYppX,Paper Decision,Accept (Poster),All reviewers recommended accept after author responses. AC doesn't find any reason to overturn this consensus.,ICLR2022, +B1Z4hMLOe,1486400000000.0,1486400000000.0,1,BkV4VS9ll,BkV4VS9ll,ICLR committee final decision,Reject,"The paper does not seem to have enough novelty, and the contribution is not clear enough due to presentation issues.",ICLR2017, +UQDWTxAPe10,1610040000000.0,1610470000000.0,1,A993YzEUKB7,A993YzEUKB7,Final Decision,Reject,"This paper is attempting to improve the OOD generalization performance of neural networks on relational reasoning tasks. This is an important failure point of general neural network architectures and important research topic. The results of the paper shows impressive improvements on a set of subject. + +* The paper is improved during the rebuttal, however, I do agree with the R5 and the paper is still lacking a lot in terms o clarity. The writing of this paper still requires some work. + +* As R2 also has written, the proposed idea is not so concrete to apply as practical solutions, and the presentation of the paper still requires some more work. + +* R3 pointed out some inaccuracies and it seems like authors have added some ablations in the direction that R3 has suggested. + +I am suggesting to reject this paper given that the majority of the reviewers are also leaning towards rejection as well. I would recommend the authors to improve the clarity of the paper, do more ablations for their models and resubmit to a different conference.",ICLR2021, +Fj_cS5T21kP,1642700000000.0,1642700000000.0,1,CCu6RcUMwK0,CCu6RcUMwK0,Paper Decision,Accept (Poster),"This paper proposes a new link prediction algorithm based on a pooling scheme called WalkPool. The main idea is to jointly encode node representations and graph topology information into node features and conduct the learning end-to-end. The paper shows the superiority of the method against the baselines. + +Strength +* The paper is generally clearly written. +* A new method is proposed, which is technically sound. +* Many experiments are conducted to verify the effectiveness of the proposed method. + +Weakness +* The novelty of the work might not be so significant. There is a similarity with the SEAL algorithm. + +The authors have addressed most of the problems pointed out by the reviewers. They have also conducted additional experiments.",ICLR2022, +2iDo4Rm_C,1576800000000.0,1576800000000.0,1,SklOypVKvS,SklOypVKvS,Paper Decision,Reject,"The paper deal with a mutual information based dependency test. + +The reviewers have provided extensive and constructive feedback on the paper. The authors have in turn given detailed response withsome new experiments and plans for improvement. + +Overall the reviewers are not convinced the paper is ready for publication. ",ICLR2020, +2oVWAln-H2,1576800000000.0,1576800000000.0,1,ByxxgCEYDS,ByxxgCEYDS,Paper Decision,Accept (Spotlight),"This paper proposes a novel technique for matrix completion, using graphical neighborhood structure to side-step the need for any side-information. + +Post-rebuttal, the reviewers converged on a unanimous decision to accept. The authors are encouraged to review to address reviewer comments.",ICLR2020, +ZO8GdunhB,1576800000000.0,1576800000000.0,1,rJlcLaVFvB,rJlcLaVFvB,Paper Decision,Reject,"This paper introduces a new architecture for sparse coding. + +The reviewers gave long and constructive feedback that the authors in turn responded at length on. There is consensus among the reviewers that despite contributions this paper in its current form is not ready for acceptance. + +Rejection is therefore recommended with encouragement to make updated version for next conference. + +",ICLR2020, +SJxPZdvSx4,1545070000000.0,1545350000000.0,1,HJguLo0cKQ,HJguLo0cKQ,Reject,Reject,The work brings little novelty compared to existing literature. ,ICLR2019,5: The area chair is absolutely certain +QezBNp1huV,1576800000000.0,1576800000000.0,1,rkxxA24FDr,rkxxA24FDr,Paper Decision,Accept (Poster),"This paper presents the neural stored-program memory, which is a key-value memory that is used to store weights for another neural network, analogous to having programs in computers. They provide an extensive set of experiments in various domains to show the benefit of the proposed method, including synthetic tasks and few-shot learning experiments. + +This is an interesting paper proposing a new idea. We discuss this submission extensively and based on our discussion I recommend accepting this submission. + +A few final comments from reviewers for the authors: +- Please try to make the paper a bit more self-contained so that it is more useful to a general audience. This can be done by either making more space in the main text (e.g., reducing the size of Figure 1, reducing space between sections, table captions and text, etc.) or adding more details in the Appendix. Importantly, your formatting is a bit off. Please use the correct style file, it will give you more space. All reviewers agree that the paper are missing some important details that would improve the paper. +- Please cite the original fast weight paper by Malsburg (1981). +- Regarding fast-weights using outer products, this was actually first done in the 1993 paper instead of the 2016 and 2017 papers.",ICLR2020, +B1lLQJprM,1517250000000.0,1517260000000.0,138,SJa9iHgAZ,SJa9iHgAZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The paper presents an interesting view of ResNets and the findings should be of broad interest. R1 did not update their score/review, but I am satisfied with the author response, and recommend this paper for acceptance. ",ICLR2018, +P9U7mub1CTa,1642700000000.0,1642700000000.0,1,_7YnfGdDVML,_7YnfGdDVML,Paper Decision,Reject,"The paper studies semantic type detection. + The problem is of practical significance to i tabular data. + However, in its current form, there are concerns about the scope of novelty and technical significance.",ICLR2022, +-9zBJ8kapV,1610040000000.0,1610730000000.0,1,_O9YLet0wvN,_O9YLet0wvN,Final Decision,Reject,"The reviewers have ranked this paper as borderline accept. On the negative side, the main claim of the paper (the more categories for training a one-shot detector, the better) has already been observed in several works and very intuitive. However, the paper has done significant experimental work to support this claim. The paper is very well written, it carefully explores the existing setups for one-shot detection and highlights their weaknesses. The paper also gives advice on how to construct better datasets for one-shot detection (the conclusion ""add more diverse categories"" is somewhat obvious but the paper demonstrates how important that is).",ICLR2021, +7OUgoxcnRG4,1610040000000.0,1610470000000.0,1,JoCR4h9O3Ew,JoCR4h9O3Ew,Final Decision,Accept (Poster),"This paper focuses on adversarial robustness with unlabeled data. The philosophy behind sounds quite interesting to me, namely, utilizing unlabeled data to enforce labeling consistency while reducing adversarial transferability among the networks via diversity +regularizers. This philosophy leads to a novel algorithm design I have never seen, i.e., ARMOURED, an adversarially robust training method based on semi-supervised learning. + +The clarity and novelty are clearly above the bar of ICLR. While the reviewers had some concerns on the significance, the authors did a particularly good job in their rebuttal. Thus, all of us have agreed to accept this paper for publication! Please carefully address all +comments in the final version. +",ICLR2021, +I4kiEL0aeMr,1642700000000.0,1642700000000.0,1,P7FLfMLTSEX,P7FLfMLTSEX,Paper Decision,Accept (Poster),"*Summary:* Investigate the NTK of PNNs and enhanced bias towards higher frequencies. + +*Strengths:* +- Spectral bias is a contemporary topic. +- Some reviewers found the paper well written. + +*Weaknesses:* +- Restricted setting (two-layers / no bias / infinite width), particularly in view of the objective to provide architecture design guidance. Restricted experiments (Introduction indicates learning spherical harmonics). +- Sparse discussion of related works, particularly on spectral bias. + +*Discussion:* + +During the discussion period authors made efforts to address some of the concerns of the reviewers. A late new experiment prompted KnZp to raise score. TQnp found the paper good but also expected a more profound theorem addressing broader PNN families given the existing work. They found that experiments and discussion of prior work could be improved. The authors added discussion of prior works and provided an explanation for their choices, but left extensions and further analysis for future work. nFMY expressed concerns about applicability of the analysis and evidence in experiments. Author responses addresses this in part. cEcf points out that the main theoretical contributions have straight forward proofs based on previous works and asks about extensions. Authors agree that the paper does not introduce novel techniques and that extending the analysis is an important direction, but leave this for future work. FuRi finds the paper provides an interesting viewpoint and raised score from 3 to 5 following the discussion (improving presentation, rigor, clarity), but considers that the paper has several drawbacks (oversimplification, lack of technical novelty) that need to be addressed. + +*Conclusion:* + +One reviewer found this work marginally below the acceptance threshold, three marginally above, and one good. I find that the paper considers an interesting problem and makes some interesting observations and some valuable advances. I appreciate the authors’ efforts during the reviewing period. Hence I am recommending accept. At the same time, I find that, clarity, technical and experimental contributions still can be improved and encourage the authors to carefully consider the reviewers comments when preparing the final version of the paper.",ICLR2022, +2z0R9WieGE,1610040000000.0,1610470000000.0,1,HdX654Yn81,HdX654Yn81,Final Decision,Reject,"This paper proposes to use an ensemble of VAEs to learn better disentangled representations by aligning their representations through additional losses. This training method is based on recent work by Rolinek et al (2019) and Duan et al (2020), which suggests that VAEs tend to approximate PCA-like behaviour when they are trained to disentangle. The method is well justified from the theoretical perspective, and the quantitative results are good. Saying this, the reviewers raised concerns about the qualitative nature of the learnt representations, which do not look as disentangled as the quantitative measures might suggest. There was a large range of scores given to this paper by the reviewers, which has generated a long discussion. I have also personally looked at the paper. Unfortunately I have to agree that the latent traversal plots do not look as disentangled as the metric scores would suggest, and as one might hope to see on such toy datasets as dSprites. The traversals are certainly subpar to even the most basic approaches to disentanglement, like beta-VAE. For this reason, and given the reviewer scores, I unfortunately have to recommend to reject the paper this time around, however I hope that the authors are able to address the reviewers' concerns and find the source of disagreement between their qualitative and quantitative results for the future revisions of this work.",ICLR2021, +d65r8L-WzvY,1642700000000.0,1642700000000.0,1,xiXOrugVHs,xiXOrugVHs,Paper Decision,Reject,"This paper deals with the task of long text summarization. Inspired by earlier work on top-down and bottom-up architectures, this work focuses on improving the traditional bottom-up converter encoder structure, and the fine resolution representations. + +Pros: +- Their model can model longer documents in coarse and fine granularity levels. +- The performance on benchmark datasets looks pretty good compared to strong baselines +- Computationally efficient. + +Cons: The reviewers have raised several concerns including: +- the experimental verification for calculation efficiency and memory usage of model is not sufficient. +- the novelty of this design is somehow limited since the bottom-up and top-down idea is not new. +- several details about the figures and especially the experiments were missing. + +The authors have addressed several of the suggestions, added new experiments results addressing the issues raised by the reviewers. During the rebuttal period, the authors further conducted empirical investigations showing that the top-down update for token representations, especially with good top-level representations, leads to good summarization because of enriched token-level representations by the top-down. Despite positive results, some reviewers raised concerns that with only using BART as a backbone, it is surprising to achieve this great performance boost with the top-down/bottom-up models on long document summarization when they compared to the state-of-the-art transformer models (BigBird, Longformer and T5) that have been shown to encode longer sequences and beat several summarization models.",ICLR2022, +BJEiVypBM,1517250000000.0,1517260000000.0,423,SyfiiMZA-,SyfiiMZA-,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"The chief contribution of this paper is to show that a single set of policy parameters can be optimized in an alternating fashion while the design parameters of the body are also optimized with policy gradients and sampled. The fact that this simple approach seems to work is interesting and worthy of note. However, the paper is otherwise quite limited - other methods are not considered or compared, incomplete experimental results are given, and important limitations of the method are not addressed. As it is an interesting but preliminary work, the workshop track would be appropriate.",ICLR2018, +nEHIMkYS25z,1610040000000.0,1610470000000.0,1,LVotkZmYyDi,LVotkZmYyDi,Final Decision,Accept (Poster),"The paper studies nonconvex-strongly concave min-max optimization using proximal gradient descent-ascent (GDA), assuming Kurdyka-Łojasiewicz (KŁ) condition holds. The main contribution is a novel Lyapunov function, which leads to a clean analysis. The main downsides of the paper as discussed by the reviewers are the lack of experiments and somewhat stringent assumptions needed in the analysis. Nevertheless, the paper was overall viewed favorably by the reviewers, who considered it a worthwhile contribution to the area min-max optimization. ",ICLR2021, +BchrlohEKw9,1642700000000.0,1642700000000.0,1,SaKO6z6Hl0c,SaKO6z6Hl0c,Paper Decision,Accept (Poster),"The paper received two accept and two marginally accept recommendations. All reviewers find value in the proposed supervised semantic segmentation methodology (making self-supervised representation learning towards dense prediction tasks like segmentation or clustering without explicit manual supervision) and appreciate the experimental gains, but had (mostly practical) criticism that was reasonably well addressed in the rebuttal.",ICLR2022, +cc_c8IJoRbi,1642700000000.0,1642700000000.0,1,IsHQmuOqRAG,IsHQmuOqRAG,Paper Decision,Reject,"This paper tackles the difficult problem of learning to segment objects from an image using no supervision during training. The paper is clearly written and a new synthetic dataset is made available. Unfortunately, the reviewers raised a number of issues with the submission (missing citations and comparison to relevant related work / additional baselines + ablation studies / missing empirical evaluation of the proposed method on standard dataset beyond the toy dataset proposed by the authors). The paper received 1 reject, 2 marginal rejects and 1 accept but even the positive reviewer agreed that these were limitations. The authors also conceded to these limitations and initiated experiments that are starting to address the reviewers' comments. At this time, the results of these experiments remain incomplete and hence most reviewers agree that the paper should go through another round of reviews before it is publishable. I thus recommend this paper be rejected in the hope that a subsequent revision will make it a much stronger contribution.",ICLR2022, +SJuHUypBf,1517250000000.0,1517260000000.0,776,rJ8rHkWRb,rJ8rHkWRb,ICLR 2018 Conference Acceptance Decision,Reject,The paper presents yet another approach for modeling words based on their characters. Unfortunately the authors do not compare properly to previous approaches and the idea is very incremental.,ICLR2018, +H1Nim1aBz,1517250000000.0,1517260000000.0,206,BJ_UL-k0b,BJ_UL-k0b,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Pros: ++ The paper introduces a non-trivial interpretation of MAML as hierarchical Bayesian learning and uses this perspective to develop a new variation of MAML that accounts for curvature information. + +Cons: +- Relatively small gains over MAML on mini-Imagenet. +- No direct comparison against the state-of-the-art on mini-Imagenet. + +The reviewers agree that the interpretation of MAML as a form of hierarchical Bayesian learning is novel, non-trivial, and opens up an interesting direction for future research. The only concerns are that the empirical results on mini-Imagenet do not show a particularly large improvement over MAML, and there is no direct comparison to the state-of-the-art results on the task. However, the value of the new perspective on meta-learning outweighs these concerns. +",ICLR2018, +s6RF4Keewg,1576800000000.0,1576800000000.0,1,Hkx6p6EFDr,Hkx6p6EFDr,Paper Decision,Reject,"This paper defines a parameter-tying scheme for a general feed-forward network with the equivalence properties of relational data. Most reviewers raised a few concerns around the experiments, baselines, datasets used and motivation. A few pointed out that the paper is hard to read - for a person without heavy database theory literature, which includes most of ICLR readers. While this paper may read well for the folks in the domain, authors should consider revising the paper to be more inclusive so that it can be read more widely. The motivation of the problem was also another point that many reviewers have mentioned (perhaps related to the language issues above) that some noted that you may not always want equivariance in relational DB and other noted that the paper did not sufficiently demonstrate the advantage of the proposed methods. Reviewers also univocally commented on experiments - many voiced the lack of baselines (not even any simple one). Authors wrote back to defend that there is no similar method and even simple tensor factorization isn’t applicable. That makes me wonder - is there really no single simple method you can compare with? If nobody had solution for this problem, is this a problem worth solving? Reviewers also encouraged to use larger (beyond Kaggle dataset) real-world datasets to strengthen the paper. All the points raised by reviewers suggests that this paper can benefit from another round of nontrivial editing before it’s ready for the show. +",ICLR2020, +w58Lhl3bKX,1642700000000.0,1642700000000.0,1,2RYOwBOFesi,2RYOwBOFesi,Paper Decision,Reject,"The paper presented an empirical study of pre-trained models on the Out-of-distribution Generalization problem. +Authors evaluated various factors (such as model sizes, datasets, learning rate, etc) and claim some major findings: 1) larger models have better OOD generalization, and combining both larger models and larger datasets is critical; 2) smaller learning rate during fine-tuning is critical; 3) strategies improving in-distribution accuracy may hurt OOD. Overall, this paper is a well-written empirical study with some useful insights, but the new findings from the empirical studies are generally not surprising and the overall contribution is not significant enough for acceptance.",ICLR2022, +zNosDd1_DPo,1610040000000.0,1610470000000.0,1,XEyElxd9zji,XEyElxd9zji,Final Decision,Reject,"This paper explores meta-learning of local plasticity rules for ANNs. The authors demonstrate that they can meta-learn purely local learning rules that can generalize from one dataset to another (though with fairly low performance, it should be noted), and they provide some data suggesting that these rules lead to more robustness to adversarial images. The reviews were mixed, but some of the reviewers were very positive about it. Specifically, there are the following nice aspects of this work: + +A) The meta-learning scheme has interesting potential for capturing/learning biological plasticity rules, since it operates on binary sequences, which appears to be a novel approach that could help to explain things like STDP rules. + +B) It is encouraging to see that the learning rules can generalise to new tasks, even if the performance isn't great. + +C) The authors provide some interesting analytical results on convergence of the rules for the output layer. + +However, the paper suffers from some significant issues: + +1) The authors do not adequately evaluate the learned rules. Specifically: + +- The comparison to GD in Fig. 2 is not providing an accurate reflection of GD learning capabilities, since a simple delta rule applied directly to pixels can achieve better than 90% accuracy on MNIST. Thus, the claim that the learned rules are ""competitive with GD"" is clearly false. + +- The authors do not compare to any unsupervised learning rules, despite the fact that the recurrent rules are not receiving information about the labels, and are thus really a form of unsupervised learning. + +- There are almost no results regarding the nature of the recurrent rules that are learned, either experimental or analytical. Given positive point (A) above, this is particularly unfortunate and misses a potential key insight for the paper. + +2) The authors do not situate their work adequately within the meta-learning for biologically plausible rules field. There are no experimental comparisons to any other meta-learning approaches herein. Moreover, they do not compare to any known biological rules, nor papers that attempt to meta-learn them. Specifically, several papers have come out in recent years that should be compared to here: + +https://proceedings.neurips.cc/paper/2020/file/f291e10ec3263bd7724556d62e70e25d-Paper.pdf https://www.biorxiv.org/content/10.1101/2019.12.30.891184v1.full.pdf https://proceedings.neurips.cc/paper/2020/file/bdbd5ebfde4934142c8a88e7a3796cd5-Paper.pdf https://openreview.net/pdf?id=HJlKNmFIUB https://proceedings.neurips.cc/paper/2020/file/ee23e7ad9b473ad072d57aaa9b2a5222-Paper.pdf + +And, the authors should consider examining the rules that are learned and how they compare to biological rules (e.g. forms of STDP), if indeed biological insights are the primary goal. + +3) The paper needs to provide better motivation and analyses for the robustness results. Why explore robustness? What is the hypothesis about why these meta-learned rules may provide better robustness? There is little motivation provided. Also, the authors provide very little insight into why you achieved better robustness and insufficient experimental details for readers to even infer this. This section requires far more work to provide any kind of meaningful insight to a reader. What was the nature of the representations learned? How are they different from GD learned representations? Was it related to the ideas in Theorem 4? Note: Theorem 4 is interesting, but only applies to a specific form of output rule. + +4) In general, the motivations and clarity of the paper need a lot of work. What are the authors hoping to achieve? Biological insights? Then do some analyses and comparisons to biology. More robust and generalisable ML? Then do more rigorous evaluations of performance and comparisons to other ML techniques. Some combination of both? Then make the mixed target much clearer. + +5) The authors need to tidy up the paper substantially, and do better at connecting the theorems to the rest of the paper, particularly for the last 2 theorems in the appendix. Also, note, Theorems 2 & 4 appear to have no proofs. + +Given the above considerations, the AC does not feel that this paper is ready for publication. This decision was reached after some discussion with the reviewers. But, the AC and the reviewers want to encourage the authors to take these comments on board to improve their paper for future submissions, as the paper is not without merit.",ICLR2021, +d3nKPyBQdY,1576800000000.0,1576800000000.0,1,BklMDCVtvr,BklMDCVtvr,Paper Decision,Reject,"This work builds directly on McCoy et al. (2019a) and add a RNN that can replace what was human generated hypotheses to the role schemes. The final goal of ROLE is to analyze a network by identifying ‘symbolic structure’. The authors conduct sanity check by conducting experiments with ground truth, and extend the work further to apply it to a complex model. I wonder under what definition of ‘interpretable’ authors have in mind with the final output (figure 2) - the output is very complex. It remains questionable if this will give some ‘insight’ or how would humans parse this info such that it is ‘useful’ for them in some way. + +Overall, though this is a good paper, due to the number of strong papers this year, it cannot be accepted at this time. We hope the comments given by reviewers can help improve a future version. +",ICLR2020, +iGWcHKzEuzN,1610040000000.0,1610470000000.0,1,Gc4MQq-JIgj,Gc4MQq-JIgj,Final Decision,Reject,"This paper investigates safe reinforcement learning with distinct reward function and safety function. The authors present theoretical analysis and simulation results. The representation of safety is a critical step. The authors define the safety function values based on various events and use linear combination of them to construct safety score. Theoretical guarantees on safety and efficiency are presented. Simulation results also show safety and efficiency of the method. + +This was a tricky case as the paper is borderline. Based on reviewers comments, we decided that the paper is not ready for publication in its current form and would benefit from another round revisions. +",ICLR2021, +WqPKLc9clMm,1642700000000.0,1642700000000.0,1,dYUdt59fJ0e,dYUdt59fJ0e,Paper Decision,Reject,"This paper presents Yformer to perform long sequence time series forecasting based on a Y-shaped encoder-decoder architecture. Inspired by the U-Net architecture, the key idea of this paper is to improve the prediction resolution by employing skip connection and to stabilize the encoder and decoder by reconstructing the recent past. The experiment results on two datasets named ETT and ECL partially showed the effectiveness of the proposed method. + +Reviewers have common concerns about the overall technical novelty, presentation quality, and experiment details. The authors only provided a rebuttal to one reviewer and most concerns from the other three reviewers were not addressed in the rebuttal and discussion phase. The final scores were unanimously below the acceptance bar. + +AC read the paper and agreed that, while the paper has some merit such as an effective Yformer model for the particular problem setup, the reviewers' concerns are reasonable and need to be addressed in a more convincing way. The weaknesses are quite obvious and will be questioned again by the next set of reviewers, so the authors are required to substantially revise their work before resubmitting.",ICLR2022, +rJO8XyaHM,1517250000000.0,1517260000000.0,145,rJzIBfZAb,rJzIBfZAb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper presents new results on adversarial training, using the framework of robust optimization. Its minimax nature allows for principled methods of both training and attacking neural networks. + +The reviewers were generally positive about its contributions, despite some concerns about 'overclaiming'. The AC recommends acceptance, and encourages the authors to also relate this work with the concurrent ICLR submission (https://openreview.net/forum?id=Hk6kPgZA-) which addresses the problem using a similar approach. ",ICLR2018, +ZbvR6yXQeg,1576800000000.0,1576800000000.0,1,BJxH22EKPS,BJxH22EKPS,Paper Decision,Accept (Poster),"The paper reports interesting NAS patterns, supported by empirical and theoretical evidence that the pattern arises due to smooth loss landscape. Reviewers generally agree the this paper would be of interest for the NAS researchers. Some questions raised by reviewers are answered by authors with a few more extra experiments. We highly recommend authors to carefully reflect on reviewers both pros and cons of the paper before camera ready. +",ICLR2020, +ZMezDBm7IK_,1642700000000.0,1642700000000.0,1,wmQCFqV9r8L,wmQCFqV9r8L,Paper Decision,Reject,"Reviewers overall found that the paper contains novel and intriguing ideas worth further investigation. There is, however, a consensus that the paper is not ready to be published yet, for several reasons detailed in the reviews pertaining to 1) the fact that several statements should be better supported theoretically or empirically, 2) the technical derivation of the method where several choices made by the authors are surprising and not justified, and 3) the experimental results that do not clearly support the claims of the manuscript. While the authors have improved the manuscript during the discussion phase, there is still too much work to be done in order to address issues remaining. We hope the reviews will be helpful for authors to consider a revision of the paper for a future submission.",ICLR2022, +#NAME?,1642700000000.0,1642700000000.0,1,TscS0R8QzfG,TscS0R8QzfG,Paper Decision,Reject,"The experimental part of the work has been reported by all reviewers as too limited and not convincing enough. +At this point this work cannot be endorsed for publication at ICLR.",ICLR2022, +ByI6UkaHf,1517250000000.0,1517260000000.0,883,HyDAQl-AW,HyDAQl-AW,ICLR 2018 Conference Acceptance Decision,Reject,The reviewers agree that this paper suffers from a lack of novelty and does not make sufficient contributions to warrant acceptance.,ICLR2018, +Sk2UnG8ux,1486400000000.0,1486400000000.0,1,Hk85q85ee,Hk85q85ee,ICLR committee final decision,Invite to Workshop Track,"The paper analyzes the dynamics of learning under Gaussian input using dynamical systems theory. As two of the reviewers have pointed out, the paper is hard to read, and not written in a way which is accessible to the wider ICLR community. Hence, I cannot recommend its acceptance to the main conference. However, I recommend acceptance to the workshop track, since it has nice technical contributions that can lead to interesting interactions. I encourage the authors to make it more accessible for a future conference.",ICLR2017, +rkx-LhumxE,1544940000000.0,1545350000000.0,1,HkxAisC9FQ,HkxAisC9FQ,The quality of the presentation makes it hard to properly assess the quality of the results,Reject,"This paper suggests augmenting adversarial training with a Lipschitz regularization of the loss, and suggests that this improves the adversarial robustness of deep neural networks. The idea of using such regularization seems novel. However, several reviewers were seriously concerned with the quality of the writing. In particular, the paper contains claims that not only are not needed but also are incorrect. Also, the Reviewer 2 in particular was also concerned with the presentation of prior work on Lipschitz regularization. + +Such poor quality of the presentation makes it impossible to properly evaluate the actual paper contribution. ",ICLR2019,3: The area chair is somewhat confident +HyOXm16Hz,1517250000000.0,1517260000000.0,105,rkYTTf-AZ,rkYTTf-AZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This work presents some of the first results on unsupervised neural machine translation. The group of reviewers is highly knowledgeable in machine translation, and they were generally very impressed by the results and the think it warrants a whole new area of research noting ""the fact that this is possible at all is remarkable."". There were some concerns with the clarity of the details presented and how it might be reproduced, but it seems like much of this was cleared up in the discussion. The reviewers generally praise the thoroughness of the method, the experimental clarity, and use of ablations. One reviewer was less impressed, and felt more comparison should be done.",ICLR2018, +UGXTTJU-GxT_,1642700000000.0,1642700000000.0,1,pbduKpYzn9j,pbduKpYzn9j,Paper Decision,Reject,"The paper proposes a method for compressing unconditional generative models by leveraging a knowledge distillation framework. Two reviewers consider the paper slightly above the acceptance threshold for the interesting topic studied in the paper and the simplicity of the method. However, the other three reviewers consider the paper below the acceptance threshold with two reviewers rating the paper slighting below the acceptance threshold and one reviewer rating the paper as not good enough. Several issues were raised, including that the paper only contains results from one unconditional model (StyleGAN2) and that the presented results are not convincing enough. Consolidating the reviews and the rebuttal, the meta-reviewer found the concern raised by the reviewers justified. It would be more ideal if the paper can present results on different unconditional models and more datasets. The authors are encouraged to incorporate the reviewers' feedback to make the paper stronger for a future venue.",ICLR2022, +HuzCBQRZej,1610040000000.0,1610470000000.0,1,uvEgLKYMBF9,uvEgLKYMBF9,Final Decision,Reject,"This paper develops a smoothing procedure to avoid the problem of posterior collapse in VAEs. The method is interesting and novel, the experiments are well executed, and the authors answered satisfactorily to most of the reviewers' concerns. However, there is one remaining issue that would require additional discussion. As identified by Reviewer 1, the analysis in Section 3 is only valid when the number of layers is 2. Above that value, ""it is possible to construct models where the ELBO has a reasonable value, but the smoothed objective behaves catastrophically"". Thus, the scope of the analysis in Section 3 deserves further discussion. Given the large number of ICLR submissions, this paper unfortunately does not meet the acceptance bar. That said, I encourage the authors to address this point and resubmit the paper to another (or the same) venue.",ICLR2021, +xCYHP1ncRGu,1610040000000.0,1610470000000.0,1,jNhWDHdjVi4,jNhWDHdjVi4,Final Decision,Reject,"This paper presents a semi-supervised model (named CPC-VAE) that trains a variational autoencoder (VAE) and a NN classifier simultaneously. The method maximizes an ELBO subject to a task-specific prediction constraint and a consistency constraint. The constraints are defined as some expectations of the variational posteriors. Such constraints are known as posterior regularization. Though the consistency constraint seems to be new, the prediction constraint has been well-examined under deep generative models (see e.g., max-margin deep generative models for (semi-)supervised learning, IEEE TPAMI, 2018). The paper needs more a thorough analysis and comparison. ",ICLR2021, +tpPHZFpmZyZ,1642700000000.0,1642700000000.0,1,SS8F6tFX3-,SS8F6tFX3-,Paper Decision,Accept (Poster),The paper examines the advantage of using models in RL. The authors' rebuttals convinced us of the value of the paper.,ICLR2022, +ByJ3Lkarf,1517250000000.0,1517260000000.0,864,r1vccClCb,r1vccClCb,ICLR 2018 Conference Acceptance Decision,Reject,"The paper proposes a form of autoencoder that learns to predict the neighbors of a given input vector rather than the input itself. The idea is nice but there are some reviewer concerns about insufficient evaluation and the effect of the curse of dimensionality. The revised paper does address some questions and includes additional helpful experiments with different types of autoencoders. However, the work is still a bit preliminary. The area of auto-encoder variants, and corresponding experiments on CIFAR-10 and the like, is crowded. In order to convince the reader that a new approach makes a real contribution, it should have very thorough experiments. Suggestions: try to improve the CIFAR-10 numbers (they need not be state-of-the-art but should be more credible), adding more data sets (especially high-dimensional ones), and analyzing the effects of factors that are likely to be important (e.g. dimensionality, choice of distance function for neighbor search).",ICLR2018, +By3IVkaSG,1517250000000.0,1517260000000.0,362,SylJ1D1C-,SylJ1D1C-,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"This paper studies the approximation and integration of partial differential equations using convolutional neural networks. By constraining CNN filters to have prescribed vanishing moments, the authors interpret CNN-based temporal prediction in terms of 'pde discovery'. The method is demonstrated on simple convection-diffusion simulations. + +Reviewers were mixed in assessing the quality, novelty and significance of this work. While they all acknowledged the importance of future research in this area, they raised concerns about clarity of exposition (which has been improved during the rebuttal period), as well as the novelty / motivation. The AC shares these concerns; in particular, he misses a more thorough analysis of stability (under what conditions would one use this method to estimate an actual PDE and obtain some certificate of approximation?) and discussions about pitfalls (in real situations one may not know in advance the family of differential operators involved in the physical process nor the nature of the non-linearity; does the method produce a faithful approximation? why?). + +Overall, the AC thinks this is an interesting submission that is still in its preliminary stage, and therefore recommends resubmitting to the worshop track at this time.",ICLR2018, +qzQckASe_1l,1642700000000.0,1642700000000.0,1,sk63PSiUyci,sk63PSiUyci,Paper Decision,Reject,"This paper presents a variant of SARAH, which employs the stochastic recursive gradient and adjustable step-size based on local geometry. The main concerns about this paper include (1) the empirical comparison with other algorithms might not be fair (which is arguable); and (2) the theorem proved in the paper is for a simplified algorithm rather than the algorithm used in the experiments. Even after author response and reviewer discussion, this paper does not gather sufficient support from the reviewers. Thus I recommend rejection.",ICLR2022, +HJeOmrileV,1544760000000.0,1545350000000.0,1,HJxdAoCcYX,HJxdAoCcYX,Clear reviewer consensus to reject,Reject,"All reviewers recommended rejecting this submission so I will as well. However, I do not believe it is fundamentally misguided or anything of that nature. + +Unfortunately, reviewers did not participate as much in discussions with the authors as I believe they should. However, this paper concerns a relatively niche problem of modest interest to the ICLR community. I believe a stronger version of this work would be a more application-focused paper that delved into practical details about a specific case study where this work provides a clear benefit.",ICLR2019,4: The area chair is confident but not absolutely certain +b7Q-fMFPbwv,1642700000000.0,1642700000000.0,1,LI2bhrE_2A,LI2bhrE_2A,Paper Decision,Accept (Spotlight),"This paper proposes use of a novel generative modelling approach, over both sequences and structure of proteins, to co-design the CDR region of antibodies so achieve good binding/neutralization. The reviewers are in agreement that the problem is one of importance, and that the technical and empirical contributions are strong. There are concerns over the relevance of evaluating the method by using a predictive model as ground truth. Still, the overall contributions remain.",ICLR2022, +I0pESid2D6n,1642700000000.0,1642700000000.0,1,p-BhZSz59o4,p-BhZSz59o4,Paper Decision,Accept (Oral),"Inspired by BERT and the corresponding masked language modeling objective, this paper proposes masked image modeling as a pre-training technique for vision transformer. More precisely, the image is tokenized using a pre-trained tokenizer, and the goal is to predict the token indices corresponding to masked patches of the image. As noted by the reviewers, the proposed method is simple, works very well in practice and the paper is well written. Since this work potentially opens a whole new research direction, my recommendation is to accept with oral presentation.",ICLR2022, +HygoUp_ggE,1544750000000.0,1545350000000.0,1,HkgSEnA5KQ,HkgSEnA5KQ,Innovative interactive instruction setting based on language interaction,Accept (Poster),"The paper proposes a meta-learning approach to ""language guided policy learning"" where instructions are provided in the form of natural language instructions, rather than in the form of a reward function or through demonstration. A particularly interesting novel feature of the proposed approach is that it can seamlessly incorporate natural language corrections after an initial attempt to solve the task, opening up the direction towards natural instructions through interactive dialogue. The method is empirically shown to be able to learn to navigate environments and manipulate objects more sample efficiently (on test tasks) than approaches without instructions. + +The reviewers noted several potential weaknesses: while the problem setting was considered interesting, the empirical validation was seen to be limited. Reviewers noted that only one (simple) domain was studied, and it was unclear if results would hold up in more complex domains. They also note lack of comparison to baselines based on prior work (e.g., pre-training). + +The authors provided very detailed replies to the reviewer comments, and added very substantial new experiments, including an entire new domain and newly implemented baselines. Reviewers indicated that they are satisfied with the revisions. The AC reviewed the reviewer suggestions and revisions and notes that the additional experiments significantly improve the contribution of the paper. The resulting consensus is that the paper should be accepted. + +The AC would like to note that several figures are very small and unreadable when the paper is printed, e.g., figure 7, and suggests that the authors increase figure size (and font size within figures) to ensure legibility.",ICLR2019,5: The area chair is absolutely certain +Skekr57teV,1545320000000.0,1545350000000.0,1,rJ4vlh0qtm,rJ4vlh0qtm,metareview,Reject,The reviewers raised a number of major concerns including the incremental novelty of the proposed and a poor readability of the presented materials (lack of sufficient explanations and discussions). The authors decided to withdraw the paper.,ICLR2019,5: The area chair is absolutely certain +zt4p3lJx-,1576800000000.0,1576800000000.0,1,HkliveStvH,HkliveStvH,Paper Decision,Reject,The paper proposes two methods for interactive panoptic segmentation (a combination of semantic and instance segmentation) that leverages scribbles as supervision during inference. Reviewers had concerns about the novelty of the paper as it applies existing algorithms for this task and limited empirical comparison with other methods. Reviewers also suggested that ICLR may not be a good fit for the paper and I encourage the authors to consider submitting to a vision oriented conference. ,ICLR2020, +ThBtwPVv_,1576800000000.0,1576800000000.0,1,S1e5YC4KPS,S1e5YC4KPS,Paper Decision,Reject,"This paper provides an approach to improve the differentially private SGD method by leveraging a differentially private version of the lottery mechanism, which reduces the number of parameters in the gradient update (and the dimension of the noise vectors). While this combination appears to be interesting, there is a non-trivial technical issue raised by Reviewer 3 on the sensitivity analysis in the paper. (R3 brought up this issue even after the rebuttal.) This issue needs to be resolved or clarified for the paper to be published.",ICLR2020, +49pHkCCBaIK,1642700000000.0,1642700000000.0,1,RJkAHKp7kNZ,RJkAHKp7kNZ,Paper Decision,Accept (Oral),"All reviewers consistently agree on the high quality of the research presented in this paper, such that it the paper clearly is significantly above the acceptance threshold of ICLR.",ICLR2022, +HylJlCQklN,1544660000000.0,1545350000000.0,1,HJz6tiCqYm,HJz6tiCqYm,clear consensus to accept this paper,Accept (Poster),"The reviewers have all recommended accepting this paper thus I am as well. Based on the reviews and the selectivity of the single track for oral presentations, I am only recommending acceptance as a poster.",ICLR2019,5: The area chair is absolutely certain +7kJxOtZ2JW,1576800000000.0,1576800000000.0,1,BJx7N1SKvB,BJx7N1SKvB,Paper Decision,Reject,"In this work, the authors focus on the high-dimensional regime in which both the dataset size and the number of features tend to infinity. They analyze the performance of a simple regression model trained on the random features and revealed several interesting and important observations. + +Unfortunately, the reviewers could not reach a consensus as to whether this paper had sufficient novelty to merit acceptance at this time. Incorporating their feedback would move the paper closer towards the acceptance threshold.",ICLR2020, +B1XcXJTSz,1517250000000.0,1517260000000.0,197,ry8dvM-R-,ry8dvM-R-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),The proposed routing networks using RL to automatically learn the optimal network architecture is very interesting. Solid experimental justification and comparisons. The authors also addressed reviewers' concerns on presentation clarity in revisions.,ICLR2018, +8kCQnlS-FdP,1642700000000.0,1642700000000.0,1,mQDpmgFKu1P,mQDpmgFKu1P,Paper Decision,Reject,"The paper proposes a language modeling architecture based on the RNN cells leveraging Legendre memory units. The proposal is interesting, but as all the reviewers notice, the paper is not ready for the presentation in the top ML conference for several reasons: comparison with weak baselines, shallow or weak analysis of the presented results, insufficient discussion of the related work, etc. Looking forward for all the comments to be addressed by the authors. + +In the rebuttal the authors addressed some of the questions but all the reviewers think that the paper is not ready for acceptance and careful rewriting is needed. Recent research on the improved RNN mechanisms suggests that Legendre memory units and related mechanisms might be a gateway to solving several standard issues of training regular RNNs so the topic is definitely of great importance. Thus the authors are highly encouraged to resubmit the paper after making all suggested corrections.",ICLR2022, +p8tw84EYvI,1642700000000.0,1642700000000.0,1,0Tnl8uBHfQw,0Tnl8uBHfQw,Paper Decision,Reject,"This paper studies the combination between model uncertainty and data uncertainty based on the spectral-normalized Gaussian process. Empirical results show the effectiveness of the proposed method. Overall, the paper is well-motivated and well-written. However, there are several concerns about the paper. (1) The novelty is marginal. The contribution of combining SNGP and heteroscedastic models into a single model may not be enough. (2) More analyses and insights are needed on why the mentioned two types of uncertainty are complementary. (3) More recent state-of-the-art methods on classification with noisy labels are suggested to be included to interest the readers. There are diverse scores. However, no one wants to champion the paper. We believe that the paper will be a strong one by addressing the concerns.",ICLR2022, +BJeDbnSNx4,1545000000000.0,1545350000000.0,1,Hke20iA9Y7,Hke20iA9Y7,Good paper on fast stochastic learning of embedding models.,Accept (Poster),"This paper presents methods to scale learning of embedding models estimated using neural networks. The main idea is to work with Gram matrices whose sizes depend on the length of the embedding. Building upon existing works like SAG algorithm, the paper proposes two new stochastic methods for learning using stochastic estimates of Gram matrices. + +Reviewers find the paper interesting and useful, although have given many suggestions to improve the presentation and experiments. For this reason, I recommend to accept this paper. + +A small note: SAG algorithm was originally proposed in 2013. The paper only cites the 2017 version. Please include the 2013 version as well. +",ICLR2019,5: The area chair is absolutely certain +aegG7B5Kazq,1610040000000.0,1610470000000.0,1,n1HD8M6WGn,n1HD8M6WGn,Final Decision,Accept (Poster),"This paper proposes fine-grained layer attention to evaluate the contribution of individual encoder layers. This departs from the standard transformer architecture where the decoder uses only the final encoder layer. This paper investigates how encoder layer fusion works, where the decoder layers have access to information for various encoder layers. The main finding of the paper is that the encoder embedding layer is particularly important. They propose SurfaceFusion, which only connects the encoder embedding layer to the softmax layer of decoders, leading to accuracy gains. + +There was some disagreement among reviewers about this paper. Overall, I found this a simple but effective contribution with interesting findings that can help future research in seq2seq models. Some of the weaknesses (discussing other relevant works, discussing other variants of FGLA, adding new experimental results) have been addressed in the updated version of the paper. One of the reviewers suggested running additional experiments on GLUE-style tasks (with a masked language model) to be really sure if the technique is convincing, and particularly try it with larger models (T5 was suggested). While adding those experiments would be a plus, I disagree that this is crucial - this paper is focusing on seq2seq tasks and is already considering several tasks: summarization, MT, and grammar correction. The results found by this paper are interesting and can foster future research extending this beyond these 3 tasks. Even if larger models can make the improvements smaller, there are many inconveniences in just increasing scale (memory consumption, energy consumption, etc.) It is my opinion that the community should value research that tries to understand the weaknesses of smaller models, rather than relying on large scale models to solve all problems.",ICLR2021, +Tuf_RS5drLh,1642700000000.0,1642700000000.0,1,3jooF27-0Wy,3jooF27-0Wy,Paper Decision,Accept (Poster),"This submission proposes a method for learning convolutional filters with trainable size, that builds on top of multiplicative filter networks. Anti-aliasing is achieved by parametrization with anisotropic Gabor filters. The reviewers were unanimous in their opinion that the paper is suitable for acceptance to ICLR. The authors are encouraged to make use of the extensive reviewer discussion in improving the final version of the paper.",ICLR2022, +H1ZaVJarz,1517250000000.0,1517260000000.0,448,HyTrSegCb,HyTrSegCb,ICLR 2018 Conference Acceptance Decision,Reject,"The pros and cons of this paper cited by the reviewers can be summarized below: + +Pros: +* Empirical results demonstrate decent improvements over other reasonable models +* The method is well engineered to the task + +Cons: +* The paper is difficult to read due to grammar and formatting issues +* Experiments are also lacking detail and potentially difficult to reproduce +* Some of the experimental results are suspect in that the train/test accuracy are basically the same. Usually we would expect train to be much better in highly parameterized neural models +* The content is somewhat specialized to a particular task in NLP, and perhaps of less interest to the ICLR audience as a whole (although I realize that ICLR is attempting to cast a broad net so this alone is not a reason for rejection of the paper) + +In addition to the Cons cited by the reviewers above, I would also note that there is some relevant work on morphology in sequence-to-sequence models, e.g.: +* ""What do Neural Machine Translation Models Learn about Morphology?"" Belinkov et al. ACL 2017. + +and that it is common in sequence-to-sequence models to use sub-word units, which allows for better handling of morphological phenomena: +* ""Neural Machine Translation of Rare Words with Subword Units"" Sennrich et al. ACL 2016. + +While the paper is not without merit, given that the cons seem to significantly outweigh the pros, I don't think that it is worthy of publication at ICLR at this time, although submission to a future conference (perhaps NLP conference) seems warranted.",ICLR2018, +rAiKglaZgj3,1642700000000.0,1642700000000.0,1,KeI9E-gsoB,KeI9E-gsoB,Paper Decision,Accept (Poster),"The title of the paper nicely summarizes the main goal of the paper and the abstract does the same for the achieved results. For this reason I abstain from providing another summary. + +The initial reviews were somewhat mixed but during the discussion phase, a lot of questions have been resolved so that actually three reviewers updated (upgraded) their score. Remark 14 certainly needs to be updated according to the discussion in the final few days of the rebuttal phase. In addition, one reviewer pointed to a naive application of Mercer's theorem. This should be addressed as well, either by restricting to compact domains and continuous kernels as suggested by the reviewer, or by considering generalizations as done by e.g. the cited Fischer and Steinwart. Finally, the cited survey by Kanagawa et al also contains some information on learning curves and thus it should be cited more prominently, e.g. around Remark 14. + +In any case, this paper is above the acceptance threshold.",ICLR2022, +7bYGeZA4pa,1642700000000.0,1642700000000.0,1,CrCvGNHAIrz,CrCvGNHAIrz,Paper Decision,Accept (Poster),"This paper proposes monotonic graph neural networks (MGNNs) for the transformation of knowledge graphs. Specifically, MGNNs transform a knowledge graph into a colored graph where each node is represented by a numeric feature vector and each edge encodes the node relationship with different colors. The authors provide theoretical analysis showing that monotonic constraint can enable the model to derive logical inference rules in Datalog, and thus the trained model is explainable. + + +The authors addressed most of the concerns raised by the reviewers, such as motivation, runtime, and comparison with existing baselines. Three of the four reviewers are positive (with the scores of 6 or above) towards acceptance after rebuttal discussions, and the remaining reviewer gives a score of 5 (below acceptance threshold) thinks that this work still lacks novelty, but he/she is not against acceptance if other reviewers choose so. Considering this work makes a good exploration on explainable graph neural networks, which is an interesting and important research direction, we recommend for acceptance. We thank the reviewers and the authors for their active discussion.",ICLR2022, +yvcLpJBPkV,1610040000000.0,1610470000000.0,1,21aG-pxQWa,21aG-pxQWa,Final Decision,Reject,"The paper introduces an approach to counterfactual fairness based on data pre-processing, and compare it to other two counterfactual fairness approaches on the Adult and COMPAS datasets. + +The reviewers are in agreement that, in its current state, the paper should not be accepted for publication at the venue. Their main concerns are around the metric used to measure fairness, and these were not resolved during the discussion. The reviewers would have also appreciated more experiments on real-world datasets to get a more comprehensive comparison of the methods. Finally, discussion and comparison with other methods to achieve counterfactual fairness from the literature were limited. ",ICLR2021, +NbBLeBZLyZr,1642700000000.0,1642700000000.0,1,8wI4UUN5RxC,8wI4UUN5RxC,Paper Decision,Reject,"The paper proposes a variational inference based on singular learning theory (SLT), where the resolution of singularity is learned by normalizing flow so that the latent distribution is factorized. + +Pros: +- A unique idea to use SLT for variational inference. + +Cons (only serious concerns): +- Goal is unclear. The authors say that they propose variational inference based on SLT. But apparently, they propose it not as an alternative to the state-of-the-art variational inference for neural networks (if so the experiments shown are far from the acceptable level). The authors must clearly say for what purpose they propose a new method. I would guess the proposed method is for analyzing singular models to compute their RLCT. In that case, the authors should compare with existing methods for evaluating RLCT, e.g., MCMC based methods: + +K. Nagata and S. Watanabe, ""Exchange Monte Carlo Sampling From Bayesian Posterior for Singular Learning Machines,"" in IEEE Transactions on Neural Networks, vol. 19, no. 7, pp. 1253-1266, July 2008, doi: 10.1109/TNN.2008.2000202. + +and discuss pros and cons of the proposed method. For DNN, you should use the state-of-the-art MCMC sampling methods like + +Wenzel, F., Roth, K., Veeling, B., Swiatkowski, J., Tran, L., Mandt, S., Snoek, J., Salimans, T., Jenatton, R. & Nowozin, S.. (2020). How Good is the Bayes Posterior in Deep Neural Networks Really?. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:10248-10259 Available from https://proceedings.mlr.press/v119/wenzel20a.html. + +as a baseline. Approximating the posterior with normalizing flow can be another baseline. + +- Large n issue. SLT can be seen as a generalization of the asymptotic learning theory for the regular model, where the model complexity is represented by the parameter dimension d, and ""asymptotic"" means n >> d. Watanabe revealed that the model complexity cannot be represented by d in singular models, and therefore the definition of ""asymptotic"" is not as clear as the regular case. But it is known that typical neural networks are overparameterized and can achieve zero training error. I have seen no work arguing that SLT holds in this regime. If the authors insist that their method is applicable to deep neural networks, they should cite references where it would be proved that SLT holds in the overparameterized regime or prove it by themselves. + +There are many more concerns including those pointed out by reviewers, and the paper is not ready for publication.",ICLR2022, +8IqBrOmVuzk,1642700000000.0,1642700000000.0,1,p3DKPQ7uaAi,p3DKPQ7uaAi,Paper Decision,Accept (Poster),"This paper has been evaluated by three reviewers with 2 borderlines leaning towards the accept, and with 1 accept. The reviewers have noted that the idea of alignment is not particularly novel per se. Nonetheless, they found some merit in the use of a network learning the alignment and they liked experiments. + +AC has however some concerns about this work. Firstly, it is not clear why Lifted+SoftDTW and Binomial+SoftDTW completely fail in Table 1, and in Table 5, SoftDTW is worse by 30% than TAP. Is soft-DTW set up properly in these experiments (the softmax temperature, the base distance used, the maximum number of steps away from the main path etc.)? + +AC is also not convinced about the principled nature of the proposed alignment. Eq. 3 and the residual design above seem more as heuristics than a principled OT transport as Eq. 1 and 2 set out to suggest. With concatenation of distances between sequence features and positional encoding, the proposed alignment seems more similar to attention and transformers than OT.",ICLR2022, +HkLG6GI_x,1486400000000.0,1486400000000.0,1,r17RD2oxe,r17RD2oxe,ICLR committee final decision,Reject,"The reviewers agree that the paper provides a creative idea of using Computer Vision in Biology by building ""the tree of life"". However, they also agree that the paper in its current form is not ready for publication due to limited novelty and unclear impact/application. The authors did not post a rebuttal to address the concerns. The AC agrees with the reviewers', and encourages the authors to improve their manuscript as per reviewers' suggestions, and submit to a future conference.",ICLR2017, +AfCQMgT8b,1576800000000.0,1576800000000.0,1,r1lkKn4KDS,r1lkKn4KDS,Paper Decision,Reject,"This paper presents a novel option discovery mechanism through incrementally learning reusasble options from a small number of policies that are usable across multiple tasks. + +The primary concern with this paper was with a number of issues around the experiments. Specifically, the reviewers took issue with the definition of novel tasks in the Atari context. A more robust discussion and analysis around what tasks are considered novel would be useful. Comparisons to other option discovery papers on the Atari domains is also required. + +Additionally, one reviewer had concerns on the hard limit of option execution length which remain unresolved following the discussion. + +While this is really promising work, it is not ready to be accepted at this stage.",ICLR2020, +B1IRiMIdg,1486400000000.0,1486400000000.0,1,rJLS7qKel,rJLS7qKel,ICLR committee final decision,Accept (Oral),"This paper details the approach that won the VizDoom competition - an on-policy reinforcement learning approach that predicts auxiliary variables, uses intrinsic motivation, and is a special case of a universal value function. The approach is a collection of different methods, but it yields impressive empirical results, and it is a clear, well-written paper.",ICLR2017, +SMHk24qERHq,1610040000000.0,1610470000000.0,1,S9MPX7ejmv,S9MPX7ejmv,Final Decision,Reject,"The paper studied multi-objective reinforcement learning (MORL), and provided a Bayesian optimization approach for challenging MORL scenarios in several simulation environments. The reviewers generally find it interesting to account for robustness in a MORL setup, and all appreciate the algorithmic contributions. However, there were shared critical concerns among \ reviewers in the technical clarity and positioning of the work. + +The paper has gone through substantial changes during the rebuttal period, which addressed some concerns regarding the experimental details; however, the major revision raised further issues that affects the clarity of the work. The reviewers are hence unconvinced that the paper is ready for publication. In addition to addressing the existing comments on clarifying the experimental details and properly positioning the work against prior art, a reorganization and optimization of the main content would be beneficial for future submission. +",ICLR2021, +Wx-NAgkJtp,1610040000000.0,1610470000000.0,1,I-VfjSBzi36,I-VfjSBzi36,Final Decision,Reject,"It is important to develop efficient training methods for BERT like models since they have been widely used in real-world natural language processing tasks. The proposed approach is interesting. It speeds up BERT training via identifying lottery tickets in the early stage of training. We agree with the authors's rebuttal that autoML is not that related to the work here. Our main concern on this work is its worse-than-BERT performance showed in Table 2. The performance gap is significant. Sufficiently more training steps would fill the performance gap but the proposed method may have no advantage any more over the normal training procedures. To make this work more convincing, we would like to suggest to include experiments on comparing different methods under similar prediction performance. In addition, since the main claim of this work is for training efficiency, it will be helpful to show the advantage of this method by directly presenting the training curves/ results of different methods. Overall this paper is pretty much on the boundary. We encourage the authors to resubmit this work once these issues are well resolved. ",ICLR2021, +qBjF2wIQAOF,1642700000000.0,1642700000000.0,1,NP9T_pViXU,NP9T_pViXU,Paper Decision,Reject,"None of the reviewers recommended this paper. There were concerns that it is hard to draw meaningful conclusions from the experimental work due to the comparisons provided. While the design of the block masking + contrastive learning proposed in this paper was rated as potentially being quite important, there remained some concern that subsequent tokenization steps could be problematic for ""spatial heavy"" datasets. + +The AC recommends rejection.",ICLR2022, +B1skBkTBG,1517250000000.0,1517260000000.0,483,ry4S90l0b,ry4S90l0b,ICLR 2018 Conference Acceptance Decision,Reject,"The paper presents self-training scheme for GANs. The proposed idea is simple but reasonable, and the experimental results show promise for MNIST and CIFAR10. However, the novelty of the proposed method seems relatively small and experimental results lack comparison against other stronger baselines (e.g., state-of-the-art semi-supervised methods). Presentation needs to be improved. More comprehensive experiments on other datasets would also strengthen the future version of the paper. ",ICLR2018, +HkOSVkaSz,1517250000000.0,1517260000000.0,345,rJWrK9lAb,rJWrK9lAb,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"The reviewers (all experts in this area) appreciated the novelty of the idea, though they felt that the experimental results (samples and Inception scores) did not provide convincing evidence value of this method over already established techniques. The authors responded to the concerns but were not able to address the issue of evaluation due to time constraints. The idea is likely sound but evaluation does not meet the bar, it may make a good contribution as a workshop paper.",ICLR2018, +saeTXLcdBsZ,1610040000000.0,1610470000000.0,1,PO0SuuafSX,PO0SuuafSX,Final Decision,Reject,"Description: +The paper presents a method for encoding a compressed version of an implicit 3D scene, from given images from arbitrary view points. This is achieved via a function, learning with a NeRF model, that maps spatial coordinates to a radiance vector field and is optimized for high compressibility and low reconstruction error. Results shows better compression, higher reconstruction quality and lower bitrates compared to other STOA. + +Strengths: +- Method for significantly compressing NerF models, which is very useful since such models are often trained for every new scene +- Retain reconstruction quality after compression by an order of magnitude + +Weaknesses: +- The need for decompressing the model before rendering can be done means reduced rendering speed. This also requires longer training times. +- Experiments against other scene compression + neural rendering technique will have further strengthened the papers’s claims +- The techniques used are well established, and thus there is not as much technical novelty. +",ICLR2021, +J1oy21VBbHt,1610040000000.0,1610470000000.0,1,T6RYeudzf1,T6RYeudzf1,Final Decision,Reject,"This paper proposes a new method for label-free text style transfer. The method employs the pre-trained language model T5 and makes an assumption that two adjacent sentences in a document have the same style. Experimental results show satisfying results compared with supervised methods. + +Pros. • The paper is generally clearly written. • The proposed method appears to be new. • Experiments have been conducted. + +Cons • The fundamental assumption of the method is not convincing enough. (Issue 1 of R3, Issue 4 of R4, Issue 1 of R2) • The proposed model is also not convincing enough. (Issues 2 and 3 of R3, Issue 3 of R2) • There are problems with the experiments. For example, it would be better to use more datasets in the experiments. (Issue 4 of R3, Issue 2 of R4) + +Discussions have been made among the reviewers. The reviewers appreciate the efforts made by the authors in the rebuttal, including the additional experiments. However, they are not fully convinced and still feel that the submission is not strong enough as an ICLR paper. + +",ICLR2021, +3s7erMNj-A8,1642700000000.0,1642700000000.0,1,R5sVzzXhW8n,R5sVzzXhW8n,Paper Decision,Reject,"This paper presents an analysis of the robustness of self-supervised learning (SSL) features to noisy labels in downstream supervised learning, and provides empirical verification of the results (mostly in the symmetric noise setup); a SSL regularization scheme is also analyzed (section 4). While the paper contains plausible insights, the reviews share similar concerns that the analysis is mainly based on the noise being symmetric, and that the SSL features already have good class separation and Gaussian clusters, which are strong assumptions. Given that the assumptions are not theoretically verified, and that there is not sufficient empirical results in heavy non-symmetric noise scenario on large benchmark datasets, the reviewers think the paper does not provide practical guidance for noise label learning in its current form.",ICLR2022, +rJgqyiqHl4,1545080000000.0,1545350000000.0,1,r1lpx3A9K7,r1lpx3A9K7,Reject,Reject,The reviewers agree the paper is not ready for publication. ,ICLR2019,5: The area chair is absolutely certain +SJunsM8ug,1486400000000.0,1486400000000.0,1,HJ6idTdgg,HJ6idTdgg,ICLR committee final decision,Reject,Four knowledgable reviewers recommend rejection due to too weak of a contribution. The authors did not post a rebuttal. The AC agrees with the reviewers' recommendation.,ICLR2017, +BJg3eiqSgE,1545080000000.0,1545350000000.0,1,H1g0piA9tQ,H1g0piA9tQ,Reject,Reject,The reviewers agree the paper is not ready for publication. ,ICLR2019,5: The area chair is absolutely certain +IBVseua3oI,1576800000000.0,1576800000000.0,1,HklFUlBKPB,HklFUlBKPB,Paper Decision,Reject,"This article studies the identifiability of architecture and weights of a ReLU network from the values of the computed functions, and presents an algorithm to do this. This is a very interesting problem with diverse implications. The reviewers raised concerns about the completeness of various parts of the proposed algorithm and the complexity analysis, some of which were addressed in the author's response. Another concern raised was that the experiments were limited to small networks, with a proof of concept on more realistic networks missing. The revision added experiments with MNIST. Other concerns (which in my opinion could be studied separately) include possible limitations of the approach to networks with no shared weights nor pooling. The reviewers agree that the article concerns an interesting topic that has not been studied in much detail yet. Still, the article would benefit from a more transparent presentation of the algorithm and theoretical analysis, as well as more extensive experiments. ",ICLR2020, +rktEnzLug,1486400000000.0,1486400000000.0,1,SJiFvr9el,SJiFvr9el,ICLR committee final decision,Reject,"This paper proposes a to use squared modulus nonlinearities within convolutional architectures. Because point-wise squaring can be written as a convolution in the Fourier domain, when doing all the operations in the Fourier this architecture becomes 'dual': convolutions become pointwise operations, and pointwise square-nonlinearities become convolutions. + The authors study this architecture in the context of scattering transforms and produce a complexity analysis that exploits the previous property, along with preliminary numerical experiments. + + All reviewers agreed that, while this is an interesting paper with potentially useful outcomes, its exposition and current experimental section are insufficient. The AC agrees with this assessment, and therefore recommends rejection. + I agree that the main unanswered question and a 'show-stopper' is the lack of comparisons with its most immediate baseline, scattering using complex modulus, both in terms of accuracy and computational complexity.",ICLR2017, +H1gy0VoR14,1544630000000.0,1545350000000.0,1,SJzqpj09YQ,SJzqpj09YQ,"Some presentation issues, but practical value for large-scale eigen computations",Accept (Poster),"The paper proposes a deep learning framework to solve large-scale spectral decomposition. + +The reviewers and AC note that the paper is quite weak from presentation. However, technically, the proposed ideas make sense, as Reviewer 1 and Reviewer 2 mentioned. In particular, as Reviewer 1 pointed out, the paper has high practical value as it aims for solving the problem at a scale larger than any existing method. Reviewer 3 pointed out no comparison with existing algorithms, but this is understandable due to the new goal. + +In overall, AC thinks this is quite a boarderline paper. But, AC tends to suggest acceptance since the paper can be interested for a broad range of readers if presentation is improved.",ICLR2019,4: The area chair is confident but not absolutely certain +yxG2CveYIu,1576800000000.0,1576800000000.0,1,Hyl7ygStwB,Hyl7ygStwB,Paper Decision,Accept (Poster),"The authors propose a novel way of incorporating a large pretrained language model (BERT) into neural machine translation using an extra attention model for both the NMT encoder and decoder. The paper presents thorough experimental design, with strong baselines and consistent positive results for supervised, semi-supervised and unsupervised experiments. The reviewers all mentioned lack of clarity in the writing and there was significant discussion with the authors. After improvements and clarifications, all reviewers agree that this paper would make a good contribution to ICLR and be of general use to the field. ",ICLR2020, +Syl53KTgg4,1544770000000.0,1545350000000.0,1,rJlJ-2CqtX,rJlJ-2CqtX,some novelty; muted endorsements; solid writing and results; revisit?,Reject,"Strengths: The paper introduces a novel constrained-optimization method for RL problems. +A lower-bound constraint can be imposed on the return (cumulative reward), +while optimizing one or more other costs, such as control effort. +The method learns multiple +The paper is clearly written. Results are shown on the cart-and-pole, a humanoid, and a realistic Minitaur +quadruped model. AC: Being able to learn conditional constraints is an interesting direction. + +Weaknesses: There are often simpler ways to solve the problem of high-amplitude, high-frequency +controls in the setting of robotics. +The paper removes one hyperparameter (lambda) but then introduces another (beta), although beta +is likely easier to tune. The ideas have some strong connections to existing work in +safe reinforcement learning. +AC: Video results for the humanoid and cart-and-pole examples would have been useful to see. + +Summary: The paper makes progress on ideas that are fairly involved to explore and use +(perhaps limiting their use in the short term), but that have potential, +i.e., learning state-dependent Lagrange multipliers for constrained RL. The paper is perfectly fine +technically, and does break some new ground in putting a particular set of pieces together. +As articulated by two of the reviewers, from a pragmatic perspective, the results are not +yet entirely compelling. I do believe that a better understanding of working with constrained RL, +in ways that are somewhat different than those used in Safe RL work. + +Given the remaining muted enthusiasm of two of the reviewers, and in the absence of further +calibration, the AC leans marginally towards a reject. Current scores: 5,6,7. +Again, the paper does have novelty, although it's a pretty intricate setup. +The AC would be happy to revisit upon global recalibration. +",ICLR2019,3: The area chair is somewhat confident +Va_IHM9Xhv,1576800000000.0,1576800000000.0,1,HJgExaVtwr,HJgExaVtwr,Paper Decision,Accept (Poster),"This paper proposes an algorithm for noisy labels by adopting an idea in the recent semi-supervised learning algorithm. + +As two problems of training noisy labels and semi-supervised ones are closely related, it is not surprising to expect such results as pointed out by reviewers. However, reported thorough experimental results are strong and I think this paper can be useful for practitioners and following works. + +Hence, I recommend acceptance.",ICLR2020, +xIu5RwzH6s,1576800000000.0,1576800000000.0,1,SJxE8erKDH,SJxE8erKDH,Paper Decision,Accept (Poster),"This paper addresses the problem of many-to-many cross-domain mapping tasks with a double variational auto-encoder architecture, making use of the normalizing flow-based priors. + +Reviewers and AC unanimously agree that it is a well written paper with a solid approach to a complicated real problem supported by good experimental results. There are still some concerns with confusing notations, and with human study to further validate their approach, which should be addressed in a future version. + +I recommend acceptance.",ICLR2020, +r11H3GL_x,1486400000000.0,1486400000000.0,1,BkSqjHqxg,BkSqjHqxg,ICLR committee final decision,Reject,"The idea of applying skip-graphs to this graph domain to learn embeddings is good. The results demonstrate that this approach is competitive, but do not show a clear advantage. This is difficult, as the variety of approaches in this area is rapidly increasing. But comparisons to other methods could be improved, notably deep graph kernels.",ICLR2017, +66z9o7F9bqs7,1642700000000.0,1642700000000.0,1,CZZ7KWOP0-M,CZZ7KWOP0-M,Paper Decision,Reject,"This paper develops a hybrid search space consisting of both multiplication-based and multiplication-free operators. It also presents a weight-sharing mechanism for searching in the introduced search space. + +Pros: +* A hybrid search space is developed. +* Strong empirical results are reported for both CV and NLP tasks. +* The paper is well written and is easy to follow. + +Cons: +* Incremental technical novelty. +* Missing baselines and competing methods. +* Missing information on the search cost. +* Lack of insights into the discovered architectures + +The rebuttal has provided most missing information and comparisons, and it has provided additional insights into the searched architectures. However, the reviewers still rate this paper at borderline primarily due to the limited technical novelties. Unfortunately, given these concerns, this submission does not meet the bar for acceptance at ICLR.",ICLR2022, +Byep-80gxE,1544770000000.0,1545350000000.0,1,H1MBuiAqtX,H1MBuiAqtX,Significant concerns with current presentation clarity,Reject,"The authors present an interesting approach but there were multiple significant concerns with the clarity of the presentation, and some concern with the significance of the experimental results.",ICLR2019,3: The area chair is somewhat confident +Skz1D1aBG,1517250000000.0,1517260000000.0,907,HyxjwgbRZ,HyxjwgbRZ,ICLR 2018 Conference Acceptance Decision,Reject,"Dear authors, + +After carefully reading the reviews, the rebuttal, and going through the paper, I regret to inform you that this paper does not meet the requirements for publication at ICLR. + +While the variance analysis is definitely of interest, the reality of the algorithm does not match the claims. The theoretical rate is worse than that of SG but this could be an artefact of the analysis. Sadly, the experimental setup lacks in several ways: +- It is not yet clear whether escaping the saddle points is really an issue in deep learning as the loss function is still poorly understood. +- This analysis is done in the noiseless setting despite your argument being based around the variance of the gradients. +- You report the test error on CIFAR-10. While interesting and required for an ML paper, you introduce an optimization algorithm and so the quantity that matters the most is the speed at which you achieve a given training accuracy. Also, your table lists the value of the test accuracy rather than the speed of increase. Thus, you test the generalization ability of your algorithm while making claims about the optimization performance.",ICLR2018, +BJlqBUqGeV,1544890000000.0,1545350000000.0,1,HJex0o05F7,HJex0o05F7,Lacks demonstrated research contribution beyond past work,Reject,"Following the unanimous vote of the reviewers, this paper is not ready for publication at ICLR. The most significant concern raised is that there does not seem to be an adequate research contribution. Moreover, unsubstantiated claims of novelty do not adequately discuss or compare to past work.",ICLR2019,5: The area chair is absolutely certain +hv0TZQNchU,1576800000000.0,1576800000000.0,1,B1lFa3EFwB,B1lFa3EFwB,Paper Decision,Reject,"The paper proposes a modification to improve adversarial invariance induction for learning representations under invariance constraints. The authors provide both a formal analysis and experimental evaluation of the method. The reviewers generally agree that the experimental evaluation is rigorous and above average, but the paper lacks clarity making it difficult to judge the significance of it. Therefore, I recommend rejection, but encourage the authors to improve the presentation and resubmit.",ICLR2020, +gKb7NLHqH1E,1642700000000.0,1642700000000.0,1,mdUYT5QV0O,mdUYT5QV0O,Paper Decision,Accept (Poster),"The paper develops optimization algorithms for fitting structured neural networks. It focuses on the manifold identification property, which guarantees after finitely many iterations, all iterates have the same sparsity structure as at convergence. The proposed method extends dual averaging to include momentum. The paper’s analysis shows that if the proposed method converges, it converges to a stationary point, and identifies the sparsity pattern of the limit in finitely many iterations. Experiments show improvements in sparsification compared to existing two-step sparsifiers, without a degradation in accuracy. + +The initial review raised concerns about clarity, as well as some of the claimed significance of the results: the paper does not prove that an optimal, or even good sparsity structure is obtained — rather, it proves that the sparsity structure at convergence is obtained after finitely many iterations. The reviewers also raised a number of detailed concerns about the paper’s mathematical exposition. + +After considering the authors response, and a revision which significantly clarified both the paper’s notation and its main claims, the reviewers converged to a recommendation to accept. The paper provides a principled approach to sparsification, with supporting theory (albeit about finite identification, rather than optimality). The proposed algorithm appears quite practical and is supported by experiments demonstrating improvements over existing sparsification methods.",ICLR2022, +WWSpvlD1C2,1610040000000.0,1610470000000.0,1,99M-4QlinPr,99M-4QlinPr,Final Decision,Reject,"This paper investigate an interesting problem of multi-agent RL with self-play. We agree with the reviewers that the paper requires more work before it can be presented at a top conference. We would encourage the authors to use the reviewers' feedback to improve the paper and resubmit to one of the upcoming conferences. +",ICLR2021, +T0ytmBXEG6,1576800000000.0,1576800000000.0,1,HJxV5yHYwB,HJxV5yHYwB,Paper Decision,Reject,"The paper considers planning through the lenses both of a single and multiple objectives. The paper then discusses the pareto frontiers of this optimization. While this is an interesting direction, the reviewers feel a more careful comparison to related work is needed.",ICLR2020, +JM28ygh4bZO,1642700000000.0,1642700000000.0,1,ciSap6Cw5mk,ciSap6Cw5mk,Paper Decision,Reject,"This manuscript proposes a ranking approach to identify Byzantine agents in federated learning. Distinct from existing methods, the mitigation is implemented by computing ranks for each gradient, then computing rank statistics across agents. The primary intuition is that adversarial agents can be identified by examining these rank statistics. + +There are three reviewers, all of whom agree that the method addresses an interesting and timely issue -- giving the growing interest in both Byzantine-robust learning and federated learning in the community. However, reviewers are mixed on the paper score -- with a strong accept a weak accept, and a strong reject. Common issues raised include the generality of the approach beyond the outlined attacks, +Other issues brought up, but addressed in the rebuttal include some weaknesses in the evaluation and comparison to additional baselines. There is also an interesting discussion of using higher-order statistics, which does not seem to help the methods when evaluated by the authors. Nevertheless, after reviews and discussion, the reviewers are mixed at the end of the discussion. + +The area chair finds, first, that the paper is much improved, and much more applicable in the updated form than in the original version. However, the area chair agrees with the reviewer who notes that the moniker ""Byzantine-robust"" implies the methods should be provably robust to worst-case adversaries, not only to a selected set of adversaries with pre-selected attacks. The specified setting may be too narrow for interest by the community. To this end, the area chair suspects that the method may be robust to a more general set of attacks than noted -- working to outline sufficient conditions for robustness would significantly strengthen this work. The asymptotic nature of the robustness guarantees is also of concern. + +An additional concern of the area chair is that the system setting investigated assumes gradient communication and IID data across devices. While this is not an issue on its own, the setting is closer to distributed learning than federated learning, where one generally communicates model updates, or model differences after multiple local updates, and not gradients. This difference can have a significant effect on robustness methods that depend on identifying benign vs. adversarial statistics of parameters. Non-IID data is also common in the federated setting, though this is less concerning, as robust methods for non-IID settings are only now emerging. A simple fix for this issue would be to rename the setting from ""Federated"" to ""Distributed."" + +Authors are encouraged to address the highlighted technical concerns in any future submission of this work. The primary concern may simply be a naming issue (i.e., removing ""Byzantine"" might fix this concern. Nevertheless, taken together, the opinion of the area chair is that the manuscript is not ready for publication. Again, the area chair believes that many of the issues noted can be fixed, the paper can be strengthened, and this paper may be publishable with limited additional work.",ICLR2022, +004KdyPaY3,1576800000000.0,1576800000000.0,1,HygHtpVtPH,HygHtpVtPH,Paper Decision,Reject,"The main idea proposed by the work is interesting. The reviewers had several concerns about applicability and the extent of the empirical work. The authors responded to all the comments, added more experiments, and as reviewer 2 noted, the method is interesting because of its ability to handle local noise. Despite the author's helpful responses, the ratings were not increased, and it is still hard to assess the exact extent of how the proposed approach improves over state of the art. Because some concerns remained, and due to a large number of stronger papers, this paper was not accepted at this time.",ICLR2020, +ndlQosI-4Qv,1642700000000.0,1642700000000.0,1,SsPCtEY6yCl,SsPCtEY6yCl,Paper Decision,Accept (Spotlight),"This is a deep theoretical paper with results that I consider very interesting. I have *not* had time to check them myself, but I have background in these theoretical matters and the results seem reasonable to me - the hardness of even checking the quality of a solution is well known for partition functions (as well as hardness of any reasonable approximation), but the undecidability seems new - I assume it comes naturally and it is a very interesting result - I have seen similar decidability issues for #P: general probabilistic polynomial time Turing machines (it is unclear if a connection was sought here). Reviewers are all positive about the content, and authors have acknowledge some points for improvement.",ICLR2022, +3WYjbTMNdM,1642700000000.0,1642700000000.0,1,IDwN6xjHnK8,IDwN6xjHnK8,Paper Decision,Accept (Poster),"The paper aims to improve image and video compression keeping in mind computation cost. In this regard, authors propose variants of Swin-Transformer for image and video coding. The experimental results shows that Transformer based transforms can replace Conv based transforms in image and video compression, and simultaneously achieving better rate-distortion performance at much faster decode times, i.e. resulting in a better rate-distortion-computation trade-off. We thank the reviewers and authors for engaging in an active discussion. The reviewers found some results to be surprising (in a good way) and are in a consensus that the empirical results are strong across datasets for image and video compression. For completeness, the authors should provide FLOPs or CPU runtimes in the final version so that one can compare to methods like VTM even if CPU is not the desired hardware for proposed method.",ICLR2022, +HJc3myprM,1517250000000.0,1517260000000.0,224,B18WgG-CZ,B18WgG-CZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper presents a very cool setup for multi task learning for learning fixed length representations for sentences. Although the authors accept the fact that fixed length representations may not be suitable for complex, long pieces of text (often, sentences), such representations may be useful for several tasks. They use a significantly large scale setup with six interesting tasks and show that learning generic representations for sentences across tasks is useful than learning in isolation. Two out of three reviewers presented extensive critique of the paper and there's thorough back and forth between the reviewers and the authors. The committee believes that this paper will add positive value to the conference.",ICLR2018, +BJgxwub2yV,1544460000000.0,1545350000000.0,1,ByxLl309Ym,ByxLl309Ym,Not well motivated and lack of novel contribution,Reject,"This paper proposes to approximate arbitrary conditional distribution of a pertained VAE using variational inferences. The paper is technically sound and clearly written. A few variants of the inference network are also compared and evaluated in experiments. + +The main problems of the paper are as follows: +1. The motivation of training an inference network for a fixed decoder is not well explained. +2. The application of VI is standard, and offers limited novelty or significance of the proposed method. +3. The introduction of the new term cross-coding is not necessary and does not bring new insights than a standard VI method. + +The authors argued in the feedback that the central contribution is using augmented VI to do conditioning inference, similar to Rezende at al, but didn't address reviewers' main concerns. I encourage the authors to incorporate the reviewers' comments in a future revision, and explain why this proposed method bring significant contribution to either address a real problem or improve VI methodology.",ICLR2019,4: The area chair is confident but not absolutely certain +IVR8CuXNN_c,1610040000000.0,1610470000000.0,1,dKg5D1Z1Lm,dKg5D1Z1Lm,Final Decision,Accept (Poster),"The paper introduces new tighter non-asymptotic confidence intervals for off-policy evaluation, and all reviewers generally liked the results. I recommend acceptance of this paper. Some concerns of Reviewer2 and Reviewer3 are not fully addressed in your rebuttal. Please make sure to address all remaining issues.",ICLR2021, +tntFUOLy5BK,1642700000000.0,1642700000000.0,1,sWbXSWzHPa,sWbXSWzHPa,Paper Decision,Reject,"This paper introduces a novel method for learning distributional robust machine learning models when only partial group labels are available to improve performance of learning algorithms on minority groups. + +Pros: The paper is well motivated and written. The ideas are interesting. Most work on distributional robust optimization (DRO) are in unsupervised settings where group information is not available. They provide an approach for the semi-supervised setting through a constraint set. + +Cons: +The empirical results do not show better performance over unsupervised baselines as pointed out by reviewers. + +The authors claim one of the benefits of their proposed approach is a one-stage approach, in contrast to competing models that require a two-stage approach; hence, allowing their approach to reduce compute time. It’ll be helpful to strengthen this point by showing time comparisons. + +Missing labels in this case due to participants withholding sensitive information is not an MCAR case, but the proposed work makes an MCAR assumption. It’ll help to add a discussion and point out such limitations of the approach. + +Summary: This paper has novel and interesting ideas, but still has several issues as pointed out by the reviewers before it is ready for publication.",ICLR2022, +h687MN0A4Pxs,1642700000000.0,1642700000000.0,1,XJFGyJEBLuz,XJFGyJEBLuz,Paper Decision,Reject,"The authors are strongly encouraged to elaborate further about the novelty of their method, as well as to give detailed (either theoretical or experimental) justifications for the design choices they make within the paper. Finally, the paper could benefit from additional experiments, as outlined in the reviews.",ICLR2022, +ryg-jRIleN,1544740000000.0,1545350000000.0,1,ryx3_iAcY7,ryx3_iAcY7,"Some clarity issues, and improvements underwhelming",Reject,"This paper proposes to improve MT with a specialized encoder component that models roles. It shows some improvements in low-resource scenarios. + +Overall, reviewers felt there were two issues with the paper: clarity of description of the contribution, and also the fact that the method itself was not seeing large empirical gains. On top of this, the method adds some additional complexity on top of the original model. + +Given that no reviewer was strongly in favor of the paper, I am not going to recommend acceptance at this time.",ICLR2019,4: The area chair is confident but not absolutely certain +SkhZpMLux,1486400000000.0,1486400000000.0,1,HyY4Owjll,HyY4Owjll,ICLR committee final decision,Reject,"The idea of boosting has recently seen a revival, and the ideas presented here are stimulating. After discussion, the reviewers agreed that the latest updates and clarifications have improved the paper, but overall they still felt that the paper is not quite ready, especially in making the case for when taking this approach is desirable, which was the common thread of concern for everyone. For this reason, this paper is not yet ready for acceptance at this year's conference.",ICLR2017, +BkxHPk-Ig4,1545110000000.0,1545350000000.0,1,SJeUAj05tQ,SJeUAj05tQ,Borderline paper: distributed optimization algorithm with analysis,Reject,"The paper provides a distributed optimization method, applicable to decentralized computation while retaining provable guarantees. This was a borderline paper and a difficult decision. + +The proposed algorithm is straightforward (a compliment), showing how adaptive optimization algorithms can still be coordinated in a distributed fashion. The theoretical analysis is interesting, but additional assumptions about the mixing are needed to reach clear conclusions: for example, additional assumptions are required to demonstrate potential advantages over non-distributed adaptive optimization algorithms. + +The initial version of the paper was unfortunately sloppy, with numerous typographical errors. More importantly, some key relevant literature was not cited: +- Duchi, John C., Alekh Agarwal, and Martin J. Wainwright. ""Dual averaging for distributed optimization: Convergence analysis and network scaling."" IEEE Transactions on Automatic control 57.3 (2012): 592-606. +In addition to citing this work, this and the other related works need to be discussed in relation to the proposed approach earlier in the paper, as suggested by Reviewer 3. + +There was disagreement between the reviewers in the assessment of this paper. Generally the dissenting reviewer produced the highest quality assessment. This paper is on the borderline, however given the criticisms raised it would benefit from additional theoretical strengthening, improved experimental reporting, and better framing with respect to the existing literature.",ICLR2019,4: The area chair is confident but not absolutely certain +odZHvkfmx,1576800000000.0,1576800000000.0,1,Syejj0NYvr,Syejj0NYvr,Paper Decision,Reject,"Reviewers agree that the proposed method is interesting and achieves impressive results. Clarifications were needed in terms of motivating and situating the work. Thee rebuttal helped, but unfortunately not enough to push the paper above the threshold. We encourage the authors to further improve the presentation of their method and take into accounts the comments in future revisions.",ICLR2020, +m8rr6uKT4p5,1642700000000.0,1642700000000.0,1,7zFokR7k_86,7zFokR7k_86,Paper Decision,Reject,"The paper describes a system for learning rules in a quasi-NL format: roughly Horn clauses where a predicate p(X1,...,Xk) is replaced by a natural language pattern interleaving ground tokens and variables. The method is to propose ground sentences - using one of several task-specific approaches - use anti-unification of pairs to variabilize, and then find a minimal theory from these proposed pairs by reduction to maxsat. + +Pros: + - QNL is a neat idea, and makes symbolic rule-learning possible to some NLP tasks + - The use of maxsat is novel in rule-learning AFAIK + +Cons: + - Unification is a highly simplified model of the NL task of cross-document co-reference + - It's unclear if maxsat process will work in the presence of noise, or how well it scales + - The datasets use clean text generated from templates or synthetic grammars + - Experimentally, the generality of the system is not well demonstrated, because there are differences in how it is applied: eg a subset of short examples for scan, input engineering ($TRUE, $FALSE) for RuleTaker, plus the ""heuristics for filtering invalid rules generated by anti-unification” + - It's not clear if this work really speaks to the main ""point"" of the SCAN and RuleTaker datasets. These are both the kind tasks that symbolic systems would be expected to do well, and are used as ANN benchmarks because ANNs perform in unexpected ways: worse than one would expect for SCAN, and better for RuleTaker. They are important for understanding ANNs but I'm not certain what the research benefit is of using them for symbolic methods as a benchmark.",ICLR2022, +DO0dqEX1E,1576800000000.0,1576800000000.0,1,BylDrRNKvH,BylDrRNKvH,Paper Decision,Reject,"This paper aims to theoretically understand the the benefit of attention mechanisms. The reviewers agreed that better understanding of attention mechanisms is an important direction. However, the paper studies a weaker form of attention which does not correspond well to the attention models using in the literature. The paper should better motivate why the theoretical results for this restrained model would carry over to more realistic mechanisms.",ICLR2020, +S1xSxgLEx4,1545000000000.0,1545350000000.0,1,BJgnmhA5KQ,BJgnmhA5KQ,writing needs to be improved / contribution limited,Reject,"+ a simple method ++ producing diverse translation is an important problem + +- technical contribution is limited / work is incremental +- R1 finds writing not precise and claims not supported, also discussion of related work is considered weak by R3 +- claims of modeling uncertainty are not well supported + + +There is no consensus among reviewers. R4 provides detailed arguments why (at the very least) certain aspects of presentations are misleading (e.g., claiming that a uniform prior promotes diversity). R1 is also negative, his main concerns are limited contribution and he also questions the task (from their perspective producing diverse translation is not a valid task; I would disagree with this). R2 likes the paper and believes it is interesting, simple to use and the paper should be accepted. R3 is more lukewarm. + +",ICLR2019,4: The area chair is confident but not absolutely certain +SJe4OiwmlV,1544940000000.0,1545350000000.0,1,B1l9qsA5KQ,B1l9qsA5KQ,Metareview,Reject,"The manuscript describes a novel technique predicting metal fatigue based on EEG measurements. The work is motivated by an application to driving safety. Reviewers and the AC agreed that the main motivation for the proposed work, and perhaps the results, are likely to be of interest to the applied BCI community. + +The reviewers and ACs noted weakness in the original submission related to the clarity of the presentation and breadth of empirical evaluation. In particular, only a few baselines were considered. As a result, for the non-expert, it is also unclear if the proposed methods are compared against the state of the art. There was also a particular concern that this work may not be a good fit for an ICLR audience. +",ICLR2019,4: The area chair is confident but not absolutely certain +__jo5mffx6,1576800000000.0,1576800000000.0,1,Syx4wnEtvH,Syx4wnEtvH,Paper Decision,Accept (Poster),"This paper presents a range of methods for over-coming the challenges of large-batch training with transformer models. While one reviewer still questions the utility of training with such large numbers of devices, there is certainly a segment of the community that focuses on large-batch training, and the ideas in this paper will hopefully find a range of uses. ",ICLR2020, +m49Ij4iVU,1576800000000.0,1576800000000.0,1,HyleclHKvS,HyleclHKvS,Paper Decision,Reject,"Two reviewers as well as the AC are confused by the paper—perhaps because the readability of it should be improved? It is clear that the page limitation of conferences are problematic, with 7 pages of appendix (not part of the review) the authors may consider another venue to publish. In its current form, the usefulness for the ICLR community seems limited.",ICLR2020, +S6m8XkKPO8H,1642700000000.0,1642700000000.0,1,Rx_nbGdtRQD,Rx_nbGdtRQD,Paper Decision,Reject,"After reading the reviews and the rebuttal I unfortunately feel the paper is not ready to be accepted. + +The reasoning for this decision is as follows: +* the empirical evaluation is somewhat weak in its current format, and even adding experiments going from BlockStacks to MNIST would have improved the results, or potentially other synthetically generated data. Or playing with which relation is used during the transfer phase. Something to give a bit more weight to the empirical section and help it connect better with the theoretical one + * But maybe more importantly (and to some extend this is true for the formalism introduced as well), I think there needs to be a bit more context. After reading the reviews, I went and read the paper, and for example in results provided, it is not clearly explained what is the relationship between the proposed method and some of the baselines. I noticed that the related work section ended up in the appendix, which is fine, to the extent the main text can connect to the literature a bit. But while I agree that the introduction of the method seems good and clear, and this is a hard and important problem that lacks a proper framework and the proposal in the paper is quite interesting. It is also important to understand its relation to other frameworks, and to explain clearly what it tries to fix in other proposal. And to interpret the result, maybe justifying or providing some intuition of why the proposed model performs better. I think this is very crucial particularly for a topic that is still in a growing phase, which makes it harder to judge. +I know in the appendix, the author mention domain adaptation which is also something that jumps in mind when looking at this architecture. However this point is not discussed or mentioned as much in the main paper. + +In current form, while the paper reads well, one is left to trying to understand whether these results are significant. I think the work is definitely very interesting and I hope the authors will resubmit it with modification. I just feel in the current format it will not have the impact it should, because of a preserved weak experimental section and not a clear grounding in the literature, making readers unsure of the significance of the work.",ICLR2022, +H7_03fz3S,1576800000000.0,1576800000000.0,1,S1gEFkrtvH,S1gEFkrtvH,Paper Decision,Reject,"The paper proposes a new way to learn a disentangled representation by embedding the latent representation z into an explicit learnt orthogonal basis M. While the paper proposes an interesting new approach to disentangling, the reviewers agreed that it would benefit from further work in order to be accepted. In particular, after an extensive discussion it was still not clear whether the assumptions of Theorem 1 applied to VAEs, and whether Theorem 1 was necessary at all. In terms of experimental results, the discussions revealed that the method used supervision during training, while the baselines in the paper are all unsupervised. The authors are encouraged to add supervised baselines in the next iteration of the manuscript. For these reasons I recommend rejection.",ICLR2020, +U31nA0NzGk,1610040000000.0,1610470000000.0,1,pW2Q2xLwIMD,pW2Q2xLwIMD,Final Decision,Accept (Poster),The paper considers the problem of learning a new task with few examples by using related tasks which can exploit shared representations for which more data is available. The paper proves a number of interesting (primarily theoretical) results.,ICLR2021, +B1B_8JTHf,1517250000000.0,1517260000000.0,815,SkmM6M_pW,SkmM6M_pW,ICLR 2018 Conference Acceptance Decision,Reject,"Authors do not respond to significant criticism - e.g. lack of a critical reference +Reviewers unanimously reject. ",ICLR2018, +Ts7ejbZzUcP,1610040000000.0,1610470000000.0,1,in2qzBZ-Vwr,in2qzBZ-Vwr,Final Decision,Reject,The reviewers have not supported the acceptance of this paper where the key weakness is that the study of the proposal neglect effect is not sufficient (see the reviews for the details). I agree with the assessment of the reviewers and recommend rejecting the paper in its current form.,ICLR2021, +J-GJlt1Pj6I,1610040000000.0,1610470000000.0,1,Mub9VkGZoZe,Mub9VkGZoZe,Final Decision,Reject,"The paper proposes a method to identify informative latent variables by thresholding based on the conditional generative model. While the exposition of the paper has substantially improved during the discussion period, some major concerns remain after the discussion among the reviewers. In particular, the problem considered in the paper has a very limited scope. Moreover, the evaluation of the methods needs to be improved. The paper could benefit from discussing how it situates in the broader context.",ICLR2021, +xf0aoJQQH,1576800000000.0,1576800000000.0,1,r1lh6C4FDr,r1lh6C4FDr,Paper Decision,Reject,"Main content: Proposes combining flexible activation functions + +Discussion: +reviewer 1: main issue is unfamiliar with stock dataset, and CIFAR dataset has a bad baseline. +reviewer 2: main issue is around baselines and writing. +reviewer 3: main issue is paper does not compare with NAS. + +Recommendation: All 3 reviewers vote reject. Paper can be improved with stronger baselines and experiments. I recommend Reject.",ICLR2020, +H1xW5Lt1pm,1541540000000.0,1545350000000.0,1,rJedV3R5tm,rJedV3R5tm,Good paper; a little incremental,Accept (Poster)," +pros: +- well-written and clear +- good evaluation with convincing ablations +- moderately novel + +cons: +- Reviewers 1 and 3 feel the paper is somewhat incremental over previous work, combining previously proposed ideas. + +(Reviewer 2 originally had concerns about the testing methodology but feels that the paper has improved in revision) +(Reviewer 3 suggests an additional comparison to related work which was addressed in revision) + +I appreciate the authors' revisions and engagement during the discussion period. Overall the paper is good and I'm recommending acceptance.",ICLR2019,3: The area chair is somewhat confident +SJ3_wUah7OH,1610040000000.0,1610470000000.0,1,IlJbTsygaI6,IlJbTsygaI6,Final Decision,Reject,"This paper proposes a method for hierarchical decision making where the intermediate representations between levels of the hierarchy are interpretable. I personally really like this general direction, as did most of the reviewers. Unfortunately, it was felt that, even after discussion, this paper is not ready for publication. To summarize the general spirit of the objection to this paper, all reviewers found that the experimental section was faulty and did not match the claims of the paper. Specifically, criticism here surrounded first, the choice of experimental setting, which was not considered to be the best for testing interpretable hierarchical decision making approaches; and second, the choice of comparison/baselines, which did not give sufficient security that the results produced by the proposed approach were sufficiently impressive. + +I am satisfied that the reviewers considered the paper fairly, gave constructive criticism, and took onboard the author feedback. As a result, I am recommending rejection. I nonetheless think that the high quality feedback provided here will enable the authors to prepare follow-up experiments that may show their method in a more robust positive light, and encourage them to submit to a future conference, once armed with such results.",ICLR2021, +rJxYmHj02Q,1541480000000.0,1545350000000.0,1,ByftGnR9KX,ByftGnR9KX,novel modeling of context for conversational QA,Accept (Poster),"Interesting and novel approach of modeling context (mainly external documents with information about the conversation content) for the conversational question answering task, demonstrating significant improvements on the newly released conversational QA datasets. +The first version of the paper was weaker on motivation and lacked a clearer presentation of the approach as mentioned by the reviewers, but the paper was updated as explained in the responses to the reviewers. +The ablation studies are useful in demonstration of the proposed FLOW approach. +A question still remains after the reviews (this was not raised by the reviewers): How does the approach perform in comparison to the state of the art for the single question and answer tasks? If each question was asked in isolation, would it still be the best? + + +",ICLR2019,4: The area chair is confident but not absolutely certain +#NAME?,1642700000000.0,1642700000000.0,1,fVu3o-YUGQK,fVu3o-YUGQK,Paper Decision,Accept (Poster),"This paper proposes two techniques for improving self-supervised learning with a vision transformer. The first improvement is using a multi-stage ViT, which is very similar to Swin transformer and authors recognized this is not a major contribution. The authors further found that using a multi-stage ViT does not produce discriminative patch representation, thus proposing the second improvement with a region level loss. While both improvements are not particularly novel by themselves, combining both leads to a strong empirical result. However, It does looks like the multi-scale vision transformer is the major improvement as removing the regional loss only leads to less than 1% decrease in performance in most cases. In general this is a good ""engineering"" paper with a practical approach for improving self-supervised learning with vision transformation and obtained strong results, thus it's worthy of publication.",ICLR2022, +J0PaZXHw-L,1610040000000.0,1610470000000.0,1,5Spjp0zDYt,5Spjp0zDYt,Final Decision,Reject,"This paper investigates various pathologies that occur when training VAE models. There was quite a bit of discussion (including ""private"" discussion between the reviewers) about the theory presented. Particular concerns included: For Theorem 1, while the required conditions formalise the setting in which the learned likelihoods are poor, it's unclear whether these particular conditions they are useful in practice or provided deep insight; for Theorem 2, its relevance and importance was not necessarily clear. In general the results in these two theorems are closely related to known challenges (e.g. that using the ELBO to optimise parameters may lead to bias), without necessarily providing as much new insight as one might hope. + +I would note that all the reviewers included positive feedback as to the quality of the experiments, showing the impact of these pathologies on downstream tasks. However, as written much of the paper focuses on the theory — too many of the (very interesting!) figures and experimental results are relegated to the appendix.",ICLR2021, +FhMlUmuMfI9,1642700000000.0,1642700000000.0,1,0ze7XgWcYNV,0ze7XgWcYNV,Paper Decision,Reject,"This paper is on the theme of active reinforcement learning with a human/assistant in the loop. Under partial observability, an agent acts as per an interaction policy that gathers state/goal information from the assistant, while an operational policy assumed pre-learnt in this paper executes low-level actions. The reviews acknowledge the relevance of this topic and that the paper is well structured and coherently presented overall. However, there are unanimous concerns around experimental evaluation being unconvincing, lack of strong baselines and lack of thorough coverage of related work precluding an accurate assessment of claimed contributions. As such, the paper is not in a form that can be accepted at ICLR -- the authors are encouraged to revise their submission as per review feedback.",ICLR2022, +HJlQWqQreN,1545050000000.0,1545350000000.0,1,rJl0r3R9KX,rJl0r3R9KX,meta-review,Accept (Poster),"The paper gives a novel algorithm for transfer learning with label distribution shift with provably guarantees. As the reviewers pointed out, the pros include: 1) a solid and motivated algorithm for a understudied problem 2) the algorithm is implemented empirically and gives good performance. The drawback includes incomplete/unclear comparison with previous work. The authors claimed that the code of the previous work cannot be completed within a reasonable amount of time. The AC decided that the paper could be accepted without such a comparison, but the authors are strongly urged to clarify this point or include the comparison for a smaller dataset in the final revision if possible. ",ICLR2019,5: The area chair is absolutely certain +dvuZs02vGwA,1642700000000.0,1642700000000.0,1,6Tk2noBdvxt,6Tk2noBdvxt,Paper Decision,Accept (Spotlight),"This paper presents an approach to synthesize programmatic policies, utilizing a continuous relaxation of program semantics and a parameterization of the full program derivation tree, to make it possible to learn both the program parameters and program structures jointly using policy gradient without the need to imitate an oracle. The parameterization of the full program derivation tree that can represent all programs up to a certain depth is interesting and novel. In its current form this won’t scale to large programs that require large tree depth, but is a promising first step in this direction. The learned programmatic policies are more structured and interpretable, and also demonstrated competitive performance against other commonly used RL algorithms. During the reviewing process the authors have actively engaged in the interaction with the reviewers and addressed all the concerns, and all reviewers unanimously recommend acceptance.",ICLR2022, +o0akPlh_cMP,1610040000000.0,1610470000000.0,1,YbDGyviJkrL,YbDGyviJkrL,Final Decision,Reject,"The paper introduces a framework for learning dynamical system models from observations consisting of discrete spatio-temporal series. It is composed of two components trained sequentially. A first one learns embedding from observations using a seq2seq approach, where the embeddings are constrained to follow linear dynamics. This is inspired by approximation schemes for Koopman operators. These embeddings are then used as the spatio-temporal series representions and are fed to a transformer trained as an autoregressive predictor. Experiments are performed on different problems using data generated from PDEs through numerical schemes. Comparisons are performed with different ML baselines. + +The paper is well written with experiments on problems with different complexities. The original contributions of the paper are 1) the combination of pretrained embeddings with a transformer auto-regressor, 2) a seq2seq architecture for learning time series representations constrained by linear dynamics. + +On the cons side, the paper original contribution and significance are over claimed. Closely related ideas for learning approximate Koopman operators and observables have already been developed and used in similar contexts. Besides there is no discussion here on the properties or physical interpretability (which is often an argument for Koopman) of the learned representations. Then the baselines are mainly composed of simple regressors (LSTM, conv-LSTM, etc.) and this is not a surprise that they cannot learn dynamics over long term horizons. There is no comparison with dynamical models incorporating numerical integration schemes that could model the temporal dynamics of the system. There is now a large literature on this topic exploiting discrete (ResNet like) or continuous formulations (as started with Neural ODE).",ICLR2021, +kptIfFGCo_7,1642700000000.0,1642700000000.0,1,#NAME?,#NAME?,Paper Decision,Reject,"Summary: This paper studies a contextual bandit problem where the decision-maker must communicate its intended actions (given observations of the contexts) to a controller through a constrained communication channel. The original part of the paper is that the “bandit algorithm” must encode its actions into a compressed version that then serves to the controller. In that sense, the controller must cluster the problems for the decision maker to simplify communication. + +Discussion: Most reviewers appreciated that the paper is well-written and proposes an original problem. The main commonly issue is that of a lack of regret analysis. The authors included an additional theoretical result giving a necessary condition for sublinear regret but the committee would still appreciate a more in-depth study of the performance of the proposed algorithm, given that the condition is satisfied. One possible direction is to connect this work with the literature on ""Clustering of bandits"" (CoB) as raised by reviewer mXFx. The authors claim that this paper is only mildly related but the committee would kindly insist that linear rewards are just a generalization of multi-armed bandits, that there is also a finite state space in CoB (finite population) and it seems possible to reduce the proposed problem to CoB under some assumptions. In that regards, it would be important that a more thorough review and comparison of that line of work is done in the main paper (note also ""Latent Bandits"" and related papers), even though we agree that the proposed approach is different and we appreciate its originality. + +Overall, the paper is borderline, and the committee did not reach a consensus. + +Decision: Reject",ICLR2022, +yXug5Nkav7Y,1610040000000.0,1610470000000.0,1,kXwdjtmMbUr,kXwdjtmMbUr,Final Decision,Reject,"This paper proposes an OOD evaluation framework under three categories: irrelevant input detection, novel class detection, and domain shift detection. As with several reviewers, the AC recognizes the importance and effort to distinguish between different cases of OOD detection, as well as the amount of experimental comparison across several prominent methods in literature (MSP, MC-dropout, cosine similarity, ODIN, Mahalanobis). + +Despite being well-motivated, three knowledgeable reviewers find the paper not ready yet for publication at ICLR. The AC recommends a rejection, given the standing major concerns from the reviewers. The AC is hopeful that the paper can be significantly improved by + +- sufficiently discussing and highlighting the novel insights of the results. +- a more rigorous definition of ""novel"" vs. ""irrelevant"" inputs. There seem to be overlapping definitions between what Hsu et al. considered vs. this paper. In particular, Hsu et al distinguish i) samples of a novel class but in a known domain, called semantic shift (S), and ii) samples of known class in a novel domain, called non-semantic shift (NS), both of which are reconsidered in this paper. Therefore, the novelty of this submission is more precisely to distinguish within the category of semantic shift. The AC agrees that this might deem some more rigorous measurement and definition of ""semantic closeness"". +- The AC also finds the evaluation of domain shift in Section 3.3.2 may be potentially misleading the community, as it falls out of the standard OOD scope. The notion of common corruption is closer to the robustness problem (which is how ML model predictions changes w.r.t some delta changes in the input space). The changes may not be substantial enough to be ""out-of-distribution"". ",ICLR2021, +rJe6ofIug,1486400000000.0,1486400000000.0,1,S1Jhfftgx,S1Jhfftgx,ICLR committee final decision,Reject,"The program committee appreciates the authors' response to the clarifying questions. Unfortunately, all reviewers are leaning against accepting the paper. Authors are encouraged to incorporate reviewer feedback in future iterations of this work.",ICLR2017, +XUGdb0BPaIV,1642700000000.0,1642700000000.0,1,IY6Zt3Qu0cT,IY6Zt3Qu0cT,Paper Decision,Reject,"Reviewers agree that the paper is well-motivated and the proposed method is somewhat interesting and well-experimented. However, reviewers feel that the paper relies on many existing methods and does not appear to be novel enough.",ICLR2022, +WPAg9yy9NFo,1642700000000.0,1642700000000.0,1,ET1UAOYeU42,ET1UAOYeU42,Paper Decision,Reject,"The weaknesses of the paper can briefly be summarised as follows: i) the suggested motivation is not so clear, and in addition the experimental results (by themselves questionable in the way they are obtained) do not support the main claim of the paper that ""...edges are generated by aggregating the node interactions over multiple overlapping node communities, each of which represents a particular type of relation that contributes to the edges via a logical OR mechanism."" In fact, the observed separation among components is not proven to be of the predicted nature. ii) empirical results are obtained using a deprecated experimental protocol. For the field to make real progress, experimental assessments should follow statistically sound protocols. Already published papers that were not following a sound protocol should not be taken as reference for future empirical assessments. +The last point alone is a strong motivation for rejecting the paper.",ICLR2022, +-2oG8guO3X,1576800000000.0,1576800000000.0,1,BkgnhTEtDS,BkgnhTEtDS,Paper Decision,Accept (Poster),"The paper extracts feature interactions in recommender systems and studies the effect of these interactions on the recommendations. While the focus is on recommender systems the authors claim that the ideas can be generalised to other domains also. + +All reviewers found the empirical results and analysis thereof to be very interesting and useful. This paper saw a healthy discussion between the authors and reviewers and all reviewers agreed that this paper makes a useful contribution. I recommend that the authors address all the concerns of the reviewers in the final version of the paper. ",ICLR2020, +-3KZ_6vsXcI,1642700000000.0,1642700000000.0,1,EBn0uInJZWh,EBn0uInJZWh,Paper Decision,Accept (Poster),"**Summary** + +This paper proposes a novel offline model-based meta-RL approach called MerPO. MerPO combines conservative value iteration with proximal RL policy iteration. The proposed method is novel despite having some similarities to approaches like COMBO. The paper compares against it in the experiments. The paper provides both empirical and theoretical justification for the proposed approach. + +**Final Thoughts** + +Overall, I think the authors did a pretty good job at addressing the reviewers' concerns. Overall, I think this is an interesting contribution to the ICLR community. The reviewers were all positive about this paper. For the camera-ready version of the paper, I would recommend the authors to go over the reviewers' concerns again and make sure that those concerns are addressed in the paper too as they did in the rebuttal. Some captions are pretty short; for example, see the captions of figure 6 and figure 7. I would recommend the authors add more description to the captions in the camera-ready version of this paper.",ICLR2022, +1c7lnWJfrB7,1642700000000.0,1642700000000.0,1,qhC8mr2LEKq,qhC8mr2LEKq,Paper Decision,Accept (Poster),"This paper addresses the problem of program synthesis given input/output examples and a domain-specific language using a bottom-up approach. The paper proposes the use of a neural architecture that exploits the search context (all the programs considered so far and their execution results) to decide which program to evaluate next. The model is trained on-policy using beam-aware training and the method is evaluated on string manipulation and inductive logic programming benchmarks. The results show that the proposed method outperforms previous work in terms of the number of programs evaluated and accuracy. + +Overall, the reviewers found the paper to be well-written and the idea proposed to be significantly novel and interesting to be presented at the conference and I agree. Several limitations were pointed out by the reviewers in terms of (i) actual run-time performance, (ii) the incompleteness of the search algorithm and the (iii) reproducibility of the approach. I believe the authors have addressed these points satisfactorily in their comments.",ICLR2022, +HyUpB1pHf,1517250000000.0,1517260000000.0,668,Syl3_2JCZ,Syl3_2JCZ,ICLR 2018 Conference Acceptance Decision,Reject,"This work extends Druckmann and Chklowskii, 2012 and demonstrates some interesting properties of the new model. This would be of interest to a neuroscience audience, but the focus is off for ICLR.",ICLR2018, +ohOsWMYotin,1610040000000.0,1610470000000.0,1,r1d-lFmO-cM,r1d-lFmO-cM,Final Decision,Reject,"This paper has been evaluated by three expert reviewers, two of whom recommended rejection and one acceptance. Two of the three reviews are particularly detailed and thorough. Both point out a few points of conceptual issues that leave the reader confused. These key issues have not been addressed sufficiently in the rebuttal to result in changing the reviewers' assessments. One major concern is lacking novelty of the work as presented, which limits its current utility to the ICLR audience. I recommend a rejection.",ICLR2021, +hLG9_GzO-V,1642700000000.0,1642700000000.0,1,RMv-5wMMrE3,RMv-5wMMrE3,Paper Decision,Reject,"While the problem tackled in this paper is interesting, there is a consensus among reviewers that the writing of the paper does not allow the reader to fully understand the method developed, nor the biological context and results obtained by the method. We encourage the authors to take into account the reviewers' comments to prepare a future improved version of the manuscript.",ICLR2022, +HkjeNyTHG,1517250000000.0,1517260000000.0,282,rkHywl-A-,rkHywl-A-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The AIRL is presented as a scalable inverse reinforcement learning algorithm. A key idea is to produce ""disentangled rewards"", which are invariant to changing dynamics; this is done by having the rewards depend only on the current state. There are some similarities with GAIL and the authors argue that this is effectively a concrete implementation of GAN-GCL that actually works. The results look promising to me and the portability aspect is neat and useful! + +In general, the reviewers found this paper and its results interesting and I think the rebuttal addressed many of the concerns. I am happy that the reproducibility report is positive which helped me put this otherwise potentially borderline paper into the 'accept' bucket.",ICLR2018, +SJVYXJTHf,1517250000000.0,1517260000000.0,184,BkN_r2lR-,BkN_r2lR-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper builds on top of Cycle GAN ideas where the main idea is to jointly optimize the domain-level translation function with an instance-level matching objective. Initially the paper received two negative reviews (4,5) and a positive (7). After the rebuttal and several back and forth between the first reviewer and the authors, the reviewer was finally swayed by the new experiments. While not officially changing the score, the reviewer recommended acceptance. The AC agrees that the paper is interesting and of value to the ICLR audience.",ICLR2018, +H1glVZVlx4,1544730000000.0,1545350000000.0,1,rylV6i09tX,rylV6i09tX,Meta-review,Reject,"This paper studies the relationship between flatness in parameter space and generalization. They show through visualization experiments on MNIST and CIFAR-10 that there is no obvious relationship between the two. However, the reviewers found the motivation for the visualization approach unconvincing and further found significant overlap between the proposed method and that of Ross & Doshi. Thus the paper should improve its framing, experimental insights and relation to prior work before being ready for publication.",ICLR2019,5: The area chair is absolutely certain +nbXMlKvuIr5,1610040000000.0,1610470000000.0,1,sI4SVtktqJ2,sI4SVtktqJ2,Final Decision,Reject,"The paper proposes an improved method for randomized smoothing, reducing computationally complexity compared with some previous works. The authors propose to learn score functions to denoise the randomized image prior to feeding it to a trained classification model. More specifically, two image denoising algorithms based on score estimation are proposed to be applied regardless of noise level/type. + + +Strengths: +- The paper shows strong quantitative results. The gap with white box smoothing is small on cifar, outperforming Salman et al. However according to the authors, the performance advantage could be mainly attributed to (1) the use of better network architecture and (2) the multi-scale training, not the major contribution of a score-based denoiser. +- The denoiser doesn't require access to the pre-trained classifiers. +- The proposed method only requires training of one score network to handle various types of noise type/level, although reviewers have raised concerns about motivation to having a method that only needs one denoiser for multiple noise levels - the computational bottleneck of randomized smoothing is the prediction time rather than training time and using the same score function for multiple noise levels could be suboptimal. + +Weaknesses: +- There are some concerns about the significance of the contribution as well as novelty of the work, as the denoising + pre-trained classifier architecture is already proposed. Specifically, the work can be seen as incremental to [1], although the work uses a score-based image denoiser whereas [1] uses a CNN based image denoiser and this work is more efficient as it requires only one score network, while [1] trained multiple denoisers with respect to each noise levels. +- Reviewers have expressed concerns on the prediction efficiency of score-function based generative / denoising models. The proposed method might exacerbate the weakness of randomized smoothing (i.e., slow prediction), especially in high-dimensions. +-The reviewers are curious to see the benefit of the proposed denoiser over the state-of-the-art Gaussian denoisers (as used in [1]) under Gaussian noise setting. +-Method seems to be effective for low-resolution images only. The gap with white box increases on Imagenet. + +[1]. Salman, Hadi, et al. ""Denoised Smoothing: A Provable Defense for Pretrained Classifiers."" Advances in Neural Information Processing Systems 33 (2020). +",ICLR2021, +r1dkEyaSG,1517250000000.0,1517260000000.0,266,H1BLjgZCb,H1BLjgZCb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),The paper proposes a method to generate adversaries close to the (training) data manifold using GANs rather than arbitrary adversaries. They show the effectiveness of their method in terms of human evaluation and success in fooling a deep network. The reviewers feel that this paper is for the most part well-written and the contribution just about makes the mark.,ICLR2018, +hHelaGh0Xl,1642700000000.0,1642700000000.0,1,EGtUVDm991w,EGtUVDm991w,Paper Decision,Reject,"This submission receives mixed ratings initially. Two reviewers lean negatively while one reviewer is positive. The raised issues include +whether the proposed method can be adapted to other vision transformers, the design choice of pooling strategy, the computational time cost, the similarity to an existing work, and the influence of the proposed method on downstream tasks. In the rebuttal, the authors have addressed several issues such as pooling strategy analysis and time consumption. + +There are still some issues not completely solved. The proposed method introduces K-mean clustering on tokens between different layers. The K-mean clustering is prevalent and the weighted clustering does not make the technical contribution sufficient. Also as a general token pooling operation, the proposed method shall be integrated into various types of vision transformers (e.g., vanilla ViT [a], ConViT [b]), rather than one single DeiT. Besides, the downstream tasks in DeiT are not conducted in the proposed method. + +Overall, the AC feels the proposed method, although interesting, requires a major revision that addresses existing issues. The authors are suggested to further improve the current submission and welcome to submit for the next venue. + +[a]. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. Dosovitskiy et al. ICLR 2021. + +[b]. ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases. Ascoli et al. ICML 2021.",ICLR2022, +Zpxizc-jUm,1576800000000.0,1576800000000.0,1,B1esx6EYvr,B1esx6EYvr,Paper Decision,Accept (Poster),"This paper studies the effectiveness of self-supervised approaches by characterising how much information they can extract from a given dataset of images on a per-layer basis. Based on an empirical evaluation of RotNet, BiGAN, and DeepCluster, the authors argue that the early layers of CNNs can be effectively learned from a single image coupled with strong data augmentation. Secondly, the authors also provide some empirical evidence that supervision might still necessary to learn the deeper layers (even in the presence of millions of images for self-supervision). +Overall, the reviews agree that the paper is well written and timely given the growing popularity of self-supervised methods. Given that most of the issues raised by the reviewers were adequately addressed in the rebuttal, I will recommend acceptance. We ask the authors to include additional experiments requested by the reviewers (they are valuable even if the conclusions are not perfectly aligned with the main message). +",ICLR2020, +xv6BFSJU3cX,1642700000000.0,1642700000000.0,1,TTnjervir3J,TTnjervir3J,Paper Decision,Reject,"This paper proposes a method for finding the action space in reinforcement learning problems, characterizing the search space into dispensable and indispensable actions through a Monte Carlo approximation. + +Reviewers are unanimous that the paper is not fit for publication at this stage. While it tackles an interesting problem and seems to be novel, the presentation leaves much to be desired; this area chair also had a hard time figuring out how the different parts of the paper fit together. Additionally, the use of a unique problem makes it hard to judge the contribution of the algorithm.",ICLR2022, +e347jbzsv91,1610040000000.0,1610470000000.0,1,DigrnXQNMTe,DigrnXQNMTe,Final Decision,Reject,"The focus of the submission is to define divergences on discrete probability measures. Particularly, the authors propose a common generalization of the well-known concept of maximum mean discrepancy and kernel Stein discrepancy. + +As summarized by the reviewers the submission is in a rather preliminary form: +1)The work lacks motivation. +2)Literature review (there are 4 references in total) and numerical illustrations are missing. +3)The submission lacks proper mathematical formulation/rigor. +I highly recommend the authors to not submit similar draft manuscripts in the future.",ICLR2021, +wHGMCMi6O1ij,1642700000000.0,1642700000000.0,1,#NAME?,#NAME?,Paper Decision,Accept (Poster),"This paper investigates TD-based off-policy policy evaluation. This topic is of interest as most SOTA DRL methods are built upon unsound algorithms, whereas more sound variants are difficult to use in practice and have not been widely adopted. This paper introduces a new variant of ETD that addresses the variance issue with the existing algorithm, along with theory characterizing sample efficiency. The paper includes a well done illustrative empirical study to support the theory. The reviewers all scored the paper highly. + +The AC pointed out several minor issues in the presentation that the authors should address for camera ready. In addition the grammar and word usage is rough in some places. Please take time to improve the text.",ICLR2022, +HyeDU-xA1N,1544580000000.0,1545350000000.0,1,ByGOuo0cYm,ByGOuo0cYm,Combination of prior work applied to a new problem statement -- more improvements needed,Reject,"This paper addresses the problem of few shot learning and then domain transfer. The proposed approach consists of combining a known few shot learning model, prototypical nets, together with image to image translation via CycleGAN for domain adaptation. Thus the algorithmic novelty is minor and amounts to combining two techniques to address a different problem statement. In addition, as mentioned by Reviewer 2, though meta learning could be a solution to learn with few examples, the solution being used in this work is not meta learning and so should not be in the title to avoid confusion. + +As this is a new problem statement the authors apply multiple existing works from few shot learning (and now adaptation) to their setting. The proposed approach does outperform prior work, however this is not surprising as the prior work was not designed for this task. Despite improvements during the rebuttal to address clarity the specific experimental setting is still unclear -- especially the setup of meta test data vs unsupervised da data. + +This paper is borderline. However, since the main contribution consists of proposing a new problem statement and suggesting a combination of prior techniques as a first solution, the paper needs a more thorough ablation of other possible combination of techniques as well as a clearly defined experimental setup before it is ready for publication.",ICLR2019,4: The area chair is confident but not absolutely certain +SyPojf8Og,1486400000000.0,1486400000000.0,1,BJ46w6Ule,BJ46w6Ule,ICLR committee final decision,Reject,"This paper is about learning distributed representations. All reviewers agreed that the first draft was not clear enough for acceptance. + + Reviewer time is limited and a paper that needed a complete overhaul after the reviews were written is not going to get the same consideration as a paper that was well-drafted from the beginning. + + It's still the case that it's unclear from the paper how the learning updates or derived. The results are not visually impressive in themselves. It's also still the case that more is needed to demonstrate that this direction is promising compared to other approaches to representation learning.",ICLR2017, +RkmOT5tpRSF,1642700000000.0,1642700000000.0,1,JSR-YDImK95,JSR-YDImK95,Paper Decision,Accept (Spotlight),"This paper provides a novel path auxiliary algorithm for more efficiently exploring discrete state spaces within a Metropolis-Hastings sampler for energy based models. In particular, it essentially replaces the ""single site update"" by instead proposing an entire path using local information, thus enabling the chain to take larger steps, which can improve acceptance/mixing significantly as they demonstrate. The work is a timely contribution that improves upon exciting recent work. After much discussion among several knowledgeable reviewers and clarifications regarding some details of the main theorem from the authors, there is consensus that the contributions are correct, novel, and likely of impact to the machine learning community. Since the revision period, the empirical evaluations have also been improved and the contributions have methodological novelty as well as promising practical performance.",ICLR2022, +aj1h43xJ25,1576800000000.0,1576800000000.0,1,S1emOTNKvS,S1emOTNKvS,Paper Decision,Reject,"This submission proposes a graph sparsification mechanism that can be used when training GNNs. + +Strengths: +-The paper is easy to follow. +-The proposed method is sound and effective. + +Weaknesses: +-The novelty is limited. + +Given the limited novelty and the number of strong submissions to ICLR, this submission, while promising, does not meet the bar for acceptance.",ICLR2020, +S1q3VJ6BM,1517250000000.0,1517260000000.0,442,SyjjD1WRb,SyjjD1WRb,ICLR 2018 Conference Acceptance Decision,Reject,"This method makes a connection between evolutionary and variational methods in a particular model. This is a good contribution, but there has been little effort to position it in comparison to standard methods that do the same thing, showing relative strengths and weaknesses. + +Also, please shorten the abstract.",ICLR2018, +sN1jh6RuNF,1610040000000.0,1610470000000.0,1,xQnvyc6r3LL,xQnvyc6r3LL,Final Decision,Reject,"The paper introduces a GNN approach to solve the problem of source detection in an epidemics. While the paper contains some interesting new ideas, the reviewers raised some important concerns about the paper and so the paper should not be accepted in the current form. In particular, + +- the paper does not motivate the ML approach to the problem +- the experiments are limited for an empirical paper +- the method used in the paper is not very novel +- the proofs presented in the paper are not formal enough +",ICLR2021, +NJgZhWBVafv,1610040000000.0,1610470000000.0,1,Oecm1tBcguW,Oecm1tBcguW,Final Decision,Reject,"The paper addresses the problem of prior selection in Bayesian neural networks by proposing a meta-learning framework based on PAC-Bayesian theory. The authors optimize a PAC bound called PACOH in the space of possible posterior distributions of BNN weights. The method does not rely on nested optimization schemes, instead, they directly minimize PAC bound via a variational algorithm called PACOH-NN which is based on SVGD and the reparameterization trick. The method is evaluated on experiments with both synthetic and real-world data showing improvements in both predictive accuracy and uncertainty estimates. + +Initially many reviewers were positive about the paper. However, it was noticed by one reviewer that the submitted paper presents a very significant overlap with + +Jonas Rothfuss, Vincent Fortuin, and Andreas Krause. PACOH: Bayes-Optimal Meta-Learning with PAC-Guarantees. arXiv, 2020. + +Another reviewer mentioned that they were actually reviewing for AISTATS the above manuscript by Rothfuss et al. The ICLR program chairs were contacted for a possible violation of the dual submission policy for ICLR: + +""Submissions that are identical (or substantially similar) to versions that have been previously published, or accepted for publication, or that have been submitted in parallel to this or other conferences or journals, are not allowed and violate our dual submission policy."" + +The ICLR program chairs decided that the similarities between the two papers are not enough to issue a desk-rejection. However, in the discussion period, three reviewers out of 4 pointed out that, even though the authors did revise Sections 4 and 5 in the current version, these modifications do not seem to be strong enough to make up for the really strong overlaps between the two papers. The reviewers agreed on rejection and stated that this paper should either be merged with the Rothfuss et. al. one (assuming the authors are the same), or its content should be developed to the point of making both of them clearly distinct. ",ICLR2021, +vWeubcuJnD,1576800000000.0,1576800000000.0,1,S1eRya4KDB,S1eRya4KDB,Paper Decision,Reject,"This paper proposes a method to improve word embedding by incorporating sentiment probabilities. Reviewer appreciate the interesting and simple approach and acknowledges improved results in low-frequency words. + +However, reviewers find that the paper is lacking in two major aspects: +1) Writing is unclear, and thus it is difficult to understand and judge the contributions of this research. +2) Perhaps because of 1, it is not convincing that the improvements are significant and directly resulting from the modeling contributions. + +I thank the authors for submitting this work to ICLR, and I hope that the reviewers' comments are helpful in improving this research for future submission.",ICLR2020, +pDUFsHnftuKO,1642700000000.0,1642700000000.0,1,7I8LPkcx8V,7I8LPkcx8V,Paper Decision,Accept (Poster),The reviewers agreed that this is a technically novel and interesting paper with results for a very natural problem and all voted for acceptance. The paper gives more evidence for the wide-ranging compatibility between the goals of sketching and of privacy.,ICLR2022, +uXjOzkqP57_,1642700000000.0,1642700000000.0,1,4-D6CZkRXxI,4-D6CZkRXxI,Paper Decision,Accept (Spotlight),"This paper studies model-based RL in the setting where the model can be misspecified. In this case, MLE of model parameters is a not necessarily a good idea because the error in the model estimate compounds when the model is used for planning. The authors solve this problem by optimizing a novel objective, which takes the quality of the next state prediction into account. + +This paper studies an important problem and this was recognized by all reviewers. Its initial reviews were positive and improved to 8, 8, 6, and 6 after the rebuttal. The rebuttal was comprehensive and exemplary. For instance, one concern of this paper was limited empirical evaluation. The authors added 5 new benchmarks and also included stabilizing improvements in their original baselines. I strongly support acceptance of this paper.",ICLR2022, +Dd67Q5P5gX,1576800000000.0,1576800000000.0,1,HylWahVtwB,HylWahVtwB,Paper Decision,Reject,"This paper proposes a method for neural architecture search in embedding space. This is an interesting idea, but its novelty is limited due to its similarity to the NAO approach. Also, the empirical evaluation is too limited; comparisons should have been performed to NAO and other contemporary NAS methods, such as DARTS. + +Due the factors above, all reviewers gave rejecting scores (3,3,1). The rebuttal did not remove the main issues, resulting in the reviewers sticking to their scores. I therefore recommend rejection.",ICLR2020, +SknjMkpHG,1517250000000.0,1517260000000.0,8,rJTutzbA-,rJTutzbA-,ICLR 2018 Conference Acceptance Decision,Accept (Oral),"The reviewers unanimously recommended that this paper be accepted, as it contains an important theoretical result that there are problems for which heavy-ball momentum cannot outperform SGD. The theory is backed up by solid experimental results, and the writing is clear. While the reviewers were originally concerned that the paper was missing a discussion of some related algorithms (ASVRG and ASDCA) that were handled in discussion. +",ICLR2018, +hATWRss0FWg,1610040000000.0,1610470000000.0,1,P84ryxVG6tR,P84ryxVG6tR,Final Decision,Reject,"The is a borderline paper with the reviewers split in their recommendations. The decision is therefore not easy. + +The work is promising, but a key concern is that the contribution appears incremental: the paper proposes to alternate between kickstarting, which is itself not entirely new as an idea, with a simple instance transfer heuristic. The resulting method is straightforward, which can be considered a strength, but there is no serious technical justification beyond intuitive motivation. Rather than present technical analysis, the paper focuses more on intuitively delivering the proposal then evaluating it. This would be acceptable if the empirical outcomes were undeniably impressive, but the outcomes, though positive, are not overwhelming. The experimental evaluation is limited in scope, considering only the simplest MuJoCo environments and a benchmark racing simulator. + +The authors responded to some of the criticisms forcefully, and were comprehensive in their rebuttal, but if the support for the proposed method is to be entirely intuitive and empirical, one would have expected a more comprehensive evaluation where transfer was used to solve more impressively difficult problems. Overall, I think this work would be better served by adding a technical analysis that validates the significance of the instance transfer heuristic, combined with a broader empirical study that tackles more challenging domains. +",ICLR2021, +aF5RLPB4ku-,1610040000000.0,1610470000000.0,1,xCm8kiWRiBT,xCm8kiWRiBT,Final Decision,Reject,"This paper was referred to the ICLR 2021 Ethics Review Committee based on concerns about a potential violation of the ICLR 2021 Code of Ethics (https://iclr.cc/public/CodeOfEthics) raised by reviewers. The paper was carefully reviewed by two committee members, who provided a binding decision. The decision is ""Significant concerns (Do not publish)"". Details are provided in the Ethics Meta Review. As a result, the paper is rejected based Ethics Review Committee's decision . + +The technical review and meta reviewing process moved proceeded independently of the ethics review. The result is as follows: + +This paper considers sparse (L0) attacks against binary images analysis systems, in particular OCR. The major concern of the reviewers seems to be similarity to other methods in the literature, but reviewers did not specify any specific methods to compare to. Because it was not possible for reviewers to address such vague concerns, and because I believe the authors did a good job differentiating their work in the rebuttal, I think the paper is of good merit. ",ICLR2021, +S1clpGU_g,1486400000000.0,1486400000000.0,1,SJ_QCYqle,SJ_QCYqle,ICLR committee final decision,Reject,"The paper addresses the little explored domain (at least from computer vision perspective) of analyzing weather data. All reviewers found the application interesting, yet felt that the technical side was somewhat standard. On the positive note, the authors try to use semi-supervised techniques due to limited labeled data, which is less explored in the object detection domain. The also released code, which is a plus. Overall, the application is nice, however, none of the reviewers was swayed by the paper. I suggest that the authors re-submit their work to a vision conference, where applications like this may be more enthusiastically received.",ICLR2017, +D8mPUbjdqnL,1642700000000.0,1642700000000.0,1,MljXVdp4A3N,MljXVdp4A3N,Paper Decision,Accept (Poster),"This paper studies the problem of how to train an agent to understand relationships and dependencies among available (and potentially changing) actions in an RL environment to more efficiently solve a task. For instance, in the absence of a hammer for the task of putting up a painting on a wall, the agent could use an alternative tool like adhesive strips if available. The paper's main technical contribution is to use train a graph attention network to learn action space relationships under a given action representation. The paper demonstrates the effectiveness of this strategy on a range of environment benchmarks. + +The reviewers initially brought up several lacunae in their assessment of the paper. These included the opaqueness in the explanation of the graph network structure, incremental nature of the improvement over the paper of Jain et al 2020, the lack of clear ablation studies and their message, comparisons with baselines drawn from other existing approaches potentially relevant to the setting, and the role of hyperparameters and their tuning. + +In response, the author(s) provided detailed clarifications and additional experimental results. Namely, they clarified the details of the graph attention network, added ablation studies to help understand the role of this component, discussed the relevant and (in)applicability of other existing work, and supplied details about hyperparameter tuning. The author response was adequate to convince the reviewers to arrive at a consensus reflecting the positive impression of the paper. + +In view of the unanimous opinion of the reviwers, I recommend acceptance of the paper.",ICLR2022, +BKg3yceJiQ,1576800000000.0,1576800000000.0,1,SylwBpNKDr,SylwBpNKDr,Paper Decision,Reject,"This paper considers how to learn the structure of deep network by beginning with a simple network and then progressively adding layers and filters as needed. The paper received three reviews by expert working in this area. R1 recommends Weak Reject due to concerns about novelty, degree of contribution, clarity of technical exposition, and experiments. R2 recommends Weak Accept and has some specific suggestions and questions. R3 recommends Weak Reject, also citing concerns with experiments and writing. The authors submitted a response that addressed many of these comments, but R1 and R3 continue to have concerns about contribution and the experiments, while R2 maintains their Weak Accept rating. Given the split decision, the AC also read the paper. While we believe the paper has significant merit, we agree with R1 and R3 on the need for additional experimentation, and believe another round of peer review would help clarify the writing and contribution. We hope the reviewer comments will hep authors prepare a revision for a future venue.",ICLR2020, +jfZlT8lP5L,1576800000000.0,1576800000000.0,1,HygTUxHKwH,HygTUxHKwH,Paper Decision,Reject,"This paper proposes a method to reduce the instability issues of off-policy deep reinforcement learning. The proposed solution constructs a simple MDP from the experience in the agent's replay memory. This graph is used to compute a lower bound for the values from the original problem. Incorporating this bound can make the learning system less prone to soft divergence. + +The reviewers appreciated the motivation of the paper and the direction of this research. However, the reviewers were not convinced that the formulation was sufficiently complete. There were concerns that the method makes additional assumptions about the data distribution (the presence of state aggregation and the absence of repeated states in continuous spaces). Reviewers found related work was missing. The reviewers also found multiple aspects of the presentation unclear even after the author response. + +This paper is not ready for publication as the generality of the proposed method was not sufficiently clear to the reviewers after the author response.",ICLR2020, +5Uy4TE8fvNo,1610040000000.0,1610470000000.0,1,MJAqnaC2vO1,MJAqnaC2vO1,Final Decision,Accept (Poster),"Auto Seg-Loss uses a differentiable surrogate parameterized loss function that approximates using RL some of the non-differentiable metrics for segmentation. Auto Seg-Loss outperform cross-entropy and other loss functions through a great number of experiments. The main concerns rised by the reviewers (More clarity on the abstract and intro, extending the related work, and performance experiments) has been addressed. Accordingly I recommend the paper to be accepted at ICLR 2021.",ICLR2021, +r1l89Hh-eN,1544830000000.0,1545350000000.0,1,ryG8UsR5t7,ryG8UsR5t7,Paper decision,Reject,"Reviewers are in a consensus and recommended to reject after engaging with the authors. Please take reviewers' comments into consideration to improve your submission should you decide to resubmit. +",ICLR2019,4: The area chair is confident but not absolutely certain +iku9Y90Dba,1642700000000.0,1642700000000.0,1,bglU8l_Pq8Q,bglU8l_Pq8Q,Paper Decision,Reject,"Strengths: +* Well-written paper +*Theoretical analysis demonstrates that dual encoder models have similar capacity as CA models +*New distillation algorithm for learning DE students from CA teachers + +Weaknesses: +* No reviewer seems particularly excited about this work +* Theoretical analysis doesn’t provide actionable insight -- it does not directly motivate the suggested distillation methods +* Empirical results are lacking -- reviewers asked for qualitative examples of improvements from their distillation method",ICLR2022, +r1B481aSM,1517250000000.0,1517260000000.0,759,ByJbJwxCW,ByJbJwxCW,ICLR 2018 Conference Acceptance Decision,Reject,"This paper presents a MIL method for medical time series data. General consensus among reviewers that work does not meet criteria for being accepted. + +Specifically: + +Pros: +- A variety of meta-learning parameters are evaluated for the task at hand. +- Minor novelty of the proposed method + +Cons: +- Minor novelty of the proposed method +- Rationale behind architectural design +- Thoroughness of experimentation +- Suboptimal choice of baseline methods +- Lack of broad evaluation across applications for new design +- Small dataset size +- Significance of improvement +",ICLR2018, +gNsdl7bMrOm,1610040000000.0,1610470000000.0,1,3tFAs5E-Pe,3tFAs5E-Pe,Final Decision,Accept (Poster),"The authors propose the 2-Wasserstein barycenter problem between measures. The authors propose a novel formulation that leverages a condition (congruence) that the optimal transport (Monge) maps, here parameterized as potentials, must obey at optimality. The introduce various regularizers to encourage that property. The idea is demonstrated on convincing synthetic experiments and on a simple color transfer problem. Although experiments are a bit limited, I do believe, and follow here the opinion of all reviewers, that there is novelty in this approach, and that this paper is a worthy addition to the recent line of work trying to leverage ICNNs/Brenier's theorem to solve OT problems.",ICLR2021, +4V7btI2hOzh,1642700000000.0,1642700000000.0,1,an_ndI09oVZ,an_ndI09oVZ,Paper Decision,Reject,"The paper develops kernel functions in Banach spaces. However the results seem to be preliminary and further development is needed before +the manuscript can be published. Reviewers point out several errors and also author/authors have graciuously agree with the suggestion +that they will incorporate all the feedback in future submissions.",ICLR2022, +SkgXsgiT1E,1544560000000.0,1545350000000.0,1,SyVpB2RqFX,SyVpB2RqFX,A principled modeling framework hindered by inadequate experiments and confusion notation,Reject,"The paper proposes a principled modeling framework to train a stochastic auto-encoder that is regularized with mutual information maximization. For unsupervised learning, this auto-encoder produces a hybrid continuous-discrete latent representation. While the authors' response and revision have partially addressed some of the raised concerns on the technical analyses, the experimental evaluations presented in the paper do not appear adequate to justify the advantages of the proposed method over previously proposed ones, and the clarity (in particular, notation) needs further improvement. The proposed framework and techniques are potentially of interest to the machine learning community, but the paper of its current form fells below the acceptance bar. The authors are encouraged to improve the clarify of the paper and provide more convincing experiments (e.g., on high-dimensional datasets beyond MNIST).",ICLR2019,5: The area chair is absolutely certain +k2F6zynkK6,1576800000000.0,1576800000000.0,1,BJxwPJHFwS,BJxwPJHFwS,Paper Decision,Accept (Poster),"A robustness verification method for transformers is presented. While robustness verification has previously been attempted for other types of neural networks, this is the first method for transformers. + +Reviewers are generally happy with the work done, but there were complaints about not comparing with and citing previous work, and only analyzing a simple one-layer version of transformers. The authors convincingly respond to these complaints. + +I think that the paper can be accepted, given that the reviewers' complaints have been addressed and the paper seems to be sufficiently novel and have practical importance for understanding transformers.",ICLR2020, +z_hp2ZEdgy0,1642700000000.0,1642700000000.0,1,9wOQOgNe-w,9wOQOgNe-w,Paper Decision,Accept (Poster),"The authors proposed an algorithm for sampling DAGs that is suited for continuous optimization. The sampling algorithm has two main steps: In the first step, a causal order over the variables is selected. In the second step, edges are sampled based on the selected order. Moreover, based on this algorithm, they proposed a method in order to learn the causal structure from the observational data. The causal structure learning algorithm is guaranteed to output a DAG at any time and it is not required any pre- or post-processing unlike previous work. + +There were concerns by two reviewers on the slight lack of novelty (""the proposed method of this paper is only a combination of well-developed techniques"") but I believe the proposed method is still worthwhile. In addition, the paper is overall well written and its experiment evaluation is thorough. It will be a nice addition to the field of differentiable causal discovery. + +My recommendation is to accept the paper as a poster.",ICLR2022, +NHrd9W6gcid,1642700000000.0,1642700000000.0,1,qEGBB9YB31,qEGBB9YB31,Paper Decision,Reject,"For this paper initially the reviews were 6,8,5,5. All the reviewers have provided constructive and substantial feedback. The authors have incorporated changes to address some of these comments and some of the comments could not be addressed. The main criticism of the reviewers have been that the Reviewer tkQp finds two clear limitations in the paper, Reviewer 3o7Z finds that the proposed idea is similar to the parameter-space adversarial attacks and Reviewer sCeW questions the generalisability of the method to other tasks. After the rebuttal the reviewers have reached the consensus that the paper may not be above the acceptance threshold (final scores: 6,6,5,5). Following the reviewers' recommendations, the meta reviewer recommends rejection.",ICLR2022, +rke_iSHHlE,1545060000000.0,1545350000000.0,1,rJlRKjActQ,rJlRKjActQ,Needs improvement.,Reject,"The paper contains useful information and shows relative improvements compared to mixup. However, some of the main claims are not substantiated enough to be fully convincing. For example, the claims that manifold mixup can prevent can manifold collision issue where the interpolation between two samples collides with a sample from other class is incorrect. The authors are encouraged to incorporate remarks of the reviewers.",ICLR2019,4: The area chair is confident but not absolutely certain +SJNZnMIdx,1486400000000.0,1486400000000.0,1,SJzCSf9xg,SJzCSf9xg,ICLR committee final decision,Accept (Poster),"The paper explores the automatic detection of adversarial examples by training a classifier to recognize them. This is an interesting direction, even though they are obviously concerns about training an adversary to circumvent this model. Nonetheless, the experimental results presented in the paper are of interest to the ICLR audience. Many of the initial reviewer comments appear to be appropriately addressed in the revision of the paper.",ICLR2017, +HkPkUyTSM,1517250000000.0,1517260000000.0,697,HkwrqtlR-,HkwrqtlR-,ICLR 2018 Conference Acceptance Decision,Reject,"As the reviewers said, it is unclear what the main contribution of the paper is.",ICLR2018, +ys32GDilqk,1642700000000.0,1642700000000.0,1,30SXt3-vvnM,30SXt3-vvnM,Paper Decision,Reject,"The paper proposes the replacement of the softmax layer in a neural network with one parametrized by a kernel. The kernel itself is learned during training from the space of radial basis kernels. The resulting models are compared against identical networks with softmax, linear kernels, second order pooling and kervolutions on several datasets, encompassing vision and NLP tasks. + +First, the reviewers raised questions about the novelty of the work. Theorem 4.3, based on which the method is derived, has existed in the literature and seems to be related to the uniqueness of the power series expansions for kernels. There is novelty in using this theoretical result to write an approximation of a positive definite kernel in a way which can be learned. Specifically, it is written as a finite weighted sum of existing kernels, where the coefficients are learned. Reviewer pWF3 posed a valid question about the quality of the approximation, to which the authors responded with an equally valid, and comprehensive, appendix on the error bounds of the approximation. Still, it is worth tempering the statement that the search is 'exhaustive' over the space of radial kernels or that the kernel is optimal (instead, the search appears over a large class of radial kernels, and the kernel is approximately optimal with an extremely low distance from the actual optimum). +Along the same lines of rephrasing claims, reviewer WDU4 also pointed out several statements and claims which were not entirely accurate, which the authors then proceeded to resolve, resulting in notable changes from the initial version of the paper. Specifically, there was mention of a ""non-parametric kernelized classifier"". This has been fixed, but it did seem to have initially confused other reviewers, who suggested related work that, it turns out, are not necessarily suitable contenders. The changes made definitely improved the paper, and resolved most of the reviewer's concerns. +Nevertheless, the appendix added comparing the method to non-parametric models could be improved. For instance the authors stated ""Wilson et al. use Gaussian RBF and spectral mixture kernels. Our method has the capability to automatically learn any positive definite radial kernel. Note that Gaussian RBF and spectral kernels are all radial kernels."" - is there any intuition, or proof, of a case when the method introduced here learns a network + classifier that the method by Wilson et al. cannot learn? Or for which deep kernel learning requires considerably more resources? (DKL has been optimized and made considerably faster since the initial paper in 2016). https://proceedings.neurips.cc/paper/2016/hash/bcc0d400288793e8bdcd7c19a8ac0c2b-Abstract.html +Also, while the present work is backed by 4.3, DKL also has a theoretical grounding. +https://www.jmlr.org/papers/volume20/17-621/17-621.pdf + +There was some discussion on the exhaustiveness of the experiments, and it was concluded that the datasets are sufficient, while the reviewers were not in agreement as to whether the authors considered sufficient contenders. A comparison against DKL, at least, appears to be warranted. + +Overall, the paper brings a contribution in terms of improving the performance of backbones with limited expressiveness through the use of a kernel-parametrized classifier, learned by optimizing an approximation of a formulation that spans the entire space of radial basis kernels. The paper was updated considerably during the reviewer process, to its betterment, however, an experimental comparison against deep learning with non-parametric kernelized classifiers is still missing.",ICLR2022, +efOcSuaWwQB,1642700000000.0,1642700000000.0,1,mk0HzdqY7i1,mk0HzdqY7i1,Paper Decision,Accept (Poster),"I would like to thank the authors for having managed a thorough discussion despite the complexity of the task at hand (e.g. BEvM). during discussion, the reviewers clearly converged to accepting the paper, praising the importance of the problem tackled and the setup put in place to effectively tackle the challenge at hand. + +All this makes the paper an important contribution and a clear accept (and an enjoyable read), for which I can only recommend a further polish before camera ready to follow the latest inclusions. + +AC.",ICLR2022, +cTgYVhD63K7,1610040000000.0,1610470000000.0,1,t4hNn7IvNZX,t4hNn7IvNZX,Final Decision,Reject,"The authors present a framework for deriving distributional robustness certificates for smoothed classifiers under perturbations of the input distribution bounded under the Wasserstein metric. + +Several authors raised concerns regarding the correctness of results presented in the initial version of the paper. While these were addressed during the rebuttal, the reviewers remain concerned about the novelty of the work relative to prior work, in particular the following papers: +https://arxiv.org/abs/1908.08729 +https://arxiv.org/pdf/2002.04197.pdf +https://doi.org/10.1287/moor.2018.0936 +and the author responses during the rebuttal did not sufficiently address these concerns. + +Hence, I recommend rejection. +",ICLR2021, +LKIFZTFC7,1576800000000.0,1576800000000.0,1,Hklo5RNtwS,Hklo5RNtwS,Paper Decision,Reject,"The authors introduce the idea of using Wasserstein distances over latent ""behavioral spaces"" to measure the similarity between two polices, for use in RL algorithms. Depending on the choice of behavioral embedding, this method produces different regularizers for policy optimization, in some cases recovering known algorithms such as TRPO. This approach generalizes ideas of similarity used in many common algorithms like TRPO, making these ideas widely applicable to many policy optimization approaches. The reviewers all agree that the core idea is interesting and would likely be useful to the community. However, a primary concern that was not sufficiently resolved during the rebuttal period was the experimental evaluation -- both the ability of the experiments to be replicated, as well as whether they provide sufficient insight into how/why the algorithm performs. Thus, I recommend rejection of this paper at this time.",ICLR2020, +Vh_QQ3HsUQs,1610040000000.0,1610470000000.0,1,ehJqJQk9cw,ehJqJQk9cw,Final Decision,Accept (Poster),"The paper proposes a personalized federated learning method, which personalizes by computing a weighted combination of neighboring compatible models. Reviewers uniformly liked the quality of writing and level of novelty, and agree on the relevance of the problem and solution. The solution was deemed creative and particularly impactful in the important case of heterogeneous data on each node, and experiments showed convincing improvements. The discussion between reviewers and authors was constructive and has lead to further improvements of the paper. Slight concerns remained on privacy with all models stored on the server, and breath of personalized FL benchmarks used, but reviewers agreed the contributions overall are still significant enough. Future work remains on the theory of the proposed model.",ICLR2021, +stwBeEViugK,1610040000000.0,1610470000000.0,1,y2I4gyAGlCB,y2I4gyAGlCB,Final Decision,Reject,"This paper proposes a method for tool synthesis by jointly training a generative model over meshes and a task success predictor. Gradient-based planning is then used to find a latent space tool representation which maximizes task success, given a starting tool and an input scene. The results indicate that this method can successfully generate simple tools, and that it performs better than either a random baseline or a version where the generative model and success predictor are trained independently. + +The reviewers unanimously felt that this paper was not quite ready for publication at ICLR. While I'm a strong believer that unique and creative papers which tackle understudied problems---such as this one---ought to be encouraged, and that the authors' rebuttal satisfactorily addressed most of the reviewers' concerns, there was one major point that was not. In particular, all reviewers noted that the paper lacks comparison to convincing baselines and/or sufficiently extensive experiments. While I do not think baselines are necessary per se, especially in such a unconventional setting such as this, I believe what the reviewers are getting at (and I agree) is that the results as presented don't really help the reader understand the contours of the method and/or problem space, and as a result, the contributions of the paper feel thin. For example, here are some questions that the reviewers raised, which I do not feel were adequately addressed: + +- R3: What are the failure cases of the model? +- R2: How important is the particular representation of the task and tool (i.e., visual for the task, meshes for the tool)? +- R4: How do the imagined tool trajectories compare between the task-aware and task-unaware cases? +- R4: Is the success classifier trained to the same level of performance in both task-aware and task-unaware settings? (In general, it would be helpful to include learning curves in the appendix.) +- R1: How important is the choice of the particular planning/optimization method (i.e. gradient descent)? +- R1: What is the generalization performance of the model along affordance directions (e.g. needing to synthesize longer/shorter tools than seen during training)? + +Taken individually, such questions might not be an issue, but together they illustrate a larger concern that the paper has not done a thorough enough job of analyzing and evaluating the proposed method. Therefore, at this stage I recommend rejection. I think that by fleshing the paper out with some answers to the above questions, this could make an excellent submission to a future conference.",ICLR2021, +hcNaFbikf5z,1642700000000.0,1642700000000.0,1,qfLJBJf_DnH,qfLJBJf_DnH,Paper Decision,Reject,"This manuscript presents a method to allow RNNs to chain together sequences of behaviors. Reviewers had numerous concerns but the most important is that the problem posed here is solved by a simple method: resetting the state of the RNN before processing a motif. + +Overall, reviewers noted a few key topics, although this list is not exhaustive: +1. Experiments are in a very simple but confusing setting. +2. Even though alternatives exist to solving this problem, they are not considered. +3. The networks considered are very simple. +4. The manuscript is difficult to understand. +5. The task admits a trivial solution. + +In more detail: + +1. The setting of learning to memorize time series and outputting them on command is very simple compared to what most modern work considers. Moreover, there is much confusion in the manuscript about what the setting is precisely. For example, the setting is described as ""independently learn motor motifs in order to build a continuously expandable motif library"". But there is no continuously expandable motif library, the motif library is fixed at test time. The authors focus heavily on calling this setting ""motor motifs"", but these are RNNs that output an arbitrary time series. They are in no sense motor programs and this work is not connected to the extensive literature on motor control in machine learning. More broadly, there is no clear mathematical definition of what the problem being solved is anywhere in the manuscript. + +2. It is unusual for manuscripts to not present other baseline models. But more importantly, many other approaches exist to this problem. As one reviewer pointed out, the manuscript essentially sets out to solve a problem that is completely solved in machine learning today. It rejects the solutions that exist for arbitrary reasons, and then adopts its own new solution. + +3. The models used are very simple, but this is a consequence of 1, the problem domain being very simple. + +4. Reviewers had difficulty understanding the details of the task. In particular, the task description section begins with minutia about the implementation rather than succinct definition of the task. + +5. Most critically, reviewers identified that the model could be hard reset and would have the same behavior as the model presented in the manuscript. The proposed solution is essentially hard resetting the state to zero as it stands. There is no reason why a hard reset cannot be followed by a smoothing operation -- this seems to be the main objection of the authors. + +Overall, the manuscript needs significant improvements. The task considered is too simple by modern ML standards and the fact that it admits a simple solution cannot be overlooked. Demonstrating the idea of the preparatory module on an existing ML task and dataset, while comparing with existing baseline models, carrying out ablations, and producing an extensive quantitative evaluation is what will get the community excited about preparatory modules.",ICLR2022, +Sygr5GZ7g4,1544910000000.0,1545350000000.0,1,HJei-2RcK7,HJei-2RcK7,"Reviewers agree that work is interesting, but reviews are borderline",Reject,"The reviewers all agree that the work is interesting, but none have stood out and championed the paper as exceptional. The reviewers note that the paper is well-written, contributes a methodological innovation, and provides compelling experiments. However, given the reviewers' positive but unenthusiastic scores, and after discussion with PCs, this paper does not meet the bar for acceptance into ICLR.",ICLR2019,4: The area chair is confident but not absolutely certain +rkxiz3OglN,1544750000000.0,1545350000000.0,1,r1lq1hRqYQ,r1lq1hRqYQ,meta-review,Accept (Poster),"This paper generated a lot of discussion (not all of it visible to the authors or the public). + +R1 initially requested reasonable comparisons, but after the authors provided a response (and new results), R1 continued to recommend rejecting the paper simply because they personally did not find the manuscript insightful. Despite several requests for clarification, we could not converge on a specific problem with the manuscript. Ungrounded gut feelings are not grounds for rejection. + +After an extensive discussion, R2 and R3 both recommend accepting the paper and the AC agrees. Paper makes interesting contributions and will be a welcome addition to the literature. ",ICLR2019,4: The area chair is confident but not absolutely certain +L4FD4vub0o0,1610040000000.0,1610470000000.0,1,P6_q1BRxY8Q,P6_q1BRxY8Q,Final Decision,Accept (Poster),"Initially there were some shared concerns about the work being too incremental, lack of technical clarity on the algorithmic side and experiments, and lack of clear mathematical formulations. The authors did a good effort and cleared up many questions and remarks satisfactorily, and several reviewers have increased their scores as a consequence. In its current state I recommend to accept the paper.",ICLR2021, +B1lUtwLVgE,1545000000000.0,1545350000000.0,1,HkzRQhR9YX,HkzRQhR9YX,Good paper on modeling nonlinear dynamical system.,Accept (Poster),"This paper presents a recurrent tree-structured linear dynamical system to model the dynamics of a complex nonlinear dynamical system. All reviewers agree that the paper is interesting and useful, and is likely to have an impact in the community. Some of the doubts that reviewers had were resolved after the rebuttal period. + +Overall, this is a good paper, and I recommend an acceptance.",ICLR2019,5: The area chair is absolutely certain +S1S-BJTBG,1517250000000.0,1517260000000.0,506,ryH_bShhW,ryH_bShhW,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers all outlined concerns regarding novelty and the maturity of this work. It would be helpful to clarify the relation to doubly stochastic kernel machines as opposed to random kitchen sinks, and to provide more insight into how this stochasticity helps. Finally, the approach should be tried on more difficult image datasets.",ICLR2018, +h9KK4wSjYz,1642700000000.0,1642700000000.0,1,0UXT6PpRpW,0UXT6PpRpW,Paper Decision,Accept (Poster),"To perform self-supervised graph representation learning that is scalable to large graphs, the authors propose Bootstrapped Graph Latents (BGRL) that learns its graph representation by predicting alternative augmentations of the input, avoiding the need to construct negative examples. The weakness of the paper lies in its novelty, as it can be considered as a direct adaptation of the BYOL method, whose success has been demonstrated on self-supervised visual representation learning, to learn graph node representations. While the novelty is limited, the paper has shown how to appropriately apply BYOL to graph representation learning, achieving state-of-the-art results on graph node representation learning on large-scale graphs. The overall assessment of the reviewers is that the empirical significance of the paper outweighs its shortcoming in novelty. The AC agrees with this assessment and hence recommends acceptance.",ICLR2022, +omBOmtRuWa2,1610040000000.0,1610470000000.0,1,Mh1Abj33qI,Mh1Abj33qI,Final Decision,Reject,"This paper proposes a simple yet powerful generalisation of graph scattering transforms that allows a flexible scale dilation structure, retaining the stability guarantees of dyadic transforms. Experiments with strong empirical performance are reported on a variety of biochemical tasks. + +Reviewers acknowledged the soundness of the approach as well as the quality of the empirical evaluation, but also raised some concerns about lack of novelty. Ultimately this AC believes that, although this work solidifies Graph Scattering Transforms as a good alternative to GNNs on certain structured physical domains, it provides little advancements on the theory front. Unfortunately not all good papers can be accepted, and therefore the AC recommends rejection at this time, encouraging a resubmission.",ICLR2021, +B1g_6u5UeE,1545150000000.0,1545350000000.0,1,H1GLm2R9Km,H1GLm2R9Km,meta-review,Reject,"The reviewers mostly raised two concerns regarding the paper: a) why this algorithm is more interpretability than BP (which is just gradient descent); b) the exposition of the paper is somewhat confusing at various places; c) the lack of large-scale experiment results to show this is practically relevant. In the AC's opinion, a principled kernel-based approach can be counted as interpretable, and there the AC would support the paper if a) is the only concern. However, c) seems to be a serious concern since the paper doesn't seem to have experiments beyond fashion MNIST (e.g., CIFAR is pretty easy to train these days) and doesn't have experiments with convolutional models. Based on c), the AC decided that the paper is not quite ready for acceptance. ",ICLR2019,5: The area chair is absolutely certain +F5iH8TfVsy-,1642700000000.0,1642700000000.0,1,6MmiS0HUJHR,6MmiS0HUJHR,Paper Decision,Accept (Poster),"This paper proposes algorithms for learning (coarse) correlated equilibrium in multi-agent general-sum Markov games, with improved sample complexities that are polynomial in the maximum size of the action sets of different players. This is a very solid work along the line of multi-agent reinforcement learning and there is unanimous support to accept this paper. Thus, I recommend acceptance.",ICLR2022, +AA0VjohYEn-,1610110000000.0,1610470000000.0,1,B7v4QMR6Z9w,B7v4QMR6Z9w,Final Decision,Accept (Oral),"The paper introduces a new federated learning algorithm that ensures that the objective function optimized on each device is asymptotically consistent with the global loss function. Both theoretical analysis and empirical results, evaluating communication efficiency, demonstrate the advantages of the proposed FedDyn method over the baselines. + +All the reviewers recommend accepting the paper. To summarize the discussion: + +- R1 mentioned a very recent (NeurIPS 20) related paper and asks several questions. I believe that the authors nicely answered the questions and discussed the relation to the previous paper in detail. + +- R2 mentioned that the paper focuses solely on minimizing communication costs, ignoring costs of local computations. The authors argued that the local computation costs are comparable to those of the baselines, and, in general, communication costs are the main source of computation energy costs (pointing to previous work), and, thus, are a natural objective to optimize. I believe that this adequately addressed this (and other) reviewer's concerns and the reviewer kept their score unchanged. + +- R3 had several concerns, which according to the reviewer were addressed in the rebuttal (they increased the score). + +- R4 points out several limitations of the method and theoretical analysis and believes that the rebuttal did not quite address the concerns. Nevertheless, remains positive about the paper, and believes that the shortcomings can be addressed in follow-up work. + +We share the reviewers' sentiment: it is a very nice and interesting paper, and should be accepted. +",ICLR2021, +9kzwWVRAEQGx,1642700000000.0,1642700000000.0,1,7qaCQiuOVf,7qaCQiuOVf,Paper Decision,Reject,"This paper presents a new perspective for understanding reinforcement learning policies based on meta-states, as an effort to improve the explainability of RL control policies. After reviewing the revised paper and reading the comments from the reviewers, here are my comments: + +- The paper is well-written and very concise. +- The strategy is novel and deserves merit. +- The utility of the explanation is not well described. +- The main concerns of the proposal are the utility of the explanation (that is not well described) and its usage in large discrete state spaces or continuous state spaces domains. + +From the above, it is difficult to see the contribution and applicability of the paper in a clear manner.",ICLR2022, +S1xnLMIgeV,1544740000000.0,1545350000000.0,1,Hyg1G2AqtQ,Hyg1G2AqtQ,meta review,Accept (Poster),"This paper proposes an input-dependent baseline function to reduce variance in policy gradient estimation without adding bias. The approach is novel and theoretically validated, and the experimental results are convincing. The authors addressed nearly all of the reviewer's concerns. I recommend acceptance.",ICLR2019,4: The area chair is confident but not absolutely certain +BJxbu1On1N,1544480000000.0,1545350000000.0,1,HyefgnCqFm,HyefgnCqFm,Interesting area but doesn't meet quality or clarity standards,Reject,"This paper introduces a few training methods to fit the dynamics of a PDE based on observations. + +Quality: Not great. The authors seem unaware of much related work both in the numerics and deep learning communities. The experiments aren't very illuminating, and the connections between the different methods are never clearly and explicitly laid out in one place. +Clarity: Poor. The intro is long and rambly, and the main contributions aren't clearly motivated. A lot of time is spent mentioning things that could be done, without saying when this would be important or useful to do. An algorithm box or two would be a big improvement over the many long english explanations of the methods, and the diagrams with cycles in them. +Originality: Not great. There has been a lot of work on fitting dynamics models using NNs, and also attempting to optimize PDE solvers, which is hardly engaged with. +Significance: This work fails to make its own significance clear, by not exploring or explaining the scope and limitations of their proposed approach, or comparing against more baselines from the large set of related literature.",ICLR2019,4: The area chair is confident but not absolutely certain +BBb20zlR-aI,1610040000000.0,1610470000000.0,1,EGVxmJKLC2L,EGVxmJKLC2L,Final Decision,Reject,"There was fairly detailed discussion among three of the four reviewers. The fundamental concern of the reviewers is regarding the contribution of the paper. During the rebuttal, the authors clarified the following: + +> while the effects of varying uncertainty / horizon lengths is well-understood for Bayes-optimal policies, it is not understood for existing meta-RL approaches, which is the topic of this paper + +That is, the contribution of the paper is to understand the effects of varying uncertainty/horizon lengths for meta-RL approaches. However, it is known in prior work that meta-RL algorithms such as RL^2 can implement Bayes-optimal policies in principle. As a result, it's not clear whether this contribution is significant relative to prior knowledge, and this paper does not seem to bring any new insights. + +An alternative framing of the paper would be to consider the question of how meta-RL solutions compare to Bayes-adaptive optimal policies. While this framing would be interesting and novel, the current version of the paper does not sufficiently answer this question, since the only experiments include RL^2 (and such a study would require experimenting with more sophisticated meta-RL algorithms beyond RL^2). + +As such, this paper isn't suitable for publication at ICLR in its current form.",ICLR2021, +54HozKDXiEF,1610040000000.0,1610470000000.0,1,ES9cpVTyLL,ES9cpVTyLL,Final Decision,Reject,"The reviews are concerned about the novelty/incremental nature of the paper and partially also about the +conclusions drawn from the experiments. The authors did not take the chance to write a response.",ICLR2021, +JKgWnUubmK,1576800000000.0,1576800000000.0,1,SkxLFaNKwB,SkxLFaNKwB,Paper Decision,Accept (Poster),"The submission applies architecture search to object detection architectures. The work is fairly incremental but the results are reasonable. After revision, the scores are 8, 6, 6, 3. The reviewer who gave ""3"" wrote after the authors' responses and revision that ""Authors' responses partly resolved my concerns on the experiments. I have no object to accept this paper. [sic]"". The AC recommends adopting the majority recommendation and accepting the paper.",ICLR2020, +HJedUZuex4,1544750000000.0,1545350000000.0,1,SJxTroR9F7,SJxTroR9F7,Accept,Accept (Poster),"The paper presents an interesting technique for constrained policy optimization, which is applicable to existing RL algorithms such as TRPO and PPO. All of the reviewers agree that the paper is above the bar and the authors have improved the exposition during the review process. I encourage the authors to address all of the comments in the final version.",ICLR2019,4: The area chair is confident but not absolutely certain +EI5nwSy7W8,1576800000000.0,1576800000000.0,1,SJlVn6NKPB,SJlVn6NKPB,Paper Decision,Reject,"The authors present a method for learning representations of remote sensing images from multiple views. The main ideas is to use the InfoNCE loss to learn from multiple views of the data. + +The reviewers had a few concerns about this work which were not adequately addressed by the authors. I have summarised these below and would strongly recommend that the authors address these in subsequent submissions: + +1) Experiments on a single dataset and a very specific task: Authors should present a more convincing argument about why the chosen dataset and task are challenging and important to demonstrate the main ideas presented in their work. Further, they should also report results on additional datasets suggested by the reviewers. +2) Comparisons with existing works: The reviewers suggested several existing works for comparison. The authors agreed that these were relevant and important but haven't done this comparison yet. Without such a comparison it is hard to evaluate the main contributions of this work. + +Based on the above objections raised by the reviewers, I recommend that the paper should not be accepted.",ICLR2020, +0jo0TYuBwA0,1642700000000.0,1642700000000.0,1,fWK3qhAtbbk,fWK3qhAtbbk,Paper Decision,Reject,"This paper addresses unique windowing schemes for the input of an LSTM model for time-series forecasting, in particular an exponential partitioning, where bin sizes increase as moving further from the current time point. Although the basic idea is interesting and motivating and experimental results are strong; as reviewers pointed out, technical significance and novelty are limited because of lack of theoretical or conceptual justification and motivation the proposed approach. The authors’ claim is primarily based on experiments results. Other critical issues include the lack of comparison with recent advances in specifically designed to attend to longer history length or the discussion of modern approaches. other issues include presentation (e.g., grammatical errors) and the use of acronyms before introducing them.",ICLR2022, +Ucbi2sC76,1576800000000.0,1576800000000.0,1,r1g6MCEtwr,r1g6MCEtwr,Paper Decision,Reject,"The paper proposes a new scoring function for OOD detection based on calculating the total deviation of the pairwise feature correlations. This is an important problem that is of general interest in our community. + +Reviewer 2 found the paper to be clear, provided a set of weaknesses relating to lack of explanations of performance and more careful ablations, along with a set of strategies to address them. Reviewer 1 recognized the importance of being useful for pretrained networks but also raised questions of explanation and theoretical motivations. Reviewer 3 was extremely supportive, used the authors' code to highlight the difference between far-from-distribution behaviour versus near-distribution OOD examples. The authors provided detailed responses to all points raised and provided additional eidence. There was no convergence of the review recommendations. + +The review added much more clarity to the paper and it is no a better paper. The paper demonstrates all the features of a good paper, but unfortunately didn't yet reach the level for acceptance for the next conference. ",ICLR2020, +ha1Nqk3zbg9,1610040000000.0,1610470000000.0,1,GXJPLbB5P-y,GXJPLbB5P-y,Final Decision,Reject,"This work is well written and easy to follow and proposes a novel framework to utilize unlabeled output data. The authors have also given a detailed proof that the denoiser reduces the required complexity of the predictor. However, ultimately the experimental results are somewhat weak and leave doubts as to how effective the approach is. More convincing experimental results such as significant improvements on a well understood task and acknowledging that the approach is mostly useful when combined with pre-training and back translation would improve the work. + +Pros +- Well written. +- Technically novel approach to the problem of utilizing unlabeled output data. +- Interesting proof on the reduced complexity requirement for the predictor. + +Cons: +- Experimental results are not convincing. Showing significant improvements on a well understood task would be more convincing. +- The approach is only really useful when combined with pre-training or back-translation.",ICLR2021, +HkSgHy6SG,1517250000000.0,1517260000000.0,491,SkJKHMW0Z,SkJKHMW0Z,ICLR 2018 Conference Acceptance Decision,Reject,"The proposed relational reasoning algorithm is basically a fairly standard graph neural network, with a few modifications (e.g., the prediction loss at each layer - also not a new idea per se). + + The claim that previously reasoning has not been considered in previous applications of graph neural networks (see discussion) is questionable. It is not even clear what is meant here by 'reasoning' as many applications of graph neural networks may be regarded as performing some kind of inference on graphs (e.g., matrix completion tasks by Berg, Kipf and Welling; statistical relational learning by Schlichtkrull et al). + +So the contribution seems a bit over-stated. Rather than introduces a new model, the work basically proposes an application of largely known model to two (not-so-hard) tasks which have not been studied in the context of GNNs. The claim that the approach is a general framework for dealing with complex reasoning problems is not well supported as both problems are (arguably) not complex reasoning problems (see R2). + +There is a general consensus between reviewers that the paper, in its current form, does not quite meet acceptance criteria. + +Pros: +-- an interesting direction +-- clarity +Cons: +-- the claim of generality is not well supported +-- the approach is not so novel +-- the approach should be better grounded in previous work + + +",ICLR2018, +SylfTv3eeV,1544760000000.0,1545350000000.0,1,Bkg5aoAqKm,Bkg5aoAqKm,Rejection: good Paper but still require further improvement ,Reject,"This paper proposes an Optimal Binary Functional Search (OBFS) algorithm for searching with general score functions, which generalizes the standard similarity measures based on Euclidean distances. This yields an extension of the classical approximate nearest neighbor search (ANNS). As observed by the reviewers, this work targets an important research direction. Unfortunately, the reviewers raised several concerns regarding the clarity and significance of the work. The authors provided a good rebuttal and addressed some concerns, but not to the degree that reviewers think it passes the bar of ICLR. We encourage the authors to further improve the work to address the key concerns. ",ICLR2019,4: The area chair is confident but not absolutely certain +Wad1ZFemI52,1642700000000.0,1642700000000.0,1,c87d0TS4yX,c87d0TS4yX,Paper Decision,Accept (Poster),"Description of paper content: + +The paper provides a framework to develop a family of algorithms that decompose rewards into linear combinations of several reward channels. The value functions per channel are estimated in a new space using an invertible function transformation, f. The framework encompasses several previously published algorithms, including Log Q-Learning. Conditions are provided for acceptable choices of f. Convergence to the optimal Q function in the tabular case is proven for a special learning update. + +Summary of paper discussion: + +All review scores were above the acceptance threshold. Overall, the reviewers found the idea interesting, the theoretical results satisfying, and the writing and presentation clear. Initial concern about the directedness of the experiments in showing the usefulness of this particular theoretical framework to explain performance improvements was allayed when some of the results in the paper (e.g. reward density in Atari Skiing) were re-emphasized. Generally, all reviewers felt that this was a nice, thorough contribution with the demerit that the framework lacked “a killer application” experimentally.",ICLR2022, +bvkDQ3tJHA,1610040000000.0,1610470000000.0,1,UAAJMiVjTY_,UAAJMiVjTY_,Final Decision,Reject,"The paper addresses the difficult problem of combining ILP in a meta-interpretive framework with noisy inputs from a neural system. The essential idea is to use MIL to ""efficiently"" search for constraints on the neural outputs (eg z1 + z2 + z3 = 7, or z2< z3) as well as logic programs, with a score related to program complexity as well as probability of the best constraint-satisfying neural outputs. It is interesting work for the right audience but it's clear from the reviews that the presentation was difficult for ICLR readers, even ones with appropriate background. + +Some potential weaknesses of the approach include: + +1 - it's unclear how scalable the MIL framework is - presumably the intrinsic difficultly of the search means that programs and constraint sets must be small + +2 - it's unclear how general the approach is beyond the digits-as-separate-inputs setting of the two experimental studies, and its unclear how accurate the perceptual layer needs to be - MNIST obviously being an example of a case where there is little noise with a modern classifier. + +3 - it's unclear how constraints can in general be used to backprop any information to the underlying neural system, and without this the joint training seems to be quite limited. + +Overall the paper is judged as inappropriate for ICLR.",ICLR2021, +3xdZV5DQO7n,1642700000000.0,1642700000000.0,1,4C93Qvn-tz,4C93Qvn-tz,Paper Decision,Accept (Poster),The authors set up a simple combination of an energy based model and a flow based model that corrects the flow based model with an energy based term. The merits of this relative only an energy based model is improved sampling to compute the gradient. The advantage over a only flow based model is that the kinds of transforms that can be used are less limited.,ICLR2022, +H1gOOD7yx4,1544660000000.0,1545350000000.0,1,HJxwAo09KQ,HJxwAo09KQ,Metareview,Reject,The paper conveys interesting idea but need more work in terms of fair empirical study and also improvement of the writing. The AC based her summary only on the technical argumentation presented by reviewers and authors. ,ICLR2019,5: The area chair is absolutely certain +mgQgul3Nr0,1642700000000.0,1642700000000.0,1,eYciPrLuUhG,eYciPrLuUhG,Paper Decision,Accept (Poster),"This paper studies the problem of learning a graphical model given observational and experimental data. The main novelty is the use of interventions to avoid the acyclicity constraint that plagues existing methods. Although this idea is quite standard and well-known, the generality of the approach merits consideration. After the discussion, there was a consensus among the reviewers to accept this paper. Some valid concerns have been raised and we expect that the authors will take into account all of the suggestions raised by the reviewers.",ICLR2022, +SJzMLypSf,1517250000000.0,1517260000000.0,733,H1LAqMbRW,H1LAqMbRW,ICLR 2018 Conference Acceptance Decision,Reject,"There was certainly some interest in this paper which investigates learning latent models of the environment for model-based planning, particularly articulated by Reviewer3. However, the bulk of reviewer remarks focused on negatives, such as: + +--The model-based approach is disappointing compared to the model-free approach. +--The idea of learning a model based on the features from a model-free agent seems novel but lacks significance in that the results are not very compelling. +--I feel the paper overstates the results in saying that the learned forward model is usable in MCTS. +-- the paper in it’s current form is not written well and does not contain strong enough empirical results + + +",ICLR2018, +S8EnleS94,1576800000000.0,1576800000000.0,1,SkxW23NtPH,SkxW23NtPH,Paper Decision,Reject,"This paper presents a new reinforcement learning based approach to device placement for operations in computational graphs and demonstrates improvements for large scale standard models. + +The paper is borderline with all reviewers appreciating the paper even the reviewer with the lowest score. The reviewer with the lowest score is basing the score on minor reservation regarding lack of detail in explaining the experiments. + +Based upon the average score rejection is recommended. The reviewers' comments can help improve the paper and it is definitely recommended to submit it to the next conference. + +",ICLR2020, +lOvXINdn66,1610040000000.0,1610470000000.0,1,8iW8HOidj1_,8iW8HOidj1_,Final Decision,Reject,"This paper proposes an extension to the Dreamer agent in which planning (either via MCTS or rollouts) is used to select actions, rather than sampling from the policy prior. The results show small improvements over the baseline Dreamer agent. + +Pros: +- Important study on incorporating decision-time planning into Dyna-based agents +- Evaluation on many control tasks rather than just a few + +Cons: +- Lack of ablations and detailed analysis +- Claims aren't backed up by quantitative results + +The reviewers generally felt that the approach taken in the paper lacked novelty. I agree that the approach is somewhat incremental (in fact I think it is also an instance of [1]). While both incremental changes and reimplementations of older methods with newer techniques can indeed be valuable, the current paper falls short in terms of the evaluation. As pointed out by several reviewers, there is no in-depth analysis explaining the design choices in which rollouts or MCTS are most likely to help (e.g. search budget, exploration parameters, etc.). As these parameters can play a large role in performance, I think it is important to characterize their effect on the agent---otherwise, I do not think there is a clear learning regarding how to translate these results to other domains and tasks. Additionally, and perhaps even more seriously, there are a number of claims made in the paper about the proposed method being more data efficient or higher performance. But, it is not clear visually that these improvements are statistically significant, and no quantitative tests have been run (and if the authors want to make a claim about data efficiency, I'd especially encourage them to report a metric like cumulative regret). Finally, while the incomplete runs are not a reason for rejection on their own, they do add to my overall sense that the paper is incomplete in its current form. + +Given the above reasons, I do not feel this paper is ready for publication at ICLR. I'd encourage the authors to perform more careful ablations of the effect of incorporating search into the agent, and to back up their claims with more rigorous quantitative results. + +One small point: the authors wrote in the rebuttal that ""we are not aware of any work which investigates look-ahead search-based planning for continuous control with learned dynamics"". Grill et al. [2] uses MCTS with learned dynamics in a modification of MuZero, though only applies it in one continuous control task (Cheetah Run). + +1. Silver, D., Sutton, R. S., & Müller, M. (2008). Sample-based learning and search with permanent and transient memories. ICML. +2. Grill, J. B., Altché, F., Tang, Y., Hubert, T., Valko, M., Antonoglou, I., & Munos, R. (2020). Monte-Carlo tree search as regularized policy optimization. ICML.",ICLR2021, +RvBz-f6Ueu,1610040000000.0,1610470000000.0,1,bMCfFepJXM,bMCfFepJXM,Final Decision,Reject,"The paper presents an interesting perspective on improving offline RL within BRAC framework. +Given the improvements over BRAC, the paper is well organized and easy to understand. + +The overall results pique interest in comparison with more recent Offline/Batch RL papers: BRAC, BEAR, CQL. +The results in this paper bring BRAC-family of methods closer to CQL with a number of practical improvements, and could have impact in practice. + +However, the reviewers have slight split over the marginal value of additional machinery. There do remain some concerns: +- KL divergence is not the best metric to capture OOD issues between policies. +- The additional machinery in comparison to CQL may be unnecessary, at least in terms of results. +- The method requires many task-specific key hyper-parameters, which limits the generality of the approach. + +I would recommend rejection as it stands. The paper needs more careful empirical analysis that explains what methodical improvements are actually required and which ones only provide marginal bumps. With multiple task-specific hyper-params, it may be tricky for these ideas to realize their potential if not clearly understood. +Further release of sufficiently documented and easy to use implementation, will probably be required for acceptance since the main argument in the paper are number of technical improvements in BRAC.",ICLR2021, +NsqeXXPHRFX,1642700000000.0,1642700000000.0,1,eJyt4hJzOLk,eJyt4hJzOLk,Paper Decision,Reject,"The reviewers had a number of concerns which remain since the authors did not provide any response nor they have updated the paper. Hopefully, once the paper is improved in terms of clarification (significance and correctness of the theoretical results and the technical approach approach), it will be ready for publication in one of the ML venues.",ICLR2022, +ryODrJTBf,1517250000000.0,1517260000000.0,586,r1nzLmWAb,r1nzLmWAb,ICLR 2018 Conference Acceptance Decision,Reject,All reviewers believed that the novelty of the contribution was limited.,ICLR2018, +Syg3M8NwxV,1545190000000.0,1545350000000.0,1,HyMRaoAqKX,HyMRaoAqKX,Interesting idea whose presentation could be less confusing,Reject,"The paper proposes an original idea for training a generative model based on an objective inspired by a VAE-like evidence lower bound (ELBO), reformulated as two KL terms, which are then approximately optimized by two GANs. They thus use implicit distributions for both the posterior and the conditional likelihood. The idea is original and intriguing. But reviewers and AC found that the paper currently suffered from the following weaknesses: a) The presentation of the approach is unclear, due primarily to the fact that it doesn't throughout unambiguously enough separate the VAE-like ELBO *inspiration*, from what happens when replacing the two KL terms by GANs, i.e. the actual algorithm used. This is a big conceptual jump that would deserve being discussed and analyzed more carefully and thoroughly. b) Reviewers agreed that the paper does not sufficiently evaluate the approach in comparative experiments with alternatives, in particular its generative capabilities, in addition to the provided evaluations of the learned representation on downstream tasks. +Reviewers did not reach a clear consensus on this paper, although discussion led two of them to revise their assessment score slightly towards each other's. One reviewer judged the paper currently too confusing (point a) putting more weight on this aspect than the other reviewers. +Based on the paper and the review discussion thread, the AC judges that while it is an original, interesting and potentially promising approach, its presentation can and should be much clarified and improved. +",ICLR2019,3: The area chair is somewhat confident +mXF7S_db0s,1576800000000.0,1576800000000.0,1,BygZK2VYvB,BygZK2VYvB,Paper Decision,Reject,"This paper proposed an auxiliary loss based on mutual information for graph neural network. Such loss is to maximize the mutual information between edge representation and corresponding edge feature in GNN ‘message passing’ function. GNN with edge features have already been proposed in the literature. Furthermore, the reviewers think the paper needs to improve further in terms of explain more clearly the motivation and rationale behind the method. ",ICLR2020, +dtHc-vZlagu,1610040000000.0,1610470000000.0,1,Y-Wl1l0Va-,Y-Wl1l0Va-,Final Decision,Reject,"This work proposes a shortest path constraint for the reinforcement learning algorithm to improve efficiency in sparse-reward scenarios. The experiments are shown in navigation tasks in first-person maze and grid world. Reviewers found the idea interesting and the paper well-written but none of them championed the paper for clear acceptance. The authors provided a detailed thoughtful rebuttal. All the reviewers acknowledged the rebuttal followed by discussion. After considering rebuttal, review, and discussion, both AC and reviewers feel that experiments don't fully support and justify the algorithm. The main issue is that the results are shown only for the shortest pathfinding problems where the shortest path constraint is shown to be helpful. Hence, it is recommended to run it on diverse scenarios and standard benchmarks like the Atari games suite. Please refer to the reviews for final feedback and suggestions to strengthen the future submission.",ICLR2021, +rUOYR4ohcVQ,1642700000000.0,1642700000000.0,1,G7PfyLimZBp,G7PfyLimZBp,Paper Decision,Reject,"The paper considers the difference between GD and ADAM in terms of implicit bias. It considers a specific distribution and architecture where the two algorithms converge to different solutions while perfectly fitting the training data. The authors highlight the fact that this happens while adding regularization, which does not happen in the linear case. +The reviewers found some of the insights and analysis interesting. However, they also had reservations about the impact of the results given that it is known that GD and ADAM have different implicit biases, and that the distribution appears specifically crafted towards showing this effect for the architecture studied. +In future versions, the authors are encouraged to better motivate the chosen distribution, use more standard neural architecture (e.g., standard relu), and provide more explanation about the role of regularization in their result.",ICLR2022, +YFkg5JQ14qa,1610040000000.0,1610470000000.0,1,wta_8Hx2KD,wta_8Hx2KD,Final Decision,Accept (Poster),"Symmetries play an important role in physics, and more and more papers show that they also play an important role in statistical machine learning. In particular, employing symmetries might be the key to improve training and predictive performance of machine learning models. In this context, the present paper shows how previous physical knowledge can be leveraged to improve neural network performance, in particular within Deep dynamic models. To this end, they show how to incorporate equivariance into resnets and u-nets for dynamical systems. On a technical level, as pointed out by the reviews and also clearly mentioned by the authors, the basic building blocks are well known in the literature. However, dynamical systems also raises their own challenges resp. laws when it comes to modelling symmetries, as the authors argue in the paper and also clarified in the rebuttal. For instance, it pays off to adapt the techniques known from the literature deal better with scale, magnitude and uniform motion equivariance. This is a solid contributions and will help many other who want to apply DNNs to dynamic and physical models. ",ICLR2021, +rkg25eByxV,1544670000000.0,1545350000000.0,1,r1Nb5i05tX,r1Nb5i05tX,"Clever application of MINE, but unclear how strongly the results validate information bottleneck theory",Reject,"This paper does two things. First, it proposes an approach to estimating the mutual information between the input, X, or target label, Y, and an internal representation in a deep neural network, L, using MINE (for I(Y;L)) or a variation on MINE (for I(X;L)) and noise regularization (estimating I(X;L+ε), where ε is isotropic Gaussian white noise) to avoid the problem that I(X;L) is infinite for deterministic networks and continuous X. Second, it attempts to validate the information bottleneck theory of deep learning (Tishby and Zaslavsky, 2015) by exploring an approach to training DNNs that optimizes the information bottleneck Lagrangian, I(Y;L) − βI(X;L+ε), layerwise instead of using cross-entropy and backpropagation. Experiments on MNIST and CIFAR-10 show improvements for the layerwise training over cross-entropy training. The penalty on I(X;L+ε) is described as being analogous to weight decay. The reviewers raised a number of concerns about the paper, the most serious of which is that the claim that the layerwise training results validate the information bottleneck theory of deep learning is too strong. In the AC's opinion, R1's critique that ""[i]f the true mutual information is infinite and the noise regularized estimator is only meant for comparative purposes, why then are the results of the training trajectories interpreted so literally as estimates of the true mutual information?"" is critical, and the authors' reply that ""this quantity is in fact a more appropriate measure for “compactness” or “complexity” than the mutual information itself"" undermines their claim that they are validating the information bottleneck theory of deep nets because the information bottleneck theory claims to be using mutual information. The AC also suggests that if the authors wish to continue this work and submit it to another venue, they (1) discuss the fact that MINE estimates only a lower bound that may be quite loose in practice and (2) say in their experimental section whether or not the variance of the regularizing noise was tuned as a hyperparameter, and if so, how results varied with different amounts of noise. Finally, the AC regrets that only one reviewer participated in the discussion (in a very minimal way), despite the reviewers' receiving several reminders that the discussion is a defining feature of the ICLR review process.",ICLR2019,4: The area chair is confident but not absolutely certain +HJ6yPyaHG,1517250000000.0,1517260000000.0,917,rJFOptp6Z,rJFOptp6Z,ICLR 2018 Conference Acceptance Decision,Reject,The authors propose a distillation-based approach that is applied to transfer knowledge from a classification network to non-classification tasks (face alignment and verification). The writing is very imprecise - for instance repeatedly referring to a 'simple trick' rather than actually defining the procedure - and the method is described in very task-specific ways that make it hard to understand how or whether it would generalize to other problems.,ICLR2018, +Vb-xRUGnoP,1576800000000.0,1576800000000.0,1,SkgsACVKPH,SkgsACVKPH,Paper Decision,Accept (Poster),This paper proposes a method to improve the training of sparse network by ensuring the gradient is preserved at initialization. The reviewers found that the approach was well motivated and well explained. The experimental evaluation considers challenging benchmarks such as Imagenet and includes strong baselines. ,ICLR2020, +Fry-G71IDVX,1610040000000.0,1610470000000.0,1,_zx8Oka09eF,_zx8Oka09eF,Final Decision,Accept (Poster),"The paper investigates the interesting question whether increasing the width or the number of parameters is +responsible for improved test accuracy. The paper is very well written and the question is novel and innovative. +From a methodological point of view, the experiments are well conducted, too. +The theoretical part of the paper is somewhat detached from the experimental part and constitutes more of +a heuristic conjecture. In addition, more experiments on a variety of other data sets would have been great. +Ideally, the theoretical section would thus be replaced by such additional experiments, but this is of course not +an option for a conference reviewing system. +Given the innovative question and well-conducted experiments I think that the pros outweighs the cons, +and for this reason I recommend to accept the paper. Reviewer concerns have been well addressed by the authors in their rebuttal and updated version of the paper.",ICLR2021, +BITEqzHRXVa,1642700000000.0,1642700000000.0,1,36SHWj0Gp1,36SHWj0Gp1,Paper Decision,Reject,"This paper presents a transformer model for learning representations of assembly code blocks, trained using a variant of the masked language modeling objective that encodes the full code block token sequence into a single bottleneck vector and then uses that vector to decode all the masked out tokens. Overall reviewer assessment for this paper is on the rejection side, mostly due to the not so novel model architecture and training objective. Experiments show that this variant of the MLM performs significantly better than the standard MLM objective without the bottleneck, which surprisingly is even worse than the simple TF-IDF in many tasks. This raises questions and it is unclear from the paper why the variant with a sequence-level bottleneck should perform better than the standard MLM. Binary code similarity detection has many implications in security, so this is a good domain to explore more in, and I encourage the authors to continue to improve this work and send it to the next venue. + +One related work also published in the ML community comes to mind that the authors might not be aware of: Graph matching networks for learning the similarity of graph structured objects by Li et al., ICML 2019, which also looked at binary code similarity detection, but works at the function level. It would be good to also take a look for other potentially missing related work.",ICLR2022, +sH9vCYg_1,1576800000000.0,1576800000000.0,1,BkgzMCVtPB,BkgzMCVtPB,Paper Decision,Accept (Talk),"This paper concerns the problem of defending against generative ""attacks"": that is, falsification of data for malicious purposes through the use of synthesized data based on ""leaked"" samples of real data. The paper casts the problem formally and assesses the problem of authentication in terms of the sample complexity at test time and the sample budget of the attacker. The authors prove a Nash equillibrium exists, derive a closed form for the special case of multivariate Gaussian data, and propose an algorithm called GAN in the Middle leveraging the developed principles, showing an implementation to perform better than authentication baselines and suggesting other applications. + +Reviewers were overall very positive, in agreement that the problem addressed is important and the contribution made is significant. Most criticisms were superficial. This is a dense piece of work, and presentation could still be improved. However this is clearly a significant piece of work addressing a problem of increasing importance, and is worthy of acceptance.",ICLR2020, +BJxEw0-zeE,1544850000000.0,1545350000000.0,1,HJlt7209Km,HJlt7209Km,Interesting proposal but requires more comprehensive evaluation and comparison,Reject,"The paper proposes a feature smoothing technique as a new and ""cheaper"" technique for training adversarially robust models. + +Pros: + +* the paper is generally well written and the claimed results seem quite promising + +* the theory contribution are interesting + +Cons: + +* the main technique is fairly incremental + +* there were concerns regarding the comprehensiveness of evaluations and baselines used",ICLR2019,5: The area chair is absolutely certain +Lb5mxg6G1sI,1610040000000.0,1610470000000.0,1,tADlrawCrVU,tADlrawCrVU,Final Decision,Reject,"This work is well written and accurately covers the context and recent related work. It's a good example of how to apply self-supervised training to the event sequence domain. However, the combination of a lack of technical originality (composing a set of previously explored ideas) and significant improvements in results (results with CoLES overlap in error bars with RTD results) limits the impact of this paper. + +Pros: +- Well written. +- Extensive evaluation. +- Well formulated problem. + +Cons: +- Lack of technical novelty. The method appears to be general to all sequences rather than specialized for event sequences so the motivation for this design is not crystal clear. +- Minor improvement in results from using the method despite written claims that the method 'significantly outperforms'. +- Limited analysis that shows the periodicity and repeatability in the data.",ICLR2021, +4Ng5rWHhRjX,1642700000000.0,1642700000000.0,1,VKtGrkUvCR,VKtGrkUvCR,Paper Decision,Reject,"This paper studies the average convergence rate for first order methods on random quadratic optimization problems. Specifically it is a follow-up to work of Pedregosa and Scieur. They study the expected spectral distribution (e.s.d.) of the objective's Hessian and show asymptotic guarantees that work under some assumptions. In comparison to Pedregosa and Scieur, the main takeaway is that you only need to know the distribution at the edges as opposed to the entire spectrum in order to get the same improved convergence. However some reviewers felt that the contributions were oversold, and for example that Assumption 1 is quite restrictive.",ICLR2022, +ZahrBJpef,1576800000000.0,1576800000000.0,1,BJxGan4FPB,BJxGan4FPB,Paper Decision,Reject,"This paper tackles the problem of how to adapt a model from a source to a target domain when both data is not available simultaneously (even unlabeled) to a single learner. This is of relevance for certain privacy preserving applications where one setting would like to benefit from information learned in a related setting but due to various factors may not be willing to directly share data. The proposed solution is a transfer alignment network (TAN) which consists of two autoencoders (each trained independently on the source and the target) and an aligner which has the task of mapping the latent codes of one domain to the other. + +All three reviewers expressed concerns for this submission. Of greatest concern was the experimental setting. The datasets chosen were non-standard and there was no prior work to compare against directly so the results presented are difficult to contextualize. The authors have responded to this concern by specifying the existing domain adaptation benchmarks are more challenging and require more complex architectures to handle the “more complex data manifolds”. The fact that existing benchmark datasets may be more complex the the dataset explored in this work is a concern. The authors should take care to clarify whether their proposed solution may only be applicable to specific types of data. In addition, the authors claim to address a new problem setting and therefore cannot compare directly to existing work. One suggestion is if using new data, report performance of existing work under the standard setting to give readers some grounding for the privacy preserving setting. Another option would be to provide scaffold results in the standard UDA setting but with frozen feature spaces. Another option would be to ablate the choice of L2 loss for learning the transformer and instead train using an adversarial loss, L1 loss etc. There are many ways the authors could both explore a new problem statement and provide convincing experimental evidence for their solution. The AC encourages the authors to revise their manuscript, paying special attention to clarity and experimental details in order to further justify their proposed work. ",ICLR2020, +rmUBhMHTTB,1610040000000.0,1610470000000.0,1,Du7s5ukNKz,Du7s5ukNKz,Final Decision,Reject,"The reviewers had some initial concerns about this submission. While the authors' rebuttal does a good job to address these concerns, the reviewers still have some doubts about the contribution of this paper and potential impact. In particular, it is not clear whether the performance improvements observed with the proposed algorithms is due to the ability to correct for noisy rewards or whether there are multiple other explanations for the improvement in performance. This makes it hard to predict whether the proposed algorithms will actually be useful in settings where noisy rewards or demonstration data are present.",ICLR2021, +d0rnS5fOgH,1576800000000.0,1576800000000.0,1,SkxSv6VFvS,SkxSv6VFvS,Paper Decision,Accept (Poster),"In my opinion, this paper is borderline (but my expertise is not in this area) and the reviewers are too uncertain to be of help in making an informed decision.",ICLR2020, +iwUlSHLUzz,1576800000000.0,1576800000000.0,1,SJxWS64FwH,SJxWS64FwH,Paper Decision,Accept (Poster),After the rebuttal period the ratings on this paper increased and it now has a strong assessment across reviewers. The AC recommends acceptance.,ICLR2020, +xJWyBk6ujM,1576800000000.0,1576800000000.0,1,S1eq9yrYvH,S1eq9yrYvH,Paper Decision,Reject,"The authors propose a learning framework to reframe non-stationary MDPs as smaller stationary MDPs, thus hopefully addressing problems with contradictory or continually changing environments. A policy is learned for each sub-MDP, and the authors present theoretical guarantees that the reframing does not inhibit agent performance. + +The reviewers discussed the paper and the authors' rebuttal. They were mainly concerned that the submission offered no practical implementation or demonstration of feasibility, and secondarily concerned that the paper was unclearly written and motivated. The authors' rebuttal did not resolve these issues. + +My recommendation is to reject the submission and encourage the authors to develop an empirical validation of their method before resubmitting.",ICLR2020, +HJlZ7uZiJV,1544390000000.0,1545350000000.0,1,BklKFo09YX,BklKFo09YX,"Fundamentally sensible idea, but too incremental and application-focused",Reject,"This paper introduces a variant of the CycleGAN designed to optimize molecular graphs to achieve a desired quality. The work is reasonably clear and sensible, however it's of limited technical novelty, since it's mainly just combining two existing techniques. Overall its specificity and incrementalness make it not meet the bar.",ICLR2019,4: The area chair is confident but not absolutely certain +DDe7ECkKrZu,1642700000000.0,1642700000000.0,1,LdEhiMG9WLO,LdEhiMG9WLO,Paper Decision,Accept (Poster),"This paper studies structured pruning methods, called kernel-pruning in the paper which is also known as channel pruning for convolutional kernels. A simple method is proposed that primarily consists of three stages: (i) clusters the filters in a convolution layer into predefined number of groups, (ii) prune the unimportant kernels from each group, and (iii) permute remaining kernels to form a grouped convolution operation and then fine-tune the network. Although the novelty of the method is not high, it is simple and effective in experiments after the supplementary sota results in the long rebuttal. Majority of reviewers increase their ratings after the rebuttal (though one reviewer promised this but forgot to act), while some reviewers have concerns on the fairness to other authors by adding lots of new results in unlimited rebuttal and refuse to check more. In terms of the top end of performance, a reviewer thinks that ""the authors haven't quite exceeded the results from existing works (""Discrimination-aware channel pruning for deep neural network"" and ""Learning-compression” algorithms for neural net pruning"" for CIFAR-10 and many others on ImageNet)"". In all, this work indeed lies on the boundary. After a discussion with other committee members, we recommend the acceptation of this work, if the authors could incorporate all the new results in rebuttal and get the reproducible codes released in the final version.",ICLR2022, +ljBHX9sphT,1576800000000.0,1576800000000.0,1,B1gNKxrYPB,B1gNKxrYPB,Paper Decision,Reject,"The paper studies the problem of graph learning with attributes, and propose a 2-D graph convolution that models the node relation graph and the attribute graph jointly. The paper proposes and efficient algorithm and models intra-class variation. Empirical performance on 20-NG, L-Cora, and Wiki show the promise of the approach. + +The authors responded to the reviews by updating the paper, but the reviewers unfortunately did not further engage during the discussion period. Therefore it is unclear whether their concerns have been adequately addressed. + +Overall, there have been many strong submissions on graph neural networks at ICLR this year, and this submission as is currently stands does not quite make the threshold of acceptance.",ICLR2020, +jxi4e-JIVt,1576800000000.0,1576800000000.0,1,SkxgnnNFvH,SkxgnnNFvH,Paper Decision,Accept (Poster),"The paper presents a new architecture that achieves the advantages of both Bi-encoder and Cross-encoder architectures. The proposed idea is reasonable and well-motivated, and the paper is clearly written. The experimental results on retrieval and dialog tasks are strong, achieving high accuracy while the computational efficiency is orders of magnitude smaller than Cross-encoder. All reviewers recommend acceptance of the paper and this AC concurs.",ICLR2020, +3aHx01HsGq,1610040000000.0,1610470000000.0,1,OCm0rwa1lx1,OCm0rwa1lx1,Final Decision,Reject,"This paper focuses on the limitations of the transformer architecture as an autoregressive model. The paper is relatively easy to follow. Though most reviewers find the paper interesting, the idea is not very novel. The introduction of sequential-ness to Transformer is good, though it also slow things down especially as the sequence gets longer. + +An extensive set of experiments are performed, though the results are not entirely convincing. The authors are encouraged to add more ablative experiments, efficiency analysis, and large-scale results.",ICLR2021, +RSvLhREmBG,1610040000000.0,1610470000000.0,1,1GTma8HwlYp,1GTma8HwlYp,Final Decision,Accept (Poster),"After engaging in some good interactive discussions all but one reviewer settled on a rating of marginal accept. The most negative reviewer didn't really provide a clear enough explanation of what was lacking in the work. The other reviewers felt that the observed gains for this multi-task learning framework were clear enough that the work is worthy of some attention by the community. The AC recommends acceptance, but one may consider this recommendation as a just past the line for acceptance recommendation.",ICLR2021, +NvOI5ntxdz,1610040000000.0,1610470000000.0,1,_X_4Akcd8Re,_X_4Akcd8Re,Final Decision,Accept (Poster),"This paper was reviewed by four experts in the field. Based on the reviewers' feedback, the decision is to recommend the paper for acceptance to ICLR 2021. The reviewers did raise some valuable concerns that should be addressed in the final camera-ready version of the paper. The authors are encouraged to make the necessary changes and include the missing references.",ICLR2021, +wdj9g4Nvs,1576800000000.0,1576800000000.0,1,BJgr4kSFDS,BJgr4kSFDS,Paper Decision,Accept (Poster),This paper proposes a new method to answering queries using incomplete knowledge bases. The approach relies on learning embeddings of the vertices of the knowledge graph. The reviewers unanimously found that the method was well motivated and found the method convincingly outperforms previous work.,ICLR2020, +Y0QD2cUcrR_,1610040000000.0,1610470000000.0,1,ULQdiUTHe3y,ULQdiUTHe3y,Final Decision,Accept (Poster),"This paper considers a new setting of robustness, where multiple predictions are simultaneously made based on a single input. Different from existing robustness certificates which independently consider perturbation of each prediction, the authors propose collective robustness certificate that computes the number of predictions which are simultaneously guaranteed to remain stable under perturbation. This yields more optimistic results. Most reviewers think this is a very interesting work and the authors present an effective method to combine individual certificate. The experimental results are convincing. I recommend accept.",ICLR2021, +WDVZrHt1DUi,1610040000000.0,1610470000000.0,1,y4-e1K23GLC,y4-e1K23GLC,Final Decision,Reject,"The paper studies the Lipschitz properties of neural networks — in particular, two layer neural networks that interpolate generic datasets. It conjectures a “size robustness tradeoff”: in this setting, the number of neurons required to interpolate with an O(1)-Lipschitz function is proportional to the number of data points n, while the number of neurons required for interpolation alone is proportional to n/d, where d is the data dimension. More precisely, the conjecture is that the best achievable Lipschitz constant is proportional to $\sqrt{n/k}$, where $k$ is the number of neurons. The paper proves weaker versions of both sides of this conjecture: it proves that a spectral upper bound on the Lipschitz constant is lower bounded by $\sqrt{n/k}$ and that there exist networks achieving this Lipschitz constant when $k ~ n/d$ and $k ~n$. The paper also provides experiments supporting its claims, with the caveat that the actual Lipschitz constant is a worst-case quantity that cannot be directly observed. + +Pros and cons: + +[+] The paper identifies a novel (conjectured) phenomenon involving the dependence of the Lipschitz constant of an interpolating network on the degree of overparameterization. In words, Lipschitz interpolation requires significantly more neurons than mere interpolation. This observation seems likely to stimulate future work. + +[+] The paper provides relatively simple and rigorous proofs of simplified versions of its conjectures (both upper and lower bounds on the achievable Lipschitz constant). + +[+] The exposition is technically clean, and the paper is clear on the limitations of its analyses. + +[-] The setting of the paper’s analysis seems somewhat mismatched with the practice of deep learning. The data are assumed to be generic, where neural networks excel in fitting structured data. Several reviewers noted this mismatch and raised concerns about whether this conjectured/proved tradeoff on generic data carries over to structured datasets. + +[-] A technical limitation is the shallowness of the network: controlling the Lipschitz properties of deep networks is much more challenging at a technical level, because one needs to argue that for the worst input, features propagate in a “generic” fashion. It is technically challenging to avoid exponential dependence on depth. + +[-] The paper obtains only partial progress towards proving its conjectures — for example, it shows that it is possible to interpolate with a Lipschitz constant of $n \log n / k$, where the conjectured bound is $\sqrt{n/k}$. + +[-] Comparing to kernel methods would help to better contextualize the results, since in a similar setting, kernel methods could also potentially be analyzed via localization arguments. + +Overall, the paper conjectures a novel phenomenon around size/robustness tradeoffs in interpolating neural networks. While the paper's conjectures have the potential to stimulate further empirical and theoretical work, the reviewers (in particular R1 and R2) note a number of significant limitations to the paper’s analysis. In light of these issues, the paper falls below the par for acceptance. ",ICLR2021, +QJSGvMhxn9G,1610040000000.0,1610470000000.0,1,I6NRcao1w-X,I6NRcao1w-X,Final Decision,Reject,"The paper studies reinforcement learning in the presence of (adversarial) perturbations in the underlying system dynamics. The main (novel) observation is that  agents trained against a single policy may overfit  to that policy and hence will lack robustness to new/unseen policies. The paper proposes a population-based augmentation to the Robust RL formulation in which a population of adversaries are randomly initialized and samples from during training. The authors seek to show that their method generalizes well to unseen policies at test time. + +Most reviewers agree that the paper provides a range of solid experimental results (with in-distribution and out-of-distribution tasks) showing robustness and generalization of their methods on several robotics benchmarks while avoiding a ubiquitous domain randomization failure mode. However, all the reviewers (and myself) agree that some of the conceptual claims of the paper may not be precise. For example, some of the reviewers disagree with the authors on finding the (mixed) Nash equilibria. Such general claims are hard to validate (may not even be true) and need theoretical justification. Hence, it is not conceptually clear why using multiple adversaries would not suffer from the same limitations as in the single adversary case.  Also, in the discussion phase, the reviewers agreed that the results/claims of the paper (i.e. overfitting to a single adversary and the need for multiple adversaries) are very interesting, but at the same time need to be confirmed by more extensive experiments. + +Indeed, if the above are addressed, the paper would make a strong contribution to the area of RL. + +",ICLR2021, +SkIWEkaSM,1517250000000.0,1517260000000.0,292,BJ8c3f-0b,BJ8c3f-0b,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This work develops importance weighted autoencoder-like training but with sequential Monte Carlo. The paper is interesting, well written and the methods are very timely (there are two highly related concurrent papers - Naesseth et al. and Maddison et al.). Initially, the reviewers shared concerns about the technical details of the paper, but the authors appear to addressed those and two reviews have been raised accordingly. There is one outlier review (two 7s and one 3). The 3 is the least thorough and has the lowest confidence (2) so that review is being weighted accordingly. + +This appears to be a timely and interesting paper that is interesting to the community and warrants publication at ICLR. + +Pros: +- Well written and clear +- An interesting approach +- Neat technical innovations +- Generative deep models are of great interest to the community (e.g. Variational Autoencoders) + +Cons: +- Could include a better treatment of recent related literature +- Leaves a variety of open questions about specific details (i.e. from the reviews)",ICLR2018, +COcJdfz0YL,1610040000000.0,1610470000000.0,1,TCAmP8zKZ6k,TCAmP8zKZ6k,Final Decision,Reject,"The authors propose a novel approach to a dialog-based automated medical diagnosis, and present promising empirical results. The focus of this work is on robustness and reliability besides just the accuracy of diagnosis, which appears to be an important aspect in medical applications. The paper is clearly written and well-motivated. However, in there are still several concerns raised by the reviewers, and the paper may require a bit of extra work to be ready for publication.",ICLR2021, +BJqM81pBG,1517250000000.0,1517260000000.0,739,rkZzY-lCb,rkZzY-lCb,ICLR 2018 Conference Acceptance Decision,Reject,"The paper presents an approach for learning continuous-valued vector representations combining multiple input feature sets of different types, in both unsupervised and supervised settings. The revised paper is a merger of the original submission and another ICLR submission. This meta-review takes into account all of the comments on both submissions and revisions. + +The merged paper is an improvement over the two separate ones. However, the contribution over previous work is still a bit unclear. It still does not sufficiently compare to/discuss in the context of other recent work on combining multiple feature groups. + +The experiments are also quite limited. The idea is introduced as extremely general, but the experiments focus on a small number of specific tasks, some of them non-standard.",ICLR2018, +fJ34mCeZBO,1576800000000.0,1576800000000.0,1,BJgQfkSYDS,BJgQfkSYDS,Paper Decision,Accept (Poster),"The paper makes a solid contribution to understanding the convergence properties of policy gradient methods with over-parameterized neural network function approximators. This work is concurrent with and not subsumed by other strong work by Agarwal et al. on the same topic. There is sufficient novelty in this contribution to merit acceptance. The authors should nevertheless clarify the relationship between their work and the related work noted by AnonReviewer2, in addition to addressing the other comments of the reviewers.",ICLR2020, +Ah8mGqhKd8f,1610040000000.0,1610470000000.0,1,9SS69KwomAM,9SS69KwomAM,Final Decision,Accept (Poster),"The paper proposes a method for solving challenging sparse reward problems by performing task reduction followed by self-imitation learning from solution trajectories to the reduced tasks. The core innovation seems to me to be the uses of the reduction search, which is essentially a form of recursive subgoal selection, but where the subgoals are sure to be achievable as assessed by leveraging the learned value function. This idea seems rather general, though its use is strongly facilitated in this paper by definition of the space (i.e. that object target is the space, is pre-specified; there is only one, rather limited result on a pixel-based task). + +Note: Another submission to this conference also explores a quite similar idea to the task reduction proposed in this paper -- see ""Divide-and-Conquer Monte Carlo Tree Search"". The breaking down of the problem into sub-problems using the value function is similar, but the details of how the papers proceed from there is quite distinct. + +This is a difficult meta-review decision due to the fairly mixed reviews, coupled with limited engagement in the discussion phase. Two reviewers felt the paper was solid and could be accepted (R1 and R2 with scores 7 and 6 respectively). R3 gave a borderline review that leaned towards reject (score 5). R3 replied to the initial author response, which provided helpful feedback to the authors. Ultimately, in my assessment, the authors did a fairly thorough job of addressing some of the points raised by R3, including by adding an additional comparison even where they didn't agree with the reviewer. R4 assigned the paper the lowest score of 3. The authors provided a lengthy reply to this review asserting that the review may have reflected misunderstanding of paper details, but the reviewer did not respond to the authors. + +Two core issues raised about this paper relate to the definition of the space for subgoals and the limited difficulty of the tasks. However, this method does not claim to be entirely ignorant of the task space so I don't see the fact that they do include some domain knowledge in designing the goal space to be totally undermining of the method. They focus on the complementary issue of how to break down difficult problems into sub-problems. While it would be considerably more impressive if the goal space were learned, I think this harder version of the problem remains a fundamental and deep problem within AI, so it seems to me too much to ask of the present paper (especially given that it was not the stated focus of the paper). And while the tasks explored in the paper are a little contrived (some repetitive motifs and designed with a relatively small search space over subtasks), these problems do have some complex structure. Compared to many works in this field, I applaud the authors for engaging with problems with both long-horizon task structure as well as complex high-DoF continuous control component. + +While I agree with some of the concerns raised, my overall assessment is that I find the contributions sufficiently innovative and substantial to justify acceptance. The authors proposed a specific innovation and evaluated that innovation. Insofar as their innovation is somewhat general, I don't think this paper can be the last word on how well it compares with the diverse approaches it could be set against. And while the experiments are not definitive, I do think they do constitute a fairly ambitious initial validation of the core idea. +",ICLR2021, +Bkld_Jukg4,1544680000000.0,1545350000000.0,1,Bkeuz20cYm,Bkeuz20cYm,Interesting ideas with better motivation needed for soundness,Reject,"The reviewers agreed that there are some promising ideas in this work, and useful empirical analysis to motivate the approach. The main concern is in the soundness of the approach (for example, comments about cumulative learning and negative samples). The authors provided some justification about using previous networks as initialization, but this is an insufficient discussion to understand the soundness of the strategy. The paper should better discuss this more, even if it is not possible to provide theory. The paper could also be improved with the addition of a baseline (though not necessarily something like DeepStack, which is not publicly available and potentially onerous to reimplement). ",ICLR2019,4: The area chair is confident but not absolutely certain +ewQScuPBCY,1576800000000.0,1576800000000.0,1,BkeYSlrYwH,BkeYSlrYwH,Paper Decision,Reject,"The paper introduces an ensemble of RL agents that share knowledge amongst themselves. Because there are no theoretical results, the experiments have to carry the paper. The reviewers had rather different views on the significance of these experiments and whether they are sufficient to convincingly validate the learning framework introduced. Overall, because of the high bar for ICLR acceptance, this paper falls just below the threshold. +",ICLR2020, +B1xa2zLul,1486400000000.0,1486400000000.0,1,SkuqA_cgx,SkuqA_cgx,ICLR committee final decision,Invite to Workshop Track,"The reviewers (and I) welcome new approaches to evaluation, and this work appears to be a sound contribution to that space. I suggest that, because the paper's incremental findings are of somewhat narrower interest, it is a good fit for a workshop, so that the ideas can be discussed at ICLR and further developed into a more mature publication.",ICLR2017, +ryW82MUOe,1486400000000.0,1486400000000.0,1,HJ9rLLcxg,HJ9rLLcxg,ICLR committee final decision,Invite to Workshop Track,"This paper proposes to regularize neural networks by adding synthetic data created by interpolating or extrapolating in an abstract feature space, learning by an autoencoder. + + The main idea is sensible, and clearly presented and motivated. Overall this paper is a good contribution. However, the idea seems unlikely to have much impact for two reasons: + - It's unclear when we should expect this method to help vs hurt + - Relatedly, the method has a number of hyperparameters that it's unclear how to set except by cross-validation. + + We also want to remark that other regularization methods effectively already do closely related things. Dropout, gradient noise, and Bayesian methods, for instance, effectively produce 'synthetic data' in a similar way when the high-level weights of the network are perturbed.",ICLR2017, +dghGW6UNJav,1610040000000.0,1610470000000.0,1,_CrmWaJ2uvP,_CrmWaJ2uvP,Final Decision,Reject,"This paper proposes a new RNN architecture called Dynamic RNN which is based on dynamic system identification. + +Reviewers questioned the expressivity of the proposed model, practical application/impact of the proposed model, and interpretability of the proposed model. Even though the authors attempted to convince the reviewers, 3 out of 4 reviewers think that this work is not ready for publication. + +Specifically, R4 recommends 5 ways to strengthen the paper. I recommend the authors to incorporate this feedback and make a stronger resubmission in the future.",ICLR2021, +Byi5Vk6Hz,1517250000000.0,1517260000000.0,415,SySaJ0xCZ,SySaJ0xCZ,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"The paper proposes a method for architecture search using network morphisms, which allows for faster search without retraining candidate models. The results on CIFAR are worse than the state of the art, but reasonably competitive, and achieved using limited computation resources. It would have been interesting to see how the method would perform on large datasets (ImageNet) and/or other tasks and search spaces. I would encourage the authors to extend the paper with further experimental evaluation.",ICLR2018, +3cAD2ZgivY,1610040000000.0,1610470000000.0,1,JzG0n48hRf,JzG0n48hRf,Final Decision,Reject,"This paper presents a method to improve the calibration of neural networks on out-of-distribution (OOD) data. + +The authors show that their method can be applied post-hoc to existing methods and that it improves calibration under distribution shift using the benchmark in Ovadia et al. 2019. + +However, reviewers felt that the theoretical justification for why this works is unclear (see detailed comments by R1 and R4), and some of the choices are not well-justified. Revising the paper to address these concerns with additional theoretical and/or empirical justifications should improve the clarity and strengthen the paper. + +I encourage the authors to revise and resubmit to a different venue. +",ICLR2021, +6I3VGjs2EtG,1610040000000.0,1610470000000.0,1,Oc-Aedbjq0,Oc-Aedbjq0,Final Decision,Reject,"This paper proposes a channel pruning method to compress and accelerate pre-trained CNNs. +The reviewers suggest further analysis of the experimental results to help explain the gains in performance, as well as point out some errors in the formulation. The paper is also found similar to meta-pruning method. The authors are encouraged to re-submit the paper after adding the analysis and improving the related work section. ",ICLR2021, +rkns7yTrG,1517250000000.0,1517260000000.0,213,r1iuQjxCZ,r1iuQjxCZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The paper contributes to a body of empirical work towards understanding generalization in deep learning. They do this through a battery of experiments studying ""single directions"" or selectivity of small groups of neurons. The reviewers that have actively participated agree that the revision is of high quality, impact, originality, and significance. The issue of a lack of prescriptiveness was raised by one reviewer. I agree with the majority that this is not necessary, but nevertheless, the revision makes some suggestions. I urge the authors to express the appropriate amount of uncertainty regarding any prescriptions that have not been as thoroughly vetted! + +",ICLR2018, +aHszb9ik8BW,1642700000000.0,1642700000000.0,1,cggphp7nPuI,cggphp7nPuI,Paper Decision,Reject,"The paper proposes Reasoning-Modulated Representations (RMR). That is, it incorporates how to incorporate (structure) prior knowledge (such as a law in physics) into a pre-trained reasoning modules, and investigates how doing so shapes the discovered representations in a number of self-supervised learning settings from pixels. The reviews and (short) discussion have presented salient arguments about the suitability of the paper for publication at this stage. One review argues that the ""methodological contribution is minimal,"" another one is asking for ""systematic evaluation"" of the main claims made. Moreover, while we all agree that the direction is interesting, the RMR approach presented is not shown to ""scale well"" (yet), as pointed out by one review. This, however, is important since the general idea that prior knowledge shapes the representation learned is common wisdom in the literature. Indeed, one may now argue that the paper is much more about ""how best to combine pixel-based deep learning and neural algorithmic reasoning algorithms"" as one reviewer puts it. From this perspective the ATARI experiments are more interesting but here the benefit compared to C-SWM seems to be marginal and one should compare to other deep baseline conditions on the RAM; the significance is not looking at the difference in score and degree in freedoms but just the number of wins. Additionally, there should be other baselines that directly make use of more structured models (structure = prior knowledge, e.g., HMMs or some other way to have bit of a memory), other datasets (where no access to RAM exists) as well as a discussion of other approaches that combines (combinatorial) reasoning with pixel-based deep learning. That is, while pushing for a more high-level contributions is fine, this also requires some more illustrations and discussion of the broader context. Therefore, my overall recommendation is reject at this stage of the paper.",ICLR2022, +fJJ0Z_ZsGq4,1610040000000.0,1610470000000.0,1,ueiBFzt7CiK,ueiBFzt7CiK,Final Decision,Reject,"This paper proposes a method for automatically discovering graph algorithms using GNNs. In general, the reviewers find the paper well-written, and the problem and the approach interesting. However, there is a concern on the practical usefulness of proposed method as shown in the following comments: “My main concerns are on Q2, i.e., the practical usefulness of the algorithm”[R1]; “It sounds like the proposed model is hard to generalize to different datasets” [R3]; “The proposed explainer does not generate practically useful outputs for discovering new algorithms”[R4]. ",ICLR2021, +6_0BOx22n-hr,1642700000000.0,1642700000000.0,1,Vr_BTpw3wz,Vr_BTpw3wz,Paper Decision,Accept (Poster),"The authors study the problem of open-ended knowledge-grounded natural language generation, in the context of free-form QA or knowledge-grounded dialogue, focusing on improving the retrieval component of the retrieval-augmented system. By retrieving more relevant passages, the generations are more grounded in retrieved passages. + +Pros: ++ The paper is clearly written and motivated. ++ Presents a straightforward approach that shows improvement over a strong baseline. ++ A strong paper focuses on a rather under-explored problem of knowledge-grounded open-ended generation, proposing novel objective, significant empirical improvements on two datasets in multiple metrics. ++ The authors included human evolution results to support their findings. ++ The authors did a good job addressing several questions raised during review period and added several new experiment results and discussions to strengthen their findings. The reviewer team was generally satisfied. + +Cons: + ++ Several related work on knowledge guided dialog response generation is missing in the paper. Although the paper's focus is on retrieval based QA systems, the main focus is on open domain generation, which has overlaps with dialog response generation research. So the authors should cite the following papers in their paper: [1] Dinan, Emily, et al. ""Wizard of wikipedia: Knowledge-powered conversational agents."" arXiv preprint arXiv:1811.01241 (2018). [2] Zhou, Kangyan, Shrimai Prabhumoye, and Alan W. Black. ""A dataset for document grounded conversations."" arXiv preprint arXiv:1809.07358 (2018). [3]Zhan, Haolan, et al. ""CoLV: A Collaborative Latent Variable Model for Knowledge-Grounded Dialogue Generation."" Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. [4]Zhao, Xueliang, et al. ""Knowledge-grounded dialogue generation with pre-trained language models."" arXiv preprint arXiv:2010.08824 (2020). ++ There are several related work concerning with generation of a response given a relatively small set of evidence text such as the following ones: [5]Lian, Rongzhong, et al. ""Learning to select knowledge for response generation in dialog systems."" arXiv preprint arXiv:1902.04911 (2019). [6]Kim, Byeongchang, Jaewoo Ahn, and Gunhee Kim. ""Sequential latent knowledge selection for knowledge-grounded dialogue."" arXiv preprint arXiv:2002.07510 (2020). Although these work do not include a retrieval part, the authors should cite and discuss similarities and differences to [5] & [6] in their paper.",ICLR2022, +l0n-FmsnoHo,1642700000000.0,1642700000000.0,1,1Z5P--ntu8,1Z5P--ntu8,Paper Decision,Reject,"This paper proposes an improved mean-field analysis for multi-player residual networks. Compared with prior works, the proposed analysis removes a full support assumption needed in prior works. The authors have addressed some of the reviewers’ concerns by adding comparisons with the existing analysis of ResNet in the NTK regime, and a more detailed comparison with Ding et al. 2021. While this paper gathers some support from a reviewer, there is still concern that the novelty of this paper is not significant, especially given that the analysis is heavily built upon prior works. I think this paper can benefit from providing a proof sketch to highlight the key difference between the new analysis and existing analyses, or explicitly demonstrating the key proof technique/technical lemmas that enable the removal of the full support assumption. This paper might be a strong work after careful revision.",ICLR2022, +M5wvDoC7S1T,1610040000000.0,1610470000000.0,1,v-9E8egy_i,v-9E8egy_i,Final Decision,Reject,"This paper proposes a GNN architecture for multi-relational data to better address long-range dependencies in graphs. The proposed GR-GAT model is a variant of graph attention networks (GAT) with, among other modifications, vector-based edge type embeddings and GRU-type updates. Results are presented on AIFB, AM, and on synthetic benchmarks. + +The reviewers agreed that this is an interesting contribution and that the results on the chosen synthetic benchmarks are insightful, but that experimental evaluation on real data and overall motivation of the architecture is lacking. In the rebuttal period, the authors have improved the writing and strengthened the motivation of the paper. However, given the limited amount of time, the authors were not able to sufficiently address the lack of experimental validation on real data (beyond AIFB & AM). I am inclined to agree with the reviewers that this paper needs significantly more work on the experimental evaluation, the overall presentation needs to be refined and it needs to more carefully analyse the effect of each individual architectural modification to meet the bar for acceptance. +",ICLR2021, +Wis-BzcTr,1576800000000.0,1576800000000.0,1,rkg98yBFDr,rkg98yBFDr,Paper Decision,Reject,"This paper combines a well-known, recently proposed unsupervised representation learning technique technique with a class-conditional negative log likelihood and a squared hinge loss on the class-wise conditional likelihoods, and proposes to use the resulting conditional density model for generative classification. The empirical work appears to validate the claim that their method leads to good out of distribution detection, and better performance using a rejection option. The adversarial defense results are less clear. Reporting raw logits is a strange choice, and difficult to interpret; the table is also difficult to read, and this method of reporting makes it difficult to compare against existing methods. + +The reviewers generally remarked on presentation issues. R1 asked about the contribution of various loss terms, a matter I feel is underexplored in this work, and the authors mainly replied with a qualitative description of loss behaviour in the joint system, which I don't believe was the question. R1 also asked about the choice of thresholds and the issues of fairness of comparison regarding model capacity, neither of which seemed adequately addressed. R3 remarked on the clarity being lacking, and also that ""Generative modeling of representations is novel, afaik."" (It is not; see, for example, the VQ-VAE line of work where PixelCNN priors are fit on top of representations, and layer-wise pre-training works of the mid 2000s, where generative models were frequently fit on greedily trained feature representations, sometimes in conjunction with a joint generative model of class labels). R2's review was very brief, and with a self-reported low confidence, but their concerns were addressed in a subsequent update. + +There are three weaknesses which are my grounds for recommending rejection. First, this paper does a poor job of situating itself in the wider body of literature on classification with rejection, which dates to at least the 1970s (see Bartlett & Wengkamp, 2006 and the references therein). Second, the empirical work makes little comparison to other methods in the literature; baselines on clean data are self-generated, and the paper compares to no other adversarial defense proposals. In a minor drawback, ImageNet results are also missing; given that one of the purported advantages of the method is scalability, a large scale benchmark would have strengthened this claim. Third, no ablation study is undertaken that might give us insight into the role of each term of the loss. Given that this is a straightforward combination of well-understood techniques, a fully empirical paper ought to deliver more insight into the combination than this manuscript has.",ICLR2020, +OY4uYTwyvg,1576800000000.0,1576800000000.0,1,rygHq6EFwr,rygHq6EFwr,Paper Decision,Reject,"This paper studies the “suspended animation limit” of various graph neural networks (GNNs) and provides some theoretical analysis to explain its cause. To overcome the limitation, the authors propose Graph Residual Network (GRESNET) framework to involve nodes’ raw features or intermediate representations throughout the graph for all the model layers. The main concern of the reviewers is: the assumption made for theoretical analysis that the fully connected layer is identical mapping is too stringent. The paper does not gather sufficient support from the reviewers to merit acceptance, even after author response and reviewer discussion. I thus recommend reject.",ICLR2020, +SkxWEtFxg4,1544750000000.0,1545350000000.0,1,H1xQSjCqFQ,H1xQSjCqFQ,"Novel idea, but requires more convincing experiments.",Reject,"The reviewers overall agree that excitation dropout is a novel idea that seems to produce good empirical performance. However, they remain optimistic, but unconvinced by the experiments in their current form. The authors have done an admiral job of addressing this through more experiments, including providing error bars, however it seems as though the reviewers still require more. I would recommend creating tables of architecture x dropout technique, where dropout technique includes information dropout, adaptive dropout, curriculum dropout, and standard dropout, across several standard datasets. Alternatively, the authors could try to be more ambitious and classify Imagenet. Essentially, it seems as though the current small-scale datasets have become somewhat saturated, and therefore the bar for gauging a new method on them is higher in terms of experimental rigor. This means the best strategy is to either try more difficult benchmarks, or be extremely thorough and complete in your experiments. + +Regarding the wide resnet result, while I can appreciate that the original version published with higher errors, the later draft should still be taken into account as it has a) been out for a while now and b) can been reproduced in open source implementations (e.g., https://github.com/szagoruyko/wide-residual-networks).",ICLR2019,4: The area chair is confident but not absolutely certain +LRtsOaVhi-b,1642700000000.0,1642700000000.0,1,Xo0lbDt975,Xo0lbDt975,Paper Decision,Accept (Poster),"This paper presents a method to handle class imbalance in federated learning, while accounting for data heterogeneity and privacy. The key idea is to solve a constrained optimization problem where the difference between the global and local objective values has to be less than some parameter $\epsilon$. The paper proposes a primal-dual optimization algorithm called CLIMB to solve this constrained FL problem. The paper presents a theoretical analysis of the algorithm, as well as experimental results. + +All the reviewers found the formulation interesting and novel and gave a positive assessment of the paper. Reviewer obo5 had some concerns about whether the optimization problem is improving fairness and getting reduced class imbalance as a side benefit or whether it is directly addressing class imbalance. After discussion with the authors, their concerns were partially addressed. Reviewer 8nbc had concerns about the assumptions and theoretical analysis. Their concerns were also mostly addressed by the authors during the discussion phase. I suggest the authors to also address Reviewer u6Lr and Reviewer Mp3G's concerns about experimental results and citing related work respectively when they revise the paper. + +Overall, I recommend acceptance of the paper, and strongly encourage the authors to take the reviewers' suggestions about 1) fairness connections, 2) privacy connections, 3) theoretical analysis, 4) experimental results, and 5) prior work into account when revising it.",ICLR2022, +MI7lmeGw5-N,1610040000000.0,1610470000000.0,1,F3s69XzWOia,F3s69XzWOia,Final Decision,Accept (Oral),"A novel second order nonlinear oscillator RNN architecture is proposed, analyzed, and evaluated in this paper. The results are solid and impactful. Authors and expert reviewers showed exemplary interactions with each other, improving the manuscript in significant ways. All four reviewers overwhelmingly recommended accept. I recommend that this paper be selected as an oral presentation.",ICLR2021, +rkSyLJ6SM,1517250000000.0,1517260000000.0,695,H1O0KGC6b,H1O0KGC6b,ICLR 2018 Conference Acceptance Decision,Reject,"* the proposed fine-tuning of only the last layer is not novel enough +* experiments are not sufficient to isolate the differences to support the benefit of post-training +",ICLR2018, +H1l0xiKlxE,1544750000000.0,1545350000000.0,1,Hye64hA9tm,Hye64hA9tm,"Interesting direction, but no compelling new method yet",Reject,"This paper addresses important general questions about how linear classifiers use features, and about the transferability of those features across tasks. The paper presents a specific new analysis method, and demonstrates it on a family of NLP tasks. + +All four reviewers (counting the emergency fourth review) found the general direction of research to be interesting and worthwhile, but all four shared several serious concerns about the impact and soundness of the proposed method. + +The impact concerns mostly dealt with the observation that the method is specific to linear classifiers, and that it's only applicable to tasks for which a substantial amount of training data is available. + +As the AC, I'm willing to accept that it should still be possible to conduct an informative analysis under these conditions, but I'm more concerned about the soundness issues: The reviewers were not convinced that a method based on the counting of specific features was appropriate for the proposed setting (due to rotation sensitivity, among other issues), and did not find that the experiments were sufficiently extensive to overcome these doubts.",ICLR2019,4: The area chair is confident but not absolutely certain +ymucTkaBIHL,1610040000000.0,1610470000000.0,1,qpsl2dR9twy,qpsl2dR9twy,Final Decision,Accept (Poster),"The authors study co-ordination in multi-agent systems. Specifically they propose a scheme where agents model future trajectories through the environment dynamics and other agents' actions, they then use this to form a plan which forms the agents' intention which is then communicated to the other agents. + +The major concerns raised by the reviewers were around novelty, lack of ablations and significance of results as improvements were modest. During the rebuttal, the authors have extended their work with ablations and have conducted a statistical test. While it is true the current results present a small improvement, i think this is an interesting contribution in the field of emergent communication",ICLR2021, +k_3OI_Cf8EV,1642700000000.0,1642700000000.0,1,cKTBRHIVjy9,cKTBRHIVjy9,Paper Decision,Reject,"This paper develops a new mechanism SubMix that provides next-token prediction under a variation of the differential privacy constraint. There is disagreement among the reviewers when assessing the quality of this work. Even though the study of private predictions in large language models is new, the reviewers raised several issues in the proposed approach. First, the formulation of partition-level DP created confusion about the privacy guarantees provided by the mechanism. Given the similarity to PATE, it might be useful to articulate if there is any difference between the privacy guarantee in this paper and the one of PATE. Second, the authors might want to further clarify the reason for having two sub-parts, which has also created some confusion. Even after reading the author's response and the updated revision, the AC still could not understand the relevant privacy argument. In summary, the paper may require further clarification and revision before it is ready for publication.",ICLR2022, +SkgzCLNiyV,1544400000000.0,1545350000000.0,1,HkzyX3CcFQ,HkzyX3CcFQ,"metareview: limited novelty, unconvincing experiments",Reject,"This paper explores the addition of feedback connections to popular CNN architectures. All three reviewers suggest rejecting the paper, pointing to limited novelty with respect to other recent publications, and unconvincing experiments. The AC agrees with the reviewers. + +",ICLR2019,5: The area chair is absolutely certain +g881K34zhI,1576800000000.0,1576800000000.0,1,HygQ7TNtPr,HygQ7TNtPr,Paper Decision,Reject,"The submission proposes methodology for quantizing neural networks. The reviewers were unanimous in their opinion that the paper is not suitable for publication at ICLR. Concerns included novelty over previous works, comparatively weak baseline comparisons, and overly restrictive assumptions.",ICLR2020, +ByebsVQVlV,1544990000000.0,1545350000000.0,1,rygZJ2RcF7,rygZJ2RcF7,reasonable clarity and quality but unclear significance,Reject,"This was a borderline paper, as reviewers generally agreed that the method was a new method that was appropriately explained and motivated and had reasonable experimental results. The main drawbacks were that the significance of the method was unclear. In particular, the method might be too inflexible due to being based on a hard-coded rule, and it is not clear why this is the right approach relative to e.g. GANs with a modified training objective). Reviewers also had difficulty assessing the significance of the results on biological datasets. While such results certainly add to the paper, the paper would be stronger if the argument for significance could be assessed from more standard datasets. + +A note on the review process: the reviewers initially scored the paper 6/6/6, but the review text for some of the reviews was more negative than a typical 6 score. To confirm this, I asked if any reviewers wanted to push for acceptance. None of the reviewers did (generally due to feeling the significance of the results was limited) and two of the reviewers decided to lower their scores to account for this.",ICLR2019,2: The area chair is not sure +BJe1MUrlg4,1544730000000.0,1545350000000.0,1,SkghBoR5FX,SkghBoR5FX,Aggregate rating of the paper just too far away from acceptance threshold.,Reject,"With scores of 5, 4 and 3 the paper is just too far away from the threshold for acceptance.",ICLR2019,4: The area chair is confident but not absolutely certain +zxppON5aoXu,1642700000000.0,1642700000000.0,1,FNSR8Okx8a,FNSR8Okx8a,Paper Decision,Reject,"This work provides a theoretical analysis of Prioritized Experience Replay (PER ) in a supervised learning setting, points out limitations of PER and proposes a model-based approach to address these shortcomings for continuous control problems. + +Strengths: +----------- +The overall problem was motivated well +Reviewers agree that this proposed algorithm has promise +Overall the paper is well written +a diverse set of experiments is provided + +Weaknesses: +--------------- +reviewers point out some clarity issues +The theoretical analysis is performed in a supervised learning setting, and it is unclear how the resulting analysis transfers to the RL setting +There are some concerns (theoretical/technical) wrt to the proposed algorithm. +The analysis of the experiments is lacking in depth. For instance, no analysis of why the proposed algorithm outperforms very related baselines. Furthermore, it's unclear why for the autonomous driving experiment the algorithms achieve the same return, but the proposed method leads to less crashes. + +Rebuttal: +---------- +The authors have addressed many of the clarity issues. However, I agree with the reviewers theoretical concerns and deeper analysis requests were not addressed in a significant manner. + +Summary: +------------ +Overall this manuscript investigates an important problem and provides a promising algorithm. However, some theoretical/technical concerns remain and a deeper analysis of results is required. Hence my recommendation is that in it's current form the manuscript is not quite ready yet for publication.",ICLR2022, +BklKhP9zxE,1544890000000.0,1545350000000.0,1,SkgVRiC9Km,SkgVRiC9Km,"The ideas are quite novel and promising, but there is no sufficient justification of claims made",Reject,"This paper suggests a method for defending against adversarial examples and out-of-distribution samples via projection onto the data manifold. The paper suggests a new method for detecting when hidden layers are off of the manifold, and uses auto encoders to map them back onto the manifold. + +The paper is well-written and the method is novel and interesting. However, most of the reviewers agree that the original robustness evaluations were not sufficient due to restricting the evaluation to using FGSM baseline and comparison with thermometer encoding (which both are known to not be fully effective baselines). + +After rebuttal, Reviewer 4 points out that the method offers very little robustness over adversarial training alone, even though it is combined with adversarial training, which suggests that the method itself provides very little robustness. ",ICLR2019,5: The area chair is absolutely certain +SJgwmbWQgN,1544910000000.0,1545350000000.0,1,B1xFxh0cKX,B1xFxh0cKX,Related work is overlooked and not compared with.,Reject,"This paper proposes a “guided” evolution strategy method where the past surrogate gradients are used to construct a covariance matrix from which future perturbations are sampled. The bias-variance tradeoff is analyzed and the method is applied to real-world examples. + +The method is not entirely new, and discussion of related work as well as comparison with them is missing. The main contribution is in the analysis and application to real-world examples, and the paper should be rewritten focusing on these contributions, while discussing existing work on this topic thoroughly. + +Due to these issue, I recommend to reject this paper. +",ICLR2019,5: The area chair is absolutely certain +B1pcI16rG,1517250000000.0,1517260000000.0,848,ryF-cQ6T-,ryF-cQ6T-,ICLR 2018 Conference Acceptance Decision,Reject,"This paper seeks to integrate tensor-based models from physics into machine learning architectures. The two main objections to this paper are first that, despite honest (I assume) efforts from the authors, it remains somewhat hard to understand without substantial background knowledge of physics. Second, that the experiments focus on MNIST and CIFAR image classification tasks, two datasets where linear models perform with high accuracy, and as such are unsuitable for properly evaluating the claims made about the models in this paper. Unfortunately, it does not seem there is sufficient enthusiasm for this paper amongst the reviewers to justify its inclusion in the conference.",ICLR2018, +Iy_3LXxrrT,1610040000000.0,1610470000000.0,1,D62nJAdpijt,D62nJAdpijt,Final Decision,Reject,"The paper presents a new attack combining trojans (backdoor attacks) with adversarial examples. The new attack is triggered only if both a trojan and the respective adversarial perturbation are present. Experimental evaluation demonstrates that neither adversarial training (as a defense against adversarial examples) nor defenses against backdoors are effective against the new attack. + +The proposed method is original albeit somewhat incremental (combination of two well-known attack techniques). The main weakness of the paper, however, is its threat model. It is not clearly explained why the proposed attack would make sense for an attacker. Backdoor attacks are typically executed by model creators in order to force certain decisions on certain data. On the other hand, adversarial examples are generated by model users (or abusers) who have an interest in wrong model predictions (e.g., decisions made in their favor). The paper does not provide a convincing use-case in which such combined attacks would be feasible. + +Furthermore, paper's clarity can be improved. The introduction does not present a clear picture of poisoning attacks. It essentially treats poisoning attacks as equivalent to backdoor/trojan attacks. This is not true and a substantial body of research (starting from the seminal paper by Barreno et al. in 2006) has addressed indiscriminate poisoning attacks aimed at general deterioration of classifier performance. A distinction between a clean-label and a poisoned-label attacks is also not clearly presented. The notation of Section 3 is rather complex and confusing. +",ICLR2021, +tl6hE5oLCD,1576800000000.0,1576800000000.0,1,HJepXaVYDr,HJepXaVYDr,Paper Decision,Accept (Poster),The paper proposed using stochastic AUC for dealing with imbalanced data. This paper provides useful insights and experiments on this important problem. I recommend acceptance.,ICLR2020, +S1eOyvVPxE,1545190000000.0,1545350000000.0,1,HyxOIoRqFQ,HyxOIoRqFQ,Interesting original idea but unclearly presented and with a too limited experimental validation.,Reject,"The paper presents an original approach to replace inefficient discrete autoregressive posterior sampling by a parallel sampling procedure based on fixed-point iterations reminiscent of normalizing flow, but for discrete variables. +All reviewers liked the idea, and found that it was an original and promising approach. But all agreed the paper was poorly written and very unclear. +All also found the experimental section lacking, in clarity and scope. + +Authors did not provide a rebuttal. + +Overall a potentially really promising idea, but the paper is not yet ripe. +",ICLR2019,4: The area chair is confident but not absolutely certain +s4YNvCtyPfS,1610040000000.0,1610470000000.0,1,N5Zacze7uru,N5Zacze7uru,Final Decision,Reject,"The authors propose an MPC based approach for learning to control systems with continuous state and actions - the dynamics, control policy and a Lyapunov function are parameterized as neural networks and the authors claim to derive stability certificates based on the Lyapunov function. + +The reviewers raised several serious technical issues with the paper as well as the lack of clarity in the presentation of the main technique in the initial version of the paper. While the clarity concerns were partially addressed during the rebuttal, the technical concerns (in particular those raised by reviewer 1) remain unaddressed - the stability certificate derived is questionable due to the fact that sampling based approaches to certifying that a function is a valid Lyapunov function are insufficient to derive any stability guarantee. Further, the experimental results are only demonstrated on relatively simple dynamical systems. Hence I recommend rejection. + +However, all reviewers agree that the ideas presented in the paper are potentially interesting - I would suggest that the authors consider revising the paper to address the feedback on technical issues and submit to a future venue. + +",ICLR2021, +Ap1QRvBcr4,1576800000000.0,1576800000000.0,1,rkllGyBFPH,rkllGyBFPH,Paper Decision,Accept (Poster),"This paper studies the training of over-parameterized two-layer neural networks by considering high-order Taylor approximation, and randomizing the network to remove the first order term in the network’s Taylor expansion. This enables the neural network training go beyond the recently so-called neural tangent kernel (NTK) regime. The authors also established the optimization landscape, generalization error and expressive power results under the proposed analysis framework. They showed that when learning polynomials, the proposed randomized networks with quadratic Taylor approximation outperform standard NTK by a factor of the input dimension. This is a very nice work, and provides a new perspective on NTK and beyond. All reviewers are in support of accepting this paper. ",ICLR2020, +BJxtrZEmeN,1544930000000.0,1545350000000.0,1,Syf9Q209YQ,Syf9Q209YQ,Borderline paper: reasonably good SSL results but limited novelty,Reject,"The paper proposes a method to perform manifold regularization for semi-supervised learning using GANs. Although the SSL results in the paper are competitive with existing methods, R1 and R3 are concerned about the novelty of the work in the light of recent manifold regularization SSL papers with GANs, a point that the AC agrees with. Given the borderline reviews and limited novelty of the core method, the paper just falls short of the acceptance threshold for ICLR. ",ICLR2019,5: The area chair is absolutely certain +umgejifkrAI,1610040000000.0,1610470000000.0,1,WtlM9p1bVAw,WtlM9p1bVAw,Final Decision,Reject,"This paper presents a continual learning method based on a novelty detection technique. All reviewers are concerned about various issues, especially, motivation, experiment, and presentation. One of the reviewers was initially positive about this paper but downgraded his/her score due to unresolved problems in the proposed method. Considering all the comments and communications with the authors, AC believes that this paper is not ready for publication yet.",ICLR2021, +BXtAz0FOce,1576800000000.0,1576800000000.0,1,HygS91rYvH,HygS91rYvH,Paper Decision,Reject,"The paper proposes to get universal adversarial examples using few test samples. The approach is very close to the Khrulkov & Oseledets, and the abstract for some reason claims that it was proposed independently, which looks like a very strange claim. Overall, all reviewers recommend rejection, and I agree with them.",ICLR2020, +o1OSI_qhUjS,1642700000000.0,1642700000000.0,1,wxVpa5z4DU1,wxVpa5z4DU1,Paper Decision,Reject,"The authors in the paper perform empirical studies to investigate the trade-off between accuracy and privacy (measured by membership inference attacks) in deep ensembles. They find out that the level of correct agreement among models is the most dominant factor that improves the performance of MI attacks in deep ensembles. They support their claim by visualizing the distribution shifts of correct agreement in train/test examples. They further implement a variety of existing defenses, such as differential privacy and regularizations, etc., to investigate the ​​effects of existing defense mechanisms. Overall, the paper is well-written and the experiments are well conducted. +While these findings are interesting, they do not reveal something useful or surprising about deep ensemble learning. It is not clear what the contribution is to the membership inference attack literature and private machine learning literature. they do not propose anything new to make the attacks stronger or defenses stronger.",ICLR2022, +Bk9Km1THM,1517250000000.0,1517260000000.0,189,By-7dz-AZ,By-7dz-AZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),The paper proposes evaluation metrics for quantifying the quality of disentangled representations. There is consensus among reviewers that the paper makes a useful contribution towards this end. Authors have addressed most of reviewers' concerns in their response.,ICLR2018, +ryZbbeOxeH,1576800000000.0,1576800000000.0,1,HkgsPhNYPS,HkgsPhNYPS,Paper Decision,Accept (Poster),The authors addressed the issues raised by the reviewers; I suggest to accept this paper.,ICLR2020, +skWJaSbatc,1576800000000.0,1576800000000.0,1,HJxVC1SYwr,HJxVC1SYwr,Paper Decision,Reject,"This paper focuses on finding universal adversarial perturbations, that is, a single noise pattern that can be applied to any input to fool the network in many cases. Further more, it focuses on the data-free setting, where such a perturbation is found without having access to data (images) from the distribution that train- and test data comes from. + +The reviewers were very conflicted about this paper. Among others, the strong experimental results and the clarity of writing and analysis were praised. However, there was also criticism of the amount of novelty compared to GDUAP, on the strong assumptions needed (potentially limiting the applicability), and on some weakness in the theoretical analysis. + +In the end, the paper seems in current form not convincing enough for me to recommend acceptance for ICLR. ",ICLR2020, +WDBg901Ci5,1576800000000.0,1576800000000.0,1,BJlBSkHtDS,BJlBSkHtDS,Paper Decision,Accept (Poster),"The paper proposed a new learnable activation function called Padé Activation Unit (PAU) based on parameterization of rational function. All the reviewers agree that the method is soundly motivated, the empirical results are strong to suggest that this would be a good addition to the literature. ",ICLR2020, +DViZLN3lFcB,1610040000000.0,1610470000000.0,1,RrIqhkFEpec,RrIqhkFEpec,Final Decision,Reject,"The paper introduces a new formulation for learning low-dimensional manifold representations via autoencoder mappings that are (locally) isometric by design. The key technical ingredient is the use of a particular (theoretically motivated) weight-tied architecture coupled with isometry-promoting loss terms that can be approximated via Monte Carlo sampling. Representative results on simple manifold learning experiments are shown in support of the proposed formulation. + +The paper was generally well-received; all reviewers appreciated the theoretical elements as well as the presentation of the ideas. + +However, there were a few criticisms. First, the fact that the approach requires Monte Carlo sampling in very high dimensions automatically limits its scope. Second, the experiments seemed somewhat limited to simple (by ICLR standards) datasets. Third and most crucially, the approach lacks a compelling-enough use case. It is not entirely clear what local isometry enables, beyond nice qualitative visualizations (and moreover, what the isometric autoencoder provides over other isometry-preserving manifold learning procedures such as ISOMAP). Some rudimentary results are shown on k-NN classification and linear SVMs, but the gains seem to be in the margins. + +The authors are encouraged to consider the above concerns (and in particular, identifying a unique use case for isometric autoencoders) while preparing a future revision.",ICLR2021, +Sklfa33Ak4,1544630000000.0,1545350000000.0,1,Bkxdqj0cFQ,Bkxdqj0cFQ,Limited contribution and no effort for rebuttal,Reject,"The reviewers and AC note the potential weaknesses of the paper in various aspects, and decided that the authors need more works to publish. ",ICLR2019,5: The area chair is absolutely certain +pPwBQLRkYSCS,1642700000000.0,1642700000000.0,1,HuaYQfggn5u,HuaYQfggn5u,Paper Decision,Accept (Poster),The paper makes some novel and interesting observation pertaining the relationship between data heterogeneity and personalization. Reviewers like the paper and ideas in general but raised several concerns. The rebuttal rectified several confusions and provided more clarification which convinced the reviewers that the paper is above bar for publication.,ICLR2022, +#NAME?,1642700000000.0,1642700000000.0,1,K1m0oSiGasn,K1m0oSiGasn,Paper Decision,Reject,"Four reviewers have reviewed this submission. Three of them recommended to reject the paper and one was marginally above the acceptance threshold. The authors have not responded to the criticisms or questions of reviewers. Among many concerns were the issues with the use of '`lean and single target object' images, lack of discussions on related models such as adaptive bilinear pooling and multi-domain pooling, lack of evaluations on datasets such as large-scale iNaturalist. Given the above criticisms and the lack of authors' response, this submission falls below the acceptance bar.",ICLR2022, +yjr2OGKbTVL_,1642700000000.0,1642700000000.0,1,ofLwshMBL_H,ofLwshMBL_H,Paper Decision,Reject,"This paper proposes an expansion strategy for both task agnostic and task-boundary aware CL. The authors demonstrate the quality of their method using two-standard scenarios with the Split-MNIST and CIFAR datasets. + +Enabling CL for task-agnostic and task-boundary aware is important and an active area of research. The proposed approach is an interesting method that adds an expert for each new task. Experts are then combined (Mixture of Experts) for prediction. One disadvantage of a MoE approach is that the model size and compute will grow linearly with the number of tasks. This effect is partly limited in the paper as the authors show that experts can be small neural networks. + +There was a bit of confusion in the original reviews regarding the exact setting this paper works under. As far as I understand this paper mostly deals with the class-incremental setting (task IDs available at training time, but not at test time). The task agnostic setting (task IDs never given) is also explored in Section 5.1. I think this confusion is partly a reflection of the state of the CL literature and the authors provided clear and concise replies to the reviewers. + +The main limitation that remains is regarding the experiments. I agree with the reviewers that the current experiments seem somewhat preliminary and showing results on larger scale datasets and/or compared to a wider diversity of baselines is important. Reviewer sgG4 made precise comments about this. Other minor comments by the reviewers including providing a detailed report of the memory usage and computational costs of the various methods (partly done in Figure 5.3). + +I think this method is interesting and could be impactful. I strongly encourage the authors to polish their manuscript and consider adding some of the additional empirical results that were suggested.",ICLR2022, +eA2eC42nsdE,1642700000000.0,1642700000000.0,1,lL3lnMbR4WU,lL3lnMbR4WU,Paper Decision,Accept (Poster),"the aim of this work is to produce an open-vocabulary detector. The approach is via knowledge distillation from existing large-scale V+L models, and the evaluation is based on novel classes with LVIS. The reviewers were generally happy with the work (approach and results), but there were substantial points of clarification during discussion that need to be properly integrated into the final manuscript.",ICLR2022, +HklRyXoZeE,1544820000000.0,1545350000000.0,1,HkgnpiR9Y7,HkgnpiR9Y7,Marginal novelty; the advantage over existing methods is not convincing enough,Reject,"The paper presents a method to learn inference mapping for GANs by reusing the learned discriminator's features and fitting a model over these features to reconstruct the original latent code z. R1 pointed out the connection to InfoGAN which the authors have addressed. R2 is concerned about limited novelty of the proposed method, which the AC agrees with, and lack of comparison to a related iGAN work by Zhu et al. (2016). The authors have provided the comparison in the revised version but the proposed method seems to be worse than iGAN in terms of the metrics used (PSNR and SSIM), though more efficient. The benefits of using the proposed metrics for evaluating GAN quality are also not established well, particularly in the context of other recent metrics such as FID and GILBO. +",ICLR2019,5: The area chair is absolutely certain +pQLbZ5t1cgI,1642700000000.0,1642700000000.0,1,OD_dnx57ksK,OD_dnx57ksK,Paper Decision,Reject,"This paper enhances Lagrangian neural networks by adding conservation of the angular and linear momenta. According to the reviewers, the technical contribution of the paper is marginal, it is a incremental change of an existing model, and it seems that there is some over claim on the generalization of the model to unseen systems. The theoretical contributions in the paper are not significant, and the experiments have not demonstrate the practical potential of the proposed model yet. After the reviewers provided their comments, the authors did not submit their rebuttals. Therefore, as a result, we do not think the paper is ready for publication at ICLR.",ICLR2022, +HNF76UX_5a,1576800000000.0,1576800000000.0,1,HJxDugSFDB,HJxDugSFDB,Paper Decision,Reject,"An actor-critic method is introduced that explicitly aims to learn a good representation using a stochastic latent variable model. There is disagreement among the reviewers regarding the significance of this paper. Two of the three reviewers argue that several strong claims made in the paper that are not properly backed up by evidence. In particular, it is not sufficiently clear to what degree the shown performance improvement is due to the stochastic nature of the model used, one of the key points of the paper. I recommend that the authors provide more empirical evidence to back up their claims and then resubmit.",ICLR2020, +rklZl5AbkN,1543790000000.0,1545350000000.0,1,r1xN5oA5tm,r1xN5oA5tm,Reject,Reject,All reviewers agree in their assessment that this paper does not meet the bar for ICLR. The area chair commends the authors for their detailed responses.,ICLR2019,4: The area chair is confident but not absolutely certain +GctiX4FllFE,1610040000000.0,1610470000000.0,1,pVwU-8cdjQQ,pVwU-8cdjQQ,Final Decision,Reject,"The reviewers appreciate the spatio-temporal formulation of amortised iterative inference. +However, the paper does not clearly state what is the end goal: if the end goal is video object segmentation, it should compared against other unsupervised object segmentation methods. If the goal is representation learning, it should evaluate the merit of the recovered representations, e.g. by fine-tuning them on some downstream task. ",ICLR2021, +OJtuCF7qFxR,1610040000000.0,1610470000000.0,1,AAes_3W-2z,AAes_3W-2z,Final Decision,Accept (Poster),This paper proposes a novel and interesting embedding of graphs emulating the Wasserstein distance. The experiments are good and the authors did a detailed answer taking into account the comments of the reviewer. The responses were appreciated and the AC recommends the paper to be accepted.,ICLR2021, +lDiHisdwflHU,1642700000000.0,1642700000000.0,1,4Ycr8oeCoIh,4Ycr8oeCoIh,Paper Decision,Accept (Poster),"This paper empirically studies when, why, and which pretrained GANs are useful. +All the reviewers are positive about this work, that they all consider very valuable for practitioners and the community. +First building intuition through toy examples, authors conduct a large-scale study of transfer learning in GANs (with the stylegan2). They propose a way to understand the relevance of a pre-trained generator and discriminator, as well as heuristics to select good initialization. +Overall, this paper makes a solid contribution that should to be accepted.",ICLR2022, +Bye5SPPWlV,1544810000000.0,1545350000000.0,1,SygQvs0cFQ,SygQvs0cFQ,accept,Accept (Poster),"as r1 and r2 have pointed out, this work presents an interesting and potentially more generalizable extension of the earlier work on introducing noise as regularization in autoregressive language modelling. although it would have been better with more extensive evaluation that goes beyond unsupervised language modelling and toward conditional language modelling, but i believe this is all fine for this further work to be left as follow-up. + +r3's concern is definitely valid, but i believe the existing evaluation set as well as exposition merit presentation and discussion at the conference, which was shared by the other reviewers as well as a programme chair.",ICLR2019,4: The area chair is confident but not absolutely certain +SkLkB1aHM,1517250000000.0,1517260000000.0,479,HkCvZXbC-,HkCvZXbC-,ICLR 2018 Conference Acceptance Decision,Reject,"The paper presents a layered image generation model (e.g., foreground vs background) using GANs. The high-level idea is interesting, but novelty is somewhat limited. For example, layered generation with VAE/GAN has been explored in Yan et al. 2016 (VAEs) and Vondrick et al. 2016 (GANs). In addition, there are earlier works for unsupervised learning of foreground/background generative models (e.g., Le Roux et al., Sohn et al.). Another critical problem is that only qualitative results on relatively simple datasets (e.g., MNIST, SVHN, CelebA) are provided as experimental results. More quantitative evaluations and additional experiments on more challenging datasets will strengthen the paper. + +* N. Le Roux, N. Heess, J. Shotton, J. Winn; Learning a generative model of images by factoring appearance and shape; Neural Computation 23(3): 593-650, 2011. +** Sohn, K., Zhou, G., Lee, C., & Lee, H. Learning and selecting features jointly with point-wise gated Boltzmann machines. ICML 2013. +",ICLR2018, +aSYuNR89idm,1610040000000.0,1610470000000.0,1,fpJX0O5bWKJ,fpJX0O5bWKJ,Final Decision,Reject,"This paper provides a new uncertainty measure of examples called ""Variance of Gradients"" (VoGs); it demonstrates that VoGs are correlated with mistakes, and can be useful for guiding optimization. + +On the positive side, the reviewers generally think that the ideas of this paper is nice and contribute to the research thrust in gradient-based uncertainty. In addition, the paper provides valuable empirical insights. + +However, the reviewers also pointed out a few important limitations: +- A more thorough comparison to prior methods is needed to convince the readers for actual usage. There are many other methods (e.g. predicted entropy) for example difficulty estimation / classifier trustworthiness that need to be compared to. +- The stability of individual VoG scores needs to be investigated further + +The authors are encouraged to address these limitations in the next iteration.",ICLR2021, +pk1uEeM7vsUU,1642700000000.0,1642700000000.0,1,p4H9QlbJvx,p4H9QlbJvx,Paper Decision,Reject,"The paper's primary contributions are: +* Contrary to previous claims, the authors empirically show that inheriting the weights after pruning can be beneficial when using *larger* fine-tuning learning rates than previously done. +* As an explanation, the authors provide suggestive results showing that pruning breaks dynamical isometry, which they claim explains why larger learning rates are needed. +* They propose a regularization-based technique to recover dynamical isometry on modern residual CNNs. + +Generally, reviewers were positive about the ideas in the paper, however, even after the rebuttal 3/4 reviewers did not find the arguments were clear or strongly supported yet. One issue that came up several times is a request for more investigation of StrongReg+pruning. At this time, I have to recommend rejection, but I encourage the authors to follow up on the reviewers suggestions and submit to a future venue.",ICLR2022, +SkghMQAee4,1544770000000.0,1545350000000.0,1,HJedho0qFX,HJedho0qFX,weak novelty and major clarity issues,Reject,"The paper aims to study what is learned in the word representations by comparing SkipGram embeddings trained from a text corpus and CNNs trained from ImageNet. + +Pros: +The paper tries to be comprehensive, including analysis of text representations and image representations, and the cases of misclassification and adversarial examples. + +Cons: +The clarity of the paper is a major concern, as noted by all reviwers, and the authors did not come back with rebuttal to address reviewers' quetions. Also, as R1 and R2 pointed out the novelty over recent relevant papers such as (Dharmaretnam & Fyshe, 2018) is not clear. + +Verdict: +Reject due to weak novelty and major clarity issues.",ICLR2019,5: The area chair is absolutely certain +06V352WC2Y,1576800000000.0,1576800000000.0,1,H1gax6VtDB,H1gax6VtDB,Paper Decision,Accept (Talk),"This paper presents an approach to learn state representations of the scene as well as their action-conditioned transition model, applying contrastive learning on top of a graph neural network. The reviewers unanimously agree that this paper contains a solid research contribution and the authors' response to the reviews further clarified their concerns.",ICLR2020, +SkewqaD1lV,1544680000000.0,1545350000000.0,1,rJl2E3AcF7,rJl2E3AcF7,Meta-review,Reject,"This work proposes a new approximation method for softmax layers with large number of classes. The idea is to use a sparse two-layer mixture of experts. This approach successfully reduces the computation requires on the PTB and Wiki-2 datasets which have up to 32k classes. However, the reviewers argue that the work lacks relevant baselines such as D-softmax and adaptive-softmax. The authors argue that they focus on training and not inference and should do worse, but this should be substantiated in the paper by actual experimental results.",ICLR2019,4: The area chair is confident but not absolutely certain +ZW1CqJ4IuZb,1610040000000.0,1610470000000.0,1,xPw-dr5t1RH,xPw-dr5t1RH,Final Decision,Reject,"While the paper studies an interesting and important problem, namely the language generation, it is poorly written, which makes it difficult to judge its value. The reviewers also expressed concern over the scope of the evaluation and the lack of comparison to SOTA.",ICLR2021, +t8QqhRsxCxn,1642700000000.0,1642700000000.0,1,nLb60uXd6Np,nLb60uXd6Np,Paper Decision,Reject,"This work studies the problem of building powerful representations of low-dimensional point clouds with permutation and rotational equivariance, with the motivation to tackle applications in the physical sciences. Their main technical contribution is the use of the so-called geometric algebra, a series of operations between scalar and vector quantities that respect rotational symmetries, which the authors then combine with attention mechanisms to provide permutation symmetry. + +Reviewers generally found this work full of interesting ideas, in particular the novel geometric algebra structure to deal with rotational symmetry. However, they also found several issues, such as lack of clarity and somewhat unclear experimental validation. In particular, the authors are encouraged to formalise the rotational equivariance property, and to further address the ""small"" aspect of the title. Taking all these considerations into account, the AC recommends rejection at this time, but encourages the authors to pursue this exciting line of research.",ICLR2022, +ILVrfoHgIk,1610040000000.0,1610470000000.0,1,4c3WeBTErrE,4c3WeBTErrE,Final Decision,Reject,"The authors propose a ""jumpy RNN"" to adaptively change the step size of an RNN to match the time scales of the system dynamics. Reviewers found merit in the simple and intuitive idea, but were less enthusiastic about the experimental results and the comparison to existing work. (Adaptive step size methods have been a subject of recent work in RNNs, not to mention in numerical methods for ODE solvers.) Overall, I think the additions the authors made in the discussion phase did strengthen the paper, but further work is necessary before publication. ",ICLR2021, +9mEoCHOW1bJ,1642700000000.0,1642700000000.0,1,fHeK814NOMO,fHeK814NOMO,Paper Decision,Reject,"The work presents a modification to existing approaches of automatic learning rate adaptation (called TLR) via a second order approximation of the function mapping step size to the change in loss when optimized with SGD. This was easily the most controversial paper in the AC's stack, with 4 reviewers advocating accept and 2 reviewers strongly arguing for reject. The authors also went through considerable effort to address reviewer and AC concerns and uploaded multiple additional experiments and ablations to support the robustness and efficacy of the proposed method. Despite a long discussion and rebuttal period, reviewers were unable to reach a consensus. There were several different aspects of the work whose merits were thoroughly debated during the rebuttal period. + +The first aspect regarded what the exact contribution of the work was. Initial reviewers who were very high on the work believed that the entire derivation from equation (1) to equation (9) was novel. However, as other reviewers correctly pointed out (1) to (7) is a standard derivation of adaptive learning rates and has appeared in several prior works. Instead it is primarily equation (9) that is the contribution. Given that multiple reviewers initially believed that (1 -> 7) was a novel contribution, I feel it is safe to say that the authors do not adequately discuss their contributions with respect to prior work. However, all reviewers in the end agreed that equation (9) is novel and potentially interesting (though some remain skeptical of it's utility). + +The second topic of debate regarded the short horizon bias raised by reviewer hkZ3. The short horizon bias presents a fundamental barrier to meta optimization of the learning rate. To summarize, greedily selecting the step size to minimize the loss will result in the optimizer taking too small of steps in the flat directions of an ill conditioned loss surface. This results in faster training in the short term but slower training in the long run. The presented method seeks to greedily optimize the loss over short time scales and thus will be subject to the short horizon bias. The initial draft of the work did not include any discussion of this prior work. During the rebuttal, the authors initially argued that their method can help mitigate the short horizon bias before later concluding that it is a limitation of the method. There was debate between the AC and reviewers regarding whether or not existing methods of adaptive learning rate schedules were already at a fundamental barrier presented by the short horizon bias. One reviewer even mentioned that in their own research they have abandoned the general approach of adaptive learning rates because they cannot overcome this issue. This debate was never resolved, it's plausible to the AC that there is room for increasing the robustness of existing approaches while not addressing the short horizon issue. It is the AC's opinion however that the work would be significantly strengthened with experiments directly addressing the short horizon issue. + +The final item of debate regarded the strength of the considered baselines. The authors claim that the second order term in (eq 9) largely removes the need for tuning relative to Baydin et. al. and that the method outperforms multiple baselines across multiple workloads, including Adam (Kingma et. al.), SLS (Vaswani et. al.), and SPS (Loizou et. al.). Indeed multiple plots are given showing that the authors have found a configuration of their method that consistently outperforms certain fixed configurations of the considered baselines. Furthermore, ablations are presented which suggest that indeed it is the addition of equation (9) that is responsible for this strong performance. Despite all of this presented evidence some reviewers remained skeptical, and believed that they could produce a different but fixed configuration of say Baydin et. al. (or even Adam) which matched the proposed method on all of the considered workloads. There are compelling reasons for reviewers to consider the presented experiments with skepticism. Indeed the deep learning optimization literature has for years struggled to make progress despite publishing hundreds of papers---see for example the results of [1] which perform an independent comparison of 100's of published methods and found that none convincingly outperform Adam. Given this, it is clear that the current standard for evaluating optimizers in the literature is inadequate if we are to reliably make progress. + +To give a more relevant example of the difficulty of comparing optimization methods, suppose for the sake of argument that we were not evaluating the efficacy of TLR but instead the method of Vaswani et. al. (SLS). Vaswani et. al. makes many similar claims as the proposed method, namely the method consistently outperforms Adam across multiple workloads and enjoys a similar robustness to hyperparameters (e.g. their Figure 6). However, in Figure 3 and 4 of this work we see SLS no longer outperforms Adam on the considered workloads. If Vaswani et. al. had argued for acceptance based on the author's Figure 3 and 4, I don't think any reviewer would have recommended acceptance. This begs the obvious question, why does SLS consistently outperform Adam in the experiments run by Vaswani et. al., but not in the experiments run in the considered paper? There are at least two possible answers here, both of which are concerning. Either Vaswani et. al. is yet another method that generally doesn't outperform Adam or in comparing SLS with TLR in this work the authors did not properly tune SLS in their baselines. Furthermore, what is going to happen if future work tries to compare against TLR? Will TLR still look better than Adam or will independent review find that TLR is yet another method that on average performs about as well as Adam? As a reviewer trying to compare the two papers I see very similar evidence given supporting the two methods and thus I am left with an unresolved contradiction. + +Given all of this, I am forced to conclude that there is insufficient evidence presented in this work that the proposed method generally outperforms related methods such as SLS and Adam. A natural question thus is, what would have been sufficient evidence? Indeed the presented experiments seem to be about as convincing as what is shown in previous published methods such as SLS. In a sense the AC is also arguing that Vaswani et. al. presented insufficient evidence that SLS generally outperforms Adam (looking at Figure 3 and 4 perhaps SLS in fact isn't as useful as Vaswani et. al. claim). In looking at the experiments presented in this work, related prior works, and the 100's of methods considered in [1] a common recurring theme is that when comparing with prior work authors consistently run their own implementation of baselines on workloads of their choosing, rather than directly comparing with published results. In doing so, this leaves open the question regarding whether or not the authors (perhaps inadvertently!) are only considering workloads and hyperparameter settings which favor their own method rather than giving a realistic assessment of the efficacy of their own methods relative to others. Thus, if the authors wish to argue that TLR generally outperforms SLS, a strong piece of evidence the authors could provide is to run TLR directly on the open sourced code provided by Vaswani et. al. Show the reviewer how TLR compares when added directly to (for example) Vaswani et. al. Figure 4. In doing so, the authors will have addressed any concerns reviewers may have about how well represented SLS is, as the authors will be comparing against SLS in a setting where there were actual incentives to make SLS look good. + +1. Schmidt et. al. Descending in a crowded valley – Benchmarking Deep Learning Optimizers, https://arxiv.org/abs/2007.01547",ICLR2022, +Tb5ZiIQwT,1576800000000.0,1576800000000.0,1,SJlRF04YwB,SJlRF04YwB,Paper Decision,Reject,"The authors present a way for generating adversarial examples using discrete perturbations, i.e., perturbations that, unlike pixel ones, carry some semantics. Thus, in order to do so, they assume the existence of an inverse graphics framework. Results are conducted in the VKITTI dataset. Overall, the main serious concern expressed by the reviewers has to do with the general applicability of this method, since it requires an inverse graphics framework, which all-in-all is not a trivial task, so it is not clear how such a method would scale to more “real” datasets. A secondary concern has to do with the fact that the proposed method seems to be mostly a way to perform semantic data-augmentation rather than a way to avoid malicious attacks. In the latter case, we would want to know something about the generality of this method (e.g., what happens a model is trained for this attacks but then a more pixel-based attack is applied). As such, I do not believe that this submission is ready for publication at ICLR. However, the technique is an interesting idea it would be interesting if a later submission would provide empirical evidence about/investigate the generality of this idea. ",ICLR2020, +BfiTNTAh2MZ,1610040000000.0,1610470000000.0,1,lcNa5mQ-CSb,lcNa5mQ-CSb,Final Decision,Reject,"This submission tackles an important problem and presents interesting ideas. I am confident that the research will lead to good publications. However, in the particular situation here, AnonReviewer2 had serious concerns that are shared by me. The authors made a great effort to clarify the situation, but the current situation still leaves me uncertain about the presentation and correctness of everything. Because some issues were major, it is not easy to re-evaluate and take new conclusions in the short time of this process. I hope the authors do not take this too negatively, but given all the comments and discussions, it is best that another round of improvements and reviews be conducted.",ICLR2021, +pWy1CtB0fW,1610040000000.0,1610470000000.0,1,Im43P9kuaeP,Im43P9kuaeP,Final Decision,Reject,"While it’s commonly acknowledged that the paper is well written, the reviews are a bit split: R3 and R1 are mildly positive/negative, respectively, R2 and R4 both voted for reject. R2 asked many questions regarding experiments, which were addressed in the details in the rebuttal. R4 raised 6 questions regarding the bound, and the authors only answered some of them in the rebuttal. R4 felt “the method is lacking in a theoretic proof of a strict bound, which is the primary contribution of the paper”. Both R1 and R4 pointed out the proposed algorithm is not practical as expected, especially the results on larger scale such as ImageNet are missing. + +The AC cannot agree with the authors’ argument that the contribution of the paper is “a conceptual framework that it is possible to certify a watermark for neural networks” in responding to such criticisms. It’s indeed very important for this conceptual framework to be proven valuable through thorough experiments and solid comparisons. ",ICLR2021, +o_1eey6eO80,1610040000000.0,1610470000000.0,1,ZsZM-4iMQkH,ZsZM-4iMQkH,Final Decision,Accept (Poster),"This paper suggests an extension of previous implicit bias results on linear networks to a tensor formulation and arguably weakens some of the assumptions of previous works (e.g. loss going to zero is replaced with initialization assumptions). The reviewers were all positive about this work, saying it is clearly written and an original significant contribution. There were a few issues raised (e.g. the novelty of the proof techniques) and the authors responded. The reviewers did not clarify if this response satisfied these concerns, but did not change their positive scores. I will take this to indicate they still recommend acceptance.",ICLR2021, +kVCpuQFdq_cR,1642700000000.0,1642700000000.0,1,vkaMaq95_rX,vkaMaq95_rX,Paper Decision,Accept (Poster),"There are numerous known methods for memory reduction used in CNNs. +This paper takes two such---quantization (Q) and random projection (RP)---and applies them to GNNs. This is a novelty, but I agree with the reviewers: on its own this novelty would not be ""surprising"" enough to report at ICLR. + +The paper further goes to show empirically that these methods, when applied to a reasonable set of datasets, do indeed produce their predicted memory reductions (unsurprising) with a small ($\approx 0.5\%$) drop in accuracy (surprising, in the sense of not being something one could predict without doing the experiment). + +All of the above is in one sense ""just engineering"", with only a small inventive step. Any real-world deployment of GNNs would, if an army of engineers were available, naturally implement quantization and RP in order to see what kind of improvements they might make. This would be just two more hyperparameters (R,B) to add to the sweep, and the deployment would vary them until the required accuracy was achieved in the minimum time (OOM is a red herring - one would vary batch size, other compression, or ship values to CPU in order to make progress). + +However, ""simply adding two more hyperparameters"" is a significant increase in the deployment burden, which is where the paper's third contribution comes in: the theoretical analysis of the effects of the two processes, with straightforward but nonobvious calculations of the effect on gradient variance of the two processes, and, usefully, their interaction. The value of this theory is twofold: first, it gives us new tools to analyse such processes; and second, it allows us to be much more judicious in the selection of these hyperparameters. + +In all, the reviewers' objection of no great novelty in porting ideas from CNN to GNN is sustained; but the authors' claim as to the value of the theory is sustained, and no reviewer provides prior art to dispute the novelty of the theory calculations. + +The revised paper has already expanded the key sections in Appendix E, and added welcome experiments which strengthen the paper. I would encourage a final copy (and certainly the poster presentation) to emphasize some of the insights over the raw experimental numbers. As the authors hint, those numbers are subject to vagaries of what PyTorch happens to implement, while the underlying analyses are a little longer lasting. + +Some other comments: + +A lot of discussion time was spent on the question of whether 0.5% is negligible. This is entirely application dependent, and is part of the hyperparameter/architecture tuning process. + + - the extra time overhead of swapping ""can go up to 80%, which is not feasible in practice"". + Not so: if choosing between OOM or 1.8x slowdown, I will of course choose the latter. + - ""for a fair comparison, we do not change any hyperparameters"" + Again, not relevant: in a real application (which is where this paper contributes), we of course change the learning rate when batch size changes. + - ""the accuracy drop of sampling may be greater than EXACT"" + Again, whether that drop is too much depends entirely on the actual application. + +And please do take a look at typos/grammar/English etc.",ICLR2022, +aEhOJTYPetz,1642700000000.0,1642700000000.0,1,ljxWpdBl4V,ljxWpdBl4V,Paper Decision,Accept (Poster),"The paper proposes to improve (generalized) zero-shot learning, by training a generator jointly with the classification task, such that it generates samples that reduce the classification loss. To achieve this, they use a zero shot model that has a (differentiable) closed form solution (ESZSL), so the full model can be optimized end-to-end. The approach is evaluated on the standard benchmarks of GZSL. + +Reviewers had some concerns regarding novelty compared with previous work and quality of experiments and evaluations. The authors answered most of these concerns in their rebuttal including discussion with previous work and additional evaluations. As a result, the paper would be interesting for the ICLR audience.",ICLR2022, +oV9WXWF198,1576800000000.0,1576800000000.0,1,rygwLgrYPB,rygwLgrYPB,Paper Decision,Accept (Poster),This paper presents an interesting and novel idea that is likely to be of interest to the community. The most negative reviewer did not acknowledge the author response. The AC recommends acceptance.,ICLR2020, +ry9mTGLul,1486400000000.0,1486400000000.0,1,Sks9_ajex,Sks9_ajex,ICLR committee final decision,Accept (Poster),"Important task (attention models), interesting distillation application, well-written paper. The authors have been responsive in updating the paper, adding new experiments, and being balanced in presenting their findings. I support accepting this paper.",ICLR2017, +rkgAg8pGeN,1544900000000.0,1545350000000.0,1,rygqqsA9KX,rygqqsA9KX,Novel perspective for learning latent multimodal representations,Accept (Poster),"This paper offers a novel perspective for learning latent multimodal representations. The idea of segmenting the information into multimodal discriminative and modality-specific generating factors is found to be intriguing by all reviewers and the AC. The technical derivations allow for an efficient implementation of this idea. + +There have been some concerns regarding the experimental section, but they have all been addressed adequately during the rebuttal period. Therefore the AC suggests this paper for acceptance. It is an overall nice and well-thought work. +",ICLR2019,4: The area chair is confident but not absolutely certain +n44ZBmrGGWG,1610040000000.0,1610470000000.0,1,pD9x3TmLONE,pD9x3TmLONE,Final Decision,Reject,"All reviewers recommend rejection due to limited novelty and insufficient experimental analysis. The author’s response has addressed several other questions raised by the reviewers, but it was not sufficient to eliminate the main concerns about novelty (as the method is a combination of existing techniques) and missing comparisons to justify the effectiveness of the proposed approach.",ICLR2021, +JVJayhEGu3,1576800000000.0,1576800000000.0,1,BJliakStvH,BJliakStvH,Paper Decision,Accept (Spotlight),The paper introduces a novel way of doing IRL based on learning constraints. The topic of IRL is an important one in RL and the approach introduced is interesting and forms a fundamental contribution that could lead to relevant follow-up work.,ICLR2020, +9WbgI1i28zD,1642700000000.0,1642700000000.0,1,7udZAsEzd60,7udZAsEzd60,Paper Decision,Accept (Poster),"The authors analyzing the VC-dimension of a class of neural networks +with hard thresholds at the hidden nodes that include a low-rank weight +matrix and hard-thresholds at hidden units. The bounds are independent +of the number of weights used to represent functions mapping a hidden +layer to the output. They also provided some experiments supporting the +practicality of networks like those treated in their theoretical analysis. + +There was some question about whether the VC-dimension continues to be relevant. +Also, while the upper bounds have attractive properties that were highlighted by the +authors, they also are not very strong in other respects. + +The consensus view overall, though, was that this is a ""nice result"", +a clean illustration of a generalization affect of the type that has +been of wide interest lately.",ICLR2022, +RVqlfGX4xCi,1610040000000.0,1610470000000.0,1,#NAME?,#NAME?,Final Decision,Accept (Poster),"Well-written paper that proposes a flow-based model for categorical data, and applies it to graph generation with good results. Extending flow-models to handle types of data that are not continuous is a useful contribution, and graph generation is an important application. Overall, the reviewers were positive about the paper, and only few negative points were raised, so I'm happy to recommend acceptance.",ICLR2021, +_3S-MwFwL,1576800000000.0,1576800000000.0,1,B1lLw6EYwB,B1lLw6EYwB,Paper Decision,Accept (Poster),"The authors propose a novel approach for measuring gradient staleness and use this measure to penalize stale gradients in an asynchronous stochastic gradient set up. Following previous work, they provide a convergence proof for their approach. Most importantly, they provide extensive evaluations comparing against previous approaches and show impressive gains over previous work. + +After the author response, the primary concerns from reviewers is regarding the gap between the proposed method and single worker SGD/synchronous SGD. I feel that the authors have made compelling arguments that ASGD is an important optimization paradigm to consider, so their improvements in narrowing the gap are of interest to the community. There were some concerns about the novelty of the theory, and my impression is that theorem is straightforward to prove based on assumptions and previous work, however, I view the main contribution of the paper as empirical. + +This paper is borderline, but I think the impressive empirical results over existing work on ASGD is a worthwhile contribution and others will find it interesting, so I am recommending acceptance.",ICLR2020, +TuHGT5Iktm,1610040000000.0,1610470000000.0,1,RSU17UoKfJF,RSU17UoKfJF,Final Decision,Accept (Poster),The major criticism of this paper after the initial reviews was a lack of experimental results on deeper and more modern architectures that include skip connections. The authors added results to the paper to address these issues.,ICLR2021, +67LPmicHqO,1610040000000.0,1610470000000.0,1,eU776ZYxEpz,eU776ZYxEpz,Final Decision,Accept (Poster),"This paper was unanimously rated above the acceptance threshold by the +reviewers. While all reviewers agree it is worth accepting, they +differed in their enthusiasm. Most reviewers agree that major +limitations of the paper include that the paper provides no insight into why +Dale's principle exists and the actual results are not truly +state-of-the-art. Nevertheless there is agreement that the paper +presents results worth publicizing to the ICLR audience. The comparison +of the inhibitory network to normalization schemes is interesting. +Also, please reference the Neural Abstraction Pyramid work. + +",ICLR2021, +ByU2NJ6Hf,1517250000000.0,1517260000000.0,438,SyAbZb-0Z,SyAbZb-0Z,ICLR 2018 Conference Acceptance Decision,Reject,"This paper presents a sensible, but somewhat incremental, generalization of neural architecture search. However, the experiments are only done in a single artificial setting (albeit composed of real, large-scale subtasks). It's also not clear that such an expensive meta-learning based approach is even necessary, compared to more traditional approaches. + +If this paper was less about proposing a single new extension, and more about putting that extension in a larger context, (either conceptually or experimentally), it would be above the bar. +",ICLR2018, +VCVaq72X7z96,1642700000000.0,1642700000000.0,1,xD3RiCCfsY,xD3RiCCfsY,Paper Decision,Reject,"The authors propose a random perturbation on top of a soft top-k operator that builds upon entropic regularized optimal transport (when applied to a 1D problem). The motivation of the paper is built around an approximation bound (proposed in the Xie et al '20 paper) that compares the true OT matrix from the regularized OT matrix in the case where some of the 1D entries from which one wishes to extract top-k values are very close (eg. x_{t} ~ x_{t+1}). The authors argue that this bound, with inverse dependencies in the closest element in the list, diverges. + +The authors state that this possible divergence is an issue, because values to be sorted/top-ked can be very close in practice. To solve this issue, the authors introduce instead a Gumbel noise mechanism that no longer makes the bound diverge, through a fairly long theoretical analysis. The approach now requires the recomputation for several noisy inputs of the same regularized OT estimator. The authors propose then to use these soft-top-k approaches to solve a combinatorial problem using gradient descent, namely a capacity constrained problem and clustering, including some tricks on controlling both entropy regularization and Gumbel noise magnitude. + +The paper has generated a long discussion among the AC and reviewers. While the paper has a few strong points that were appreciated (interest of empirical validation which seems to suggest some improvements over commercial solvers on considered setups), there remain a few issues. + +The theoretical side of the paper is bit blurry. The idea of introducing Gumbel noise on top of an already soft operator is not completely clear, since these perturbations are there to add differentiability to something (reg-OT) that was introduced itself to be differentiable. The theoretical motivation is unclear: the noise is introduced because the _upper bound_ diverges (and not the gap between the ""true"" OT and entropic OT, since it is always bounded). The perturbation mechanism is only motivated to improve the limitations of an upper bound, not of the original algorithm itself. What's more, it's not entirely clear why that gap should be decreased (between true and regularized OT) since it has to exist to obtain some differentiability. While the study of the gap itself was added during the discussion phase in Fig. 1""A toy example to explain Lemma 2"", one would expect better foundations for this idea. + +With a somewhat unclear theoretical motivation, the experiments should be very convincing. Reviewers have noted some issues related to comparing CPU/GPU times. While I am sympathetic to the problems encountered by the authors when running such comparisons, these issues should be properly reflected in their initial claims, and not appear in the rebuttal only. I also think experiments are still lacking in diversity. For instance, the k-means problem is studied in 2D (begging the natural question of whether such an improvement would remain in higher dimensions). I could not find a clear statement on the number of repeats carried out to obtain error bars. Since I don't envision either of the max-covering problem nor k-means to become the ""killer app"" of this paper, I would encourage the authors to consider problems that are less synthetic.",ICLR2022, +4r93aL6xQo,1576800000000.0,1576800000000.0,1,BylTta4YvB,BylTta4YvB,Paper Decision,Reject,"There is insufficient support to recommend accepting this paper. Generally the reviewers found the technical contribution to be insufficient, and were not sufficiently convinced by the experimental evaluation. The feedback provided should help the authors improve their paper.",ICLR2020, +rkBb2G8dl,1486400000000.0,1486400000000.0,1,B1PA8fqeg,B1PA8fqeg,ICLR committee final decision,Reject,"The reviewers were consistent in their review that they thought this was a strong rejection. + Two of the reviewers expressed strong confidence in their reviews. + The main arguments made by the reviewers against acceptance were: + Lack of novelty (R2, R3) + Lack of knowledge of literature and development history; particularly with respect to biological inspiration of ANNs (R3) + Inappropriate baseline comparison (R2) + Not clear (R1) + + The authors did not provide a response to the official reviews. Therefore I have decided to follow the consensus towards rejection.",ICLR2017, +SkshIyaHG,1517250000000.0,1517260000000.0,874,rJ6iJmWCW,rJ6iJmWCW,ICLR 2018 Conference Acceptance Decision,Reject,"The paper proposes a method for accented speech generation using GANs. +The reviewers have pointed out the problems in the justification of the method (e.g. the need for using policy gradients with a differentiable objective) as well as its evaluation.",ICLR2018, +Y1Lpb4v0WBc,1642700000000.0,1642700000000.0,1,e-IkMkna5uJ,e-IkMkna5uJ,Paper Decision,Reject,"This paper expands the spectral bias, which has been studied in a constrained situation such as the fully-connected network, to a more practical situation of a multi-class classification situation, and proposes a novel technique that can measure the smoothness through linear interpolation of test examples. + +Two reviewers highly evaluated the importance of the research question considered in this study and the value of diverse experiments applying the proposed method in various directions, and suggested acceptance. On the other hand, two other reviewers suggested rejection due to the lack of rigor in writing and experiments. I strongly agree with the reviewer's concern that the method was only verified on CIFAR10 and the rigor of the experiment was lacking. Unlike the spectral bias paper, which is the basis of this study, this submission is not a theoretical paper, but rather an experimental paper. I admit that it is impossible to verify in various domains as mentioned by the author. However, I believe that verification on more diverse, especially larger-scale datasets is essential at least focusing on the image classification task.",ICLR2022, +fSkl0ZNAZm,1576800000000.0,1576800000000.0,1,H1lj0nNFwB,H1lj0nNFwB,Paper Decision,Accept (Poster),"The paper studies the role of depth on incremental learning, defined as a favorable learning regime in which one searches through the hypothesis space in increasing order of complexity. Specifically, it establishes a dynamical depth separation result, whereby shallow models require exponetially smaller initializations than deep ones in order to operate in the incremental learning regime. + +Despite some concerns shared amongst reviewers about the significance of these results to explain realistic deep models (that exhibit nonlinear behavior as well as interactions between neurons) and some remarks about the precision of some claims, the overall consensus -- also shared by the AC -- is that this paper puts forward an interesting phenomenon that will likely spark future research in this important direction. The AC thus recommends acceptance. ",ICLR2020, +Kiykz0q8Q4dS,1642700000000.0,1642700000000.0,1,1HxTO6CTkz,1HxTO6CTkz,Paper Decision,Accept (Spotlight),"The paper investigates various approaches, and a unifying framework, for sequence design. There were a variety of opinions about the paper. It was felt, after discussion, that the paper would benefit from a sharper focus, and somewhat suffers from being overwhelmed by various approaches, lacking a clear narrative. But overall all reviewers had a positive sentiment, and the paper makes a nice contribution to the growing body of work on protein design.",ICLR2022, +BL0tboSR-F,1576800000000.0,1576800000000.0,1,HyeG9lHYwH,HyeG9lHYwH,Paper Decision,Reject,"The paper proposes a method for lossy image compression. Based on the encoder-decoder framework, it replaces the discrete codes by continuous ones, so that the learning can be performed in an end-to-end way. The idea is interesting, but the motivation is based on a quantization ""problem"" that the authors show no evidence the competing method is actually suffering from. It is thus unclear how much does quantization in existing methods impact performance, and how much will fixing this benefit the overall system. Also, the authors may add some discussions on whether the proposed sampling of z_{c^\star} is indeed also a form of quantization. + +Experimental results are not convincing. The proposed method is only compared with one method. While it works only slightly worse at low bit-rate region, the gap becomes larger in higher bit rate regions. Another major concern is that the encoding time is significantly longer. Ablation study is also needed. Finally, the writing can be improved.",ICLR2020, +ONKkoXxAMSA,1610040000000.0,1610470000000.0,1,4IwieFS44l,4IwieFS44l,Final Decision,Accept (Poster),"The authors demonstrate that complete neural network verification methods that use limited precision arithmetic can fail to detect the possibility of attacks that exploit numerical roundoff errors. They develop techniques to insert a backdoor into networks enabling such exploitation, that remains undetected by neural network verifiers and a simple defence against this particular backdoor insertion. + +The paper demonstrates an important and often ignored shortcoming of neural network verification methods, getting around which remains a significant challenge. Particularly in adversarial situations, this is a significant risk and needs to be studied carefully in further work. + +All reviewers were in agreement on acceptance and concerns raised were adequately addressed in the rebuttal phase, hence I recommend acceptance. However, a few clarifications raised by the official reviewers and public comments should be addressed in the final revision: +1) Acknowledging that incomplete verification methods that rely on sound overapproximation do not suffer from this shortcoming. +2) Concerns around reproducibility of MIPVerify related experiments brought up in public comments.",ICLR2021, +By02HJ6Sf,1517250000000.0,1517260000000.0,661,BJ6anzb0Z,BJ6anzb0Z,ICLR 2018 Conference Acceptance Decision,Reject,"This work combines words and images from Tumblr to provide more fine-grained sentiment analysis than just positive-negative. The contribution is too slight, as a straightforward combination of existing architectures applied on an emotion classification task with conclusions that aren't well motivated and are not providing any comparison to existing related work on finer emotion classification.",ICLR2018, +BkZgafI_g,1486400000000.0,1486400000000.0,1,rkaRFYcgl,rkaRFYcgl,ICLR committee final decision,Reject,"The reviewers seem to agree that the framework presented is not very novel, something I agree with. + The experiments show that the low rank + diagonal parameterization can be useful, however. The paper could be improved by making a more tightened message, and clearer arguments. As it currently stands, however it does not seem ready for publication in ICLR.",ICLR2017, +P7wcJAFekV,1576800000000.0,1576800000000.0,1,rJxWxxSYvB,rJxWxxSYvB,Paper Decision,Accept (Poster),"All authors agree the paper is well written, and there is a good consensus on acceptance. The last reviewer was concerned about a lack of diversity in datasets, but this was addressed in the rebuttal.",ICLR2020, +SyxpW5ofeE,1544890000000.0,1545350000000.0,1,H1GaLiAcY7,H1GaLiAcY7,The idea has some merits.,Reject,"AR1 finds the paper overly lengthy and ill-focused on contributions of this work. Moreover, AR1 would like to see more results for G-ZSL. AR2 finds the paper is lacking in clarity, e.g. Eq. 9, and complete definition of the end-to-end decision pipeline is missing. AR2 points that the manuscript relies on GZSL and comparisons to it but other more recent methods could be also cited: +- Generalized Zero-Shot Learning via Synthesized Examples by Verma et al. +- Zero-Shot Kernel Learning by Zhang et al. +- Model Selection for Generalized Zero-shot Learning by Zhang et al. +- Generalized Zero-Shot Learning with Deep Calibration Network by Liu et al. +- Multi-modal Cycle-consistent Generalized Zero-Shot Learning by Felix et al. +- Open Set Learning with Counterfactual Images +- Feature Generating Networks for Zero-Shot Learning +Though, the authors are welcome to find even more relevant papers in google scholar. + +Overall, AC finds the paper interesting and finds the idea has some merits. Nonetheless, two reviewers maintained their scores below borderline due to numerous worries highlighted above. The authors are encouraged to work on presentation of this method and comparisons to more recent papers where possible. AC encourages the authors to re-submit their improved manuscript as, at this time, it feels this paper is not ready and cannot be accepted to ICLR. +",ICLR2019,4: The area chair is confident but not absolutely certain +r1ezaO4ZgE,1544800000000.0,1545350000000.0,1,SkloDjAqYm,SkloDjAqYm,Good applied paper ,Accept (Poster),This paper is about representation learning for calcium imaging and thus a bit different in scope that most ICLR submissions. But the paper is well-executed with good choices for the various parts of the model making it relevant for other similar domains.,ICLR2019,4: The area chair is confident but not absolutely certain +rJlV8GXMlV,1544860000000.0,1545350000000.0,1,ByxGSsR9FQ,ByxGSsR9FQ,"practical ideas for ensuring robustness, albeit in a limited attack model",Accept (Poster)," +* Strengths + +This paper studies adversarial robustness to perturbations that are bounded in the L2 norm. It is motivated by a theoretical sufficient condition (non-expansiveness) but rather than trying to formally verify robustness, it uses this condition as inspiration, modifying standard network architectures in several ways to encourage non-expansiveness while mostly preserving computational efficiency and accuracy. This “theory-inspired practically-focused” hybrid is a rare perspective in this area and could fruitfully inspire further improvements. Finally, the paper came under substantial scrutiny during the review period (there are 65 comments on the page) and the authors have convincingly answered a number of technical criticisms. + +* Weaknesses + +One reviewer and some commenters were concerned that the L2 norm is not a realistic norm to measure adversarial attacks in. There were also concerns that the empirical level of robustness of the network was too weak to be meaningful. In addition, while some parts of the experiments were thorough and some parts of the paper were well-presented, the quality was not uniform throughout. Finally, while the proposed changes improve adversarial robustness, they also decrease the accuracy of the network on clean examples (this is to be expected but may be an issue in practice). + +* Discussion + +There was substantial disagreement on whether to accept the paper. On the one hand, there has been limited progress on robustness to adversarial examples (even under simple norms such as the L2 norm) and most methods that do work are based on formal verification and therefore quite computationally expensive. On the other hand, simple norms such as the L2 norm are somewhat contrived and mainly chosen for convenience (although doing well in the L2 norm is a necessary condition for being robust to more general attacks). Moreover, the empirical results are currently too weak to confer meaningful robustness even under the L2 norm. + +* Decision + +While I agree with the reviewers and commenters who are skeptical of the L2 norm model (and would very much like to see approaches that consider more realistic threat models), I decided to accept the paper for two reasons: first, doing well in L2 is a necessary condition to doing well in more general models, and the ideas and approach here are simple enough that they might provide inspiration in these more general models as well. Additionally, this was one of the strongest adversarial defense papers at ICLR this year in terms of credibility of the claims (certainly the strongest in my pile) and contains several useful ideas as well as novel empirical findings (such as the increased success of attacks up to 1 million iterations).",ICLR2019,4: The area chair is confident but not absolutely certain +sHexrI1JN8I,1642700000000.0,1642700000000.0,1,RXQ-FPbQYVn,RXQ-FPbQYVn,Paper Decision,Accept (Poster),"This paper tackles the problem of exploration in Deep RL in settings with a large action space. To this end, the authors introduce an intrinsic reward inspired by the exploration bonus of LinUCB. This novel exploration method called anti-concentrated confidence bounds (ACB) provably approximates the elliptical exploration bonus of LinUCB by using an ensemble of least-squares regressors. This allows ACB to bypass costly covariance matrix inversion, which can be problematic for high-dimensional problems (hence allowing it to be used in large state spaces). Empirical experiments show that ACB enjoys near-optimal performance in linear stochastic bandits. However, experiments on Atari benchmark fail to show any practical advantage of ACB over current methods, neither computation nor performance-wise. That being said, the proposed ACB approach is theoretically transparent, which contributes to advancing our theoretical understanding of usable intrinsic rewards in deep RL and can inform theoretically motivated directions for improvement and further research, while being on par with SOTA. I believe that this makes the contribution of this work strong enough for acceptance.",ICLR2022, +w-BFRBRXkt,1576800000000.0,1576800000000.0,1,rylNJlStwB,rylNJlStwB,Paper Decision,Reject,"The majority of reviewers suggest rejection, pointing to concerns about design and novelty. Perhaps the most concerning part to me was the consistent lack of expertise in the applied area. This could be random bad luck draw of reviewers, but more likely the paper is not positioned well in the ICLR literature. This means that either it was submitted to the wrong venue, or that the exposition needs to be improved so that the paper is approachable by a larger part of the ICLR community. Since this is not currently true, I suggest that the authors work on a revision.",ICLR2020, +gvQ94s6q0S,1576800000000.0,1576800000000.0,1,r1xMH1BtvB,r1xMH1BtvB,Paper Decision,Accept (Poster),"This paper investigates the tasks used to pretrain language models. The paper proposes not using a generative tasks ('filling in' masked tokens), but instead a discriminative tasked (recognising corrupted tokens). The authors empirically show that the proposed method leads to improved performance, especially in the ""limited compute"" regime. + +Initially, the reviewers had quite split opinions on the paper, but after the rebuttal and discussion phases all reviewers agreed on an ""accept"" recommendation. I am happy to agree with this recommendation based on the following observations: +- The authors provide strong empirical results including relevant ablations. Reviews initially suggested a limitation to classification tasks and a lack of empirical analysis, but those issues have been addressed in the updated version. +- The problem of pre-training language model is relevant for the ML and NLP communities, and it should be especially relevant for ICLR. The resulting method significantly outperforms existing methods, especially in the low compute regime. +- The idea is quite simple, but at the same time it seems to be a quite novel idea. ",ICLR2020, +HO9yDzhP4U,1642700000000.0,1642700000000.0,1,-70L8lpp9DF,-70L8lpp9DF,Paper Decision,Accept (Oral),"This paper tackles a problem at the intersection of AutoML and trustworthiness that has not been studied much before, and provides a first solution, leaving much space for a lot of interesting future research. +All reviewers agree that this is a strong paper and clearly recommend acceptance. +I recommend acceptance as an oral since the paper opens the door for a lot of interesting follow-ups.",ICLR2022, +_QT4v_QA9pV,1642700000000.0,1642700000000.0,1,XWODe7ZLn8f,XWODe7ZLn8f,Paper Decision,Accept (Spotlight),"All the reviewers liked the paper. The proposed method contains novel ideas of learning feature representation to maixmize the mutral informatio nbetween the latent code and its corresponding observation for fine-grained class clustering. The model seems to successfully avoid mode collapse while training generators and able to generate various object (foregrounds) with varying backgrounds. The foreground and background control ability is an outstanding feature of the paper. Please incorporate the comments of the reviewers in the final version. + +BTW, the real score of this paper should be 7.0 as Reviewer 5wFE commented that he/she would raise the score from 5 to 6 but at the time of this meta review, ths core was not raised. So the final score of the paper should be 8/8/6/6.",ICLR2022, +H1eOaY0JlN,1544710000000.0,1545350000000.0,1,S1x2Fj0qKQ,S1x2Fj0qKQ,solid idea and results,Accept (Poster),"The paper addresses normalisation and conditioning of GANs. The authors propose to replace class-conditional batch norm with whitening and class-conditional coloring. Evaluation demonstrates that the method performs very well, and the ablation studies confirm the design choices. After extensive discussion, all reviewers agreed that this is a solid contribution, and the paper should be accepted. ",ICLR2019,5: The area chair is absolutely certain +KxonMIIbcFw,1642700000000.0,1642700000000.0,1,JGO8CvG5S9,JGO8CvG5S9,Paper Decision,Accept (Spotlight),"The paper studies an interesting question of whether neural networks can approximate the target function while keep the output in the constraint set. The constraint set is quite natural for e.g. multi-class classification, where the output has to stay on on the probability manifold. The challenge here is that traditional universal approximation theory only guarantees that $\hat{f}(x) \approx f(x)$, but can not guarantee that $\hat{f}(x)$ lies exactly in the same constraint set as $f(x)$. + +The paper made a significant contribution in the theory of deep learning -- It is shown that the neural network can indeed approximate any regular functions while keep the output stay in the regular constraint set. This gives a solid backup in terms of the representation power of neural networks in practice, to represent target functions whose outputs are in certain constraint set (e.g. probabilities).",ICLR2022, +rJeRxEAZJE,1543790000000.0,1545350000000.0,1,Ske6wiAcKQ,Ske6wiAcKQ,Reject,Reject,All reviewers agree in their assessment that this paper is not ready for acceptance into ICLR.,ICLR2019,4: The area chair is confident but not absolutely certain +ol7NdtAi3,1576800000000.0,1576800000000.0,1,BJlLvnEtDB,BJlLvnEtDB,Paper Decision,Reject,"This paper aims to analyze CNN representations in terms of how well they measure the perceptual severity of image distortions. In particularly, (a) sensitivity to changes in visual frequency and (b) orientation selectivity was used. Although the reviewers agree that this paper presents some interesting initial findings with a promising direction, the majority of the reviewers (three out of four) find that the paper is incomplete, raising concerns in terms of experimental settings and results. Multiple reviewers explicitly asked for additional experiments to confirm whether the presented empirical results can be used to improve results of an image generation. Responding to the reviews, the authors added a super-resolution experiment in the appendix, which the reviewers believe is the right direction but is still preliminary. + +Overall, we believe the paper reports interesting findings but it will require a series of additional work to make it ready for the publication.",ICLR2020, +c769NN_F9h,1576800000000.0,1576800000000.0,1,SJg2j0VFPB,SJg2j0VFPB,Paper Decision,Reject,"The paper provides a generalization error bound, which extends the results from PU learning, for the problem of knowledge graph completion. The authors assume a missing at random setting, and provide bounds on the triples (two nodes and an edge) that could be mistakes. Then the paper provides a maximum likelihood interpretation, as well as relations to existing knowledge graph completion methods. The problem setting is interesting, and the writing clear. + +This discussion was extensive, with reviewers and authors following the spirit of ICLR and having a constructive discussion which resulted in improvements to the paper. However, there seems to be still some remaining improvements to be made in terms of clarity of presentation, as well as precision of the theoretical arguments. + +Unfortunately, there are many strong submissions, and the paper as it currently stands does not satisfy the quality threshold of ICLR.",ICLR2020, +Hyli0PpWx4,1544830000000.0,1545350000000.0,1,Byxr73R5FQ,Byxr73R5FQ,Meta-review,Reject,"Pros: +- simple, sensible subgoal discovery method +- strong inuitions, visualizations +- detailed rebuttal, 15 appendix sections + +Cons: +- moderate novelty +- lack of ablations +- assessments don't back up all claims +- ill-justified/mismatching design decisions +- inefficiency due to relying on a random policy in the first phase + +There is consensus among the reviewers that the paper is not quite good enough, and should be (borderline) rejected.",ICLR2019,4: The area chair is confident but not absolutely certain +ULKoHfjLfb,1576800000000.0,1576800000000.0,1,S1xitgHtvS,S1xitgHtvS,Paper Decision,Accept (Spotlight),"The paper explores in more detail the ""RL as inference"" viewpoint and highlights some issues with this approach, as well as ways to address these issues. The new version of the paper has effectively addressed some of the reviewers' initial concerns, resulting in an overall well-written paper with interesting insights.",ICLR2020, +pezDiHjcGYd,1610040000000.0,1610470000000.0,1,S5S3eTEmouw,S5S3eTEmouw,Final Decision,Reject,"The paper proposes a method for offline meta-RL, where we meta-train on pre-collected offline data for several RL tasks and adapt to a new task with a small amount of data. The paper assumes that there is no interaction with the environment either during meta-train or meta-test. In this setting, motivated by the ide of leveraging offline experience from multiple tasks to enable fast adaptation to new tasks, the paper introduces MACAW, which combines the consistent MAML and the popular offline AWR, improving upon them by adding capacity through parameterization and adding an extra objective in the policy update. As a result, the MACAW proposed for the offline meta-RL has the desirable property of being consistent, i.e., converging to a good policy if enough time and data for the meta-test task are given, regardless of meta-training. + + +Pros: ++ Most of the experiments are well executed, using good baselines. Extensive ablations on the various modifications to MAML+AWR confirmed the utility of the approach for the fully offline meta-RL problem. ++ MACAW is a simple algorithm with theoretical guarantees; the modifications to the policy functions are backed by theory. + + +Cons: +- The reviewers have concerns on the formulation of offline meta-RL. One major contribution of the paper is to introduce offline meta-RL. However the paper largely borrows the meta-RL formulation from the online setting where task=MDP. The reviewers think that directly borrowing from regular meta-RL as the formulation of offline meta-RL might be misleading. The reviewers suggest including behavior policy as part of the task definition for offline meta-RL formulation. + +- Several reviewers raised concerns that the fully offline setting might be unrealistic. Although the author did add a motivation, the reviewers would be interested in seeing MACAW being adapted online at test time on in-distribution tasks. + +- Unfortunately, the authors accidentally revealed their names in one of the modified versions. +",ICLR2021, +SJxCSBYegV,1544750000000.0,1545350000000.0,1,Hke4l2AcKQ,Hke4l2AcKQ,Meta-Review,Accept (Poster),"This paper proposes a solution for the well-known problem of posterior collapse in VAEs: a phenomenon where the posteriors fail to diverge from the prior, which tends to happen in situations where the decoder is overly flexible. + +A downside of the proposed method is the introduction of hyper-parameters controlling the degree of regularization. The empirical results show improvements on various baselines. + +The paper proposes the addition of a regularization term that penalizes pairwise similarity of posteriors in latent space. The reviewers agree that the paper is clearly written and that the method is reasonably motivated. The experiments are also sufficiently convincing.",ICLR2019,4: The area chair is confident but not absolutely certain +5QiVO7XdD1,1576800000000.0,1576800000000.0,1,rJx4p3NYDB,rJx4p3NYDB,Paper Decision,Accept (Poster),"The paper proposed an regret based approach to speed up counterfactural regret minimization. The reviewers find the proposed approach interesting. However, the method require large memory. More experimental comparisons and comparisons pointed out by reviewers and public comments will help improve the paper. ",ICLR2020, +iuSa-xWNed,1610040000000.0,1610470000000.0,1,jHefDGsorp5,jHefDGsorp5,Final Decision,Accept (Poster),"The authors appreciated this submission because (a) the aspect of explainability is novel, (b) its strong performance, (c) the clarity of the paper. I urge the authors to double check all of the reviewer comments to make sure they are all addressed in the updated version. I vote to accept.",ICLR2021, +rke4EaO-xE,1544810000000.0,1545350000000.0,1,Hkl-di09FQ,Hkl-di09FQ,meta review,Reject,"This paper proposes a new method for combining previous state representation learning methods and compares to end-to-end learning without without separately learning a state representation. The topic is important, and the authors have made an extensive effort to address the reviewer's concerns, particularly regarding clarity, related work, and accuracy of the drawn conclusions. The reviewers found that the main weakness of the paper was the experiments not being sufficiently convincing that the proposed approach is better than the alternatives. Hence, it does not currently meet the bar for publication.",ICLR2019,4: The area chair is confident but not absolutely certain +dpWrpAY3HV,1576800000000.0,1576800000000.0,1,H1l2mxHKvr,H1l2mxHKvr,Paper Decision,Reject,"This paper tackles the interesting problem of meta-learning in problem spaces where training ""tasks"" are scarce. Two criticisms that seems to shared across reviewers are that (i) it is debatable how ""novel"" the space of meta learning with ""few"" tasks is, especially since there aren't established standard for how many training tasks should be available, and (ii) the paper could use more comparisons with baseline methods and ablations to understand the contributions. As an AC, I down-weight criticism (i) because I don't feel the paper has to be creating a new problem definition; it's acceptable to make advances within an existing space. However, criticism (ii) seems to remain. After conferring with reviewers it seems that the rebuttal was not strong enough to significantly alter the reviewer's opinions on this issue, and so the paper does not have enough support to justify acceptance. The paper certainly addresses interesting issues, and I look forward to seeing a revised/improved version at another venue. ",ICLR2020, +1mMED5Pmro,1576800000000.0,1576800000000.0,1,rke2HRVYvH,rke2HRVYvH,Paper Decision,Reject,"The consensus of reviewers is that this paper is not acceptable in present form, and the AC concurs.",ICLR2020, +wKUec0gRcvK,1642700000000.0,1642700000000.0,1,7gE9V9GBZaI,7gE9V9GBZaI,Paper Decision,Accept (Poster),"This paper demonstrates that deep networks can memorize adversarial examples of training data with completely random labels, which motivates some analyses on the convergence and generalization of adversarial training (AT). The authors identify a significant drawback of memorization in AT that could result in robust overfitting and propose a new algorithm to mitigate this drawback. Experiments on benchmark datasets validate the effectiveness of the proposed algorithm. One of the reviewers is concerned about (1) the validity of stability analysis where 80% of the data labels are noisy, and the perturbation (64/255) is large, (2) the gap between theory and practice, and (3) novelty. The authors have made a great effort to address these concerns. Although there is still no consensus after the author's response, the majority of the reviewers are in strong support. I, therefore, recommend acceptance.",ICLR2022, +gARs4WtxIWX,1642700000000.0,1642700000000.0,1,zPLQSnfd14w,zPLQSnfd14w,Paper Decision,Reject,"The paper provides two new generalization bounds for non-linear metric learning with deep neural networks, by extending results of Bartlett et al. 2017 to the metric learning setting. The main contribution of the paper is by extending the techniques of Bartlett et al. from a classification setting to the metric learning setting (which has very different objectives) and consider two regimes. In the first regime the techniques are fairly similar but the second regime is more novel. However, the current version of the paper does not highlight the similarity and differences between the results and techniques with Bartlett et al. 2017; it also does not give sufficient intuition on how the metric learning setting is fundamentally different from the classification setting and how the paper leverage the difference to get improved bounds. All the reviewers had some confusions to different degrees, and the paper would be much stronger if it can explain the intuition and make more explicit comparisons.",ICLR2022, +KW13-Oq2ux,1576800000000.0,1576800000000.0,1,HJgBA2VYwH,HJgBA2VYwH,Paper Decision,Accept (Poster),"Overall, this paper got strong scores from the reviewers (2 accepts and 1 weak accept). The paper proposes to address the responsibility problem, enabling encoding and decoding sets without worrying about permutations. This is achieved using permutation-equivariant set autoencoders and an 'inverse' operation that undoes the sorting in the decoder. The reviewers all agreed that the paper makes a meaningful contribution and should be accepted. Some concerns regarding clarity of exposition were initially raised but were addressed during the rebuttal period. I recommend that the paper be accepted.",ICLR2020, +Bkxznf8dg,1486400000000.0,1486400000000.0,1,SkB-_mcel,SkB-_mcel,ICLR committee final decision,Accept (Poster),The proposed Central Moment Discrepancy criterion is well-described and supported in this paper. Its performance on domain adaptation tasks is good as well. The reviewers had several good comments and suggestions and the authors have taken most of these into account and improved the paper considerably. The paper thus makes a nice contribution to the distribution matching literature and toolbox.,ICLR2017, +Byg5NvuQg4,1544940000000.0,1545350000000.0,1,B1VZqjAcYX,B1VZqjAcYX,new method with thin evaluation,Accept (Poster),"This method proposes a criterion (SNIP) to prune neural networks before training. The pro is that SNIP can find the architecturally important parameters in the network without full training. The con is that SNIP only evaluated on small datasets (mnist, cifar, tiny-imagenet) and it's uncertain if the same heuristic works on large-scale dataset. Small datasets can always achieve high pruning ratio, so evaluation on ImageNet is quite important for pruning work. The reviewers have consensus on accept. The authors are recommended to compare with previous work [1][2] to make the paper more convincing. + +[1] Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efficient neural network. NIPS, 2015. + +[2] Yiwen Guo, Anbang Yao, and Yurong Chen. Dynamic network surgery for efficient dnns. NIPS, 2016.",ICLR2019,5: The area chair is absolutely certain +6uz0Y2XqbB,1642700000000.0,1642700000000.0,1,Vog_3GXsgmb,Vog_3GXsgmb,Paper Decision,Accept (Poster),"The paper introduces a pipeline to discover PDEs from scarce and noisy data. Reviewers engaged in a very thoughtful discussion with the authors. I read the extensive rebuttal, and I believe the authors have addressed the major concerns claimed by the reviewers. I ask the authors to make sure to include all the changes and additional experiments in the camera-ready version.",ICLR2022, +HkxrtRlLlV,1545110000000.0,1545350000000.0,1,Hk4fpoA5Km,Hk4fpoA5Km,Well written paper highlighting and fixing a common problem in Adversarial Imitation Learning algorithms,Accept (Poster),"This work highlights the problem of biased rewards present in common adversarial imitation learning implementations, and proposes adding absorbing states to to fix the issue. This is combined with an off-policy training algorithm, yielding significantly improved sample efficiency, whose benefits are convincingly shown empirically. The paper is well written and clearly presents the contributions. Questions were satisfactorily answered during discussion, and resulted in an improved submission, a paper that all reviewers now agree is worth presenting at ICLR. +",ICLR2019,4: The area chair is confident but not absolutely certain +6bctUxNZcJ,1642700000000.0,1642700000000.0,1,qQuzhbU3Gto,qQuzhbU3Gto,Paper Decision,Reject,"The paper proposes an edge-independent graph generative model that can capture heterophily. The authors propose a 3-stage process to obtain the node representations. The idea of factorization in the form of BB^T-CC^T is an interesting approach to model heterophily. + +The paper can be improved in terms of writing to better motivate the need for a 3-stage algorithm and how these individual steps are related to the existing techniques in the literature. The authors should elaborate on the implications of the theorems and the concerns raised by the reviewers in the body of the paper. + +The algorithm faces scalability challenges, which are not studied well in the experiments. The reviewers also have raised concerns about degeneracy in network reconstruction experiments. Overall, the paper needs further improvements for publication.",ICLR2022, +rjZ3LFLNmC,1576800000000.0,1576800000000.0,1,BylldnNFwS,BylldnNFwS,Paper Decision,Reject,"This paper studies the decision boundaries of a certain class of +neural networks (piecewise linear, non-linear activation functions) +using tropical geometry, a subfield of algebraic geometry that leverages piece-wise linear structures. +Building on earlier work, such piecewise linear networks are shown to be represented as +a tropical rational function. This characterisation is used to explain different phenomena of neural +network training, such as the 'lottery ticket hypothesis', network +pruning, and adversarial attacks. + +This paper received mixed reviews, owing to its very specialized area. Whereas R1 championed the submission +for its technical novelty, the other reviewers felt the current exposition is too inaccessible and some application areas are not properly addressed. The AC shares these concerns, recommends rejection and strongly encourages the authors to address the reviewers concerns in the next iteration. +",ICLR2020, +juQY51Sg-_Y,1642700000000.0,1642700000000.0,1,m22XrToDacC,m22XrToDacC,Paper Decision,Reject,"The paper provides a framework for recourse (i.e. counterfactual explanations) that is robust to model shifts. The setup for the proposed method is a min-max optimization problem, where the max is over a neighborhood around the distribution over model parameters. The model parameters are drawn from a mixture of K distributions, so that the neighborhood is specified by the Gelbrich distance on each component. The authors propose a finite-dimensional version of the robustified optimization problem, which can be optimized using projected gradient descent. They evaluate their approach on the German credit dataset, the Small Business Administration dataset, and the Student performance dataset, each of which demonstrates a different type of data distribution shift. + +Strengths: + +- Most existing work on recourse actions do not consider model change, so the problem addressed by the paper is relatively new +- The experiment results demonstrate the superiority of the proposed method over baselines. + +Weaknesses: + +- The solution provided is somewhat limited as it relies heavily on the structural properties of the mixture distribution and Gelbrich distance to reformulate the optimization problem. + +Most of the reviewers voted initially for rejection. The paper is borderline, tending to rejection after the rebuttal. The authors have also considerably updated the paper with new results after the initial reviews. It seems therefore that the paper may benefit from another round of reviewing and, because of this, I recommend rejection and the authors to use the reviewers' comments to improve the paper before resubmitting to another venue for another round of reviewing.",ICLR2022, +SJJ-vyTrf,1517250000000.0,1517260000000.0,931,HJJ0w--0W,HJJ0w--0W,ICLR 2018 Conference Acceptance Decision,Reject,"This paper address the increasingly studied problem of predictions over long-term horizons. Despite this, and the important updates from the authors, the paper is not yeat ready and improvements identified include more control over the fair comparisons, improved clarity in exposition.",ICLR2018, +ryHMnfL_g,1486400000000.0,1486400000000.0,1,ByG8A7cee,ByG8A7cee,ICLR committee final decision,Reject,"All of the reviewers point out clarity problems; while these may have been resolved in an updated version, the reviewers have not expressed that the matter is resolved. There are several questions raised about the use of perplexity, both whether the comparison is fair, and whether it is a valid proxy for more standard measures in NLP. The former seems to be more of an issue for this area chair, and the discussion did not convince me that it was adequately resolved.",ICLR2017, +yTgV6PvUX_w,1642700000000.0,1642700000000.0,1,S7vWxSkqv_M,S7vWxSkqv_M,Paper Decision,Reject,"The paper describes a new testbed to evaluate Bayesian techniques in the context of joint predictive distribution. Since this is not the first paper that considers marginal vs joint distribution evaluation, the paper should include a thorough discussion of the differences with prior work. The paper simply states that it refutes Wang et al.'s previous observation that joint distributions do not distinguish techniques much more than marginals. However, the paper does not really explain why their observation is correct and Wang's observation should be discarded. Since this is the core contribution of the paper and it is doubtful, this is problematic. The discussion of epistemic/aleatoric uncertainty also seems superfluous and therefore distract the reader.",ICLR2022, +hTPnoV1iuk,1576800000000.0,1576800000000.0,1,S1gtclSFvr,S1gtclSFvr,Paper Decision,Reject,"This paper describes how they extend a previous phrase-based neural machine translation model to incorporate external dictionaries. The reviewers mention the small scale of the experiments, and the lack of clarity in the writing, and missing discussion on computational complexity. Even though the method seems to have the potential to impact the field, the paper is currently not strong enough for publication. The authors have not engaged in the discussion at all. ",ICLR2020, +HJx3F58uAm,1543170000000.0,1545350000000.0,1,r1ztwiCcYQ,r1ztwiCcYQ,"A good problem, but not well executed and communicated ",Reject,"This paper studies a variational formulation of the loss minimization to study the solution that generalizes the most. An expectation of the loss wrt a Gaussian distribution is minimized to find the mean and variance of the Gaussian distribution. As the variance goes to zero, we recover the original loss, but for a higher value of variance, the loss may be convex. This is used to study the generalizability of the landscape. + +Both objective and solutions of the paper are unclear and not communicated well. There is not enough citation to previous work (e.g., Gaussian homotopy exactly considers this problem, and there are papers that study the convexity of the expectation of the loss function). There are no experimental results either to confirm the theoretical finding. + +All the reviewers struggle to understand both the problem and solutions discussed in this paper. I believe that the paper could become useful if reviewers' feedback is taken seriously to improve the paper.",ICLR2019,5: The area chair is absolutely certain +W5MOblSDJ,1576800000000.0,1576800000000.0,1,HklOo0VFDH,HklOo0VFDH,Paper Decision,Accept (Poster),"This paper proposes an approximate inference approach for decoding in autoregressive models, based on the method of auxiliary coordinates, which uses iterative factor graph approximations of the model. The approach leads to nice improvements in performance on a text infilling task. The reviewers were generally positive about this paper, though there was a concern that more baselines are needed and discussion was very limited following the author responses. I tend to agree with the authors that their results are convincing on the infilling task. The impact of the paper is a bit limited by the lack of experiments on more standard decoding tasks, which, as the authors point out, would be challenging as their approach is computationally demanding. Overall I believe this would be an interesting contribution to the ICLR community.",ICLR2020, +S1Yfmkarz,1517250000000.0,1517260000000.0,92,rJNpifWAb,rJNpifWAb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Thank you for submitting you paper to ICLR. The idea is simple, but easy to implement and effective. The paper examines the performance fairly thoroughly across a number of different scenarios showing that the method consistently reduces variance. How this translates into final performance is complex of course, but faster convergence is demonstrated and the revised experiments in table 2 show that it can lead to improvements in accuracy. ",ICLR2018, +S1YNmkarG,1517250000000.0,1517260000000.0,120,rkO3uTkAZ,rkO3uTkAZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster)," +I am going to recommend acceptance of this paper despite being worried about the issues raised by reviewer 1. In particular, + +1: the best possible inception score would be obtained by copying the training dataset +2: the highest visual quality samples would be obtained by copying the training dataset +3: perturbations (in the hidden space of a convnet) of training data might not be perturbations in l2, and so one might not find a close nearest neighbor with an l2 search +4: it has been demonstrated in other works that perturbations of convnet features of training data (e.g. trained as auto-encoders) can make convincing ""new samples""; or more generally, paths between nearby samples in the hidden space of a convnet can be convincing new samples. + +These together suggest the possibility that the method presented is not necessarily doing a great job as a generative model or as a density model (it may be, we just can't tell...), but it is doing a good job at hacking the metrics (inception score, visual quality). This is not an issue with only this paper, and I do not want to punish the authors of this papers for the failings of the field; but this work, especially because of its explicit use of training examples in the memory, nicely exposes the deficiencies in our community's methodology for evaluating GANs and other generative models. + +",ICLR2018, +kCHxyCLFCm6g,1642700000000.0,1642700000000.0,1,dK_t8oN8G4,dK_t8oN8G4,Paper Decision,Reject,"This work proposes an interesting approach for learning the relational constraints of a dataset and then generating according to those constraints. Learning the constraints via a constrained optimization problem is an interesting contribution. The application of constrained generation is also interesting and can be applied to several domains though only music and poetry is examined in this work. However, the music evaluation is unconvincing and the paper lacks clarity in the description of the approach such as building the GCN. Music evaluation could be improved with human evaluation since loss metrics don't paint a full picture. Finally, a more diverse set of experiments and datasets (rather than just one poetry collection and one folk song corpus) and more analysis on the learnt constraints could give a more complete story for this approach's effectiveness on sequence data.",ICLR2022, +Syl3DwRW1V,1543790000000.0,1545350000000.0,1,r1eJssCqY7,r1eJssCqY7,Reject,Reject,All reviewers agree in their assessment that this paper has merits but is not yet ready for acceptance into ICLR. The area chair commends the authors for their responses to the reviews.,ICLR2019,4: The area chair is confident but not absolutely certain +HylFQRVTkV,1544540000000.0,1545350000000.0,1,B1eXbn05t7,B1eXbn05t7,decision,Reject,"The paper is on the borderline. From my reading, the paper presents a reasonable idea with quite good results on novel image generation and one-shot learning. On the other hand, the comparison against the prior work (both generation task and one-shot classification task) is not convincing. I also feel that there are many work with similar ideas (I listed some below, but these are not exhaustive/comprehensive list), but they are not cited or compared, I am not sure about if the proposed concept is novel in high-level. Although some implementation details of this method may provide advantages over other related work, such comparison is not clear to me. + +Disentangling factors of variation in deep representations using adversarial training +https://arxiv.org/abs/1611.03383 +NIPS 2016 + +Rethinking Style and Content Disentanglement in Variational Autoencoders +https://openreview.net/forum?id=B1rQtwJDG +ICLR 2018 workshop + +Disentangling Factors of Variation by Mixing Them +http://openaccess.thecvf.com/content_cvpr_2018/papers/Hu_Disentangling_Factors_of_CVPR_2018_paper.pdf +CVPR 2018 + +Separating Style and Content for Generalized Style Transfer +https://arxiv.org/pdf/1711.06454.pdf + +Finally, I feel that the writing needs improvement. Although the method is intuitive and has simple idea, the paper seems to lack full details (e.g., principled derivation of the model as a variant of the VAE formulation) and precise definitions of terms (e.g., second term of LF loss). +",ICLR2019,4: The area chair is confident but not absolutely certain +4CY8-PwSK5,1610040000000.0,1610470000000.0,1,sgJJjd3-Y3,sgJJjd3-Y3,Final Decision,Reject,"This paper addresses the real-world problem of semi-supervised learning where the distribution from which the labeled examples are drawn is different from the distribution from which the unlabeled examples are drawn. The task is motivated by structure-activity prediction for drug design (quantitative structure activity prediction, or QSAR). Examples represent molecules, and we wish to predict a real-valued measure of binding affinity. Exactly the general problem of data skew arose with exactly this task for example in one of the KDD Cup 2001 tasks. While the authors here mention that labeled data may be focused more on active molecules (those with a high continuous-valued response), in the KDD Cup 200`1 data the reverse was true, and the unlabeled test data were skewed to higher activity level. I say all this to agree with the authors about the real-world nature of the problem they address. Also, some reviewers felt more empirical evaluation was needed, so that may be an additional data set for the authors to consider using. + +Reviewer concerns including that the approach was simplistic, the empirical results were insufficient, and the claims were oversold. The author replies and revisions, and the discussion, moved the reviews to be more favorable but still not strong enough to justify acceptance yet. Nevertheless, the consensus is that the paper addresses an important problem and the revisions are headed in the right direction to make a strong future paper, and that the authors should be encouraged to continue this work.",ICLR2021, +NT43L6ytiJ_,1642700000000.0,1642700000000.0,1,K3uRhaKJuZg,K3uRhaKJuZg,Paper Decision,Reject,"The authors propose a new method for deepfake detection (ENST) which relies on high-frequency information, low-level/shallow features, and optical flow. In particular, EfficientNet-B5 is used to extract the high frequency info and shallow features, and a Swin Transformer to capture discrepancies between optical flows. Empirical validation on FaceForensics++ and Celeb-DF shows some improvements over the baselines. + +The reviewers found this to be a relevant and timely topic. The reviewers also found that integrating information from the frequency domain, the spatial domain, and optical flow is a promising approach. There were three reviewers suggesting rejection, and one suggesting acceptance. After the rebuttal and discussion phase, the following remaining issues were highlighted: +- **Limited technical novelty** (nearly all components used in this work were already expired in other work). +- Underwhelming empirical improvements given the fact that the model uses EfficientNet-B5 and the SwinTransformer. +- Many claims are still not supported by empirical evidence. For instance, to claim generalisation, an extensive analysis, including more datasets as well as competing methods should be carried out.",ICLR2022, +SkRTnMLOg,1486400000000.0,1486400000000.0,1,S11KBYclx,S11KBYclx,ICLR committee final decision,Accept (Poster),"The reviewers agreed that this is a good paper that proposes an interesting approach to modeling training curves. The approach is well motivated in terms of surrogate based (e.g. Bayesian) optimization. They are convinced that a great model of training curves could be used to extrapolate in the future, greatly expediting hyperparameter search methods through early stopping. None of the reviewers championed the paper, however. They all stated that the paper just did not go far enough in showing that the method is really useful in practice. While the authors seem to have added some interesting additional results to this effect, it was a little too late for the reviewers to take into account. + + It seems like this paper would benefit from some added experiments as per the authors' suggestions. Incorporating this within a Bayesian optimization methodology is certainly non-trivial (e.g. may require rethinking the modeling and acquisition function) but could be very impactful. The overall pros and cons are as follows: + + Pros: + - Proposes a neat model for extrapolating training curves + - Experiments show that the model can extrapolate quite well + - It addresses a strong limitation of current hyperparameter optimization techniques + + Cons + - The reviewers were underwhelmed with the experimental analysis + - The paper did not push the ideas far enough to demonstrate the effectiveness of the approach + +Overall, the PCs have established that, despite some of its weaknesses, this paper deserved to appear at the conference.",ICLR2017, +qp_RSbYeAxx,1642700000000.0,1642700000000.0,1,olQbo52II9,olQbo52II9,Paper Decision,Reject,"The paper proposes an efficient RL-based approach for solving the weighted maximum cut problem. The proposed approach shares high-level insights with prior work such as ECO-DQN (Barrett et al.) and S2V-DQN; the key contribution is to demonstrate that the proposed cheap action decoding and stochastic policy strategy can improve the scalability without sacrificing much of the quality of the solution on the tasks considered in this paper. + +The reviewers in general find the paper well presented, and especially note that the clear motivation for improving the efficiency of current GNN-based RL baselines, particularly represented by ECO-DQN. + +A common concern among the reviewers is that the original title is misleading; the authors acknowledge that they should properly position the paper to avoid confusion that they were to address general combinatorial optimization problems (as the current title suggests). Notably, many combinational optimization problems can be reduced to max-cut as suggested in the authors’ responses; demonstrating the performance in (some of) these problems via a max-cut reduction would be helpful to support the significance of this work. + +Beyond the title and positioning of this work, there were also initial confusions among the committee in terms of the choice of both (RL or supervised) learning-based and heuristic-based baselines. The authors did an excellent job in clarifying many of the questions in terms of related work and baselines (the clarity of the work has improved over the rebuttal phase). However, despite the additional ablation study and newly added baselines, there remain concerns/questions in the choice of task domains (lack of hard problem instances where existing solvers, learning- or heuristics -based may fail due to (possibly higher) computational complexity). Given the empirical focus of the paper, this appears to be an important concern, and not all reviewers are convinced the current empirical results are significant to warrant acceptance of this work.",ICLR2022, +52mcRK2WXRT,1642700000000.0,1642700000000.0,1,1QxveKM654,1QxveKM654,Paper Decision,Reject,"The paper demonstrates that one phase of de novo assembly, specifically the layout phase, can be replaced with graph-neural-network based methods. The paper clarifies in the rebuttal that it focuses on building a method for assembling high-quality long reads. + +All four reviewers rated the paper as below the acceptance threshold. The reviewers largely agree that the idea of using GNNs to assemble a genome from reads is novel, interesting, and has the potential to be very useful. +The reviewers raise the following concerns: The paper only considers synthetic data, and the synthetic reads used in the simulations are error-free. In practice, reads are not error-free, and thus simulations on real data or at the very least on reads with errors are needed. The authors acknowledge that, and state that they'll provide such experiments in future work. In summary, the reviewers found the experiments to be insufficient to support the claims, even though it is understood by the reviewers and me that the paper only presents a proof-of-concept idea. I agree with the reviewers that simulations on erroneous reads, ideally real data, would be needed for acceptance. + +I recommend to reject the paper, since the paper provides insufficient experiments to understand the merits of the proposed approach.",ICLR2022, +bwwgnKFYFxk,1610040000000.0,1610470000000.0,1,ePh9bvqIgKL,ePh9bvqIgKL,Final Decision,Reject,"The paper proposes the idea of searching parameterized activation functions, in contrast to the previous handcraft or learnable ones. It may be a counterpart of neural architecture search. + +Pros: +1. The idea is very interesting. +2. The paper is well written. +3. The experiments show improvements over baseline activation functions. + +Cons: +1. The AC fully agreed with Reviewer #4 that the whole literature of learnable activation function is neglected (Reviewer #2 also alluded to this issue). Although the authors added experiments with learnable baseline activation functionss, the literature review on learnable activation function was not included accordingly. +2. Although the idea of searching activation functions is interesting, the AC doubted the necessity. Since the rich literature of learnable activation functions is already there (note that it is more than introducing parameters to handcrafted ones), can we simply learn piecewise linear activation functions with more pieces so that it can approximate complex enough functions? This can be much more easily implemented (can go along with weight training on the standard deep learning platform) and the computation cost will be much lower. Such a comparison is absolutely necessary. +3. The AC was actually worried about the activation functions founded as they may be too complex, so the generalization issue (even numerical stability issue) may be a concern. More thorough testing is necessary (currently only tested on CIFAR-100 and three CNNs; and Reviewers #3 and #2 also concerned about this issue). + +Although Reviewer #2 raised his/her score, the final average score is still below threshold. So the AC decided to reject the paper.",ICLR2021, +Hye7mhWGlE,1544850000000.0,1545350000000.0,1,S1EHOsC9tX,S1EHOsC9tX,Interesting progress on more versatile robustness guarantees,Accept (Poster),"The paper presents a technique of training robust classification models that uses the input distribution within each class to achieve high accuracy and robustness against adversarial perturbations. + +Strengths: + +- The resulting model offers good robustness guarantees for a wide range of norm-bounded perturbations + +- The authors put a lot of care into the robustness evaluation + +Weaknesses: + +- Some of the ""shortcomings"" attributed to the previous work seem confusing, as the reported vulnerability corresponds to threat models that the previous work did not made claims about + +Overall, this looks like a valuable and interesting contribution. +",ICLR2019,5: The area chair is absolutely certain +qurX__LD4,1576800000000.0,1576800000000.0,1,r1gBOxSFwr,r1gBOxSFwr,Paper Decision,Reject,"This paper proposes a novel pruning method for use with transformer text encoding models like BERT, and show that it can dramatically reduce the number of non-zero weights in a trained model while only slightly harming performance. + +This is one of the hardest cases in my pile. The topic is obviously timely and worthwhile. None of the reviewers was able to give a high-confidence assessment, but the reviews were all ultimately leaning positive. However, the reviewers didn't reach a clear consensus on the main strengths of the paper, even after some private discussion, and they raised many concerns. These concerns, taken together, make me doubt that the current paper represents a substantial, sound contribution to the model compression literature in NLP. + +I'm voting to reject, on the basis of: + +- Recurring concerns about missing strong baselines, which make it less clear that the new method is an ideal choice. +- Relatively weak motivations for the proposed method (pruning a pre-trained model before fine-tuning) in the proposed application domain (mobile devices). +- Recurring concerns about thin analysis.",ICLR2020, +FaQTLflCSN,1576800000000.0,1576800000000.0,1,HketzTNYwS,HketzTNYwS,Paper Decision,Reject,"(I acknowledge reading authors' recent note on decaNLP.) + +This paper proposes a span extraction approach (SpExBERT) to unify question answering, text classification and regression. Paper includes a significant number of experiments (including low-resource and multi-tasking experiments) on multiple benchmarks. The reviewers are concerned about lack of support on author's claims from the experimental results due to seemingly insignificant improvements and lack of analysis regarding the results. Hence, I suggest rejecting the paper.",ICLR2020, +BJxoat1IxN,1545100000000.0,1545350000000.0,1,By40DoAqtX,By40DoAqtX,Needs rewriting,Reject,"All three reviewers expressed concerns about the writing of the paper. The AC thus recommends ""revise and resubmit"".",ICLR2019,4: The area chair is confident but not absolutely certain +Bke5LuGZlV,1544790000000.0,1545350000000.0,1,HketHo0qFm,HketHo0qFm,Meta-review,Reject,"Pros: +- an original idea: learn an additional inverse policy (that minimizes reward) to help find actions that should be avoided. + +Cons: +- not clearly presented +- conclusions are not not validated +- empirical evidence is weak +- no rebuttal + +The three reviewers reached consensus that the paper should be rejected in its current form, but make numerous suggestions for improving it for a future submission. +",ICLR2019,5: The area chair is absolutely certain +#NAME?,1642700000000.0,1642700000000.0,1,6q_2b6u0BnJ,6q_2b6u0BnJ,Paper Decision,Accept (Poster),"The paper investigates what we can learn from _suboptimal_ demonstrations for imitation learning. It suggests that we can learn about the structure of the environment by finding a factored dynamics model including a latent action space. It demonstrates both theoretically and empirically that this information can reduce sample requirements for downstream IL. + +The reviewers praised the simplicity of the method (including its minimal assumptions), the theoretical analysis, and the breadth of the experimental validation. The authors were helpful during the discussion period, and addressed any questions or concerns the reviewers raised. + +Overall, this is an interesting idea and a well-executed paper.",ICLR2022, +0YFv99Y4U_K,1610040000000.0,1610470000000.0,1,3SqrRe8FWQ-,3SqrRe8FWQ-,Final Decision,Accept (Poster),"Most of the reviewers agree that this paper presents an interesting idea. Practically implementing a BNN that gains real world speedup is challenging, and as past work [1] showed, the bottleneck could shift into other layers(besides the accumulation). The paper would benefit from a thorough discussion about the practical impact in implementing the proposed method and relation to past work. + +The meta-reviewer decided to accept the paper given the positive aspects, and encourages the author to further improve the paper per review comments. + +Thank you for submitting the paper to ICLR. + +[1] Riptide: Fast End-to-End Binarized Neural Networks +",ICLR2021, +fo3bFRQd8p,1576800000000.0,1576800000000.0,1,rylztAEYvr,rylztAEYvr,Paper Decision,Reject,"This paper proposes a training scheme to enhance the optimization process where the outputs are required to meet certain constraints. The authors propose to insert an additional target augmentation phase after the regular training. For each datapoint, the algorithm samples candidate outputs until it find a valid output according the an external filter. The model is further fine-tuned on the augmented dataset. + +The authors provided detailed answers and responses to the reviews, which the reviewers appreciated. However, some significant concerns remained, and due to a large number of stronger papers, this paper was not accepted at this time.",ICLR2020, +bk3PWmwIW-k,1642700000000.0,1642700000000.0,1,AOn-gHymcx,AOn-gHymcx,Paper Decision,Reject,"All reviewers vote for rejecting this paper. The main points of criticism shared by the reviewers are missing novelty and missing/unclear significance of the contribution. There was no rebuttal, so this is a clear reject.",ICLR2022, +mFPVG-rTb7,1576800000000.0,1576800000000.0,1,Sye57xStvB,Sye57xStvB,Paper Decision,Accept (Poster),"This paper tackles hard-exploration RL problems. The idea is to learn separate exploration and exploitation strategies using the same network (representation). The exploration is driven by intrinsic rewards, which are generated using an episodic memory and a lifelong novelty modules. Several experiments (simple and Atari domains) show that the proposed approach compares favourably with the baselines. + +The work is novel both in terms of the episodic curiosity metric and its integration with the life-long curiosity metric, and the results are convincing. All reviewers being positive about this paper, I therefore recommend acceptance.",ICLR2020, +4cjMdXZJ0,1576800000000.0,1576800000000.0,1,r1gzdhEKvH,r1gzdhEKvH,Paper Decision,Reject,"Reviewers found the problem statement having merit, but found the solution not completely justifiable. Bandit algorithms often come with theoretical justification because the feedback is such that the algorithm could be performing horribly without giving any indication of performance loss. With neural networks this is obviously challenging given the lack of supervised learning guarantees, but reviewers remain skeptical and prefer not to speculate based on empirical results. ",ICLR2020, +fjSJ9keA-l,1576800000000.0,1576800000000.0,1,HklvMJSYPB,HklvMJSYPB,Paper Decision,Reject,This paper extends adversarial imitation learning to an adaptive setting where environment dynamics change frequently. The authors propose a novel approach with pragmatic design choices to address the challenges that arise in this setting. Several questions and requests for clarification were addressed during the reviewing phase. The paper remains borderline after the rebuttal. Remaining concerns include the size of the algorithmic or conceptual contribution of the paper.,ICLR2020, +S1rahG8ul,1486400000000.0,1486400000000.0,1,SkJeEtclx,SkJeEtclx,ICLR committee final decision,Reject,"There appears to be consensus among the reviewers that the paper appears the overstate its contributions: the originality of the proposed temporal modeler (TEM) is limited, and the experimental evaluation (which itself is of good quality!) does not demonstrate clear merits of the TEM architecture. As a result, the impact of this paper is expected to be limited.",ICLR2017, +H1t2My6Hz,1517250000000.0,1517260000000.0,20,H1tSsb-AW,H1tSsb-AW,ICLR 2018 Conference Acceptance Decision,Accept (Oral),The reviewers are satisfied that this paper makes a good contribution to policy gradient methods.,ICLR2018, +H1lMFvztlE,1545310000000.0,1545350000000.0,1,S1gWz2CcKX,S1gWz2CcKX,metareview,Reject,"The reviewers raise a number of concerns including limited methodological novelty, limited experimental evaluation (comparisons), and poor readability. Although the authors did address some of the concerns, the paper as is needs a lot of polishing and rewriting. Hence, I cannot recommend this work for presentation at ICLR.",ICLR2019,4: The area chair is confident but not absolutely certain +SJ18EkaSf,1517250000000.0,1517260000000.0,350,Bki4EfWCb,Bki4EfWCb,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"Thank you for submitting you paper to ICLR. This paper provides an informative analysis of the approximation contributions from the various assumptions made in variational auto-encoders. The revision has demonstrated the robustness of the paper’s conclusions, however these conclusions are arguably unsurprising. Although the work provides a thorough and interesting piece of detective work, the significance of the findings is not quite great enough to warrant publication. + +Reviewer 1 was searching for a reference for work in similar vein to section 5.4: The second problem identified in the reference below shows examples where using an approximating distribution of a particular form biases the model parameter estimates to settings that mean the true posterior is closer to that form. + +R. E. Turner and M. Sahani. (2011) Two problems with variational Expectation Maximisation for time-series models. Inference and Learning in Dynamic Models. Eds. D. Barber, T. Cemgil and S. Chiappa, Cambridge University Press, 104–123, 2011.",ICLR2018, +FQpJtZAtKtC,1642700000000.0,1642700000000.0,1,#NAME?,#NAME?,Paper Decision,Accept (Spotlight),"The paper considers the setting of bi-level optimization and proposes a quasi-Newton scheme to reduce the cost of Jacobian inversion, which is the main bottleneck of bi-level optimization methods. The paper proves that the proposed scheme correctly estimates the true implicit gradient. The theoretical results are supported by numerical experiments, which are encouraging and show that the proposed method is either competitive with or outperforms the Jacobian Free method recently proposed in the literature. + +Even though the reviews expressed some initial concerns regarding the empirical performance of the proposed method, the authors adequately addressed those concerns and provided additional experiments. Thus, a consensus was reached that the paper should be accepted.",ICLR2022, +uVb3Ow-12JH,1610050000000.0,1610470000000.0,1,tGZu6DlbreV,tGZu6DlbreV,Final Decision,Accept (Poster),"There is a consensus among the reviewers that the work is interesting and the paper should be accepted. Nevertheless, several reviewers struggled with understanding the details. While the authors (largely successfully) addressed these concerns, I believe that the paper is still too dense and hard to follow, I would encourage the authors to invest more time into improving its readability. One important point which came late in the discussion is the provenance of baseline scores in the result tables (see the review by AnonReviewer3, the current manuscript claims that the numbers are taken from the original papers while in some cases, the numbers cannot be located in these papers). Unfortunately, the authors did not have a chance to respond to this criticism, and fortunately we could trace the key numbers and establish that the results are strong enough to warrant accepting the submission. Still, we would ask the reviewers to fix this issue in the final version.",ICLR2021, +B1xRUVBZg4,1544800000000.0,1545350000000.0,1,r1Gsk3R9Fm,r1Gsk3R9Fm,not ready for acceptance,Reject,"The paper discusses layer-wise training of deep networks. The authors show that it's possible to achieve reasonable performance by training deep nets layer by layer, as opposed to now widely adopted end-to-end training. While such a training procedure is not novel, the authors argue that this is an interesting result, considering that such a training procedure is often dismissed as sub-optimal and leading to inferior results. However, the results show exactly that, as the performance of the models is significantly worse than the state of the art, and it is unclear what other advantages such a training scheme can offer. The authors mention that layer-wise training could be useful for theoretical understanding of deep nets, but they don’t really perform such analysis in this submission, and it’s also unclear whether conclusions of such analysis would extend to deep nets trained end-to-end. + +In its current form, the paper is not ready for acceptance. I encourage the authors to make a more clear case for the method: either by improving results to match end-to-end training, or by actually demonstrating that layer-wise training has certain advantages over end-to-end learning. +",ICLR2019,5: The area chair is absolutely certain +rJOj2fUul,1486400000000.0,1486400000000.0,1,r1Usiwcex,r1Usiwcex,ICLR committee final decision,Reject,"This paper applies an existing idea (Yao's block Gibbs sampling of NADE) to a music model. There is also prior art for applying NADE to music. The main novel and interesting result is that block Gibbs sampling (an approximation) actually improves performance, highlighting problems with NADE. + + This work is borderline for inclusion. The paper is mainly an application of existing ideas. The implications of the interesting results could perhaps have been explored further for a paper at a meeting on learning representations.",ICLR2017, +Jy9Cuw_ym,1576800000000.0,1576800000000.0,1,HyxJhCEFDS,HyxJhCEFDS,Paper Decision,Accept (Poster),This paper studies the properties of adversarial training in the large scale setting. The reviewers found the properties identified by the paper to be of interest to the ICLR community - in particular the robustness community. We encourage the authors to release their models to help jumpstart future work building on this study.,ICLR2020, +gUuumxFCUS,1576800000000.0,1576800000000.0,1,Hke4_JrYDr,Hke4_JrYDr,Paper Decision,Reject,"This paper proposes a deep network architecture for learning to predict depth from images with sparsely depth-labeled pixels. + +This paper was subject to some discussion, since the authors felt that the approach was interesting and the problem-well motivated. Some of the concerns about experimental evaluation (especially from R1) were resolved due the author's rebuttal, but ultimately the reviewers felt the paper was not yet ready for publication. ",ICLR2020, +TS9-KFMaBDo,1610040000000.0,1610470000000.0,1,IpPQmzj4T_,IpPQmzj4T_,Final Decision,Reject,"The paper seeks to increase receptive fields of GNNs by aggregating information beyond local neighborhoods with the idea of addressing oversmoothing and/or overfitting issues with message passing algorithms. The proposed method is simple and primarily makes use of node features and local structure similarities. In this sense the approach is related to Pei et al. Several concerns remained as articulated in the reviews, including: oversmoothing is not discussed/analyzed, performance gains are small, more extensive comparisons are needed. +",ICLR2021, +ryxHsjD61N,1544550000000.0,1545350000000.0,1,HyGLy2RqtQ,HyGLy2RqtQ,ICLR 2019 decision,Reject,"`This paper tackles the problem of learning with one hidden layer non-overlapping conv net for XOR detection problem. For this problem the paper shows that over parametrized models perform better, giving insights into why larger neural networks generalize better - an interesting question to study. However reviews opined that the setting considered in this paper is too specific to this XOR problem and the simplified network architecture, and the techniques are not generalizable to other models. Generalizing these results to more complex architectures or other learning problems will make the paper more interesting. ",ICLR2019,4: The area chair is confident but not absolutely certain +yUVSnvEF9T9,1642700000000.0,1642700000000.0,1,zNHzqZ9wrRB,zNHzqZ9wrRB,Paper Decision,Accept (Spotlight),"The paper proposes a rotationally equivariant transformer architecture for predicting molecular properties. The proposed architecture demonstrates good computational efficiency and good results on three benchmarks. + +All four reviewers recommend acceptance (two weak, two strong), citing the novelty of the architecture, the good computational efficiency of the model and the good empirical results as the main strengths of the paper. The reviewers expressed minor criticisms and recommendations for improvement, some of which were addressed by the authors during the reviewing process, which led to an increase in scores. + +Overall, this is a nice contribution of machine learning to science, and I'm happy to recommend acceptance to ICLR.",ICLR2022, +ZvQbhXE95G,1610040000000.0,1610470000000.0,1,GtiDFD1pxpz,GtiDFD1pxpz,Final Decision,Reject,"This paper was reviewed by 4 reviewers who scored the paper below acceptance threshold even after the rebuttal. Reviewer 4 is concerned about motivation, Reviewer 2 rightly points out that there exist numerous works that use some form of spectral layers in a deep setting on challenging datasets - something lacking in this work. Reviewer 3 is concerned about limited discussion on lie groups and the overall benefit of expm(.). Reviewer 1 reverberates the same comments regarding insufficient experiments, comparisons and limited motivation. We encourage authors to consider all pointers given by reviewers in any future re-submission.",ICLR2021, +OhkRUXxy_2,1576800000000.0,1576800000000.0,1,SylR6n4tPS,SylR6n4tPS,Paper Decision,Reject,"This paper proposes a cyclical training scheme for grounded visual captioning, where a localization model is trained to identify the regions in the image referred to by caption words, and a reconstruction step is added conditioned on this information. This extends prior work which required grounding supervision. + +While the proposed approach is sensible and grounding of generated captions is an important requirement, some reviewers (me included) pointed out concerns about the relevance of this paper's contributions. I found the authors’ explanation that the objective is not to improve the captioning accuracy but to refine its grounding performance without any localization supervision a bit unconvincing -- I would expect that better grounding would be reflected in overall better captioning performance, which seems to have happened with the supervised model of Zhou et al. (2019). In fact, even the localization gains seem rather small: “The attention accuracy for localizer is 20.4% and is higher than the 19.3% from the decoder at the end of training.” Overall, the proposed model is an incremental change on the training of an image captioning system, by adding a localizer component, which is not used at test time. The authors' claim that “The network is implicitly regularized to update its attention mechanism to match with the localized image regions” is also unclear to me -- there is nothing in the loss function that penalizes the difference between these two attentions, as the gradient doesn’t backprop from one component to another. Sharing the LSTM and Language LSTM doesn’t imply this, as the localizer is just providing guidance to the decoder, but there is no reason this will help the attention of the original model. + +Other natural questions left unanswered by this paper are: +- What happens if we use the localizer also in test time (calling the decoder twice)? Will the captions improve? This experiment would be needed to assess the potential of this method to help image captioning. +- Can we keep refining this iteratively? +- Can we add a loss term on the disagreement of the two attentions to actually achieve the said regularisation effect? + +Finally, the paper [1] (cited by the authors) seems to employ a similar strategy (encoder-decoder with reconstructor) with shown benefits in video captioning. + +[1] Bairui Wang, Lin Ma, Wei Zhang, and Wei Liu. Reconstruction network for video captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7622–7631, 2018. + +I suggest addressing some of these concerns in a revised version of the paper.",ICLR2020, +pwUugxED1k6,1610040000000.0,1610470000000.0,1,FOyuZ26emy,FOyuZ26emy,Final Decision,Accept (Poster),"The authors carefully study a class of unsupervised learning models called self-expressive deep subspace clustering (SEDSC) models, which involve clustering data arising from mixtures of complex nonlinear manifolds. The main contribution is to show that the SEDSC formulation itself suffers from fundamental degeneracies, and that the experimental gains reported in the literature may be due to ad-hoc preprocessing. + +The contributions are compelling, and all reviewers appreciated the paper. Despite the paper being of somewhat narrow focus, my belief is that negative results of this nature are useful and timely. I recommend an accept.",ICLR2021, +X0nlKU13z,1576800000000.0,1576800000000.0,1,r1xNJ0NYDH,r1xNJ0NYDH,Paper Decision,Reject,"This paper introduces the concept of gradient confusion to show how the neural network architecture affects the speed of training. The reviewers' opinion on this paper varies widely, also after the discussion phase. The main disagreement is on the significance of this work, and whether the concept of gradient confusion adds something meaningful to the existing literature with respect to understanding deep networks. The strong disagreement on this paper suggest that the paper is not quite ready yet for ICLR, but that the authors should make another iteration on the paper to strengthen the case for its significance. +",ICLR2020, +X9ETHEEm8s,1576800000000.0,1576800000000.0,1,S1xI_TEtwS,S1xI_TEtwS,Paper Decision,Reject,"The paper proposes a modification for adversarial training in order to improve the robustness of the algorithm by developing an annealing mechanism for PGD adversarial training. This mechanism gradually reduces the step size and increases the number of iterations of PGD maximization. One reviewer found the paper to be clear and competitive with existing work, but raised concerns of novelty and significance. Another reviewer noted the significant improvements in training times but had concerns about small scale datasets. The final reviewer liked the optimal control formulation, and requested further details. The authors provided detailed answers and responses to the reviews, although some of these concerns remain. The paper has improved over the course of the review, but due to a large number of stronger papers, was not accepted at this time.",ICLR2020, +MMyr9n6HY2M,1610040000000.0,1610470000000.0,1,3FK30d5BZdu,3FK30d5BZdu,Final Decision,Reject,"This is a thought-provoking paper which describes a significant problem that plausibly occurs in deployed ML/RL models. +The paper is clearly written, describing claims using examples and developing small unit-tests to probe models. +However, as the reviews and discussion show, the exposition should be substantially re-worked so that the core contributions are more understandable -- the core message in the revised manuscript is still very nuanced and easy to mis-understand. + +Let's say we train an ML model using supervised learning to minimize a loss function on a dataset. Several models may have near-optimal loss as measured on a validation set -- a learning algorithm is free to return any one of them. Now in a deployed system, these ML models are not merely generating passive predictions; these predictions are driving system operation and potentially influencing future states/contexts/inputs that the model will be invoked on. It is well known that supervised learning makes an iid assumption between training and deployment which is violated in this setting -- that is not the main point of this paper. Consider again the set of models with near-optimal loss. Some of them, when deployed, may cause the distribution mismatch between training and deployment to be miniscule, while other models may introduce a vast mismatch. We may choose a learning algorithm which just so happens to pick models from the former category; and we may conclude that feedback effects induced by the ML model are not substantial. We then change some unrelated detail in the learning algorithm (but not the objective, datasets, validation criteria, etc.) which just so happens to pick models from the latter category and suddenly witness a large distribution shift. What happened? And could we have developed tests to detect that our learning algorithms have these tendencies? The paper attempts to articulate such questions, and design the first step in answering them. + +Moving to RL, where we routinely consider distribution shifts in states visited by different policies, does not fundamentally fix all these issues because the reward function is typically an engineered proxy to elicit desired behavior -- and we may again find that some RL algorithms have a tendency to find reward-maximizing policies that exploit gaps in reward specification as opposed to following intended behavior. + +The core question studied in this paper, scoped to the supervised learning setting, is very related to that of strategic classification (see e.g., https://arxiv.org/pdf/1910.10362.pdf Strategic Classification is Causal Modeling in Disguise). The following sketch is inspired by that literature. + +We might hope to augment the training objective of ML/myopic RL/strategic RL to address the Auto-induced distribution shift problem as follows. +[Supervised learning for content recommendation] Let the training/validation data distribution be D. Assume for now that there is no exogenous factor in the environment that causes any distribution shifts in deployment -- so, the only shift is due to feedback effects from the predictions made by the model. For an ML model f, let the corresponding recommendation policy be pi_f, and let the long-term distribution of data seen from user-interactions with pi_f be D[pi_f]. Then, what we want is: f* = argmin_F Expectation over D [ L(f) ] subject to constraint that D ~= D[pi_f]. +For a contextual bandit/myopic RL formulation of the problem, we could similarly constrain the learning problem as pi* = argmax_Pi Expectation over D [ Reward of pi] subject to constraint that D \approx D[pi]. +Essentially, both supervised ML and contextual bandit algorithms are assuming that context distribution is unchanged -- so let us enforce that the context distribution is indeed unchanged as a consequence of the policy's actions. +It is unclear how to generalize this kind of thinking to situations where environmental changes also contribute to distribution shift. The authors call out precisely this flaw using the cryptic comment -- 'not trying to change X' is not the same as 'trying to not change X'. The formulation above does 'trying to not change X', but that is an insufficient band-aid in situations when environment changes X. It's also unclear how one might estimate D[pi] or D[pi_f] and appropriately constrain the learning algorithm -- but these are all interesting questions to study. + +The paper in its current form is asking an important question. In supervised learning, the desired solution might actually coincide with strategic classification solution concepts. The paper may be asking a generalization of the phenomenon for myopic RL and RL. It may spark interesting discussions and follow-up work, but is not yet mature beyond a workshop poster. +Generalizing the unit-tests, articulating the scope of situations where context-swapping may be a useful strategy, and even formalizing the problem and desired goal (as attempted above for the content recommendation example) will substantially strengthen the paper.",ICLR2021, +G9ObSPSc6eA,1610040000000.0,1610470000000.0,1,pHkBwAaZ3UK,pHkBwAaZ3UK,Final Decision,Reject,"Four knowledgeable referees lean towards rejection because of the missing detailed complexity analysis [R1,R2,R3], the choice of rather small datasets which hinders the rigorous evaluation of GNN models [R3,R4], missing state-of-the-art comparisons [R2] and ablations [R4]. The rebuttal addressed some of the concerns raised by the reviewers, in particular, clarifications request by R2, smoothness of the weights questions of R4, and the difference in performance of the baseline methods of R1. However, after discussion, the reviewers are still concerned with the missing ablations, comparisons, and complexity analysis. I agree with their assessment and therefore must reject. However, I agree with the reviewers that this is an interesting approach and encourage the authors to consider the reviewer's suggestions for future iterations of their work. ",ICLR2021, +cR32AjQwmV9,1610040000000.0,1610470000000.0,1,xEpUl1um6V,xEpUl1um6V,Final Decision,Reject,"The paper studies benchmarking of bias mitigation methods. The authors propose a synthetic dataset of images (alike colored-MNIST) that enables a controlled setup over different types of correlations between a binary sensitive attribute, dataset features, and a binary outcome label. The authors have evaluated 2K models that are the variants of three recently proposed debiasing methods using fair representation learning across various settings. + +While the reviewers acknowledged the importance of benchmarking fair learning methods in a systematic controlled setting, they have raised several concerns: +(1) the proposed benchmark is too abstract/unrealistic (R4, R2, R3); it is not clear whether the findings from this benchmark can be generalized to real-world data with real sensitive features, (2) the proposed benchmark is limited to pseudo sensitive attributes (R1) that are binary (R1, R2), (3) the paper lacks in-depth analysis on why certain methods work under certain conditions (R3). Among these, (2) did not have a major impact on the decision, but would be helpful to address in a subsequent revision, (3) was partially addressed in the rebuttal. However, (1) makes it very difficult to assess the benefits of the proposed benchmark, and was viewed by AC as a critical issue. + +The authors provided a detailed rebuttal addressing multiple of the reviewers’ concerns. AC can confirm that all four reviewers have read the author responses and have contributed to the discussion. A general consensus among reviewers and AC suggests, in its current state the manuscript is not ready for a publication. See R1 post-rebuttal encouragement and suggestions how to strengthen the work. We hope the detailed reviews are useful for improving the paper. +",ICLR2021, +_GqW6y8KdnP,1610040000000.0,1610470000000.0,1,e6hMkY6MFcU,e6hMkY6MFcU,Final Decision,Reject,"The authors propose a method for attacking neural NLP models based on individual word importance (""WordsWorth"" scores).  This is an interesting, timely topic and there may be some interesting ideas here, but at present the paper suffers from poor presentation which makes it difficult to discern the contribution. Presentation issues aside, it seems that the experimental setup is missing key baselines (an issue not sufficiently addressed by the author response). ",ICLR2021, +Qy8E9ksjm-,1576800000000.0,1576800000000.0,1,Hkxzx0NtDB,Hkxzx0NtDB,Paper Decision,Accept (Talk),"This paper uses energy based model to interpret standard discriminative classifier and demonstrates that energy based model training of the joint distribution improves calibration, robustness, and out-of-distribution detection while generating samples with better quality than GAN-based approaches. The reviewers are very excited about this work, and the energy-based perspective of generative and discriminative learning. There is a unanimous agreement to strongly accept this paper after author response.",ICLR2020, +rJeOxbBexN,1544730000000.0,1545350000000.0,1,HJxB5sRcFQ,HJxB5sRcFQ,Clear accept ratings from reviewers,Accept (Poster),"Reviewers agree the paper should be accepted. +See reviews below.",ICLR2019,4: The area chair is confident but not absolutely certain +NJt3vIbY4Q,1610350000000.0,1610470000000.0,1,n5go16HF_B,n5go16HF_B,Final Decision,Reject,"This paper proposes a new generation technique for multi-category marked temporal point processes. The paper was reviewed by three expert reviewers who expressed concerns for limited novel contributions, theoretical justification, and empirical evidence. The authors are encouraged to continue research, taking into consideration the detailed comments provided by the reviewers.",ICLR2021, +U8NA3EqwE6,1576800000000.0,1576800000000.0,1,rJxe3xSYDS,rJxe3xSYDS,Paper Decision,Accept (Poster),"The paper proposes a fast training method for extreme classification problems where number of classes is very large. The method improves the negative sampling (method which uses uniform distribution to sample the negatives) by using an adversarial auxiliary model to sample negatives in a non-uniform manner. This has logarithmic computational cost and minimizes the variance in the gradients. There were some concerns about missing empirical comparisons with methods that use sampled-softmax approach for extreme classification. While these comparisons will certainly add further value to the paper, the improvement over widely used method of negative sampling and a formal analysis of improvement from hard negatives is a valuable contribution in itself that will be of interest to the community. Authors should include the experiments on small datasets to quantify the approximation gap due to negative sampling compared to full softmax, as promised.",ICLR2020, +iV_Cewp2j-,1576800000000.0,1576800000000.0,1,HJgzpgrYDr,HJgzpgrYDr,Paper Decision,Reject,"The authors present a self-supervised framework for learning a hierarchical policy in reinforcement learning tasks that combines a high-level planner over learned latent goals with a shared low-level goal-completing control policy. The reviewers had significant concerns about both problem positioning (w.r.t. existing work) and writing clarity, as well as the fact that all comparative experiments were ablations, rather than comparisons to prior work. While the reviewers agreed that the authors reasonably resolved issues of clarity, there was not agreement that concerns about positioning w.r.t. prior work and experimental comparisons were sufficiently resolved. Thus, I recommend to reject this paper at this time.",ICLR2020, +8esEzrXw-YS,1642700000000.0,1642700000000.0,1,rX3rZYP8zZF,rX3rZYP8zZF,Paper Decision,Reject,"The paper introduces the CareGraph, a knowledge graph based recommendation approach. +CareGraph is a deep neural network-based recommender that can be used a mobile healthcare platform +for nudge recommendation. The main motivation is to use the knowledge graph to +mitigate cold start problems when recommending nudge messages. + +The papers' main strength is the topic of interest. Research on recommending systems in the healthcare context is of great interest. +However, the reviews raised concerns that outweigh the strengths. +The majority reviewers agree that the work is not ready for publication. +Main concerns focus on weak experimental section and lack of technical details. + +I recommend the authors to incorporate all the reviewers' comments and make a +stronger submission to a future conference!",ICLR2022, +bvpCNULqJ,1576800000000.0,1576800000000.0,1,SJlRWC4FDB,SJlRWC4FDB,Paper Decision,Reject,"This paper shows a case study of an adversarial attack on a copyright detection system. The paper implements a music identification method with a simple convolutional neural network, and shows that it is possible to fool such CNN with an adversarial learning. After the discussion period, two among three reviewers incline to the rejection of the paper. Although the majority of the reviewers agree that this is an interesting problem with an important application, they also find many of their concerns remain unaddressed. These include the generality of the finding as the current paper is more like a proof-of-concept that black/white-box attack can work for copyright system. The reviewers are also concerned that the technique solution/finding is not novel as it is very similar to prior work in other domains (e.g., image classification). One reviewer was particularly concerned about that the user study is missing, making it difficult to judge whether the quality of the modified audio is reasonable or not.",ICLR2020, +B1gmZOT-gN,1544830000000.0,1545350000000.0,1,B1ePui0ctQ,B1ePui0ctQ,Paper decision,Reject,"Reviewers mostly recommended to reject. Please take reviewers' comments into consideration to improve your submission should you decide to resubmit. +",ICLR2019,3: The area chair is somewhat confident +kGiBXCM-CIom,1642700000000.0,1642700000000.0,1,uxgg9o7bI_3,uxgg9o7bI_3,Paper Decision,Accept (Oral),"This paper proposes an efficient method for message passing that can incorporate structural information that is provably stronger than 1-WL. As compared to three strands of provably powerful (more than 1 WL) GNNs, the method has limited additional computational overhead, and can also show encouraging results on the over smoothing problem. Overall speaking, all the reviewers like this paper quite a lot, although the also raised some minor concerns. The paper also attracted some unofficial reviewers who provided quite a few related works. The authors did a good job in interacting with the reviewers and addressing their minor concerns. So, we believe the paper is worth accepting, and could be a significant work in the field of graph neural networks.",ICLR2022, +7mcOUUr9TQp,1642700000000.0,1642700000000.0,1,z7p2V6KROOV,z7p2V6KROOV,Paper Decision,Accept (Oral),"This paper presents U-WILDS, an extension of the multi-task, large-scale domain-shift dataset WILDS. The authors propose an extensive array of experiments evaluating the ability of a wide variety of algorithms to leverage the unlabelled data to address domain-shift problems. The vision behind sounds quite ambitious and convincing to me, namely, the proposed U-WILDS benchmark would be a useful and well-motivated resource for the ML community, and their experiments were very comprehensive. Although they did not introduce any new methods in this paper, U-WILDS significantly expands the range of modalities, applications, and shifts available for studying and benchmarking real-world unsupervised adaptation. + +The clarity, vision and significance are clearly above the bar of ICLR. While the reviewers had some concerns on the novelty, the authors did a particularly good job in their rebuttal. Thus, all of us have agreed to strongly accept this paper for publication! Please include the additional rebuttal discussion in the next version.",ICLR2022, +7Ptr9lukkj,1576800000000.0,1576800000000.0,1,rygUoeHKvB,rygUoeHKvB,Paper Decision,Reject,"There is insufficient support to recommend accepting this paper. The reviewers unanimously recommended rejection, and did not change their recommendation after the author response period. The technical depth of the paper was criticized, as was the experimental evaluation. The review comments should help the authors strenghen this work.",ICLR2020, +rJ4WpzIdx,1486400000000.0,1486400000000.0,1,Syfkm6cgx,Syfkm6cgx,ICLR committee final decision,Reject,The authors have withdrawn the submission.,ICLR2017, +ESN0GM5NVn,1576800000000.0,1576800000000.0,1,rJehf0VKwS,rJehf0VKwS,Paper Decision,Reject,"This paper shows a nice idea to transfer knowledge from larger sequence models to small models. However, all the reivewers find that the contribution is too limited and the experiments are insufficient. All the reviewers agree to reject.",ICLR2020, +SkZiNyaBM,1517250000000.0,1517260000000.0,420,SyVOjfbRb,SyVOjfbRb,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"The reviewers think that the theoretical contribution is not significant on its own. The reviewers find the empirical aspect of the paper interesting, but more analysis of the empirical behavior is required, especially for large datasets. Even for small datasets with input augmentation (e.g. random crops in CIFAR-10) the pre-processing can become prohibitive. I recommend improving the manuscript for a re-submission to another venue and an ICLR workshop presentation.",ICLR2018, +rJQNLypSG,1517250000000.0,1517260000000.0,758,HkjL6MiTb,HkjL6MiTb,ICLR 2018 Conference Acceptance Decision,Reject,"Reviewers unanimous in assessment that manuscript has merits, but does not satisfy criteria for publication. + +Pros: +- Potentially novel application of neural networks to survival analysis with competing risks, where only one terminal event from one risk category may be observed. + +Cons: +- Incomplete coverage of other literature. +- Architecture novelty may not be significant. +- Small performance gains (though statistically significant)",ICLR2018, +K23TsejCZ,1576800000000.0,1576800000000.0,1,H1xJhJStPS,H1xJhJStPS,Paper Decision,Reject,"Main content: paper introduces a new variant of equilibrium propagation algorithm that continually updates the weights making it unnecessary to save steady states. T + +Summary of discussion: +reviewer 1: likes the idea but points out many issues with the proofs. +reviewer 2: he really likes the novelty of paper, but review is not detailed, particularly discussing pros/cons. +reviewer 3: likes the ideas but has questions on proofs, and also questions why MNIST is used as the evaluation tasks. +Recommendation: interesting idea but writing/proofs could be clarified better. Vote reject. + +",ICLR2020, +O0qn9S8g0n,1576800000000.0,1576800000000.0,1,Bkl2UlrFwr,Bkl2UlrFwr,Paper Decision,Reject,"The submission proposes a method for learning a graph structure and node embeddings through an iterative process. Smoothness and sparsity are both optimized in this approach. The iterative method has a stopping mechanism based on distance from a ground truth. + +The concerns of the reviewers were about scalability and novelty. Since other methods have used the same costs for optimization, as well as other aspects of this approach, there is little contribution other than the iterative process. The improvement over LDS, the most similar approach, is relatively minor. + +Although the paper is promising, more work is required to establish the contributions of the method. Recommendation is for rejection. +",ICLR2020, +SyAgE1pSz,1517250000000.0,1517260000000.0,285,Skp1ESxRZ,Skp1ESxRZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper proposes a method for training an neural network to operate stack-based mechanism in order to act as a CFG parser in order to, eventually, improve program synthesis and program induction systems. The reviewers agreed that the paper was compelling and well supported empirically, although one reviewer suggested that analysis of empirical results could stand some improvement. The reviewers were not able to achieve a clear consensus on the paper, but given that the most negative reviewer has also declared themselves the least confident in their assessment, I am happy to recommend acceptance on the basis of the median rather than mean score.",ICLR2018, +dmDTOSOpQvBE,1642700000000.0,1642700000000.0,1,N3KYKkSvciP,N3KYKkSvciP,Paper Decision,Reject,"The paper contributes a theoretical understanding of training over-parametrized deep neural networks using gradient descent with respect to square loss in the NTK regime. Besides giving guarantees on the classification accuracy using square loss, authors reveal several interesting properties in this regime including robustness and calibration. + +The problem studied here is exciting and very relevant. The current version, unfortunately, has some shortcomings. For example, under a margin assumption, the authors show that the least-squares solution finds something with the margin and, therefore, it yields “robustness.” There is no quantification of how “robust” is the trained model, what is the threat model, what if the noise budget is larger than the attained margin. In general, the analysis lacks any careful finer characterization or quantification of the claimed properties. Besides, as was pointed out, the setting of the neural tangent kernel regime is somewhat limited and to some extent impractical. The assumptions under which the results hold further make the setting of the paper significantly restrictive. + +The writing can be improved with more emphasis on the novelty and significance of the contributions. Currently, all of the assumptions are buried in the appendix and the main paper is not even self-contained. I believe the comments from the reviewers have already helped improve the quality of the paper. I encourage the authors to further incorporate the feedback and work towards a stronger submission.",ICLR2022, +A-9-EuWRdpG,1642700000000.0,1642700000000.0,1,#NAME?,#NAME?,Paper Decision,Accept (Poster),"The paper proposes a new method for unsupervised text style transfer by assuming there exist some pseudo-parallal sentences pairs in the data. The method thus first mines and constructs a synthetic parallel corpus with certain similarity metrics, and then trains the model via imitation learning. Reviewers have found the method is sound and the empiricial results are decent. The assumption on pseudo-parallal pairs would limited the application of the methods in other settings where the source/target text distributions are very different. The authors have added discussion on this limitation during rebuttal.",ICLR2022, +a1zJS2QinU,1576800000000.0,1576800000000.0,1,Byxl-04KvH,Byxl-04KvH,Paper Decision,Reject,"This paper proposes a model architecture and training procedure for multiple nested label sets of varying granularities and shows improvements in efficiency over simple baselines in the number of fine-grained training labels needed to reach a given level of performance. + +Reviewers did not raise any serious concerns about the method that was presented, but they were also not convinced that it represented a sufficiently novel or impactful contribution to an open problem. Without any reviewer advocating for the paper, even after discussion, I have no choice but to recommend rejection. + +I'm open to the possibility that there is substantial technical value here, but I think this work would be well served by more extensive comparisons and a potentially revamped motivation to try to make the case for it that value more directly.",ICLR2020, +tQfU3slxsgD,1642700000000.0,1642700000000.0,1,mF122BuAnnW,mF122BuAnnW,Paper Decision,Reject,"The authors develop a framework for improving robustness certificates obtained by randomly smoothed classifiers in settings with multiple outputs (segmentation or node classification), by combining local robustness certificates obtained for individual classifiers. They validate their results empirically and demonstrate gains from their approach. + +The reviewers were mostly in agreement that the authors make a novel and interesting contribution. However, there were a lot of technical concerns raised by reviewers that, while addressed during the discussion phase, would require a substantial revision of the paper to address adequately. Overall, I feel the paper is borderline but recommend rejection and encourage the authors to incorporate feedback from the reviewers and submit to a future venue.",ICLR2022, +6fFMbGIlbDy,1610040000000.0,1610470000000.0,1,b9PoimzZFJ,b9PoimzZFJ,Final Decision,Accept (Spotlight),"All reviewers seems in favour of accepting this paper, witht he majority voting for marginally above acceptance threshold. +The authors have taken special heed of the suggestions and improved the clarity of the paper. +From examination of the reviews, the paper achieves enough to warrant publication. +My recommendation is therefore to accept the manuscript. ",ICLR2021, +vfX6knnaa,1576800000000.0,1576800000000.0,1,BkgHWkrtPB,BkgHWkrtPB,Paper Decision,Reject,"This paper is full of ideas. However, a logical argument is only as strong as its weakest link, and I believe the current paper has some weak links. For example, the attempt to tie the behavior of SGD to free energy minimization relies on unrealistic approximations. Second, the bounds based on limiting flat priors become trivial. The authors in-depth response to my own review was much appreciated, especially given its last minute appearance. Unfortunately, I was not convinced by the arguments. In part, the authors argue that the logical argument they are making is not sensitive to certain issues that I raised, but this only highlights for me that the argument being made is not very precise. I can imagine a version of this work with sharper claims, built on clearly stated assumptions/conjectures about SGD's dynamics, RATHER THAN being framed as the consequences of clearly inaccurate approximations. The behavior of diffusions can be presented as evidence that the assumptions/conjectures (that cannot be proven at the moment, but which are needed to complete the logical argument) are reasonable. However, I am also not convinced that it is trivial to do this, and so the community must have a chance to review a major revision.",ICLR2020, +tBUJyAa-OMJm,1642700000000.0,1642700000000.0,1,SidzxAb9k30,SidzxAb9k30,Paper Decision,Accept (Spotlight),"This paper addresses the reward-free exploration problem with function approximation under linear mixture MDP assumption. The analysis shows that the proposed algorithm is (nearly) minimax optimal. The proposed approach can work with any planning solver to provide an ($\epsilon + \epsilon_{opt}$)-optimal policy for any reward function. + +After reading the authors' feedback and discussing their concerns, the reviewers agree that the contributions in this paper are valuable and that this paper deserves publication. +I encourage the authors to follow the reviewers' suggestions as they will prepare the camera-ready version.",ICLR2022, +rkgLIyhelN,1544760000000.0,1545350000000.0,1,r1efr3C9Ym,r1efr3C9Ym,Meta-Review for Interpolation-Predictions paper,Accept (Poster),"After much discussion, all reviewers agree that this paper should be accepted. Congratulations!!",ICLR2019,4: The area chair is confident but not absolutely certain +ovskqci4JH,1576800000000.0,1576800000000.0,1,B1erJJrYPH,B1erJJrYPH,Paper Decision,Reject,"This paper studies the loss landscape of neural networks by taking into consideration the symmetries arising from the parametrisation. Specifically, given two models $\theta_1$, $\theta_2$, it attempts to connect $\theta_1$ with the equivalence of class of $\theta_2$ generated by weight permutations. +Reviewers found several strengths in this work, from its intuitive and simple idea to the quality of the experimental setup. However, they also found important shortcomings in the current manuscript, chief among them the lack of significance of the results. As a result, this paper unfortunately cannot be accepted in its current form. The chairs encourage the authors to revise their work by taking the reviewer feedback into consideration. ",ICLR2020, +DVBlUw7Iox,1576800000000.0,1576800000000.0,1,SJg9z6VFDr,SJg9z6VFDr,Paper Decision,Reject,"This paper introduces a few ideas to potentially improve the performance of neural ODEs on graph networks. However, the reviewers disagreed about the motivations for the proposed modifications. Specifically, it's not clear that neural ODEs provide a more advantageous parameterization in this setting than standard discrete networks. + +It's also not clear at all why the authors are discussion graph neural networks in particular, as all of their proposed changes would apply to all types of network. + +Another major problem I had with this paper was the assertion that the running the original system backwards leads to large numerical error. This is a plausible claim, but it was never verified. It's extremely easy to check (e.g. by comparing the reconstructed initial state at t0 with the true original state at t0, or by comparing gradients computed by different methods). It's also not clear if the authors enforced the constraints on their dynamics function needed to ensure that a unique solution exists in the first place.",ICLR2020, +SE5NfHJzyMn,1610040000000.0,1610470000000.0,1,IbFcpYnwCvd,IbFcpYnwCvd,Final Decision,Reject,"This was a borderline paper with a split recommendation from the reviewers. The authors took great care to answer the reviewer questions in detail, and the clarity and precision of the technical exposition was strengthened. However, substantial technical content was added to the paper during the rebuttal process, which the reviewers were not able to fully and properly assess. + +Overall, this is worthwhile research, but the paper is still maturing. The contribution was perceived as incremental in light of previous work using LTL and FSAs in RL, despite the authors extensively re-explaining the significance of the work in the rebuttal. A resubmission is more likely to resonate with reviewers and ultimately achieve higher impact. + +For completeness, it would help to also briefly acknowledge and compare to hierarchical RL work that also seeks to capture composable subtask structures, such as: + +Sohn et al. ""Hierarchical reinforcement learning for zero-shot generalization with subtask dependencies"", NeurIPS-2018 + +Sohn et al. ""Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies"", ICLR-2020",ICLR2021, +JcNn982pdpE,1642700000000.0,1642700000000.0,1,sRZ3GhmegS,sRZ3GhmegS,Paper Decision,Accept (Spotlight),"This paper introduces a new transformer architecture for representation learning in RL. The key ingredients of the proposed architectures are a novel combination of existing methods: (1) the use of LSTMs to reduce the need for large transformers and (2) a contrastive learning procedure that doesn't require human data augmentation. The resulting approach requires less prior knowledge and provides higher sample efficiency. The paper is convincing, with comprehensive experiments on multiple challenging and well-known benchmarks and an ablation study. The reviewers did expressed concerns that parts of the paper are a very difficult read and could use improvement, especially those relying on substantial external background. The intuition behind several components could be improved, and there are some clarity issues, as detailed in the individual reviews.",ICLR2022, +tZi9xSXy-Jj,1642700000000.0,1642700000000.0,1,qiukmqxQF6,qiukmqxQF6,Paper Decision,Reject,"This paper proposes a new autoregressive flow model with autoencoders to learn latent embeddings from time series. The authors conducted extensive comparative experiments, and the experimental results are very encouraging. However, the proposed method, as a combination of the encoder/decoder structure and autoregressive flows on the latent space, does not seem novel enough.",ICLR2022, +uKtHXmb2jX,1576800000000.0,1576800000000.0,1,BklDO1HYPS,BklDO1HYPS,Paper Decision,Reject,"This paper proposes a stochastic variance reduced extragradient algorithm. The reviewers had a number of concerns which I feel have been adequately addressed by the authors. + +That being said, the field of optimizers is crowded and I could not be convinced that the proposed method would be used. In particular, (almost) hyperparameter-free methods are usually preferred (see Adam), which is not the case here. + +To be honest, this work is borderline and could have gone either way but was rated lower than other borderline submissions.",ICLR2020, +D25Q3Q2duw,1610040000000.0,1610470000000.0,1,vrCiOrqgl3B,vrCiOrqgl3B,Final Decision,Reject,"The paper proposes a novel approach to detect outliers using Optimal transport. the authors prove a very interesting relation between Outlier robust OT and solving OT with a thresholded loss. Numerical experiments show that the proposed approach indeed work for outlier detection. + +The paper had mixed reviews and the comments and changes from the authors were appreciated. The comments about recent (and contemporary) references were not taken into account in the final decision following ICLR guidelines. + +One major concern that appeared during discussion was the fact that one important claimed contribution is the ability to perform outlier detection, the proposed method is never evaluated or compared to the numerous existing outlier detection methods. It works on a toy example and seem to provide a robust way to train a robust GAN but the experiments are very limited. Also the claim from the authors that the method scales are not really true. The proposed approach requires solving an exact OT of complexity O(N^3log(N)), while one can use an approximated entropic solver on the thresholded loss it does not solve the ROBOT problem anymore and the relations between the problem does not exist anymore in this case (or are more similar to UOT). + +The concerns detailed above and the limited novelty of the contributions (most of the formulations proposed in the paper are already existing in the literature) suggest that the paper in its current iteration is too borderline for being accepted in a selective venue such as ICLR. The method and the relations uncovered are interesting and the AC encourages the authors to continue work on the proposed method and provide more detailed experiments illustrating and comparing the method to baselines for outlier detection. +",ICLR2021, +cytEEyIkqxM,1610040000000.0,1610470000000.0,1,BIIwfP55pp,BIIwfP55pp,Final Decision,Reject,"The paper proposes a method that combines imitation learning and meta-learning, which aims to be able to explore beyond the provided demonstrations. + +While the paper addresses an important topic, and the authors are commended on a productive conversations, there is a consensus among the reviews that the work is not yet ready for publication. The future manuscript should address: reexamine the assumptions and improve presentation.",ICLR2021, +9YeGsBVwgQ,1576800000000.0,1576800000000.0,1,H1lfwAVFwr,H1lfwAVFwr,Paper Decision,Reject,"This paper presents Capacity-Limited Reinforcement Learning (CLRL) which builds on methods in soft RL to enable learning in agents with limited capacity. + +The reviewers raised issues that were largely around three areas: there is a lack of clear motivation for the work, and many of the insights given lack intuition; many connections to related literature are missing; and the experimental results remain unconvincing. + +Although the ideas presented in the paper are interesting, more work is required for this to be accepted. Therefore at this point, this is unfortunately a rejection.",ICLR2020, +53OErNxwTzR,1610040000000.0,1610470000000.0,1,xfmSoxdxFCG,xfmSoxdxFCG,Final Decision,Accept (Poster),"This work is likely to lead to more connections between machine learning and neuroscience at a fine-grained level where ML methods can help explain and understand neural circuits. + +To encourage this, it would be helpful if authors described the biology of the PN-KC-APL network and the known constraints over possible formalizations of that network. The authors present one formalization, but little discussion is given toward the design space for such models. Are there other possible ways to describe the PN-KC-APL network? Are all alternate ways to do so equivalent to the model presented here? What properties are unknown and how could they affect the formalization presented here? + +Overall, reviewers agree this is a good submission.",ICLR2021, +8o_VKXKwa3,1610040000000.0,1610470000000.0,1,4jXnFYaDOuD,4jXnFYaDOuD,Final Decision,Reject,"The paper proposes an auto-encoder framework IMA, a scalable model that learns the importance of modalities along with robust multimodal representations through a novel cross-covariance based loss function, in an unsupervised manner. They have compared their approach to SOTA methods via multiple experiments and shown how IMA gives better performance. + +The authors have addressed some of the reviewers' feedback. However, as pointed out by the reviewers, the experimental section needs better analysis of results and comparison to other methods, and the modeling section needs to be better explained and motivated. The authors have made changes in their revision, however the ICLR review process does not allow for checking the camera-ready. Since we cannot accept the paper in its current form (or with small variations) and there have been many competitive submissions, we would encourage the authors to make their revisions and resubmit to other venues.",ICLR2021, +lXLW5klyQP-,1610040000000.0,1610470000000.0,1,TYXs_y84xRj,TYXs_y84xRj,Final Decision,Accept (Poster),"This paper received overall positive scores. One reviewer (R3) recommended clear reject. + +All reviewers agree that the paper introduces a novel idea and its effectiveness is supported by the experimental results. There are concerns about clarity of presentation and certain missing analyses, which have been addressed by the authors in the rebuttal. Thus the ACs recommend acceptance. ",ICLR2021, +ky7vtFCOGGv,1642700000000.0,1642700000000.0,1,OIs3SxU5Ynl,OIs3SxU5Ynl,Paper Decision,Accept (Spotlight),"Wide agreement from the reviewers. Interesting theorems. Empirical work illustrates the theory. +Claim and insight: failure of VAEs is caused by the inherent limitations of ELBO learning with inflexible encoder distribution. +Good discussion pointed out related work and insights from the experiments.",ICLR2022, +aZ1KSWOjdFc,1610040000000.0,1610470000000.0,1,7WwYBADS3E_,7WwYBADS3E_,Final Decision,Reject,The consensus recommendation is that the paper is not ready for publication at this time.,ICLR2021, +HyerSQiexV,1544760000000.0,1545350000000.0,1,H1MgjoR9tQ,H1MgjoR9tQ,"Clear study of an important problem, though improvements limited",Accept (Poster),"This paper presents CMOW—an unsupervised sentence representation learning method that treats sentences as the product of their word matrices. This method is not entirely novel, as the authors acknowledge, but it has not been successfully applied to downstream tasks before. This paper presents methods for successfully training it, and shows results on the SentEval benchmark suite for sentence representations and an associated set of analysis tasks. + +All three reviewers agree that the results are unimpressive: CMOW is no better than the faster CBOW baseline on most tasks, and the combination of the two is only marginally better than CBOW. However, CMOW does show some real advantages on the analysis tasks. No reviewer has any major correctness concerns that I can see. + +As I see it, this paper is borderline, but narrowly worth accepting: As a methods paper, it presents weak results, and it's not likely that many practitioners will leap to use the method. However, the method is so appealingly simple and well known that there is some value in seeing this as an analysis paper that thoroughly evaluates it. Because it is so simple, it will likely be of interest to researchers beyond just the NLP domain in which it is tested (as CBOW-style models have been), so ICLR seems like an appropriate venue. It seems like it's in the community's best interest to see a method like this be evaluated, and since this paper appears to offer a thorough and sound evaluation, I recommend acceptance.",ICLR2019,4: The area chair is confident but not absolutely certain +qgQCJN2MA,1576800000000.0,1576800000000.0,1,BJe4PyrFvB,BJe4PyrFvB,Paper Decision,Reject,"The paper proposes a new method for improving generative properties of VAE model. The reviewers unanimously agree that this paper is not ready to be published, particularly being concerned about the unclear objective and potentially misleading claims of the paper. Multiple reviewers pointed out about incorrect claims and statements without theoretical or empirical justification. The reviewers also mention that the paper does not provide new insights about VAE model as MDL interpretation of VAE it is not new.",ICLR2020, +pperiADu3,1576800000000.0,1576800000000.0,1,SJgMK64Ywr,SJgMK64Ywr,Paper Decision,Accept (Poster),"The submission applies architecture search to find effective architectures for video classification. The work is not terribly innovative, but the results are good. All reviewers recommend accepting the paper.",ICLR2020, +GU5dn4luyo,1576800000000.0,1576800000000.0,1,B1guLAVFDB,B1guLAVFDB,Paper Decision,Accept (Poster),The authors propose a way to recover latent factors implicitly constructed by a neural net with black box access to the nets output. This can be useful for identifying possible adversarial attacks. The majority of reviewers agrees that this is a solid technical and experimental contribution.,ICLR2020, +iRj4XZlcNhh,1642700000000.0,1642700000000.0,1,FASW5Ed837,FASW5Ed837,Paper Decision,Reject,"The reviewers have the following remain concerns: +1. The bounded function value assumption is strong. Note that the previous works for SGD and SGD-M for other LR schemes do not necessarily need this assumption, hence it may be unfair to compare with existing results and say that this work has improvements for non-monotonic schemes. The authors also agree that it is not easy to prove and remove this assumption. +2. The novelty is limited, and the contributions are somewhat incremental. The bandwidth step size scheme was already introduced in a previous work with a very similar setting. The convergence rate for the proposed LR scheme is the same as previous works for other schemes (or only better by a logarithmic term), which makes the results incremental. +3. Some of the claims are not well supported. For example, the reviewers comment that it is not clear how the proposed bandwidth step size can help to escape local minima. Although the authors aim to show this empirically, the toy setting is not strong enough to conclude the superior performance of the proposed scheme. + +We encourage the authors to improve their paper and resubmit to another venue. Here are the related suggestions: +1. The authors might try to investigate and provide a rigorous proof of how the non-monotonic step size can help to escape local minima. It also helps to characterize the effectiveness of each cyclic rule (cosine/ triangular or any other) and make clear what property (cosine/linear rules or bandwidth or non-monotonicity) contributes most in the good performance of a LR scheme. +2. It is better if the assumption on the bounded function value can be removed. In addition, a theoretical/empirical analysis on the generalization performance of the proposed scheme might also be helpful.",ICLR2022, +haRPG8eQB6W,1642700000000.0,1642700000000.0,1,hOaYDFpQk3g,hOaYDFpQk3g,Paper Decision,Reject,"This paper proposes an algorithm called LightWaveS to improve the ROCKET (and mini-ROCKET) algorithm for multivariate time series classification, by using wavelet scattering instead of the kernel function. More than the usual number of reviewers were invited to provide independent reviews on the paper. + +A concern was raised regarding the lack of hyperparameter search in the paper. The authors responded that this was intentional to avoid overfitting the solution to the tested datasets. This response is not convincing. Note that other important reasons to vary the hyperparameter values (as commonly adopted by ML researchers) are to study the sensitivity of the proposed method to hyperparameter settings and to perform more holistic performance comparison with other methods. + +Other concerns on both novelty and significance have also been raised. + +Although 2 of the 7 reviews show a weak support for acceptance, other reviewers have pointed out legitimate concerns that make this paper not ready for publication in ICLR in its current form. We appreciate the authors for clarifying some points in their responses and discussions and even including further results, but addressing all the concerns raised really needs a more substantial revision of the paper. We hope the comments and suggestions made by us can help the authors prepare a revised version that will be more ready for publication.",ICLR2022, +BJUt2GUOe,1486400000000.0,1486400000000.0,1,HyecJGP5ge,HyecJGP5ge,ICLR committee final decision,Reject,"This paper combines simple heuristics to adapt the size of a dictionary during learning. The heuristics are intuitive: augmenting the dictionary size when correlation between reconstruction and inputs falls below a certain pre-determined threshold, reducing the dictionary size by adding a group-sparsity L1,2 penalty on the overall dictionary (the L1,2 penalty for pruning models had appeared elsewhere before, as the authors acknowledge). + The topic of online adjustment of model complexity is an important one, and work like this is an interesting and welcome direction. The simplicity of the heuristics presented here would be appealing if there was more empirical support to demonstrate the usefulness of the proposed adaptation. However, the empirical validation falls short here: + - the claim that this work offers more than existing off-line model adaptation methods because of its online setting is the crux of the value of the paper, but the experimental validation does not offer much in terms of how to adjust hyperparameters to a truly nonstationary setting. Here, the hyperparameters are set rather arbitrarily, and the experimental setting is a single switch between two datasets (where off-line methods would do well), so this obscures the challenges that would be presented by a less artificial level of non-stationarity + - an extensive set of experiments is presented, but all these experiments are confined in a strange experimental setting that is divorced from any practical metrics: the ""state-of-the-art"" method of reference is Mairal et al 2009 which was focused on speeding learning of dictionaries. Here, the metrics offered are correlation / reconstruction error, and classification, over full images instead of patches as is usually done for large images, without justification. This unnecessarily puts the work in a vacuum in terms of evaluation and comparison to existing art. In fact, the reconstructions in the appendix Fig. 17 and 18 look pretty bad, and the classification performance 1) does not demonstrate superiority of the method over standard dictionary learning (both lines are within the error bars and virtually indistinguishable if using more than 60 elements, which is not much), 2) performance overall is pretty bad for a 2-class discrimination task, again because of the unusual setting of using full images. + In summary, this paper could be a very nice paper if the ideas were validated in a way that shows true usefulness and true robustness to the challenges of realistic nonstationarity, but this is unfortunately not the case here. + Ps: two recent papers that could be worth mentioning in terms of model adaptation are diversity networks for pruning neurons (Diversity Networks, Mariet and Sra ICLR 2016, https://arxiv.org/abs/1511.05077), and augmenting neural networks with extra parameters while retaining previous learning (Progressive Neural Networks, Rusu et al 2016, https://arxiv.org/pdf/1606.04671v3.pdf).",ICLR2017, +B1gjEDuZx4,1544810000000.0,1545350000000.0,1,HkfYOoCcYX,HkfYOoCcYX,Efficient weight encoding which is important for a practical standpoint,Accept (Poster),"The authors propose an efficient scheme for encoding sparse matrices which allow weights to be compressed efficiently. At the same time, the proposed scheme allows for fast parallelizable decompression into a dense matrix using Viterbi-based pruning. +The reviewers noted that the techniques address an important problem relevant to deploying neural networks on resource-constrained platforms, and although the work builds on previous work, it is important from a practical standpoint. +The reviewers noted a number of concerns on the initial draft of this work related to the experimental methodology and the absence of runtime comparison against the baseline, which the reviewers have since fixed in the revised draft. The reviewers were unanimous in recommending that the revision be accepted, and the authors are requested to incorporate the final changes that they said they would make in the camera-ready version. +",ICLR2019,4: The area chair is confident but not absolutely certain +106frcdWYAj,1610040000000.0,1610470000000.0,1,CHLhSw9pSw8,CHLhSw9pSw8,Final Decision,Accept (Poster),"This is an unusual, but interesting submission. Can we use a simple ""quantum computer"" (in fact, physical system) to solve classification problems in ML? A single photon passes through the screen. Its state is described by the complex vector. A quantum computer makes a unitary linear transformation on this state in such a way that it maximizes the overlap with a corresponding class. Such a model can be parametrized by conventional means, and trained and later possibly realized by an quantum system + +Pros: +1. The area of QC is very important, and such papers shed a new light on the subject. +2. Inspiration to the ICLR community to work on in this area. +3. Technically correct. + + +Cons: +1. The accuracies are far from SOTA and use very toy datasets. It is not clear, how to get to the accuracies needed in practice. +2. The actual computational speed of inference is not clear. +3. Discussion of more complicated models and their possibility is necessary. +4. Quite a few misprints are in the text which need to be fixed in the final version. + + + +",ICLR2021, +obw5Nz56s6l,1610040000000.0,1610470000000.0,1,TVjLza1t4hI,TVjLza1t4hI,Final Decision,Accept (Poster),"The approach is novel and according to the reviewers' comments addresses a relevant and important problem on EEG data analysis. Differences to related work are discussed. Methods and Experimental results are sound. The authors have provided a comprehensive response to the reviews. +",ICLR2021, +FwSk7xRhS,1576800000000.0,1576800000000.0,1,HklZUpEtvr,HklZUpEtvr,Paper Decision,Reject,"This paper provides a novel approach for addressing ill-posed inverse problems based on a formulation as a regularized estimation problem and showing that this can be optimized using the CycleGAN framework. While the paper contains interesting ideas and has been substantially improved from its original form, the paper still does not meet the quality bar of ICLR due to a critical gap between the presented theory and applications. The paper will benefit from a revision and resubmission to another venue.",ICLR2020, +qC-gDRaMcI,1576800000000.0,1576800000000.0,1,r1eU1gHFvH,r1eU1gHFvH,Paper Decision,Reject,This paper studies when hidden units provide local codes by analyzing the hidden units of trained fully connected classification networks under various architectures and regularizers. The reviewers and the AC believe that the paper in its current form is not ready for acceptance to ICLR-2020. Further work and experiments are needed in order to identify an explanation for the emergence of local codes. This would significantly strengthen the paper.,ICLR2020, +iYi7pI4Bg7,1576800000000.0,1576800000000.0,1,BJeUs3VFPH,BJeUs3VFPH,Paper Decision,Reject,"Three reviewers have scored this paper as 1/1/3 and they have not increased their rating after the rebuttal and the paper revision. The main criticism revolves around the choice of datasets, missing comparisons with the existing methods, complexity and practical demonstration of speed. Other concerns touch upon a loose bound and a weak motivation regarding the low-rank mechanism in connection to DA. On balance, the authors resolved some issues in the revised manuscripts but reviewers remain unconvinced about plenty other aspects, thus this paper cannot be accepted to ICLR2020.",ICLR2020, +fayyflEqMF0,1642700000000.0,1642700000000.0,1,lf0W6tcWmh-,lf0W6tcWmh-,Paper Decision,Reject,"All but one of the reviewers recommended rejecting this submission. The reviewer recommending acceptance (PBhC) was not confident in their assessment and was unwilling to champion the paper during the discussion phase, making it very difficult for me to unilaterally overrule the de facto reviewer consensus and recommend accepting the submission. Although some of the reviewers recommending rejecting the submission made relatively weak arguments, others raised more compelling points in favor of rejecting the paper. The discussion and reviews convinced me that the preponderance of the evidence indicated that I should recommend rejecting on the merits of the case anyway. Ultimately, I am recommending rejecting this submission, primarily because I do not believe the empirical contributions are strong enough, nor are they polished enough. Holistically, it is hard to see what impact this work can have without improved empirical evidence, given how little guidance the theoretical results give to practitioners. That said, I hope the authors iterate some more on the experiments and refocus the narrative a bit in that direction. + +The paper exhibits a problem where gradient descent with momentum provably generalizes better than gradient descent without momentum. Given that momentum does not universally improve the out of sample error of neural networks trained with gradient descent, we should strongly suspect that there also exist problems where adding momentum to gradient descent degrades out of sample performance. Therefore, what actionable insights do we have? The paper suggests that perhaps the details of the problem (constructed in the submission) where momentum helps gives us an ability to predict when momentum will be helpful in practice, but we would need to see several more successful predictions of this form on typical datasets from the literature or other real (non-synthetic) datasets. Furthermore, has the literature and this submission even demonstrated convincingly enough that momentum improving out of sample error for the same training loss is a common occurrence? And has this submission even made a convincing empirical case on CIFAR10, let alone a larger selection of problems? The latter question would be sufficient to reject the submission, but resolving it favorably would not, in my view, be sufficient to accept the submission without also more evidence for the prevalence of this momentum generalization phenomenon or without demonstrating successful predictions about relative generalization performance on more problems. + +Has the literature established that gradient descent or minibatch stochastic gradient descent often generalizes better when using momentum? The paper says ""While these works shed light on how momentum acts on neural network training, they fail to capture the generalization improvement induced by momentum (Sutskever et al., 2013)."" but Sutskever et al. to my recollection only measures training set loss and never properly considers questions of generalization. Certainly, in many places in the literature we see momentum get better validation error, but rarely do we get information on whether it does so for the same training loss and a priori we should suspect optimization speed is the primary effect at play. The paper also claims ""Although it is well accepted that Momentum improve generalization in deep learning..."", but the submission does not provide enough evidence that this is well accepted. The results of Leclerc & Madry (2020) are equivocal and may well be confounded by batch norm, but would need to be investigated further. So no, at least with the citations in this submission, it is far from well-established that momentum often improves generalization performance, i.e. that momentum results in better validation loss for the same training loss. Of course it won't always do this, but we should observe it regularly in the wild (the more dramatically the better) for this to be interesting. + +Ok, but what about the experiments on CIFAR10? These experiments are hard to interpret because they seem to compare misclassification error (zero one loss) with the actual optimization objective of cross entropy error. These issues may be resolvable, but in their current form leave open too many loose ends. Just because two training runs both get zero classification errors on the training set does not mean that they do not differ in the log loss and even a small difference in log loss might explain a large difference in out of sample classification error. Although we often use these quantities as proxies for each other, that isn't quite safe and a better way to conduct this measurement would be to select an iterate of GD without momentum that has an almost identical (but slightly better) training cross entropy loss than a specific iterate of GD with momentum and then compare the cross entropy loss on the validation set, repeating for many different runs and iterates. + +In the final analysis, stochastic gradient descent without momentum rarely gets used in practice and full gradient descent even more rarely, so this submission needs to do a better job of making a case for the impact it will have on researchers in this field. Perhaps a stronger case can be made, but I do not quite find the current version sufficiently compelling.",ICLR2022, +1NkNGqpYWoh,1610040000000.0,1610470000000.0,1,2K5WDVL2KI,2K5WDVL2KI,Final Decision,Reject,"The paper introduces a model-agnostic heuristic for batch active learning. There was an agreement among the reviewers that it's a good approach to try and report about, but the paper was ultimately rejected after calibration. + +There were two concerns raised in the reviews, and the authors are encouraged to address them in a revision: + +1) Several reviewers commented on issues with readability, affecting the paper's reproducibility (see reviews for details). The reviewers would have also liked to see more evidence of empirical robustness to various choices made. + +2) For the paper to be compelling, it should either compare with gradient-based approaches like BADGE (Ash et al. 2019) or include experiments with a representation where BADGE can't be applied (to support the model-agnostic distinction the authors are making). The core motivation is the same, with both approaches trying to explicitly incorporate predictive uncertainty and sample diversity, and it would be interesting to see a comparison.",ICLR2021, +qRePqegUbM,1610040000000.0,1610470000000.0,1,jsM6yvqiT0W,jsM6yvqiT0W,Final Decision,Reject,"# Paper Summary + +This paper considers calibrating the output of a multiclass classifier in such a way that the output probabilities approximately are approximately ""correct"". They observe that if such a method is able to re-order the logits, then it will change the accuracy of the classifier. Therefore, if they use a calibrator that is constrained to be monotonic in the input logits, then they can train it to optimize any metric they choose, without impacting the accuracy. + +They propose minimizing ECE (expected calibration error, which is essentially a binned approximation to the L1 distance between the model's confidence in its top label, and the probability that the top label is correct). Borrowing an idea from local temperature scaling, they also allow their calibrator to see the input features, by also taking the top hidden layer (the layer before the logits) as an input. + +Their experiments are, with one glaring exception, comprehensive: they have a good selection of datasets, a reasonable choice of metrics, and they dig pretty deep into what the results mean. Reviewer 2, however, believes that the baselines are far from state-of-the art, and two of the other reviewers (and I) agree. + +# Pros + +1. Well-organized and well-written (not exceptionally so, but above the bar) +1. Good insight overall. In particular, the observation that imposing a monotonicity constraint enables one to optimize any metric, including ECE, was considered both original and significant by the reviewers +1. Well-thought-out experiments. They were generally praised, aside from the (unfortunately crucial) question of whether the baselines are state-of-the-art + +# Cons + +1. The paper seems to mainly discuss related work coming from the temperature-scaling ""tradition"". The reviewers would like to see a more comprehensive discussion of other calibration approaches (Reviewer 2 provided a number of references) +1. There are some misstatements (e.g. that LTS is ""state-of-the-art""), and incorrect implications (e.g. that temperature-scaling-like methods are dominant). Reviewer 2 listed several of these, all of which are fairly minor, but more care should be taken +1. The TS and LTS baselines are not state-of-the art. The reviewers were generally impressed with the experiments, but the lack of a strong baseline is a fatal flaw + +# Conclusion + +This was a paper that initially received mostly-positive reviews, but Reviewer 2 raised several concerns that were not adequately addressed in the author response, causing two other reviewers to lower their scores. Ultimately, three of the four reviewers recommended rejection. + +The general consensus is that this is a well-written paper, with good insight, well thought-out experiments (except for the baselines), and that it overall makes a worthwhile contribution. The main issues, all of which were raised by Reviewer 2, are eminently fixable: (i) adding a more thorough discussion of related work, especially work unrelated to temperature scaling, (ii) being more careful to avoid misstatements, or to seem to imply incorrect statements (e.g. that temperature-scaling-like methods are dominant), and (iii) adding a couple of new state-of-the-art baselines to the experiments.",ICLR2021, +nmT4w5S5Q0s,1642700000000.0,1642700000000.0,1,R9Ht8RZK3qY,R9Ht8RZK3qY,Paper Decision,Reject,This work proposes a federated version of the classical $\chi^2$ correlation test. The key new step is the use of stable projection to reduce computational overheads associated with the use of secure multi-party protocols. Overall while the contribution is of interest the novelty is rather limited. I also consider the work to be somewhat outside of scope for ICLR. It would be more suitable for a security or statistics focused venue. Therefore I do not recommend acceptance.,ICLR2022, +bbNFL0a58M,1610040000000.0,1610470000000.0,1,b4Phn_aTm_e,b4Phn_aTm_e,Final Decision,Reject,"All reviewers agree that the paper overclaims its contributions both in the main text and in the title, and given also the limited novelty and scope it is not suggested for publication.",ICLR2021, +BJlUkwHxeV,1544730000000.0,1545350000000.0,1,Syl8Sn0cK7,Syl8Sn0cK7,Exciting work,Accept (Poster),"This paper presents an RL agent which progressively synthesis programs according to syntactic constraints, and can learn to solve problems with different DSLs, demonstrating some degree of transfer across program synthesis problems. Reviewers agreed that this was an exciting and important development in program synthesis and meta-learning (if that word still has any meaning to it), and were impressed with both the clarity of the paper and its evaluation. There were some concerns about missing baselines and benchmarks, some of which were resolved during the discussion period, although it would still be good to compare to out-of-the-box MCTS. + +Overall, everyone agrees this is a strong paper and that it belongs in the conference, so I have no hesitation in recommending it.",ICLR2019,4: The area chair is confident but not absolutely certain +FnMT1IXDqui,1610040000000.0,1610470000000.0,1,XLfdzwNKzch,XLfdzwNKzch,Final Decision,Accept (Poster),"The paper proposes a novel method for greed layer-wise training by considering the learning signal from either backprop or from the additional auxiliary losses. SEarching for DecOupled Neural Architecture learns to identify the decoupled blocks by learning the gating parameters similar to gradient-based architecture search algorithms, such as DARTs. The empirical experiments demonstrated the effectiveness of SEDONA on CIFAR and TinyImageNet using various ResNet architectures. Several issues of clarity and the correctness of the main theoretical result were addressed during the rebuttal period in the way that satisfied the reviewers. The ideas in this paper are interesting and are broadly applicable. Additional experiments / discussions on the tradeoff between initial search cost and accuracy should be included in the final version. ",ICLR2021, +3hWVGj0Q4bQ,1642700000000.0,1642700000000.0,1,#NAME?,#NAME?,Paper Decision,Accept (Spotlight),"The authors address a very important question pertaining to the relevance of morphological complexity in the ability of transformer based conditional language models. Through extensive (controlled) experiments using 6 languages they answer as well as raise very interesting questions about the role of morphology/segmentation/vocab size which mat spawn more work in this area. + +All the reviewers were positive about the paper and agreed that the paper made significant contributions which would be useful to the community. More importantly, the authors and reviewers engaged in meaningful and insightful discussions through the discussion phase. The authors did a thorough job of addressing all reviewer concerns and changing the draft of the paper accordingly. + +I have no hesitation in recommending that this paper should be accepted.",ICLR2022, +5VpqUV68Elp,1642700000000.0,1642700000000.0,1,Uy6YEI9-6v,Uy6YEI9-6v,Paper Decision,Reject,"All three reviewers recommend borderline rejection based on limited novelty, missing comparisons with other methods, and runtime inefficiency. The authors’ response helped clarify other questions but did not eliminate the main concerns about the paper. The AC agrees with the reviewers that, in its current form, the paper does not pass the acceptance bar of ICLR. The reviews have detailed comments and suggestions that should help the authors to improve the work for another conference.",ICLR2022, +ByAS7JTSz,1517250000000.0,1517260000000.0,137,B1mvVm-C-,B1mvVm-C-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"All reviewers recommend accepting this paper, and this AC agrees. ",ICLR2018, +HJraH16rG,1517250000000.0,1517260000000.0,667,HJXyS7bRb,HJXyS7bRb,ICLR 2018 Conference Acceptance Decision,Reject,"While using self-play for training a goal-oriented dialogue system makes sense, the contribution of this paper compared to previous work (that the paper itself cites) seems too minor, and the limitations of using toy synthetic data further weaken the work. +",ICLR2018, +KqhrZYFyJCR,1642700000000.0,1642700000000.0,1,X8cLTHexYyY,X8cLTHexYyY,Paper Decision,Accept (Spotlight),"One might assume that the k-means problem has already been beaten to +death, but this paper shows there are still remaining questions. And +rather interesting ones at that, with a novel angle of having +additional help from a prediction algorithm of cluster +memberships. This connects to learning-augmented algorithms research. + +The reviewers agreed that the problem is interesting and gives a novel +angle, and the interestingness stems from novelty, and the ability to +""escape"" from NP-hardness. + +The reviewers and authors had nice discussions about details and +conclusions, on how limiting is it that the authors focus on +reasonably accurate predictors, for instance, and where could the +predictors come from. This is a good paper, and hopefully the +discussion helped make it even better.",ICLR2022, +oZfRXJSQw0,1576800000000.0,1576800000000.0,1,r1glygHtDB,r1glygHtDB,Paper Decision,Reject,"The paper proposes an architecture for semantic instance segmentation learnable from coarse annotations and evaluates it on two microscopy image datasets, demonstrating its advantage over baseline. While the reviewers appreciate the details of the architecture, they note the lack of evaluation on any of popular datasets and the lack of comparisons with baselines that would be more close to state-of-the-art. The authors do not address this criticism convincingly. It is not clear, why e.g. the Cityscapes or VOC Pascal datasets, which both have reasonably accurate annotations, cannot be used for the validation of the idea. If the focus is on the precision of the result near the boundaries, then one can always report the error near boundaries (this is a standard thing to do). Note that the performance of the baseline models is far from saturated near boundaries (i.e. the errors are larger than mistakes of annotation). + +At this stage, the paper lacks convincing evaluation and comparison with prior art. Given that this is first and foremost application paper, lacking some very novel ideas (as pointed out by e.g. Rev1), better evaluation is needed for acceptance.",ICLR2020, +2ohlSmPGnD,1576800000000.0,1576800000000.0,1,Bkx1mxSKvB,Bkx1mxSKvB,Paper Decision,Reject,"The paper investigates the trainability and generalization of deep networks as a function of hyperparameters/architecture, while focusing on wide nets of large depth; it aims to characterize regions of hyperparameter space where networks generalize well vs where they do not; empirical observations are demonstrated to support theoretical results. However, all reviewers agree that, while the topic of the paper is important and interesting, more work is required to improve the readability and clarify the exposition to support the proposed theoretical results. + ",ICLR2020, +iAzlAZe5iGr,1642700000000.0,1642700000000.0,1,SZRqWWB4AAh,SZRqWWB4AAh,Paper Decision,Reject,"This paper presents a batch active learning approach (where in each active learning round, instead of a single input, we wish to select several inputs to be labeled). The paper attempts to solve this problem by posing it as a sparse approximation problem and shows that their approach performs favorably as compared to some of the existing methods such as BALD and Bayesian Coresets for batch active learning. + +While the reviewers appreciated the basic idea and the general framework, there were several concerns from the reviewers (as well as myself upon reading the manuscript). Firstly, the idea of batch active learning as a sparse subset selection problem is not new (Pinsler et al, 2019). While previous methods such as (Pinsler et al, 2019) have used ideas such as Coresets, this paper uses sparse optimization techniques such as Greedy and IHT. Moreover, there were concerns about experimental settings relying on various heuristics, and lack of a more extensive and thorough comparison with important baselines, such as BatchBALD and others, which the authors acknowledged. + +The reviewers have read the authors' response and engaged in discussion but their assessment remained unchanged. Based on their assessment and my own reading of the manuscript, the paper does not seem to be ready for publication. The authors are advised to consider the points raised in the reviews which I hope will help strengthen the paper for a future submission.",ICLR2022, +qfz85mSlBco,1642700000000.0,1642700000000.0,1,KTF1h2XWKZA,KTF1h2XWKZA,Paper Decision,Reject,"The main identified issues were the limited contribution and use cases, poor writing and missing baseline comparisons and more needed experiments. These issues were not addressed satisfactorily by the rebuttal and hence, I believe the paper should be revised by the authors and undergo another review process at another conference. I therefore recommend rejection.",ICLR2022, +hx35vk72ND,1610040000000.0,1610470000000.0,1,V3o2w-jDeT5,V3o2w-jDeT5,Final Decision,Reject,"The paper has been actively discussed in the light of the authors’ response. Even though the paper was, overall, found quite clear with a solid theoretical support, the reviewers listed several concerns that remained unsolved after the rebuttal, e.g., + +* The proposed approach may not be properly scoped/positioned and evaluated as an HPO method, a concern unanimously shared across the reviewers. Although this is not a concern impossible to overcome, the reviewers believed it could not be achieved as part of a simple revision of the paper. +* The lack of challenging baselines to fully assess the performance of the proposed method (e.g., see the list suggested by Reviewer 1) +* Along the lines of the previous point, the experiments focus on small-scale settings, which does not make it possible to completely assess the performance of the approach +* Some discrepancy between the theoretical analysis and the actual experimental settings (e.g., the assumption about bounded losses not valid with the squared loss) + +As illustrated by its scores, the paper is extremely borderline. Given the mixed perspectives of pros and cons, the paper is eventually recommended for rejection. +This list, together with the detailed comments of the reviewers, highlight opportunities to improve the manuscript for a future resubmission. +",ICLR2021, +_YPF0JVTJCX,1610040000000.0,1610470000000.0,1,UH-cmocLJC,UH-cmocLJC,Final Decision,Accept (Oral),"This paper studies how (two layer) neural nets extrapolates. The paper is beautifully written and the authors very successfully answered all the questions. They managed to update the paper, clarify the assumptions and add additional experiments. ",ICLR2021, +_gWkZrT8pMU,1610040000000.0,1610470000000.0,1,CPfjKI8Yzx,CPfjKI8Yzx,Final Decision,Reject,"The reviewers highly appreciated the replies and the additional experiments. We also had a private discussion on the paper. To summarize: the replies alleviated quite a few concerns, however the consensus was that the paper still does not meet the bar for a highly competitive conference like ICLR. + +The idea of combining MPC (on a 'wrong' model) with a learned cost function is very interesting and a promising direction. On the downside the reviewers are still not entirely convinced about the contribution and believe that the paper requires a significant re-write to incorporate the discussed points as well as an additional round of reviews.",ICLR2021, +Uas4tmMyCWO,1642700000000.0,1642700000000.0,1,OUz_9TiTv9j,OUz_9TiTv9j,Paper Decision,Accept (Poster),"This paper presents a method, called Zest, to measure the similarity between two supervised machine learning models based on their model explanations computed by the LIME feature attribution method. The technical novelty and significant are high, and results are strong. Reviewers had clarifying questions regarding experiments and suggestions to add experiments, which involve additional domains (text and audio) and different families of classifiers, and more contexts based on prior literatures. These were adequately addressed by the authors. Overall, this paper deserves borderline acceptance.",ICLR2022, +vlc0JagntR,1576800000000.0,1576800000000.0,1,ryg7vA4tPB,ryg7vA4tPB,Paper Decision,Reject,"A somewhat new approach to growing sparse networks. Experimental validation is good, focussing on ImageNet and CIFAR-10, plus experiments on language modelling. Though efficient in computation and storage size, the approach does not have a theoretical foundation. That does not agree with the intended scope of ICLR. I strongly suggest the authors submit elsewhere.",ICLR2020, +DXps8Zvbdr,1610040000000.0,1610470000000.0,1,u4WfreuXxnk,u4WfreuXxnk,Final Decision,Reject,"The paper deals with adversarial attacks on graph neural networks, a new and promising field in graph representation learning. The paper analyzes a new extreme setting of attack for a single node, and presents important insights, albeit not new algorithms. + +The reviewers were not particularly enthusiastic and complained about +- limited novelty in light of Zuegner et al +- missing baselines +- doubts about the attack setting with a selected attacker node + +The authors provided an elaborate rebuttal, including specific responses to the above questions, however, the final scores are not quite above the bar, especially having in mind the sheer number of submissions on graph deep learning this year. We, therefore, recommend rejection and encourage the author to publish the paper elsewhere. +",ICLR2021, +uVdykxpUZD,1610040000000.0,1610470000000.0,1,9p2ekP904Rs,9p2ekP904Rs,Final Decision,Accept (Poster),"The manuscript proposes a causal interpretation of the self-supervised representation learning problem. The data is modeled as being generated from two independent latent factors: style and content, where content captures all information necessary for downstream tasks, and style captures everything that is affected by data augmentations (e.g. rotation, grayscaling, translation, cropping). The main contribution is a specific regularizer for self-supervised contrastive learning, motivated by the assumptions about the data generation. + +Reviewers agreed that the manuscript is oversold on the causal jargon, as was noted, the manuscript does not perform any causal inference. Nevertheless, they think that there is an interesting interpretation of self-supervised learning and the results are noteworthy. +",ICLR2021, +zEkjv6kkVZE,1642700000000.0,1642700000000.0,1,JeSIUeUSUuR,JeSIUeUSUuR,Paper Decision,Reject,"The reviewers generally agreed that the ideas presented in the paper are interesting and novel. However, all reviewers also agreed that the paper is quite preliminary in its current form: the particular approach, while sensible, appears to be somewhat heuristic, and the evaluations are not as complete as necessary to fully evaluate the proposed approach. + +Generally, my sense is that there is something quite interesting in this work, but the present paper is too preliminary for publication. I would encourage the authors to take the reviewer comments into account and improve the work into a more complete submission for a future venue.",ICLR2022, +uGyAa9aqSv5,1610040000000.0,1610470000000.0,1,Ao2-JgYxuQf,Ao2-JgYxuQf,Final Decision,Reject,"This paper introduces a method to estimate dynamics parameters in recurrent structured models during the learning process. All three reviewers agreed that the idea is interesting and the proposed method could be potentially useful. However, two of the three reviewers have a serious concern about the lack of comparison with other approaches. I agree with these two reviewers; due to the lack of discussion and comparison with existing studies, I cannot recommend accepting this submission in its current form. ",ICLR2021, +S1POhGLdg,1486400000000.0,1486400000000.0,1,Bk0MRI5lg,Bk0MRI5lg,ICLR committee final decision,Reject,The reviewers unanimously recommend rejecting the paper.,ICLR2017, +RvVfkb3AGE,1642700000000.0,1642700000000.0,1,Vjki79-619-,Vjki79-619-,Paper Decision,Accept (Poster),"The paper presents interesting new results for pruning random convolutional networks to approximate a target function. It follows a recent line of work in the topic of pruning by learning. The results are novel, and the techniques interesting. There are some technical issues that are easy to fix within the camera ready timeline (see comments of reviewers below). I would also suggest refining the title of the paper: the lottery ticket hypothesis has an algorithmic component too, which clearly is not covered by existence results.",ICLR2022, +rJxP9921eV,1544700000000.0,1545350000000.0,1,SkgCV205tQ,SkgCV205tQ,Issues with the presentation,Reject,"Dear authors, + +All reviewers commented that the paper had issues with the presentations and the results, making it unsuitable for publication to ICLR. Please address these comments should you decide to resubmit this work.",ICLR2019,5: The area chair is absolutely certain +s9YdQ50qQ7,1642700000000.0,1642700000000.0,1,wgR0BQfG5vi,wgR0BQfG5vi,Paper Decision,Reject,"The paper proposes an approach to performing label smoothing, with the amount of smoothing being sample-dependent and guided by the model's prediction (similar to self-distillation). While the reviewers find the studied problem relevant and important, they find the contributions (in their current state) to be borderline, mainly on the basis of lack of novelty and missing discussion with some related papers. While authors' response was able to partially resolve these concerns, at the end none of the reviewers was a strong advocate for accepting the paper and all scores remained at the borderline (although on the positive side). In concordance with the reviewers, I believe this submission can be made much stronger by digging a bit deeper into the problem, and also making broader connections with the existing literature. + +As a concrete example/suggestion (among many other possibilities for strengthening this work), the authors may want to go a bit deeper into the theoretical analysis. Currently, their analysis shows the approach is able to reduce model's confidence, which is what happens in label smoothing and self-distillation. However, self-distillation is more than confidence reduction, and the information contained in the ""dark knowledge"" can provide a much stronger regularization than a sole confidence reduction argument. There are already some papers in the literature on the regularization/generalization effects of self-distillation, which the authors might want to use as a stepping-stone.",ICLR2022, +S1e42MU_g,1486400000000.0,1486400000000.0,1,Hk4kQHceg,Hk4kQHceg,ICLR committee final decision,Invite to Workshop Track,"The paper presents a new way of doing multiplicative / tensored recurrent weights in RNNs. The multiplicative weights are input dependent. Results are presented on language modeling (PTB and Hutter). We found the paper to be clearly written, and the idea well motivated. However, as pointed out by the reviewers, the results were not state of the art. We feel that is that this is because the authors did not make a strong attempt at regularizing the training. Better results on a larger set of tasks would have probably made this paper easier to accept. + + Pros: + - interesting idea, and reasonable results + Cons: + - only shown on language modeling tasks + - results were not very strong, when compared to other methods (which typically used strong regularization and training like batch normalization etc). + - reviewers did not find the experiments convincing enough, and felt that a fair comparison would be to compare with dynamic weights on the competing RNNs.",ICLR2017, +Ou9PI3ZC2As,1610040000000.0,1610470000000.0,1,RovX-uQ1Hua,RovX-uQ1Hua,Final Decision,Accept (Poster),"The paper is well-written, it is clear and concise. The idea of learning to generate text from off-policy demonstrations is interesting. The results experimental results are good. The authors seem to address the concerns raised by the authors during the rebuttal.",ICLR2021, +ylxUWAkgM1L,1642700000000.0,1642700000000.0,1,tiQ5Zh2S3zV,tiQ5Zh2S3zV,Paper Decision,Reject,"The authors propose a graph multi-domain splitting framework, called GMDS, to detect anomalies in datasets with temporal information. The reviewers agree that the paper studies an important and interesting problem but they think that the paper should be improved significantly before being accepted. + +In particular, the reviewers feel that the authors should provide more technical details and insights on the design of the solution proposed and the proposed method should be compared with other(even simple) baselines for the same problem.",ICLR2022, +2ObW6uOB10J,1642700000000.0,1642700000000.0,1,d7-GwtDWNNJ,d7-GwtDWNNJ,Paper Decision,Reject,"The paper addresses the problem of recovering a graph structure from empirical observations. The proposed approach consists of formulating the problem as an inverse problem, and then unrolling a proximal gradient descent algorithm to generate a solution. + +Whereas the paper has definitely some merit, it received borderline reviews, with three borderline rejects and one borderline accept. The reviewers have appreciated the clarifications and discussions provided by the rebuttal, and one reviewer went up from reject to borderline reject. More precisely, this reviewer agrees that the paper has become stronger, but he/she believes that the paper requires additional experimental work (see section ""After rebuttal"" from his/her review). Another active reviewer during the rebuttal/discussion stage was not convinced by the rebuttal, after raising issues about identifiability. The area chair agrees that solving the identifiability issue is not a key requirement for this paper; however, this raises legitimate questions about the guarantees/properties of the returned solutions. + +Overall, this is a borderline paper, which introduces an interesting idea, but which requires additional experimental work and discussions about the properties of the solutions. Unfortunately, the area chair agrees with the majority of the reviewers and follows their recommendation. The two previous points should be addressed if the paper is resubmitted elsewhere.",ICLR2022, +c4e976xo1,1576800000000.0,1576800000000.0,1,rJeBJJBYDB,rJeBJJBYDB,Paper Decision,Reject,"This paper proposes to use more varied geometric structures of latent spaces to capture the manifold structure of the data, and provide experiments with synthetic and real data that show some promise in terms of approximating manifolds. +While reviewers appreciate the motivation behind the paper and see that angle as potentially resulting in a strong paper in the future, they have concerns that the method is too complicated and that the experimental results are not fully convincing that the proposed method is useful, with also not enough ablation studies. Authors provided some additional results and clarified explanations in their revisions, but reviewers still believe there is more work required to deliver a submission warranting acceptance in terms of justifying the complicated architecture experimentally. +Therefore, we do not recommend acceptance.",ICLR2020, +1CJ3_lw7_p,1610040000000.0,1610470000000.0,1,_IM-AfFhna9,_IM-AfFhna9,Final Decision,Accept (Poster),"Three of four reviewers are in favour of accepting the paper. Some reviewers raised valid criticism regarding the derivations, interpretation of the mathematical analysis and experimental results. So clearly some aspects of the paper could and should be clarified in accordance with the points raised by the reviewers. However, all in all the paper contains enough contributions to warrant publication. ",ICLR2021, +SytWI3pWPer,1610040000000.0,1610470000000.0,1,wbQXW1XTq_y,wbQXW1XTq_y,Final Decision,Reject,"Paper addresses the problem of sim2real (training with synthetic data and then applying the  learned model on real data) in the context of scene graph generation.  The paper was reviewed by four expert reviewers who identified the following pros and cons of the method. + +> Pros: +- Paper addresses relevant and important problem [R1, R2, R4] +- Paper containing compelling results with respect to a number of baselines [R2, R4] + +> Cons: +- Lack of clear motivation, focus, and explanation of novelty [R4] +- Missing details, which makes paper hard to follow [R3, R4] +- Lack of explanations for baselines [R4] +- Lack of focused analysis of specific contributions [R4] +- Lack of discussion on the limitations of the approach [R1, R2] +- Presented evidence is largely on toy data or very domain specific [R2] + +A number of the shortcoming were addressed by the authors during the rebuttal through revisions. However, opinion of reviewers on the paper remained split, with paper receiving following scores: + +- 5: Marginally below acceptance threshold +- 6: Marginally above acceptance threshold +- 7: Good paper, accept +- 5: Marginally below acceptance threshold + +Overall, all reviewers and AC agree that the paper addresses an important and interesting problem. At the same time AC agrees with R2 and others that point out that there are significant limitation in terms of applicability of the approach in more complex scenarios, where readily available simulator may not exist. On balance, and considering the large number of high quality submissions to ICLR this year, the paper was deemed marginally below the acceptance threshold. +",ICLR2021, +QyUgtyCb7z,1610040000000.0,1610470000000.0,1,L4n9FPoQL1,L4n9FPoQL1,Final Decision,Reject,"The paper studies the problem of leveraging Positive-Unlabeled~(PU) classification and conditional generation with extra unlabeled data simultaneously in one learning framework. Some major review concerns on the weaknesses include limited novel technical contributions, poor presentation and weak experimental results (e.g., experiments were mostly conducted on small toy datasets). Overall, the paper has some interesting idea, but the work is clearly below the ICLR acceptance bar. ",ICLR2021, +Skgw2ODtyE,1544280000000.0,1545350000000.0,1,rkGG6s0qKQ,rkGG6s0qKQ,Intersting large scale empirical comparision that could profit from a clearer presentation. ,Reject,"The paper presents a large scale empirical comparison between different prominent losses, regularization and normalization schemes, and neural architectures frequently used in GAN training. Large scale comparisons in this field are rare and important and the outcome of the experimental analysis is clearly of interest for practitioners. However, as two of the reviewers point out, the significance of the new insights is limited, and after rebutal all reviewers agree that the paper would profit from a clearer write-up and presentation of the main findings. I see the paper therefore, as lying slightly under the acceptance trashhold.",ICLR2019,3: The area chair is somewhat confident +whvFGJzwf2x,1642700000000.0,1642700000000.0,1,per0G3dnkYh,per0G3dnkYh,Paper Decision,Reject,"This paper addresses the performance of normalizing flows in the tail of the distribution. It does this by controlling tail properties in the marginals of the high-dimensional distribution. The paper is well-motivated, and the key theoretical insight has merit. However, the general perspective and methodology appears to be incremental relative to past results. Furthermore, some concerns over correctness remain after discussion with authors. Also, clear baselines and more realistic settings are lacking in the experimental results. Thus, while the paper generally has promising ideas on a pertinent topic, it appears to be not developed enough to merit dissemination.",ICLR2022, +kB_taeGi8rs,1610040000000.0,1610470000000.0,1,LhY8QdUGSuw,LhY8QdUGSuw,Final Decision,Accept (Poster),"This paper studies how layer-wise representation and task semantics affect catastrophic forgetting in continual learning. It presents two findings: 1. the higher layers contribute more to forgetting than lower layers, 2. intermediate-level similarity between tasks causes the maximal forgetting. It also indicates that existing methods employ either feature reuse or orthogonality to mitigate forgetting. + +Pros: +- The layer-wise analysis of catastrophic forgetting and investigation of different mitigating forgetting methods are important and interesting. +- The paper is well-motivated and well-written. +- The results can potentially help to suggest new approaches for developing and measuring mitigation methods. + +Cons before rebuttal: +- The paper misses discussion on and takeaways from the findings. +- How general are the findings? There is a different observation by Kirkpatrick et al. 2017. +- Limited diversity of experiments, because the experiments are only done on image classification tasks with CIFAR10 and CIFAR100. + +The authors conducted more experiments and updated the paper with added explanations and results. The reviewers found the new evidence and arguments in the rebuttal to be convincing and the authors addressed most concerns. + +In summary, the findings from this paper will help researchers better understanding and addressing catastrophic forgetting, and will be of interest to the community. +Hence, I recommend acceptance. +",ICLR2021, +uEzgwFZ-8G8,1642700000000.0,1642700000000.0,1,0SiVrAfIxOe,0SiVrAfIxOe,Paper Decision,Reject,"All reviewers agreed that the paper contains interesting experiments. However, as this paper is a systems paper without much algorithmic contributions, all reviewers felt that the paper felt short in terms of describing the results, has too many unsupported claims and it is unclear how the presented results transfer to slightly different domains. I therefore agree with the reviewers and recommend rejection of the paper.",ICLR2022, +E-Uvjl5BMPD,1610040000000.0,1610470000000.0,1,7nfCtKep-v,7nfCtKep-v,Final Decision,Reject,"The paper presents novel model stealing attacks against BERT API. The attacks are split in two phases. In the first phase, the black-box BERT model is recovered by submission of specially crafted data. In the second phase, the inferred model can be used for identifying sensitive attributes or to generate adversarial examples against the basic BERT model. + +Despite the novelty of presented attacks against BERT models, the current version of the paper has some problems with clarity and motivation. The presentation of attacks is very short, and some technical details are not adequately covered. The practical motivation of adversarial example transfer attacks is not very clear, and the authors' response on this issue did not provide a convincing clarifications. Furthermore, creation of surrogate models for generation of adversarial examples is a well-known technique and the difference of the proposed AET attack from this conceptual approach is not clear. + +Overall, the paper reveals a solid and interesting work but a substantial revision would be necessary to make it suitable for the ACLR audience. ",ICLR2021, +_jlqGvwoJx,1576800000000.0,1576800000000.0,1,SJxy5A4twS,SJxy5A4twS,Paper Decision,Reject,"This paper presents to integrate the codes based on multiple hashing functions with Transformer networks to reduce vocabulary sizes in input and output spaces. Compared to non-hashed models, it enables training more complex and powerful models with the same number of overall parameters, thus leads to better performance. +Although the technical contribution is limited considering hash-based approach itself is rather well-known and straightforward, all reviewers agree that some findings in the experiments are interesting. On the cons side, two reviewers were concerned about unclear presentation regarding the details of the method. More importantly, the proposed method is only evaluated on non-standard tasks without comparison to other previous methods. Considering that the main contribution of the paper is in empirical side, I agree it is necessary to evaluate the method on more standard benchmarking tasks in NLP where there should be many other state-of-the-art methods of model compression. For these reasons, I’d like to recommend rejection. ",ICLR2020, +2Y3D3zGBpG,1642700000000.0,1642700000000.0,1,uHq5rHHektz,uHq5rHHektz,Paper Decision,Reject,"This manuscript proposes an information fusion approach to improve adversarial robustness. Reviewers agree that the problem studied is timely and the approach is interesting. However, note concerns about the novelty compared to closely related work, the quality of the presentation, the strength of the evaluated attacks compared to the state of the art, among other concerns. There is no rebuttal.",ICLR2022, +jcYNpUbyAeh,1642700000000.0,1642700000000.0,1,czmQDWhGwd9,czmQDWhGwd9,Paper Decision,Reject,"This paper aims to relate brain activity (of people reading computer +code) to properties of the computer code. They relate the found +representations to those obtained from ML computational language +models applied to the same programs. The paper is clearly written and +an interesting idea. + +There was a lot of discussion and the author(s) updated their paper a +lot. Program length as a potential confound was raised and +successfully rebutted. The extent of novelty from Ivanova et al 2020 +was also discussed and successfully rebutted. In the end, the main +issues the reviewers had were 1) that the paper had been updated +substantially since submission (and would therefore benefit from a +thorough re-review) and 2) whether the results provide enough new +insights about the brain or about ML language models. + +To summarize, the authors spent a lot of time addressing issues in the +rebuttal phase and the paper got a lot better with the reviewers' +suggestions, but reviewers agreed it would benefit from more work and +further review before acceptance. I agree with this assessment.",ICLR2022, +HkO_BJ6BM,1517250000000.0,1517260000000.0,599,H139Q_gAW,H139Q_gAW,ICLR 2018 Conference Acceptance Decision,Reject,"This paper proposes to combine Depthwise separable convolutions developed for 2d grids with recent graph convolutional architectures. The resulting architecture can be seen as learning both node and edge features, the latter encoding node similarities with learnt weights. +Reviewers agreed that this is an interesting line of work, but that further work is needed in both the presentation and the experimental front before publication. In particular, the paper should also compare against recent models (such as the MPNN from Gilmer et al) that also propose edge feature learning. THerefore, the AC recommends rejection at this time. +",ICLR2018, +BJxYy9ieg4,1544760000000.0,1545350000000.0,1,HJf7ts0cFm,HJf7ts0cFm,RBFN is back; a bit more work necessary,Reject,"the authors propose to incorporate an additional layer between the consecutive steps in LSTM by introducing a radial basis function layer (with dot product kernel and softmax) followed by a linear layer to make LSTM similar to or better at (by being more explicit) capturing DFA-like transition. the motivation is relatively straightforward, but it does not really resolve the issue of whether existing formulations of RNN's cannot capture such transition. since this was not shown theoretically nor intuitively, it is important for empirical evaluations to be thorough and clearly show that the proposed approach does indeed outperform the vanilla LSTM (with peepholes) when the capacity (e.g., the number of parameters) matches. unfortunately it has been the consensus among the reviewers that more thorough comparison on more conventional benchmarks are needed to convince them of the merit of the proposed approach.",ICLR2019,4: The area chair is confident but not absolutely certain +-NP6xc-7K,1576800000000.0,1576800000000.0,1,B1xIj3VYvr,B1xIj3VYvr,Paper Decision,Accept (Poster),"The paper proposes a weakly supervised learning algorithm, motivated by its application to histopathology. Similar to the multiple instance learning scenario, labels are provided for bags of instances. However instead of a single (binary) label per bag, the paper introduces the setting where the training algorithm is provided with the number of classes in the bag (but not which ones). Careful empirical experiments on semantic segmentation of histopathology data, as well as simulated labelling from MNIST and CIFAR demonstrate the usefulness of the method. The proposed approach is similar in spirit to works such as learning from label proportions and UU learning (both which solve classification tasks). +http://www.jmlr.org/papers/volume10/quadrianto09a/quadrianto09a.pdf +https://arxiv.org/abs/1808.10585 + +The reviews are widely spread, with a low confidence reviewer rating (1). However it seems that the high confidence reviewers are also providing higher scores and better comments. The authors addressed many of the reviewer comments, and seeked clarification for certain points, but the reviewers did not engage further during the discussion period. + +This paper provides a novel weakly supervised learning setting, motivated by a real world semantic segmentation task, and provides an algorithm to learn from only the number of classes per bag, which is demonstrated to work on empirical experiments. It is a good addition to the ICLR program.",ICLR2020, +Zw_NlFjQCG,1610040000000.0,1610470000000.0,1,F438zjb-XaM,F438zjb-XaM,Final Decision,Reject,"The authors investigate different tokenization methods for the translation between French and Fon (an African low-resource language). Low-resource machine translation is a very important topic and it is great to see work on African languages - we need more of this! + +Unfortunately, the reviewers unanimously agree that this work might be better suited for a different conference, for example LREC, since the machine learning contributions are small. The AC encourages the authors to consider submitting this work to LREC or a similar conference.",ICLR2021, +NRPZobaUUg,1576800000000.0,1576800000000.0,1,BJeAHkrYDS,BJeAHkrYDS,Paper Decision,Accept (Talk),"This work uses a variational autoencoder-based approach to combine the benefits of recent methods that learn policies with behavioral diversity with the advantages of successor representations, addressing the generalization and slow inference problems of competing methods such as DIAYN. After discussion of the author rebuttal, the reviewers all agreed on the significant contribution of the paper and that concerns about clarity were sufficiently addressed. Thus, I recommend this paper for acceptance.",ICLR2020, +_SVN71XkNN4,1642700000000.0,1642700000000.0,1,QJb1-8NH2Ux,QJb1-8NH2Ux,Paper Decision,Reject,"The paper investigates a very interesting problem of the connections between adversarial detection and adversarial classification. Theoretically, the authors show that one can always (ideally) construct a robust classifier from a robust detector that has equivalent robustness, and vice versa. This theorem is only correct without considering the computational complexity. However, the authors did not provide any approximate results of the reduction steps to verify the feasibility of the theorems in practice, which is the main concern of all reviewers. So we can say the paper is a reminder to the community we need to be careful about the detection results but did not provide any evidence to say they are overclaimed (only a conjecture based on the theorem in the paper) which greatly limits the contribution of the paper. Due to the competitiveness of ICLR, I cannot recommend accepting it.",ICLR2022, +zpu8wy-iGen,1610040000000.0,1610470000000.0,1,S6AtYQLzXOY,S6AtYQLzXOY,Final Decision,Reject,"This paper presents an approach to use spatio-temporal self-similarity (STSS) as a feature for a convolutional neural network for video understanding. The proposed approach extracts STSS as a descriptor capturing similarities between local spatio-temporal regions, and adds conventional layers such as soft-argmax, fully connected layers, and conv. layers on top of it. + +On one hand, all of the reviewers agree that the novelty of the paper is limited. On the other hand, most of the reviewers (except R1) appreciated thoroughness of the experiments and ablations. In the end, the reviewers gave 3 marginally above the acceptance threshold ratings and 1 marginally below the threshold rating. + +The AC views this paper as a borderline paper. None of the reviewers are excited about the paper, and it is a typical ""Nice experiments with limited novelty"" (by R1) paper. The concept of the STSS itself was already proposed in prior studies as mentioned in the paper and by the reviewers, and this paper 'engineers' a new way to take advantage of STSS without further theoratical or conceptual justifications on why it should work. The newly added Kinetics and HMDB results in the rebuttal are nice, but the impact of STSS seems to be minimal in these results. + +Overall, the AC find the paper slighly lacking to be considered for ICLR.",ICLR2021, +HJlk_1rJgV,1544670000000.0,1545350000000.0,1,rke4HiAcY7,rke4HiAcY7,Important cautions about the information bottleneck in typical learning settings,Accept (Poster),"This paper considers the information bottleneck Lagrangian as a tool for studying deep networks in the common case of supervised learning (predicting label Y from features X) with a deterministic model, and identifies a number of troublesome issues. (1) The information bottleneck curve cannot be recovered by optimizing the Lagrangian for different values of β because in the deterministic case, the IB curve is piecewise linear, not strictly concave. (2) Uninteresting representations can lie on the IB curve, so information bottleneck optimality does not imply that a representation is useful. (3) In a multilayer model with a low probability of error, the only tradeoff that successive layers can make between compression and prediction is that deeper layers may compress more. Experiments on MNIST illustrate these issues, and supplementary material shows that these issues also apply to the deterministic information bottleneck and to stochastic models that are nearly deterministic. There was a substantial degree of disagreement between the reviewers of this paper. One reviewer (R3) suggested that all the conclusions of the paper are the consequence of P(X,Y) being degenerate. The authors responded to this criticism in their response and revision quite effectively, in the opinion of the AC. Because R3 failed to participate in the discussion, this review has been discounted in the final decision. The other two reviewers were considerably more positive about the paper, with one (R1) having basically no criticisms and the other (R2) expression some doubts about the novelty of the observations being made in the paper and their importance for practical machine learning scenarios. Following the revision and discussion, R2 expressed general satisfaction with the paper, so the AC is recommending acceptance. The AC thinks that the final paper would be clearer if the authors were to carefully distinguish between ground-truth labels used in training and the labels estimated by the model for a given input. At the moment, the symbol Y appears to be overloaded, standing for both. Perhaps the authors should place a hat over Y when it is standing for estimated labels?",ICLR2019,4: The area chair is confident but not absolutely certain +r1xDD0cLl4,1545150000000.0,1545350000000.0,1,rJgTciR9tm,rJgTciR9tm,meta-review,Reject,The reviewers reached a consensus that the paper is not ready for publication in ICLR. (see more details in the reviews below. ),ICLR2019,4: The area chair is confident but not absolutely certain +uEZufbCr_tn,1642700000000.0,1642700000000.0,1,OcvjQ3yqgTG,OcvjQ3yqgTG,Paper Decision,Reject,"This paper presents an approach ""ImpressLearn"" to continual learning using the idea of task-specific masks. The idea builds upon another idea - SupSup (Wortzman 2020) - which uses a backbone network shared by all the tasks and binary task-specific masks. However, the number of parameters for an approach like SupSup can become excessively large when the number of tasks is very large. This paper presents a solution by having a small number of basis-masks and learning a weighted combination of these basis-masks to use as the task-specific mask for each task. The experimental results show that ImpressLearn yields significant parameter savings as compared to SupSup. + +There were several concerns shared by all the reviewers, such as (1) Limited novelty as compared to SupSup, and (2) Limited experimental evaluation and not having enough baselines. From my own reading of the paper, I largely agree with the assessment of the other reviewers. + +The authors responded to the original reviews and acknowledged some of the concerns raised by the reviewers. The reviewers read the authors' response but their assessment has remained unchanged. + +The basic motivation and the idea is nice but offers limited novelty (especially as compared to SupSup). If the authors could improve the experimental evaluation (more baselines, larger datasets/networks, etc), it will be a much stronger paper. However, in its current shape, I as well as the other reviewers do not think that the paper is ready for publication.",ICLR2022, +mB8fsy5XVE,1576800000000.0,1576800000000.0,1,SyevDaVYwr,SyevDaVYwr,Paper Decision,Reject,"While two reviewers rated this paper as an accept, reviewer 3 strongly believes there are unresolved issues with the work as summarized in their post-rebuttal review. This work seems very promising and while the AC will recommend rejection at this time, the authors are strongly encouraged to resubmit this work.",ICLR2020, +FLqbcZkAZuU,1642700000000.0,1642700000000.0,1,FWiwSGJ_Bpa,FWiwSGJ_Bpa,Paper Decision,Reject,"The reviewers acknowledge that the paper is well written and contains interesting ideas to combine adaptive control and learning. However, they identified issues regarding the claims about transient tracking and STL formula. Moreover, the significance of the presented learning rule was unclear regarding one reviewer. While the authors could respond well to the identified transient tracking issue, they also needed to weaken their claims, limiting the contribution of the paper. The reviewers therefore stayed with a a reject rating.",ICLR2022, +H1e98F5Nx4,1545020000000.0,1545350000000.0,1,SJe2so0qF7,SJe2so0qF7,Misses the point of privacy,Reject,"This paper addresses data sanitization, using a KL-divergence-based notion of privacy. While an interesting goal, the use of average-case as opposed to worst-case privacy misses the point of privacy guarantees, which must protect all individuals. (Otherwise, individuals with truly anomalous private values may be the only ones who opt for the highest levels of privacy, yet this situation will itself leak some information about their private values). + +",ICLR2019,5: The area chair is absolutely certain +ltM7zqpqtx0,1610040000000.0,1610470000000.0,1,XbJiphOWXiU,XbJiphOWXiU,Final Decision,Reject,"All the reviewers unanimously agree that the paper should be rejected. The main concern is well summarized by comment by R1's comment ""While the problem is interesting, I found the paper difficult to read as the task is ill-defined in section 3 where many notation definitions are missing and some notations are reused in different contexts with different definitions"". Also, as R4 mentions the proposed method can be reduced to reward engineering and doesn't provide any scientific or methodological advancement to the problem of testing hypothesis. The authors did not provide any rebuttal. ",ICLR2021, +aA6pb5liVcx,1642700000000.0,1642700000000.0,1,1T5FmILBsq2,1T5FmILBsq2,Paper Decision,Reject,"This paper considers the exploding gradient problem in RNNs. The proposed network SGORNN can be seen as an extension to the FastRNN model by adding orthogonal weight matrices. + +I recommend rejection for this paper mainly for two reasons. + +First, as mentioned in the review of Reviewer 815o and Reviewer W7nS, adding orthogonal constraints into FastRNN should not be considered as a significant technical contribution. + +Second, more importantly, the experiments of the paper are not that convincing. All reviewers raise concerns about this issue. I also do not see the point of comparing the proposed model with a baseline LSTM model of much larger parameter size. I can’t think of a reason to do so. Also I think the small datasets will not give you a lot of meaningful insights in comparing the models – PTB for example, is a rather small dataset for language modeling and the results presented there are far from well. The numbers look really bad, reflecting the quality of how these experiments are done ( https://arxiv.org/pdf/1707.05589.pdf ).",ICLR2022, +ByxrNul4gV,1544980000000.0,1545350000000.0,1,Hk4dFjR5K7,Hk4dFjR5K7,Area chair recommendation,Accept (Poster),"The submission proposes a method to construct adversarial attacks based on deforming an input image rather than adding small peturbations. Although deformations can also be characterized by the difference of the original and deformed image, it is qualitatively and quantitatively different as a small deformation can result in a large difference. + +On the positive side, this paper proposes an interesting form of adversarial attack, whose success can give additional insights on the forms of existing adversarial attacks. The experiments on MNIST and ImageNet are reasonably comprehensive and allow interesting interpretation of how the image deforms to allow the attack. The paper is also praised for its clarity, and cleaner formulation compared to Xiao et al. (see below). Additional experiments during rebuttal phase partially answered reviewer concerns, and provided more information e.g. about the effect of the smoothness of the deformation. + +There were some concerns that the paper primarly presents one idea, and perhaps missed an opportunity for deeper analysis (R1). R2 would have appreciated more analysis on how to defend against the attack. + +A controversial point is the relation / novelty with respect to Xiao et al., ICLR 2018. As e.g. pointed out by R1: ""The paper originates from a document provably written in late 2017, which is before the deposit on arXiv of another article (by different authors, early 2018) which was later accepted to ICLR 2018 [Xiao and al.]. This remark is important in that it changes my rating of the paper (being more indulgent with papers proposing new ideas, as otherwise the novelty is rather low compared to [Xiao and al.])."" + +On the balance, all three reviewers recommended acceptance of the paper. Regarding novelty over Xiao et al., even ignoring the arguable precedence of the current submission, the formulation is cleaner and will likely advance the analysis of adversarial attacks.",ICLR2019,4: The area chair is confident but not absolutely certain +3QRlH-tVV_,1576800000000.0,1576800000000.0,1,SJlRUkrFPS,SJlRUkrFPS,Paper Decision,Accept (Poster),"The paper proposes an algorithm for learning a transport cost function that accurately captures how two datasets are related by leveraging side information such as a subset of correctly labeled points. The reviewers believe that this is an interesting and novel idea. There were several questions and comments, which the authors adequately addressed. I recommend that the paper be accepted.",ICLR2020, +RZDvWBsQG_t,1642700000000.0,1642700000000.0,1,U9zTUXVdoIr,U9zTUXVdoIr,Paper Decision,Reject,"This paper proposes a more generalized form of certified robustness and attempts to provide new results on applying randomized smoothing to semantic transformations such as different types of blurs or distortions. The main idea is to use an image-to-image neural network to approximate semantic transformations, and then certify robustness based on bounds on that neural network. The authors provide empirical results on standard benchmark datasets like MNIST and CIFAR showing that their method can achieve improved results on some transformations compared to prior work. + +The review committee appreciates the authors taking the time to attempt to respond to the concerns of all reviewers, and for updating and improving their work during the rebuttal process. The committee is glad to see that they do provide empirical evidence of improvement to common-corruption robustness, compared to AugMix (one of the state-of-the-art approaches for standard common-corruption robustness) and TSS. + + +However, the reviewers still have concerns about the novelty of the paper. The main novelty is not improvement for resolvable transformations (prior works that the authors cite perform about the same or better), but rather, is the ability to handle non-resolvable transformations. The reviewers agree that robustness to non-resolvable transformations is important; however, the reviewers think certified robustness to non-resolvable transformations is not meaningful, because they are only being certified with respect to a neural network that is trained to approximate those non-resolvable transformations. Without MTurk studies to confirm how good the neural network's non-resolvable transforms are, the reviewers do not find certified robustness here meaningful.",ICLR2022, +7nWDfJUor,1576800000000.0,1576800000000.0,1,BJlRs34Fvr,BJlRs34Fvr,Paper Decision,Accept (Spotlight),"This paper makes the observation that, by adjusting the ratio of gradients from skip connections and residual connections in ResNet-family networks in a projected gradient descent attack (that is, upweighting the contribution of the skip connection gradient), one can obtain more transferable adversarial examples. This is evaluated empirically in the single-model black box transfer setting, against a wide range of models, both with and without countermeasures. + +Reviewers praised the novelty and simplicity of the method, the breadth of empirical results, and the review of related work. Concerns were raised regarding a lack of variance reporting, strength of the baselines vs. numbers reported in the literature, and the lack of consideration paid to the threat model under which an adversary employs an ensemble of source models, as well as the framing given by the original title and abstract. All of these appear to have been satisfactorily addressed, in a fine example of what ICLR's review & revision process can yield. It is therefore my pleasure to recommend acceptance.",ICLR2020, +g0RvFiw63t,1576800000000.0,1576800000000.0,1,rylkma4twr,rylkma4twr,Paper Decision,Reject,"This paper proposes convergence results for zeroth-order optimization. + +One of the main complaints was that ZO has limited use in ML. I appreciate the authors' response that there are cases where gradients are not easily available, especially for black-box attacks. + +However, I find the limited applicability an issue for ICLR and I encourage the authors to find a conference that is more suited to that work.",ICLR2020, +BkJpQkpHG,1517250000000.0,1517260000000.0,229,SyoDInJ0-,SyoDInJ0-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The reviewers are unanimous in accepting the paper. They generally view it as introducing an original approach to online RL using bandit-style selection from a fixed portfolio of off-policy algorithms. Furthermore, rigorous theoretical analysis shows that the algorithm achieves near-optimal performance. + +The only real knock on the paper is that they use a weak notion of regret i.e. short-sighted pseudo regret. This is considered inevitable, given the setting.",ICLR2018, +RQIVVZPpOZ,1576800000000.0,1576800000000.0,1,Syxss0EYPS,Syxss0EYPS,Paper Decision,Reject,"The authors propose an agent that can act in an RL environment to verify hypotheses about it, using hypotheses formulated as triplets of pre-condition, action sequence, and post-condition variables. Training then proceeds in multiple stages, including a pretraining phase using a reward function that encourages the agent to learn the hypothesis triplets. + +Strengths: Reviewers generally agreed it’s an important problem and interesting approach + +Weaknesses: There were some points of convergence among reviewer comments: lack of connection to existing literature (ie to causal reasoning and POMDPs), and concerns about the robustness of the results (which were only reporting the max seeds). Two reviewers also found the use of natural language to unnecessarily complicate their setup. Overall, clarity seemed to be an issue. Other comments concerned lack of comparisons, analyses, and suggestions for alternate methods of rewarding the agent (to improve understandability). + +The authors deserve credit for their responsiveness to reviewer comments and for the considerable amount of additional work done in the rebuttal period. However, these efforts ultimately didn’t satisfy the reviewers enough to change their scores. Although I find that the additional experiments and revisions have significantly strengthened the paper, I don't believe it's currently ready for publication at ICLR. I urge the authors to focus on clearly presenting and integrating these new results in a future submission, which I look forward to. +",ICLR2020, +9vWLSWYvh1,1576800000000.0,1576800000000.0,1,HJlWWJSFDH,HJlWWJSFDH,Paper Decision,Accept (Spotlight),All three reviewers are consistently positive on this paper. Thus an accept is recommended.,ICLR2020, +SJov8Jprf,1517250000000.0,1517260000000.0,807,r1Oen--RW,r1Oen--RW,ICLR 2018 Conference Acceptance Decision,Reject,"This paper showcases how saliency methods are brittle and cannot be trusted to obtain robust explanations. They define a property called input invariance that they claim all reliable explanation methods must possess. The reviewers have concerns regarding the motivation of this property in terms of why is it needed. This is not clear from the exposition. Moreover, even after having the opportunity to update the manuscript they seem to have not touched upon this issue other than providing a generic response.",ICLR2018, +_g6_LAWdmR3,1610040000000.0,1610470000000.0,1,BUlyHkzjgmA,BUlyHkzjgmA,Final Decision,Accept (Poster),"High-quality theoretical paper that studies the connection between concentration of the data distribution and adversarial robustness. It contributes a method for more accurate estimation of concentration, which allows drawing stronger conclusions about adversarial robustness compared to previous work. The paper is highly technical, but written clearly and precisely. All reviewers give positive scores, with only minor negative comments. + +One minor concern I have is that the potential audience of the paper might be small, given its highly technical nature and very specialized line of research it follows. Still, I believe it's a solid contribution, so I'm happy to recommend acceptance.",ICLR2021, +GXVFHnVTI6B,1642700000000.0,1642700000000.0,1,mniwiEAuzL,mniwiEAuzL,Paper Decision,Reject,"This paper proposed algorithms (based on natural actor-critic methods) to solve two-player zero-sum Markov games. The authors established theoretical support for the convergence properties--and hence sample complexity---of the proposed methods. The authors claimed, based on their theoretical results, that the proposed methods are sample-efficient. + +As the reviewers pointed out, the original submission focused on the dependency on epsilon without explicit dependency on other important parameters like S, A, B, etc. The revised version has made explicit the dependencies on all these problem parameters, which I appreciated. However, the sample complexity presented in the new version scales as either S^3 max{A,B} or S^4 * \max{A,B}^6 on the sizes of state space and action spaces, which are all huge. What is more, the sample complexities also rely on additional parameters like rho, x, y, which could all depend on S,A,B, etc. As a result, the resulting sample complexity bounds do not seem to imply sample efficiency. In addition, Assumptions 1 and 2 are somewhat unnatural to make.",ICLR2022, +LO6qrJtyCZ,1576800000000.0,1576800000000.0,1,SkxMjxHYPS,SkxMjxHYPS,Paper Decision,Reject,"This paper examines how different distributions of the layer-wise number of CNN filters, as partitioned into a set of fixed templates, impacts the performance of various baseline deep architectures. Testing is conducting from the viewpoint of balancing accuracy with various resource metrics such as number of parameters, memory footprint, etc. + +In the end, reviewer scores were partitioned as two accepts and two rejects. However, the actual comments indicate that both nominal accept reviewers expressed borderline opinions regarding this work (e.g., one preferred a score of 4 or 5 if available, while the other explicitly stated that the paper was borderline acceptance-worthy). Consequently in aggregate there was no strong support for acceptance and non-dismissable sentiment towards rejection. + +For example, consistent with reviewer comments, a primary concern with this paper is that the novelty and technical contribution is rather limited, and hence, to warrant acceptance the empirical component should be especially compelling. However, all the experiments are limited to cifar10/cifar100 data, with the exception of a couple extra tests on tiny ImageNet added after the rebuttal. But these latter experiments are not so convincing since the base architecture has the best accuracy on VGG, and only on a single MobileNet test do we actually see clear-cut improvement. Moreover, these new results appear to be based on just a single trial per data set (this important detail is unclear), and judging from Figure 2 of the revision, MobileNet results on cifar data can have very high variance blurring the distinction between methods. It is therefore hard to draw firm conclusions at this point, and these two additional tiny ImageNet tests notwithstanding, we don't really know how to differentiate phenomena that are intrinsic to cifar data from other potentially relevant factors. + +Overall then, my view is that far more testing with different data types is warranted to strengthen the conclusions of this paper and compensate for the modest technical contribution. Note also that training with all of these different filter templates is likely no less computationally expensive than some state-of-the-art pruning or related compression methods, and therefore it would be worth comparing head-to-head with such approaches. This is especially true given that in many scenarios, test-time computational resources are more critical than marginal differences in training time, etc.",ICLR2020, +1bl_Ht56c_,1642700000000.0,1642700000000.0,1,OhytAdNSzO-,OhytAdNSzO-,Paper Decision,Reject,"This paper explores strategies for scaling vision transformers that can be transferable across hardware devices and ViT variants. While it presents some interesting observations as well as a useful practical guide, multiple reviewers expressed major concerns over the novelty and significance of the methods and findings. Besides novelty and significance, there are also some concerns about comparison with existing work as well as clarity of the presentation.",ICLR2022, +2fWbvW_zrAH,1610040000000.0,1610470000000.0,1,Dtahsj2FkrK,Dtahsj2FkrK,Final Decision,Reject,"This paper proposes a testing procedure to determine whether a policy is better than another policy with respect to long-term treatment effects. The reviewers found the problem interesting and saw a lot of value in this work. One of the key concerns was the lack of clarity throughout the paper. The reviews helped the authors actively revise the paper, improving the paper's overall readability throughout the discussion phase. However, the reviewers did not change their ratings. While I agree that this work has merits, since there are many legibility issues, I cannot recommend its acceptance at this stage. + +",ICLR2021, +AbjKCGwpnf,1576800000000.0,1576800000000.0,1,HJxKhyStPH,HJxKhyStPH,Paper Decision,Reject,"The paper analyses the effect of different loss functions for TransE and argues that certain limitations of TransE can be mitigated by choosing more appropriate loss functions. The submission then proposes TransComplEx to further improve results. This paper received four reviews, with three recommending rejection, and one recommending weak acceptance. A main concern was in the clarity of motivating the different models. Another was in the relatively low performance of RotatE compared with [1], which was raised by multiple reviewers. The authors provided extensive responses to the concerns raised by the reviewers. However, at least the implementation of RotatE remains of concern, with the response of the authors indicating ""Please note that we couldn’t use exactly the same setting of RotatE due to limitations in our infrastructure."" On the balance, a majority of reviewers felt that the paper was not suitable for publication in its current form.",ICLR2020, +rknlEJ6Sf,1517250000000.0,1517260000000.0,283,B1DmUzWAW,B1DmUzWAW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"An interesting new approach for doing meta-learning incorporating temporal convolution blocks and soft attention. Achieves impressive SOTA results on few shot learning tasks and a number of RL tasks. I appreciate the authors doing the ablation studies in the appendix as that raises my confidence in the novelty aspect of this work. I thus recommend acceptance, but do encourage the authors to perform the ablation experiments promised to Reviewer 1 (especially the one to ""show how much SNAILs performance degrades when TCs are replaced with this method [of Vaswani et al.]."")",ICLR2018, +IfSpHqQ7tHJ,1642700000000.0,1642700000000.0,1,PQTW3iG4sC-,PQTW3iG4sC-,Paper Decision,Accept (Poster),"This paper studies optimization of over-parametrized neural networks in the mean-field scaling. Specifically, when the input dimension in larger than the number of training samples, the paper shows that the training loss converges to 0 at a linear rate under gradient flow. It's possible to extend the result by random feature layers to handle the case when input dimension is low. Empirically the dynamics in this paper seems to achieve better generalization performance than the NTK counterpart, but no theoretical result is known. Overall this is a solid contribution to the hard problem of analyzing the training dynamics of mean-field regime. There was some debate between reviewers on what is the definition of ""feature learning"" and I recommend the authors to give an explicit definition of what they mean (and potentially use a different term).",ICLR2022, +BjKeuvNg4H4,1642700000000.0,1642700000000.0,1,HUeyM2qVey2,HUeyM2qVey2,Paper Decision,Reject,"First this is the seed for a very good paper on approximating manifolds and densities using injective flows. + +Reviewers have done an admirable effort reviewing the paper giving detailed reviews and suggestions to improve the theory and corrections that resulted in an improvement of the paper during the rebuttal/ revision phase. + +Unfortunately the paper still needs major rewriting and organization to be accessible by other readers, and should undergo another round of review in its last polished version to further vet the correctness of some of its claims as explained below . + +The paper was discussed at length among reviewers and the AC and here are the suggestions to improve the paper. + +* Implementing Reviewer 4sjW suggestion w.r.t to the narrative and adding explanations to improve the readability and accessibility of the paper. + +* Another concerns were raised by reviewer eR1p in the discussion regarding the correctness of Theorem 1 and Corollary 1. "" The proof of Corollary 1 is so rough that I could not confirm its correctness. For example, the functions $r$ and $f$ are undefined."" Please revisit the proof of this Corollary. Theorem 1 builds on Lemma 7 point 5. In point 5 of Lemma 7 :""The embedding $r$ depends on $\epsilon$ , hence so is the measure $\mu$.Therefor the statement $W_2(g \mu',f \mu)< B_{K,W}(f,g) + \epsilon$ for all $\epsilon$, does not imply that $W_2(g \mu',f \mu)< B_{K,W}(f,g)$. One solution can be by building a sequence of measures that would converge to that measure and see if the argument goes through. + + We encourage the authors to implement all the feedback and suggestions of the reviewers and to submit this interesting work to an upcoming venue.",ICLR2022, +uZruT68ZS_i,1642700000000.0,1642700000000.0,1,ab7lBP7Fb60,ab7lBP7Fb60,Paper Decision,Reject,"This paper continues the investigation on fairness and privacy in the context of federated learning. We appreciate the detailed response from the authors. During the rebuttal period, the authors have largely updated the set of experiments, since there was an identified bug in the previous implementation. Another drawback that the AC identified is that there is a lack of formulation and formal guarantees in the paper. In particular, is the proposed algorithm trying to satisfy example-level or client-level data privacy? The resulting noise scale can be quite different. Unlike prior work (e.g. Jagielski et al), the proposed algorithm does not seem to provide any fairness guarantee. Thus, it is not clear why the proposed approach is justified (even under some assumptions). In a similar vein, perhaps the authors could consider a more in-depth discussion that compares their approach with prior work and articulate what advantages does their new method offers. Overall, the paper is not ready for publication at ICLR.",ICLR2022, +Rsg83YT1SPc,1610040000000.0,1610470000000.0,1,TDDZxmr6851,TDDZxmr6851,Final Decision,Reject,"I agree with the reviewers that said that this paper has valuable insights. However, all reviewers ultimately recommended rejection. I think the main reason was that the reviewers did not feel these insights don't accumulate together to a message that would justify a paper. I hope the authors can address these concerns and resubmit. There were additional concerns, like the fact a very simplistic toy model is being used, but I agree with the authors that it makes sense to first explore such phenomena in the simplest model that produces them.",ICLR2021, +Mrk_B8jDEOE,1642700000000.0,1642700000000.0,1,nioAdKCEdXB,nioAdKCEdXB,Paper Decision,Accept (Poster),"The paper presents a new computational framework, grounded on Forward-Backward SDEs theory, for the log-likelihood training of Schrödinger Bridge and provides theoretical connections to score-based generative models. The presentation of the results is not satisfactory (the algorithm should be clarified in several places and the notation is not accurate which raises doubts about the soundness of the method). The paper is thus very hard to read for the non-experts on the subject. Furthermore, some reviewers raise concerns about the similarity of this method to other algorithms that were never cited in the paper. Finally, the empirical analysis, as of now, is limited. + +In the rebuttal the authors carefully addressed lots of the comments. However paper's presentation still needs to be substantially improved (de-densification of the paper would be extremely important since now the main narrative is very convoluted). The authors made several changes in the manuscript, but detailed discussion regarding training time complexity still seems to be missing (main body and the Appendix) in the new version of the manuscript, even though this was one of the main raised concerns. Overall, the manuscript requires major rewriting. Since the comments regarding the content were successfully addressed (the reviewers are satisfied with detailed answers given by the authors), the paper satisfies the conference bar and can be accepted.",ICLR2022, +r179HyTBf,1517250000000.0,1517260000000.0,623,HJMN-xWC-,HJMN-xWC-,ICLR 2018 Conference Acceptance Decision,Reject,"I am inclined to agree with R1 that there is an extensive literature on learning architectures now, and I have seen two others as part of my area chairing. This paper does not offer comparisons to existing methods for architecture learning other than very basic ones and that reduces the strength of the paper significantly. Further the broad exploration over 17 tasks is more overwhelming, than adding to an insight into the methods.",ICLR2018, +tCfjc5UM8eX,1642700000000.0,1642700000000.0,1,aOX3a9q3RVV,aOX3a9q3RVV,Paper Decision,Accept (Poster),"This paper explores addition of a version of divisive normalization to AlexNets and +compares performance and other measures of these networks to those +with more commmonly used normalization schemes (batch, group, and +layer norm). Various tests are performed to explore the effect of +their divisive normalization. + +Scores were initially mixed but after clarifications for design and +experiment decisions, and experiments run in response to comments by +the reviewers the paper improved significantly. While reviewers still +had several suggestions for further improvements, after the authors' +revisions reviewers were in favor of acceptance which I support.",ICLR2022, +-2u2Au4kY4,1576800000000.0,1576800000000.0,1,r1gIa0NtDH,r1gIa0NtDH,Paper Decision,Reject,"The paper proposed an autoregressive model with a multiscale generative representation of the spectrograms to better modeling the long term dependencies in audio signals. The techniques developed in the paper are novel and interesting. The main concern is the validation of the method. The paper presented some human listening studies to compare long-term structure on unconditional samples, which as also mentioned by reviewers are not particularly useful. Including justifications on the usefulness of the learned representation for any downstream task would make the work much more solid. ",ICLR2020, +EQWJGfxtIl,1576800000000.0,1576800000000.0,1,BkxGAREYwB,BkxGAREYwB,Paper Decision,Reject,"The authors propose to use numerical differentiation to approximate the Jacobian while estimating the parameters for a collection of Hidden Markov Models (HMMs). Two reviewers provided detailed and constructive comments, while unanimously rated weak rejection. Reviewer #1 likes the general idea of the work, and consider the contribution to be sound. However, he concerns the reproducibility of the work due to the niche database from e-commerce applications. Reviewer #2 concerns the poor presentation, especially section 3. The authors respond to Reviewers’ concerns but did not change the rating. The ACs concur the concerns and the paper can not be accepted at its current state.",ICLR2020, +rxBB1m4Kk6,1576800000000.0,1576800000000.0,1,HylxE1HKwS,HylxE1HKwS,Paper Decision,Accept (Poster),"The authors propose a new method for neural architecture search, except it's not exactly that because model training is separated from architecture, which is the main point of the paper. Once this network is trained, sub-networks can be distilled from it and used for specific tasks. + +The paper as submitted missed certain details, but after this was pointed out by reviewers the details were satisfactorily described by the authors. + +The idea of the paper is original and interesting. The paper is correct and, after the revisions by authors, complete. In my view, this is sufficient for acceptance.",ICLR2020, +HklZlzmtx4,1545310000000.0,1545350000000.0,1,HyesW2C9YQ,HyesW2C9YQ,metareview,Reject,"The reviewers raised a number of concerns including the usefulness of the presented dataset given that the collected data is acted rather than naturalistic (and the large body of research in affective computing explains that models trained on acted data cannot generalise to naturalistic data), no methodological novelty in the presented work, and relatively uninteresting application with very limited real-world application (it remains unclear whether having better empathetic dialogues would be truly crucial for any real-life application and, in addition, all work is based on acted rather than real-world data). The authors’ rebuttal addressed some of the reviewers’ concerns but not fully (especially when it comes to usefulness of the data). Overall, I believe that the effort to collect the presented database is noble and may be useful to the community to a small extent. However, given the unrealism of the data and, in turn, very limited (if any) generalisability of the presented to real-world scenarios, and lack of methodological contribution, I cannot recommend this paper for presentation at ICLR.",ICLR2019,5: The area chair is absolutely certain +1hPAy5wXO,1576800000000.0,1576800000000.0,1,SJlPOCEKvH,SJlPOCEKvH,Paper Decision,Reject,"This work explores weight pruning for BERT in three broad regimes of transfer learning: low, medium and high. + +Overall, the paper is well written and explained and the goal of efficient training and inference is meaningful. Reviewers have major concerns about this work is its technical innovation and value to the community: a reuse of pruning to BERT is not new in technical perspective, the marginal improvement in pruning ratio compared to other compression method for BERT, and the introduced sparsity that hinders efficient computation for modern hardware such as GPU. The rebuttal failed to answer a majority of these important concerns. + +Hence I recommend rejection.",ICLR2020, +SPoGClV2DZ,1576800000000.0,1576800000000.0,1,BkexaxBKPB,BkexaxBKPB,Paper Decision,Reject,"The general consensus amongst the reviewers is that this paper is not quite ready for publication. The reviewers raised several issues with your paper, which I hope will help you as you work towards finding a home for this work.",ICLR2020, +mjNr-GzqxY3,1610040000000.0,1610470000000.0,1,9CG8RW_p3Y,9CG8RW_p3Y,Final Decision,Reject,"This paper studies an interesting information-theoretic trade-off between accuracy and invariance by posing it as a minimax problem. The results are of theoretical nature. However, the implications of the results are not clear. Also, the model/assumptions authors consider are not completely justified. Therefore, the paper at this stage is not recommended for acceptance. However, I highly encourage the authors to improve upon their existing work and resubmit to the next ML conference. ",ICLR2021, +c-idfmzxFfP,1642700000000.0,1642700000000.0,1,r8S93OsHWEf,r8S93OsHWEf,Paper Decision,Reject,"This paper proposes a method to improve the robust accuracy of classifiers using test-time training. The reviewers all agree that the method is interesting, and many reviewers had a positive view of the method. However, two main criticisms remain: (i) the method increases the runtime of inference, and (ii) comparisons to other related methods were lacking. The authors responded to (i) by reporting runtimes for their method in the rebuttal. Some reviewers were concerned that the runtime increase of the method is not acceptable, however I am not very concerned with this issue since I think the paper contains an interesting methodology even if it’s not ready for deployment at the industrial scale. However, issue (ii) does not seem to have been adequately addressed. The comparison to SOAP is a welcome addition the reviewers acknowledge, but a number of other methods, for example masking and cleansing, are closely related (but different) and so comparisons should be provided.",ICLR2022, +UFtX-Xgy_qP,1610040000000.0,1610470000000.0,1,LSFCEb3GYU7,LSFCEb3GYU7,Final Decision,Accept (Spotlight),"The paper proposes a recurrent neural network architecture for abstract rule learning. An LSTM is augmented with a two-stream memory structure: one block is populated with visual representations, and the other is populated by hidden state vectors from the RNN controller. + +The authors also introduce a set of tasks that require a simple symbolic reasoning on visual inputs and strong extrapolation ability. They show that previous memory-augmented neural networks fail on these tasks, whereas their model exhibits excellent generalization with limited training data. + +Pro: The work addresses an important and open question in neuroscience and deep learning. The proposed solution is simple and effective. The manuscript is well-written. It was also improved in a revised version after the first review round. + +Con: The main criticism raised by the reviewers was that the considered tasks may be a too simplified synthetic task. It would have been good to consider other the more complex tasks involving symbolic reasoning such as CLEVR or bAbI. + +While this is a valid criticism, all reviewers agreed that this is an interesting and important work worth publishing. In particular, the considered question is of pivotal importance for the community and the work presents a significant progress.",ICLR2021, +Jk2soSiADLT,1642700000000.0,1642700000000.0,1,bVT5w39X0a,bVT5w39X0a,Paper Decision,Reject,"The paper extends the FNP model to multimodal settings using the mixture of graphs. However, there are legitimate concerns about the quality of experiments, such as baselines, as the reviewers mention. For example, mRNP is supervised, and comparison to DeepIMV is not fair. I encourage the authors to address them appropriately in the next version of the paper. + +The authors can significantly improve the presentation of ideas. Please avoid making hyperbole and excessively bold statements, as the reviewers have pointed out. This way, there will be room for a better demonstration of the novel parts of the paper. For example, the authors misuse the term ""generative"" for the proposed mRNP. There are multiple hand-waving statements about the role of uncertainty that are not well-supported in the current draft. I believe this paper can be a good paper by addressing the reviewers' comments.",ICLR2022, +efyk8XdIiW,1576800000000.0,1576800000000.0,1,r1e7M6VYwH,r1e7M6VYwH,Paper Decision,Reject,"All of the reviewers agree the paper has an interesting idea (using rotations of the representation as regularization). However, the reviewers also agree the empirical gains are too insignificant. While the paper shows results on CIFAR, the reviewers mentioned a few other ways to improve performance, such as more complex and unconstrained datasets. These additional experiments would make the effectiveness of proposed approach more convincing.",ICLR2020, +#NAME?,1576800000000.0,1576800000000.0,1,r1xapAEKwS,r1xapAEKwS,Paper Decision,Reject,"This paper presents a method for merging a discriminative GMM with an ARD sparsity-promoting prior. This is accomplished by nesting the ARD prior update within a larger EM-based routine for handling the GMM, allowing the model to automatically remove redundant components and improve generalization. The resulting algorithm was deployed on standard benchmark data sets and compared against existing baselines such as logistic regression, RVMs, and SVMs. + +Overall, one potential weakness of this paper, which is admittedly somewhat subjective, is that the exhibited novelty of the proposed approach is modest. Indeed ARD approaches are now widely used in various capacities, and even if some hurdles must be overcome to implement the specific marriage with a discriminative GMM as reported here, at least one reviewer did not feel that this was sufficient to warrant publication. Other concerns related to the experiments and comparison with existing work. For example, one reviewer mentioned comparisons with Panousis et al., ""Nonparametric Bayesian Deep Networks with Local Competition,"" ICML 2019 and requested a discussion of differences. However, the rebuttal merely deferred this consideration to future work and provided no feedback regarding similarities or differences. In the end, all reviewers recommended rejecting this paper and I did not find any sufficient reason to overrule this consensus.",ICLR2020, +RmVD7aLDwxf,1610040000000.0,1610470000000.0,1,Uf_WNt41tUA,Uf_WNt41tUA,Final Decision,Reject,"The paper proposes a method for the interesting task of dialog summarisation which is slowly getting attention from the research community. In particular, they propose a method which first generates a summary draft and then a final draft. + +Pros: +1) The paper is well written +2) Addresses an interesting problem +3) SOTA results + +Cons: +1) Lack of novelty +2) No quantitative analysis of the summary draft though it is as an important part of the proposed solution +3) Human evaluations are not adequate (the authors have said they will expand on this but clear details are not provided) +4) The BART model seems to have some advantage as it is pre-trained on XSUM data whereas some of the other models are not (the authors haven't clarified this sufficiently in the rebuttal) + +Overall, the reviewers were not completely happy with the work and there was not clear champion. ",ICLR2021, +nNE6-uGq4u,1610040000000.0,1610470000000.0,1,WC04PD6dFrP,WC04PD6dFrP,Final Decision,Reject,"The paper considers the OPE problem under the contextual bandit model with continuous action. They studied the model of a piecewise constant value function according to the actions. The assumption is new, though still somewhat restrictive as it requires the piecewise constant partitions to be the same for all x. The proposed algorithm estimates the partitions, and then used it to build a doubly robust estimator with stratified importance sampling (fitting an MLP for each partition separately). + + +The reviewers have mixed views about the paper. The following is the AC's evaluation based on reading the paper and consolidating the reviewers' comments and the authors' responses. + +Pros: + +- The algorithm is new and it makes sense for the new problem setup (though computationally intractable) +- The experimental results outperform the baseline and reinforces the theory. But it's a toy example at best. + +Cons: + +- The method is called ""Q-learning"" but it is somewhat disappointing to see that it actually applies only to the contextual bandit model (without dynamics). There is quite a bit of branding issues here. I suggest the authors to revise it to reflect the actual problem setup. + +- The estimator is assumed to be arg min, but the objective function is non-convex and cannot be solved efficiently in general, e.g., (3) involves searching over all partitions... and (4) involves solving neural network partitions. In other words, the result applies to a hypothetical minimizer that the practical solvers may or may not obtain (the authors cited Scikit-Learn for the optimization algorithm and claims that the optimization problem can be solved, which is not the case ... the SGD algorithm can be applied to solve it, but it does not necessarily find you the solution). + +- The theory is completely asymptotic and generic. There is no rate of convergence specified, and no dependence on the number of jumps |D_0| at all in Theorem 1. + +- Theorem 3 is obnoxiously sloppy. The assumptions are not made explicit (do you need Assumption 1 and 2, what is the choice of \rho? ) The notion of ""minimax rate"" is not defined at all. Usually the minimax rate is the property of problem setting, i.e., Min over all algorithms, and Max over all problems with in a family. However, in the way the authors described the results in Theorem 3, it says the ""the minimax convergence rate of kernel-based estimator is Op(n^{−1/3})."" which seems to be restricting the algorithms instead. Such non-typical choices require clear definitions and justification. Based on what is stated, it really appears that the authors are just comparing upper bounds of the two methods. + +I looked at the appendix and while there is a ""lower bound analysis"", the bound is not information-theoretical, but rather a fixed example where an unspecified family of algorithms (I think it is a specific kernel smoothing method with a arbitrary choice of the bandwidth parameter h) will fail. + +Suggestions to the authors: + +- Instead of a piecewise constant (and uniformly bounded) function, why not consider the total variation class, which is strictly more general and comes with the same rate? + +- For formalizing the lower bound, I suggest the authors to look into classical lower bounds for linear smoother, e.g., Donoho, Liu, MacGibbon (1990); which clearly illustrates that kernel smoothing-type methods do not achieve the minimax rates; and that wavelets-based approaches, locally adaptive regression splines, and fused lasso (You can think about the Haar Wavelets as a basis function of piecewise linear functions ) do. + +The authors can improve the paper by ensuring that the theoretical parts are clearly and rigorously presented; and perhaps to iron out the more useful finite-sample analysis that depends on model parameters of interest.",ICLR2021, +H11kDJ6Sf,1517250000000.0,1517260000000.0,904,H1pri9vTZ,H1pri9vTZ,ICLR 2018 Conference Acceptance Decision,Reject,"The idea of extending deep nets to infinite dimensional inputs is interesting but, as the reviewers noted, the execution does not have the quality we can expect from an ICLR publication. I encourage the authors to consider the meaningful comments that were made and modify the paper accordingly.",ICLR2018, +ujcWT8zNslN,1642700000000.0,1642700000000.0,1,BGvt0ghNgA,BGvt0ghNgA,Paper Decision,Accept (Poster),The paper introduces unsupervised skill discovery using Lipschitz-constrained skills. It is well-written and demonstrates the advantages in a solid experimental section.,ICLR2022, +b-hNKhM8xdx,1610040000000.0,1610470000000.0,1,FKotzp6PZJw,FKotzp6PZJw,Final Decision,Reject,"This paper is rejected. + +I and the reviewers appreciate the changes made by the authors. The paper presents: +* An analysis (based on techniques from previous work) of double Q-learning which shows that in an analytic model, double Q-learning can have multiple sub-optimal ""approximated"" fixed points. +* Propose a modification of the update that uses collected trajectories to lower bound the optimal value. +* Experiments on several Atari games. + +While the theoretical results on double Q-learning are interesting, the authors provide little theoretical analysis of their proposed approach. Doing so will significantly strengthen the paper. Additionally, reviewers had concerns about the experiments. R2 questions the parameter setting in the multi-step experiments. ",ICLR2021, +xLOiH_IJi,1576800000000.0,1576800000000.0,1,Hkl_bCVKDr,Hkl_bCVKDr,Paper Decision,Reject,"(1) the authors emphasize the theoretical contribution and claims the bound are tighter. However, they did not directly compare with any certified robust methods, or previous bounds to support the argument. +HM, not sure, need to check this + +(2) The empirical results look suboptimal. The authors did not convince me why they sampled 1000 images for test for a small CIFAR-10 dataset. The proposed method is 10% less robust comparing to Madry's in table 1. +Seems ok, understand authors response + +1) The theoretical analysis are not terribly new, which is just a straightforward application of first-order Taylor expansion. This idea could be traced back to the very first paper on adversarial examples FGSM (Goodfellow et al 2014). +True + +2) The novelty of the paper is to replace exact gradient (w.r.t input) by their finite difference and use it as a regularization. However, there is a misalignment between the theory and the proposed algorithm. The theory only encourages input gradient regularization, regardless to how it is evaluated, and previous studies have shown that this is not a very effective way to improve robustness. According to the experiments, the main empirical improvement comes from the finite difference implementation but the benefit of finite difference is not justified/discussed by the theory. Therefore, the empirical improvement are not supported by the theory. Authors have briefly respond to this issue in the discussion but I believe a more rigorous analysis is needed. +This seems okay based on author response + +3) Moreover, the empirical performance does not achieve state-of-the-art result. Indeed, there is a non-negligible gap (12%) between the obtained performance and some well-known baseline. Thus the empirical contribution is also limited. +Yea, for some cases",ICLR2020, +EFj5MHtyvT,1576800000000.0,1576800000000.0,1,B1l5m6VFwr,B1l5m6VFwr,Paper Decision,Reject,The paper proposed new version of LSTM which is claimed to abandon the redundancies in LSTM. It is weak both in theory and experiments. All reviewers gave clear rejects and the AC agree.,ICLR2020, +yZXKfKU2Bg,1576800000000.0,1576800000000.0,1,BJxI5gHKDr,BJxI5gHKDr,Paper Decision,Accept (Poster),"The paper points out pitfalls of existing metrics for in-domain uncertainty quantification, and also studies different strategies for ensembling techniques. + +The authors also satisfactorily addressed the reviewers' questions during the rebuttal phase. In the end, all the reviewers agreed that this is a valuable contribution and paper deserves to be accepted. + +Nice work!",ICLR2020, +rJgDczdXlE,1544940000000.0,1545350000000.0,1,rygunsAqYQ,rygunsAqYQ,Metareview,Reject,"The manuscript proposes a novel estimation technique for generative models based on fast nearest neighbors and inspired by maximum likelihood estimation. Overall, reviewers and AC agree that the general problem statement is timely and interesting, and the subject is of interest to the ICLR community + +The reviewers and ACs note weakness in the evaluation of the proposed method. In particular, reviewers note that the Parzen-based log-likelihood estimate is known to be unreliable in high-dimensions. This makes a quantitative evaluation of the results challenging, thus other metrics should be evaluated. Reviewers also expressed concerns about the strengths of the baselines compared. Additional concerns are raised with regards to scalability which the authors address in the rebuttal. ",ICLR2019,4: The area chair is confident but not absolutely certain +tShDrcBhTIh,1642700000000.0,1642700000000.0,1,i--G7mhB19P,i--G7mhB19P,Paper Decision,Reject,"*Summary:* Study inductive bias of natural gradient flow. + +*Strengths:* +- Some reviewers found the invariance to reparametrization insightful, a good way to better understand the interaction of the learning rule and parametrization. +- Experiments support the theory. + +*Weaknesses:* +- Unclear takeaway message. +- Comparison with Euclidean case not comprehensive. +- Insufficient distinction between reparametrization (invertible) and different parametrization. No experiments on actual dataset/architecture. +- Some reviewers found the the cases considered in the paper are already well understood. + +*Discussion:* + +2Yhk found that although the author responses and other reviews clarified some of their concerns, particularly about reparametrization conditions, the result provided in the paper is not strong enough and could be further clarified. The authors found that this reviewer might have misunderstood the paper. Following the discussion period, the reviewer raised his/her score and lowered his/her confidence. In response to CBDn the authors added demonstration of NGD being worse than EGD on matrix completion. In one of the responses, the authors summarize their contribution as: ''replacing EGD with NGD … disturbs the second mechanism [dynamics and GD trajectories]''. I find the question really is what kind of quantitative conclusions can be made. gWo5 pointed out important related work that was not discussed in the initial submission. Authors added discussion. VPSX misses applications to less well understood settings. Authors however only offer to keep this in mind for future work. VPSX also asks to emphasize the insights into the inductive bias of NGD. Authors added some discussion, however mostly pertaining previous works and not specific enough in my opinion. + +*Conclusion:* + +One reviewer found this work marginally above the acceptance threshold and four other reviewers found that it does not reach the bar for acceptance. I find the topic worthwhile and that it deserves a thorough investigation. However, I concur with the reviewers that some concepts require a clearer presentation and that it would be desirable to see more general results and more comprehensive discussions. Several suggestions were made by the reviewers and acknowledged by the authors, but many of these were left for future work. In summary, I think that the article makes a good start but needs more work. Therefore I am recommending a rejection at this time. I encourage the authors to revise and resubmit.",ICLR2022, +MfoSlTdAxB,1576800000000.0,1576800000000.0,1,rklraTNFwB,rklraTNFwB,Paper Decision,Reject,"The paper examines whether it is possible to train agents to follow synthetic instructions that perceives and modifies a 3D scene based on a first-person viewpoint, and have the trained agents follow natural language instructions provided by humans. + +The paper received two weak rejects and one weak accept. The main concerns voiced by the reviewers are: +1. Lack of variety in natural language +One of the key claims of the paper is that previous work on instruction following can only handle instructions generated from templates and cannot handle ambiguous expressions used by real people, and that the contribution of this work is that it can handle such expresssions. However, as pointed out by R1, the language considered in this work is very simplistic in form (close to being template based) with the main variation coming from synonyms. Even the free-form natural instructions that are collected, are done so with very specific instructions that restrict diversity of language (e.g don't use colors or other properties of the object). R1 also point out that there are prior work that handles much more diverse language. + +2. Limited technical novelty and questions about how much the proposed CMSA method actually contribute + +3. Overclaims and lack of precision when using terminology +There is concern that the task that is addressed is not actually that complex. The environments are simple (with just 2 objects) and not that realistic. Tackling 2 tasks is barely ""multi-task"", and commonly, ""manipulation"" refers to low-level grasping/picking up of objects which is not how it is used here. + +While the paper has many strong elements and is mostly well written, considerable improvements still need to be made for the paper to have claims it can support. It is currently below the bar for acceptance. The authors are encouraged to improve their paper and resubmit to an appropriate venue. +",ICLR2020, +B1e_KGRllV,1544770000000.0,1545350000000.0,1,HyMnYiR9Y7,HyMnYiR9Y7,A potentially interesting idea but 2/3 reviewers share strong concerns about the empirical results and overall clarity of the paper.,Reject,"This paper investigates a data selection framework for domain adaptation based on reinforcement learning. + +Pros: +The paper presents an approach that can dynamically adjust the data selection strategy via reinforcement learning. More specifically, the RL agent gets reward by selecting a new sample that makes the source training data distribution closer to the target distribution, where the distribution comparison is based on the feature representations that will be used by the prediction classifier. While the use of RL for data selection is not entirely new, the specific method proposed by the paper is reasonably novel and interesting. + +Cons: +The use of RL is not clearly motivated and justified (R1,R3) and the method presented in this paper is rather hard to follow might be overly complex (R1). One fair point R1 raised is more clean-cut empirical evaluation that demonstrates how RL performs clearly better than greedy optimization. The authors came back with additional analysis in Section 4.2 to address this question, but R1 feels the new analysis (e.g., Fig 3) is not clear how to interpret. A more thorough ablation study of the proposed model might have addressed the reviewer's question more clearly. In addition, all reviewers felt that baselines are not convincingly strong enough, though each reviewer pointed out somewhat different aspects of baselines. R3 is most concerned about baselines being not state-of-the-art, and the rebuttal did not address R3's concern well enough. + +Verdict: +Reject. A potentially interesting idea but 2/3 reviewers share strong concerns about the empirical results and overall clarity of the paper.",ICLR2019,4: The area chair is confident but not absolutely certain +t3TV3FqukD,1576800000000.0,1576800000000.0,1,BJlisySYPS,BJlisySYPS,Paper Decision,Reject,"The paper examines the idea that real world data is highly structured / lies on a low-dimensional manifold. The authors show differences in neural network dynamics when trained on structured (MNIST) vs. unstructured datasets (random), and show that ""structure"" can be captured by their new ""hidden manifold"" generative model that explicitly considers some low-dimensional manifold. + +The reviewers perceived a lack of actionable insights following the paper, since in general these ideas are known, and for MNIST to be a limited dataset, despite finding the paper generally clear and correct. + +Following the discussion, I must recommend rejection at this time, but highly encourage the authors to take the insights developed in the paper a bit further and submit to another venue. E.g. trying to improve our algorithms by considering the inductive bias of structure of the hidden manifold, or developing a systematic and quantifiable notion of structure for many different datasets that correlate with difficulty of training would both be great contributions.",ICLR2020, +H5Yq1pYmtj3,1610040000000.0,1610470000000.0,1,IfEkus1dpU,IfEkus1dpU,Final Decision,Reject,"Overall the review is borderline: R2 and R4 are slightly positive and R3 is slightly negative. All the reviewers like the novel shading consistency loss proposed in the paper and, improved DIP that produces consistent image decomposition inferences, and good experimental results. However, reviewers also shared concerns about speed and the thoroughness of the evaluation, and human tolerance of shading inaccuracy. These points were addressed in details in the rebuttal, and reviewers didn’t change their initial scores. + +The AC is concerned about the cut-and-paste neural rendering results. Because there are no cast shadows, the rendering doesn’t look realistic under the lighting conditions in the new image. It’s unclear that the proposed method would lead to a promising direction of copying and pasting contents into images for photorealistic editing. Consequentially, the paper is not ready for publication at its current form. +",ICLR2021, +SyfRsz8ug,1486400000000.0,1486400000000.0,1,Hy3_KuYxg,Hy3_KuYxg,ICLR committee final decision,Reject,"The area chair agrees with the reviewers that this paper is not ready for ICLR yet. There are significant issues with the writing, making it difficult to follow the technical details. Writing aside, the technique seems somewhat limited in its applicability. The authors also promised an updated version, but this version was never delivered (latest version is from Nov 13).",ICLR2017, +8_KzC5k5-fO,1642700000000.0,1642700000000.0,1,7MV6uLzOChW,7MV6uLzOChW,Paper Decision,Accept (Poster),"This paper presents a method to turn a pretrained unconditional VAE into a conditional VAE by training an encoder to predict the unconditional VAE latents given conditional input. On a variety of image tasks, the method is shown to perform competitively with GANs, yielding good sample quality and diversity, and resulting in training time that improves on direct conditional generation approaches. While the technical novelty is limited, the strong empirical results and relevance given the growing availability of pretrained unconditional models lead me to recommend accepting this paper. + +Ethics concerns have been raised for this paper. In particular, there were concerns with respect to the application of generative models, which inherit biases from the dataset, to guide medical imaging. It would be good to discuss this issue in more depth. A second point that was raised by the ethics committee is the fact that chest X-rays are usually not taken in a sequential manner. We ask the authors to either provide evidence that X-rays can be taken sequentially (one can think of situations where that's the case, e.g., X-rays of teeth in the mouth), preferably in the context of chest X-rays; if that's not possible, please highlight that the application, as described in the paper, is unrealistic (at the moment), and that it only serves as an illustration. + +The key point we therefore ask the authors to address is to ensure that the paper clearly states how realistic the application is and what potential problems may arise when using generative models in this particular domain.",ICLR2022, +rysBI1TSf,1517250000000.0,1517260000000.0,779,SJky6Ry0W,SJky6Ry0W,ICLR 2018 Conference Acceptance Decision,Reject,"PROS: +1. All the reviewers thought that the work was interesting and showed promise +2. The paper is relatively well written + +CONS: +1. Limited experimental evaluation (just MNIST) + +The reviewers were all really on the fence about this but in the end felt that while the idea was a good one and the authors were responsive in their rebuttal, the experimental evaluation needed more work. ",ICLR2018, +JPxZUZn7-pK,1610040000000.0,1610470000000.0,1,01olnfLIbD,01olnfLIbD,Final Decision,Accept (Poster),"The paper presents a scalable data poisoning algorithm for targeted attacks, using the idea of designing poisoning patterns which ""align"" the gradients of the real objective and the adversarial objective. This intuition is supported by theoretical results, and the paper presents convincing experimental results about the effectiveness of the model. + +The reviewers overall liked the paper. However, they requested a number of clarifications and some additional work, which should be incorporated in the final version (however, the authors are not required to use the wording as poison integrity/ poison availability). In particular, it would be great to see the experiment the authors suggested in their response to Reviewer 2 about the effectiveness of their method for multiple targets (this is important to better understand the limitations of the proposed approach).",ICLR2021, +Hye-0cdmeN,1544940000000.0,1545350000000.0,1,ryGDEjCcK7,ryGDEjCcK7,Well-written paper but the empirical result seem to be not fully convincing,Reject,"This paper introduces a technique called EquiNorm, which normalizes the weights of convolutional layers in order to control covariate shift. The paper is well-written and the reviewers agree that the solution idea is elegant. However, the reviewers also agree that the experiments presented in the work were insufficient to prove the method's superiority. Reviewer 2 also expressed concerns about the poor results on ImageNet, which calls into question the significance of the proposed method. ",ICLR2019,4: The area chair is confident but not absolutely certain +UDcp0KviE8,1576800000000.0,1576800000000.0,1,r1xF7lSYDS,r1xF7lSYDS,Paper Decision,Reject,"This paper presents several models for recognition-aware image enhancement. The authors propose to enhance the image quality in the presence of image degradation (e.g., low-resolution, noise, compression artifacts) as well as to improve the recognition accuracy in a joint model. While acknowledging that the paper is addressing an interesting direction, the reviewers and AC note the following potential weaknesses: presentation clarity, limited technical contributions, insufficient empirical evidence. AC can confirm all the reviewers have read the rebuttal and have contributed to the discussion. All the reviewers and AC agree that the rebuttal was informative, and the authors have partially addressed some of the concerns (e.g. additional experiments). R2 has raised the score from reject to weak reject. However, at this stage AC suggest the manuscript is below the acceptance bar and needs a major revision before submitting for another round of reviews. We hope the reviews are useful for improving and revising the paper.",ICLR2020, +JV1v6PLD3J,1576800000000.0,1576800000000.0,1,ryxF80NYwS,ryxF80NYwS,Paper Decision,Reject,"This paper uses neural amortized inference for clustering processes to automatically tune the number of clusters based on the observed data. The main contribution of the paper is the design of the posterior parametrization based on the DeepSet method. The reviewers feel that the paper has limited novelty since it mainly follows from existing methodologies. Also, experiments are limited and not all comparisons are made. ",ICLR2020, +P0xEBTu8KK,1576800000000.0,1576800000000.0,1,S1ly10EKDS,S1ly10EKDS,Paper Decision,Accept (Poster),"The paper studies the variance reduced TD algorithm by Konda and Prashanth (2015). The original paper provided a convergence analysis that had some technical issues. This paper provides a new convergence analysis, and shows the advantage of VRTD to vanilla TD in terms of reducing the bias and variance. Several of the five reviewers are expert in this area and all of them are positive about it. Therefore, I recommend acceptance of this work.",ICLR2020, +ryknMk6Hz,1517250000000.0,1517260000000.0,11,HkL7n1-0b,HkL7n1-0b,ICLR 2018 Conference Acceptance Decision,Accept (Oral),"This paper proposes a new generative model that has the stability of variational autoencoders (VAE) while producing better samples. The authors clearly compare their work to previous efforts that combine VAEs and Generative Adversarial Networks with similar goals. Authors show that the proposed algorithm is a generalization of Adversarial Autoencoder (AAE) and minimizes Wasserstein distance between model and target distribution. The paper is well written with convincing results. Reviewers agree that the algorithm is novel and practical; and close connections of the algorithm to related approaches are clearly discussed with useful insights. Overall, the paper is strong and I recommend acceptance.",ICLR2018, +iCMbtNSRp3s,1642700000000.0,1642700000000.0,1,j-63FSNcO5a,j-63FSNcO5a,Paper Decision,Accept (Poster),"The paper proposes a framework, named Disentaglement via Contrast (DisCo), to learn disentangled representations via contrastive learning on well-pretrained generative models. The method aims at simultaneously discovering semantically meaningful directions in pretrained generative models and training and encoder to extract them. The method uses contrastive learning where random samples perturbed along the discovered directions are regularised to be similar. The method is versatile and can be applied to various pretrained non-disentangled generative models including GAN, VAE, and Flow. Extensive experimental evaluation shows the benefits of the approach. + +The authors provided a strong rebuttal addressing many of the concerns raised by the reviewers, including running new experiments (such as adding the JEMMIG metric to measure disentanglement as requested by Reviewer sBQs). This led to all reviewers recommending to accept the work. + +The paper provides an exhaustive empirical evaluation testing several models and results are convincing. This was highlighted by all reviewers. + +While the high level description of the method is clear, in practice the method is quite sophisticated requiring many heuristics (e.g. entropy-based domination loss or flipping hard negatives). This requires tuning several hyperparameters and complicates the message. This is mitigated by an ablation study presented by the authors highlighting the importance of each component. This was highlighted by Reviewer j95X and the AC agrees. The paper does provide implementation details, and reproducibility is not a concern. + +Related to this point, Reviewer Go6R points out that the paper falls short in providing clear explanations on why the method is able to find meaningful semantic directions, and on where do the gains of the proposed model come from. While the paper could improve in this direction, the proposed empirical validation is convincing. + +Overall the paper presents an interesting method that performs well in practice. All reviewers recommend accepting the work. The AC agrees with this recommendation.",ICLR2022, +DPNAoVqto,1576800000000.0,1576800000000.0,1,rkxuWaVYDB,rkxuWaVYDB,Paper Decision,Reject,"This paper studies the problem of devising optimal attacks in deep RL to minimize the main agent average reward. In the white-box attack setting, optimal attacks amounts to solving a Markov Decision Process, while in black-box attacks, optimal attacks can be trained using RL techniques. Empirical efficiency of the attacks was demonstrated. It has valuable contributions on studying the adversarial robustness on deep RL. However, the current motivation and setup needs to be made clearer, and so is not being accepted at this time. We hope for these comments to help improve a future version.",ICLR2020, +BJHySJaSz,1517250000000.0,1517260000000.0,478,rJa90ceAb,rJa90ceAb,ICLR 2018 Conference Acceptance Decision,Reject,"The paper proposes a method for learning convolutional networks with dynamic input-conditioned filters. There are several prior work along this idea, but there is no comparison agaist them. Overall, experimental results are not convincing enough.",ICLR2018, +yRJsD1iIxIZW,1642700000000.0,1642700000000.0,1,06fUz_bJStS,06fUz_bJStS,Paper Decision,Reject,"This paper provides a new differentially private training method. The key idea is sparse gradient updates---that is, their variant of differentially private SGD (DP-SGD) only updates on a random subset of the parameters in each iteration. The authors argued that their method has a benefit in terms of memory and communication efficiency. The reviews suggested that the paper may require further evidence to motivate and justify the novelty of the proposed method. First, the reviewers are not fully convinced that the proposed method reduced both memory and communication. In particular, would the technique of random freeze require running DP-SGD for more iterations? Even though the authors added a new theoretical result (mostly adapted from Chen et al.), the newly added Theorem 2 does not explain the benefits of the freezing technique. Thus, the paper can benefit from more extensive theoretical analyses or justification. The authors should also consider including the additional related work brought up by the reviewers. In summary, the paper is not ready for publication at ICLR.",ICLR2022, +L4oCkxz2tI1,1642700000000.0,1642700000000.0,1,obi9EkyVeED,obi9EkyVeED,Paper Decision,Reject,"The paper introduces drop-out probabilities which are adaptive to the similarity of model parameters between clients. +The reviewers liked the idea, however missed several aspects, such as a convergence analysis or at least discussion, as well as an analysis of additional cost of the adaptive step, and finally several concerns on the strength of the experimental setup and benchmarks. + +Unfortunately consensus among the reviewers is that it remains below the bar even after the discussion phase. + +We hope the detailed feedback helps to strengthen the paper for a future occasion.",ICLR2022, +5tleqeC2Lt,1642700000000.0,1642700000000.0,1,1JN7MepVDFv,1JN7MepVDFv,Paper Decision,Reject,"This paper aims to look at the relationship between disentanglement +and multi-task learning. The authors claim to show that disentanglement +emerges naturally from MTL. + +The main discussion was whether the claim that disentanglement emerges +naturally from MTL has been adequately demonstrated. The main +issue is that MTL results in more extraction of information and that +is hard to disentangle from the disentanglement metrics used. + +Reviewers agreed the work was interesting but not as complete as would +be desirable. I also feel it is not ready for ICLR presentation, but +with further work could be a nice future contribution.",ICLR2022, +SyfO71aSz,1517250000000.0,1517260000000.0,168,BySRH6CpW,BySRH6CpW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Well written paper on a novel application of the local reprarametrisation trick to learn networks with discrete weights. The approach achieves state-of-the-art results. + +Note: I apreciate that the authors added a comparison to the Gumbel-softmax continuous relaxation approach during the review period, following the suggestion of a reviewer. This additional comparison strengthens the paper.",ICLR2018, +1QTqY2y8w1,1576800000000.0,1576800000000.0,1,BkgF4kSFPB,BkgF4kSFPB,Paper Decision,Reject,"The submission presents an approach to visual planning. The work builds on semi-parametric topological memory (SPTM) and introduces ideas that facilitate zero-shot generalization to new environments. The reviews are split. While the ideas are generally perceived as interesting, there are significant concerns about presentation and experimental evaluation. In particular, the work is evaluated in extremely simple environments and scenarios that do not match the experimental settings of other comparable works in this area. The paper was discussed and all reviewers expressed their views following the authors' responses and revision. In particular, R1 posted a detailed justification of their recommendation to reject the paper. The AC agrees that the paper is not ready for publication in a first-tier venue. The AC recommends that the authors seriously consider R1's recommendations.",ICLR2020, +_JFG7ta0io_,1642700000000.0,1642700000000.0,1,FqRHeQTDU5N,FqRHeQTDU5N,Paper Decision,Reject,"The reviewers are split about this paper and did not come to a consensus: on one hand they agreed that the paper has valuable theoretical contributions and addresses an important problem in current ML literature, on the other hand they would have liked to see empirical results on a real-world problem setting. After going through the paper and the discussion I have decided to vote to reject for the following reason: I believe the reviewers' concerns about empirical results is not just a request for applying this to more datasets (which is easy to satisfy and I don't think is grounds for rejection), but is actually for a clearer connection for how this work would be used in the machine learning problems described in the introduction and related work sections. What would really help this paper is a real-world running example, in place of the blue plus example, in Figure 1 (I think the blue plus problem is still a useful experimental tool and should be evaluated, but it doesn't clarify the real-world use-cases of this work. This led the reviewers to look to the experimental section for clarification on this, but this wasn't clarified there either. The authors' response to these concerns was an out-of-scope argument: the goal of this paper is to derive/test theoretical results, and there are a number of possible use cases we could apply this to. The authors argue that the current work sends 'a strong signal to the ICLR community that the Prover-Verifier Game is interesting and promising'. I'm sorry but I disagree here: the authors need to do more to convince the ICLR community that this is a framework that will solve outstanding problems in ML. This is solved if the authors (a) run their approach on a real-world dataset in a paper they cite in the related work, (b) they include baselines in this experiment, and (c) if they add this as a running example throughout with a figure that explains this real-world example. With these additions the paper will be a much stronger submission.",ICLR2022, +BklaVbdZgE,1544810000000.0,1545350000000.0,1,B1ggosR9Ym,B1ggosR9Ym,Better suited for another venue,Reject,"The reviewers highlighted that the application in the paper is interesting, but note a lack of new methodology, and also highlight serious flaws in the testing methodology. Specifically, the reviewers are discouraged by the straightforward reuse of Siamese networks without clear modifications. Further, the testing setup might be unfairly easy, since chemical families are represented in both training and test sets, while in true application of the method would be exposed to previously unseen chemical families. + +The authors did not participate in the discussion, and address concerns. The reviewer consensus is a rejection.",ICLR2019,5: The area chair is absolutely certain +DUkmbaQhu,1576800000000.0,1576800000000.0,1,B1ltfgSYwS,B1ltfgSYwS,Paper Decision,Reject,"The authors present a combination of few-shot learning with one-class classification model of problems. The authors use the existing MAML algorithm and build upon it to present a learning algorithm for the problem. As pointed out by the reviewers, the technical contributions of the paper are quite minimal and after the author response period the reviewers have not changed their minds. However, the authors have significantly changed the paper from its initial submission and as of now it needs to be reviewed again. I recommend authors to resubmit their paper to another conference. As of now, I recommend rejection.",ICLR2020, +nUcCSGxpTEC,1610040000000.0,1610470000000.0,1,mb2L9vL-MjI,mb2L9vL-MjI,Final Decision,Reject,"This paper empirically investigates the gradient dynamic of two-layer network nets with ReLU activations on synthetic datasets under $L^2$ loss. The empirical results show that for a specific type of initialization and less overparametrized neural nets, the gradient dynamics experience two phases: a phase that follows the random features model where all the neurons are *quenched* and another phase where there are a few *activated* neurons. As pointed out by Reviewer 1, this paper lacks mathematical support and did not distinguish between *random features model* and *neural tangent model*. Reviewer 3 and Reviewer 4 also complained that the paper is purely experimental. Therefore, this paper may benefit from proposing an at least heuristic or high-level conjecture/interpretation/argument that tries to explain the empirical results. ",ICLR2021, +SkOGV1arf,1517250000000.0,1517260000000.0,308,SkA-IE06W,SkA-IE06W,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Dear authors, + +The reviewers all appreciated your work and agree that this a very good first step in an interesting direction.",ICLR2018, +X5-r6QtWkX,1642700000000.0,1642700000000.0,1,qsZoGvFiJn1,qsZoGvFiJn1,Paper Decision,Accept (Poster),"The paper proposes a framework for object detection on lidar scans, with query of scene feature extracted offline from previous traversals. Overall there is good agreement among reviewers, with three recommending accepting the paper and one marginally accepting it -- to me the authors satisfactorily addressed most aspect raised in reviewing.",ICLR2022, +KDSvmfQs-a,1642700000000.0,1642700000000.0,1,FPCMqjI0jXN,FPCMqjI0jXN,Paper Decision,Accept (Oral),"Three experts reviewed this paper and all recommended acceptance. The reviewers liked that the work addressed a common problem in prior related work that it is hard to quantitatively evaluate slide discovery methods. Moreover, the proposed method achieves superior performance over prior arts. Based on the reviewers' feedback, the decision is to recommend the paper for acceptance. The reviewers did raise some valuable concerns, such as paper clarity, significance of the textual descriptions, that should be addressed in the final camera-ready version of the paper. The authors are encouraged to make the necessary changes to the best of their ability. We congratulate the authors on the acceptance of their paper!",ICLR2022, +B7wAguWj0_,1610040000000.0,1610470000000.0,1,xcd5iTC6J-W,xcd5iTC6J-W,Final Decision,Reject,There is consensus that the submission is not yet ready for publication. The reviews contain multiple comments and suggestions and I hope they can be useful for the authors.,ICLR2021, +uhQlKYXk3B,1576800000000.0,1576800000000.0,1,SylR-CEKDS,SylR-CEKDS,Paper Decision,Reject,"The authors explore different ways to generate questions about the current state of a “Battleship” game. Overall the reviewers feel that the problem setting is interesting, and the program generation part is also interesting. However, the proposed approach is evaluated in tangential tasks rather than learning to generate question to achieve the goal. Improving this part is essential to improve the quality of the work. + +",ICLR2020, +kXKxC_civf,1576800000000.0,1576800000000.0,1,BkxDthVtvS,BkxDthVtvS,Paper Decision,Reject,"This paper proposes a way to construct group equivariant neural networks from pre-trained non-equivariant networks. The equivarification is done with respect to known finite groups, and can be done globally or layer-wise. The authors discuss their approach in the context of the image data domain. The paper is theoretically sound and proposes a novel perspective on equivarification, however, the reviewers agree that the experimental section should be strengthened and connections with other approaches (e.g. the work by Cohen and Welling) should be made clearer. The reviewers also had concerns about the computational cost of the equivarification method proposed in this paper. While the authors’ revision addressed some of the reviewers’ concerns, it was not enough to accept the paper this time round. Hence, unfortunately I recommend a rejection.",ICLR2020, +jySC3zH1LJT,1610040000000.0,1610470000000.0,1,Shjmp-QK8Y-,Shjmp-QK8Y-,Final Decision,Reject,"This paper proposes to incorporate additional prior knowledge into transformer architectures for machine translation tasks. The definition of problem is reasonable, despite the fact that there is a long thread of work on adding knowledge of different types into neural architectures of NMT. The proposed model, however, needs to be better motivated, as to why the same thing cannot be done in a simpler way in the framework of transformers. Judging from the exposition and the experiments, the proposed model is neither novel or empirically significant enough. The writing needs to be greatly improved to get rid of the grammatical errors and notational inconsistency. + +I’d suggest to reject this paper +",ICLR2021, +r1Z2Q1pHz,1517250000000.0,1517260000000.0,217,BJij4yg0Z,BJij4yg0Z,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"I'm inclined to recommend accepting this paper, although it is borderline given the strong dissenting opinion. The revisions have addressed many of the concerns about quality, clarity, and significance. The paper gives an end to end explanation in Bayesian terms of generalization in neural networks using SGD. + +However, it is my opinion that Bayesian statistics is not, at present, a theory that can be used to explain why a learning algorithm works. The Bayesian theory is too optimistic: you introduce a prior and model and then trust both implicitly. Relative to any particular prior and model (likelihood), the Bayesian posterior is the optimal summary of the data, but if either part is misspecified, then the Bayesian posterior carries no optimality guarantee. The prior is chosen for convenience here. And the model (a neural network feeding into cross entropy) is clearly misspecified. + +However, there are ways to sidestep both these issues using a frequentist theory closely related to Bayes, which can explain generalization. Indeed, you cite a recent such paper by Dzugate and Roy who use PAC-Bayes. However, you citation is disappointingly misleading: a reader would never know that these authors are also responding to Zhang, have already proposed to explain ""broad minima"" in (PAC-)Bayesian terms, and then even get nonvacuous bounds. (The connection between PAC-bayes and marginl likelihood is explained by Germain et al. ""PAC-Bayesian Theory Meets Bayesian Inference""). Dzugate et al don't propose to explain why SGD finds such ""good"" minima. So I would say, your work provides the missing half of their argument. This work deserves more prominent placement and shouldn't be buried on page 5. Indeed, it should appear in the introduction and a proper description of the relationship should be given. ",ICLR2018, +hYshZWVBFE,1576800000000.0,1576800000000.0,1,H1gS364FwS,H1gS364FwS,Paper Decision,Reject,"This paper performs event extraction from Amharic texts. To this end, authors prepared a novel Amharic corpus and used a hybrid system of rule-based and learning-based systems. +Overall, while all reviewers admit the importance of addressing low-resource language and the value of the novel Amharic corpus, they are not satisfied with the quality of the current paper as a scientific work. +Most importantly, although the attempt of even extraction might be new on Amharic, there have been many works on other languages. It should be clearly presented what are the non-trivial language-specific challenges on Amharic and how they are solved, otherwise it seems just an engineering of existing techniques on a new dataset. Also, all reviewers are fairly concerned about the presentation and clarity of the paper. Unfortunately, no revised paper is uploaded and we cannot confirm how authors' response is reflected. For those reasons, I would like to recommend rejection. +",ICLR2020, +w7yCvVN-vG,1576800000000.0,1576800000000.0,1,ryGWhJBtDB,ryGWhJBtDB,Paper Decision,Reject,"Authors provide an empirical evaluation of batch size and learning rate selection and its effect on training and generalization performance. As the authors and reviewers note, this is an active area of research with many closely related results to the contributions of this paper already existing in the literature. In light of this work, reviewers felt that this paper did not clearly place itself in the appropriate context to make its contributions clear. Following the rebuttal, reviewers minds remained unchanged. ",ICLR2020, +D8xAU35HQKa,1642700000000.0,1642700000000.0,1,ezbMFmQY7L,ezbMFmQY7L,Paper Decision,Reject,"This work proposes to use a transformer model and language model inspired self-supervised training techniques to generate local modifications of organic molecules. The use of IUPAC names coupled with language inspired pre-training is indeed an interesting idea worthy of exploration. The paper has a lot of promises in this regard but needs more work to deliver it through the finish line. In the rebuttal, the authors have provided strong arguments toward the advantages of using IUPAC representation. While these arguments make sense, they are more or less conceptual and better and more clear empirical evidences are required to back them up.",ICLR2022, +N--KvRY8J34,1642700000000.0,1642700000000.0,1,cWlMII1LwTZ,cWlMII1LwTZ,Paper Decision,Reject,"This work considers the problem of how to predict on sensitive user points while preserving their privacy. It proposes a fairly straightforward way to create a local randomizer that optimizes loss for a given model subject to preserving LDP. The work also gives theoretical analysis of the randomizer for least squares linear regression. +The problem formulation is different from the standard LDP framework where privacy of training data points needs to be preserved. The submission does not motivate this setting and I don't see a good motivation for this problem either. More importantly, it does not sufficiently emphasize that the problem is entirely different from prior work. Indeed all reviewers were confused about various aspects of comparison with previous work. Therefore, in my opinion, the submission is not sufficiently well motivated and clearly presented to be accepted.",ICLR2022, +B12lQyarz,1517250000000.0,1517260000000.0,66,rkmu5b0a-,rkmu5b0a-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper presents an analysis of using multiple generators in a GAN setup, to address the mode-collapse problem. R1 was generally positive about the paper, raising the concern on how to choose the number of generators, and also whether parameter sharing was essential. The authors reported back on parameter sharing, showing its benefits yet did not have any principled method of selecting the number of generators. R2 was less positive about the paper, pointing out that mixture GANs and multiple generators have been tried before. They also raised concern with the (flawed) Inception score as the basis for comparison. R2 also pointed out that fixing the mixing proportions to uniform was an unrealistic assumption. The authors responded to these claims, clarifying the differences between this paper and the previous mixture GAN/multiple generator papers, and reporting FID scores. R3 was generally positive, also citing some novelty concerns similar to that of R2. I acknowledge the authors detailed responses to the reviews (in particular in response to R2) and I believe that the majority of concerns expressed have now been addressed. I also encourage the authors to include the FID scores in the final version of the paper.",ICLR2018, +Byxc9SDlgE,1544740000000.0,1545350000000.0,1,HylVB3AqYm,HylVB3AqYm,Good empirical results. Novelty is limited.,Accept (Poster),"This paper integrates a bunch of existing approaches for neural architecture search, including OneShot/DARTS, BinaryConnect, REINFORCE, etc. Although the novelty of the paper may be limited, empirical performance seems impressive. The source code is not available. I think this is a borderline paper but maybe good enough for acceptance. +",ICLR2019,4: The area chair is confident but not absolutely certain +9Rg-TPeivME,1642700000000.0,1642700000000.0,1,UseMOjWENv,UseMOjWENv,Paper Decision,Accept (Oral),"This paper proposed MIDI-DDSP, a structured hierarchical generative model which offers both detailed expressive controls (as in traditional synthesizers) as well as the realistic audio quality (as in black-box neural audio synthesis). Overall the reviews are very positive. All the reviewers unanimously agree that the paper is very well-written and presented a very convincing model and a meaningful step-up from the earlier work of DDSP. The authors also presented a well-documented website for the project and promised to release the source code. The reviewers raised some clarifying questions and minor corrections which the authors addressed during the response. Therefore, I vote for accept.",ICLR2022, +P0VAa99gwo_D,1642700000000.0,1642700000000.0,1,57PipS27Km,57PipS27Km,Paper Decision,Accept (Spotlight),"This paper addresses a continuous-time formulation of gradient-based meta-learning (COMLN) where the adaptation is the solution of a differential equations. In general, outer loop optimization requires backpropagating over trajectories involving gradient updates in the inner loop optimization. It is claimed that one of main advantages of COMLN is able to compute the exact meta-gradients in a memory-efficient way, regardless of the length of adaptation trajectory. To this end, the forward-mode differentiation is used, with exploiting the Jacobian matrix decomposition. All the reviewers agree that the derivation of memory-efficient forward-mode differentiation is a significant contribution in the few-shot learning. The paper is well written and has interesting contributions. Authors did a good job in responding to reviewers’ comments during the discussion period. What is missing in this paper is the discussion of some limitations of the proposed method. This can be improved in the final version. All reviewers agree to champion this paper. Congratulations on a nice work.",ICLR2022, +xm72BYwlEBT,1642700000000.0,1642700000000.0,1,ZUinrZwKnHb,ZUinrZwKnHb,Paper Decision,Reject,"This paper proposes a bottom-up multi-person pose estimation method using a Transformer model. There is consensus among the reviewers that this paper is not ready for acceptance/publication. Although some reviewers find the proposed idea interesting (some find it lacking novelty though), all the reviewers agree that the quantitative experimental results are not promising. Some reviewers explicitly criticized lacking empirical accuracy compared to state-of-the-arts. The authors provided additional details and results in the rebuttal, but they were not sufficient to change the opinions of the reviewers. + +We recommend rejecting the paper.",ICLR2022, +MdGTMh6oilx,1642700000000.0,1642700000000.0,1,f-KGT01Qze0,f-KGT01Qze0,Paper Decision,Reject,"The paper proposes a data augmentation approach that extends Mixup with high- and low-pass filtering operations, in order to regularize deep networks towards focusing on low frequency components of the input signal. Reviewers are unconvinced about the significance of the contribution. Reviewer 5zdd notes that the method does not improve over standard Mixup in the absence of corruption error. Reviewer 3E2o notes that ""the idea of spectral mixing itself is not particularly novel"", and also asks for ablation studies concerning the hyperparameters of the method; the author response unfortunately does not provide enough detail on ablation experiments. The AC agrees with the reviewers and does not believe the author response has addressed weaknesses in a satisfactory manner.",ICLR2022, +#NAME?,1642700000000.0,1642700000000.0,1,B3Nde6lvab,B3Nde6lvab,Paper Decision,Accept (Poster),"Motivated by empirical observations that SGD performed on deep networks converge to regions of flatter loss curvature relative to large or full batch GD, the authors perform a theoretical analysis of trajectories of SGD with the presence of heavy tailed noise. The primary observation of the theory is that heavy tailed noise has a higher probability of ""kicking"" the current parameters to a new region of the input space, which has some probability of lying in a sharper region. However, it's important to note that in this analysis SGD with heavy tailed noise doesn't stay in the sharp regions, but will eventually be kicked back out of it back to other regions. In a sense, this defines a transition graph which predicts that the steady state distribution should spend some fraction of time in different regions of the input space (and different sharpness) while never ""converging"" anywhere. This is shown most clearly in Figure 1 top center where the heavy tailed SGD randomly jumps between different regions of the input space throughout the entire training trajectory. Experiments are then run on deep networks showing that heavy-tailed SGD with gradient clipping converges to regions of flatter curvature. + +Reviews of the work were generally positive, the theory is well presented and Figure 1 does a solid job demonstrating the main idea. The primary criticism was raised by reviewer HGyL, arguing that the results should be largely irrelevant to deep learning. Most of the debate between this reviewer and the authors centered around whether or not ReLU networks have minima which extend off to infinity. The AC will not dig into the details of the argument. It seems clear, however, that if there were a deep learning workload with heavy tailed noise that the authors results will have some relevancy, though the exact nature of the resulting transition graph may have a complicated dependence on the loss surface. Unfortunately the authors were unable to find a such a workload in image classification (there is some prior work suggesting the NLP models with rare tokens may be a better fit) and so needed to artificially induce heavy tailed noise to test their theory. This is a bit of a limitation, but given the clear writing and interesting experiments as noted by reviewers the work seems worth accepting. The AC strongly urges the authors though to include a more lengthy discussed of Wu. et. al. as that work seems to agree with experiment of the sharpness of stable regions selected by SGD when run on deep models without heavy tailed noise.",ICLR2022, +wndvmQFK93i,1642700000000.0,1642700000000.0,1,IhkSFe9YqMy,IhkSFe9YqMy,Paper Decision,Reject,"The paper proposes an interesting way of prioritizing samples in replay that is compatible with many RL methods. It is evaluated experimentally on different tasks and with different RL algorithms. + +The reviewers highly appreciated the revised paper and the detailed replies and discussions. +While this iteration improved the paper substantially, it is still not ready for publication in its current form. In particular: +- The paper is still not self-contained enough +- The reviewers are still not convinced about the statistical significance +- More tasks should be added +- PER needs to be added as a baseline +The authors promised those changes for the final version, but those are so substantive that the paper will need to go thorough another complete review cycle. Hence, we'd like to encourage the authors to re-submit at a different venue. + +P.S.: Careful with double-blind submissions, acknowledgements should not be included.",ICLR2022, +I98PzGatAbx,1642700000000.0,1642700000000.0,1,inA3szzFE5,inA3szzFE5,Paper Decision,Reject,"The reviewers all generally appreciated the idea in the paper. However, the nature of this contribution necessitates an empirical evaluation, and the reviewers generally found this to not be sufficient convincing. My assessment is that this idea can likely result in a successful publication, but will require additional empirical evaluation and analysis as suggested by reviewers. While the authors did add some additional results during the response period, they do not seem to be sufficient to fully address reviewer concerns.",ICLR2022, +0q80eRUwlH,1610040000000.0,1610470000000.0,1,hx0D7wn6qIy,hx0D7wn6qIy,Final Decision,Reject,"The paper proposes an approach that generates pseudo-labels along with confidence to help semi-supervised learning. Then, selected pseudo-labels are used to update the model. Moreover, the authors include a variation of mixup for data augmentation to train a more calibrated model. Experimental results justify the validity of the proposed approach. + +Several reviewers believe that the paper is somewhat well-written. The main concern is on the novelty of the work. In particular, many works have discussed selected treatment of unlabeled data, data augmentation for semi-supervised learning, and label confidence estimation. Those works deserve more discussions/comparisons. The paper can also be improved with deeper experimental studies that better justify the main assumption and merits of the proposed approach. +",ICLR2021, +SkxCMF7Fg4,1545320000000.0,1545350000000.0,1,rke41hC5Km,rke41hC5Km,metareview,Reject,"The reviewers raised a number of major concerns including the incremental novelty of the proposed (WGANs are applied to a new domain), and, most importantly, insufficient and unconvincing experimental evaluation presented (including the lack of comparative studies). The authors’ rebuttal failed to fully alleviate reviewers’ concerns. Hence, I cannot suggest this paper for presentation at ICLR.",ICLR2019,5: The area chair is absolutely certain +bqtUAvmNo8,1576800000000.0,1576800000000.0,1,ryxB2lBtvH,ryxB2lBtvH,Paper Decision,Accept (Poster),"This paper deals with multi-agent hierarchical reinforcement learning. A discrete set of pre-specified low-level skills are modulated by a conditioning vector and trained in a fashion reminiscent of Diversity Is All You Need, and then combined via a meta-policy which coordinates multiple agents in pursuit of a goal. The idea is that fine control over primitive skills is beneficial for achieving coordinated high-level behaviour. + +The paper improved considerably in its completeness and in the addition of baselines, notably DIAYN without discrete, mutually exclusive skills. Reviewers agreed that the problem is interesting and the method, despite involving a degree of hand-crafting, showed promise for informing future directions. + +On the basis that this work addresses an interesting problem setting with a compelling set of experiments, I recommend acceptance.",ICLR2020, +dAFeMtS9T8y,1642700000000.0,1642700000000.0,1,Dl4LetuLdyK,Dl4LetuLdyK,Paper Decision,Accept (Oral),"The paper proposes a general framework to reason about fine-grained distribution shifts, evaluating a large set of different approaches in a variety of settings. All reviewers recommend acceptance. While concerns were raised, including questions about the generality of the framework, unsurprising “tips”, and unclear take-home messages, all reviewers find the work strong, with an elegant formulation, and useful insights. The AC agrees with the reviewers that this work addresses a very important problem, proposes an interesting unified framework and benchmark for domain shift analysis, and should be a valuable tool for the community to pursue further research in this area.",ICLR2022, +fHagPsJ-vD,1576800000000.0,1576800000000.0,1,r1xMnCNYvB,r1xMnCNYvB,Paper Decision,Reject,"The paper is about a software library that allows for relatively easy simulation of molecular dynamics. The library is based on JAX and draws heavily from its benefits. + +To be honest, this is a difficult paper to evaluate for everyone involved in this discussion. The reason for this is that it is an unconventional paper (software) whose target application centered around molecular dynamics. While the package seems to be useful for this purpose (and some ML-related purposes), the paper does not expose which of the benefits come from JAX and which ones the authors added in JAX MD. It looks like that most of the benefits are built-in benefits in JAX. Furthermore, I am missing a detailed analysis of computation speed (the authors do mention this in the discussion below and in a sentence in the paper, but this insufficient). Currently, it seems that the package is relatively slow compared to existing alternatives. + +Here are some recommendations: +1. It would be good if the authors focused more on ML-related problems in the paper, because this would also make sure that the package is not considered a specialized package that overfits to molecular dynamics. +2. Please work out the contribution/delta of JAX MD compared to JAX. +3. Provide a thorough analysis of the computation speed +4. Make a better case, why JAX MD should be the go-to method for practitioners. + +Overall, I recommend rejection of this paper. A potential re-submission venue could be JMLR, which has an explicit software track.",ICLR2020, +XsRIyxQJzk,1576800000000.0,1576800000000.0,1,SklfY6EFDH,SklfY6EFDH,Paper Decision,Reject,"The reviewers found the aim of the paper interesting (to connect representation quality with adversarial examples). However, the reviewers consistently pointed out writing issues, such as inaccurate or unsubstantiated claims, which are not appropriate for a scientific venue. The reviewers also found the experiments, which are on simple datasets, unconvincing.",ICLR2020, +rkUNNy6Bf,1517250000000.0,1517260000000.0,331,H1sUHgb0Z,H1sUHgb0Z,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper provides an important discussion about the relationship between training efficiency and label redundancy. The updates to the paper will improve the paper further. Reviewers found the paper interesting, well written, and addresses and important problem.",ICLR2018, +mTKuvbsl5u4,1642700000000.0,1642700000000.0,1,vxlAHR9AyZ6,vxlAHR9AyZ6,Paper Decision,Reject,"This manuscript proposes and analyses a weighting approach to improve the conformance of adversarial training in federated learning. The authors observe that adversarial training seems to degrade during the late stages of training, and suggest that this degradation is a consequence of exacerbated cross-device bias in federated averaging. They suggest and analyze a weighted scheme to fix this issue. + +During the review, the main concerns are related to the novelty of the work compared to existing work, the clarity of the technical contributions, and unclear technical statements. The authors respond to these concerns and partially satisfy the reviewers. After discussion, reviewers remain mixed, with multiple weak rejects and one strong accept. No fatal flaws are noted. + +The opinion of the area chair is that while there are no fatal flaws, there is very limited enthusiasm for this paper. This limited enthusiasm seems to be a result of intuition for observed phenomena that seem incorrect or insufficient to reviewers. Overall, I think this paper outlines and addresses an interesting issue of real concern. Flaws in the intuition building/explanation, and issues with clarity of presentation need to be improved for this work to have some impact.",ICLR2022, +YGG3wayhq-,1576800000000.0,1576800000000.0,1,HJewxlHFwH,HJewxlHFwH,Paper Decision,Reject,"While the reviewers generally appreciated the ideas presented in the paper and found the overall aims and motivation of the paper to be compelling, there were too many questions raised about the experiments and the soundness of the technical formulation to accept the paper at this time, and the reviewers did not feel that the authors had adequately addressed these issues in their responses. The main concerns were (1) with the correctness and rigor of the technical derivation, which the reviewers generally found to be somewhat questionable -- while the main idea seems reasonable, the details have a few too many question marks; (2) the experimental results have a number of shortcomings that make it difficult to fully understand whether the method really works, and how well.",ICLR2020, +TGt6hB_MHN,1576800000000.0,1576800000000.0,1,Hke0lRNYwS,Hke0lRNYwS,Paper Decision,Reject,"This paper proposes to reintroduce bipartite attractor networks and update them using ideas from modern deep net architectures. + +After some discussions, all three reviewers felt that the paper did not meet the ICLR bar, in part because of an insufficiency of quantitative results, and in part because the extension was considered pretty straightforward and the results unsurprising, and hence it did not meet the novelty bar. I therefore recommend rejection. ",ICLR2020, +o0xQT4uPRpZ,1610040000000.0,1610470000000.0,1,KxUlUb26-P3,KxUlUb26-P3,Final Decision,Reject,"This paper first makes the observation that incidental supervisory data can be used to define a new prior from which to calculate a PAC-Bayes generalization guarantee. This observation can be applied to any setting where there is unsupervised or semi-supervised pre-training followed by fine-tuning on labeled data. The PAC-Bayes bound is valid when applied to the fine-tuning. For example, one could use an L2 bound (derived from PAC-Bayes) on the difference between the fine-tuned parameters and pre-trained parameters. + +But the paper proposes evaluating the value of pre-training before looking at any labeled data. Let $\pi_0$ be the prior before unsupervised or semi-supervised training and let $\tilde{\pi}$ be the prior after pre-training. The paper proposes using the entropy ratio $H(\pi_0)/H(\tilde{\pi})$ as a measure of the value of the pre-training. As the reviewers note, this is not really related to PAC-Bayes bounds. Furthermore, it is clearly possible that the pre-training greatly focuses the prior but in a way that is detrimental to learning the task at hand. + +I have to side with the reviewers that feel that this is below threshold.",ICLR2021, +zvhfGspnu1,1576800000000.0,1576800000000.0,1,B1eZYkHYPS,B1eZYkHYPS,Paper Decision,Reject,"The proposed algorithm is found to be a straightforward extension of the previous work, which is not sufficient to warrant publication in ICLR2020.",ICLR2020, +5qUMV1MbX5,1576800000000.0,1576800000000.0,1,rJlTXxSFPr,rJlTXxSFPr,Paper Decision,Reject,"This paper provides a method (loss function) for training GAN model for generation of discrete text token generation. The aim of this loss method to control the trade off between quality vs diversity while generating the text data. + +The paper is generally well written, but the experimental section is not overly good: Interpretation of the results is missing; error bars are missing. ",ICLR2020, +SyXhU1TrG,1517250000000.0,1517260000000.0,867,HJOQ7MgAW,HJOQ7MgAW,ICLR 2018 Conference Acceptance Decision,Reject,"The paper performs an ablation analysis on LSTM, showing that the gating component is the most important. There is little novelty in the analysis, and in its current form, its impact is rather limited.",ICLR2018, +9KqOYKW7Eu9,1642700000000.0,1642700000000.0,1,tG8QrhMwEqS,tG8QrhMwEqS,Paper Decision,Reject,"The paper proposes a strategy for incrementally pruning deep learning models based on activation values. The approach can satisfy different kinds of requirements, trading off between accuracy and sparsity. + +The approach seems promising and seems to have competitive performance. However, the method is described by reviewers as a combination of ideas that have been proposed in the literature, and the experimental evaluation relies too much on a dataset considered too small to be reliable in such experiments --- CIFAR10. We do not expect substantial experiments within the rebuttal period: such comparisons with relevant SOTA methods should have been present in the submission. Moreover, the strategy proposed for selecting a threshold seems to rely on some doubtful assumptions, and there are no benchmarks on actual runtime. + +The writing has improved based on reviewer input, and the reviewers are satisfied with this aspect. I would still add that I would prefer some clarity in the method presentation: is there a quantity being optimized? is there a value we can monitor to ensure our reimplementation is correct? etc. In addition I would like to ask authors in the next revision to be mindful to the difference between `\citet` and `\citep` in author-year citations -- see e.g. the first two ones in 3.1.",ICLR2022, +Hj9jnTeJCq,1642700000000.0,1642700000000.0,1,S3qhbZwzq3H,S3qhbZwzq3H,Paper Decision,Reject,"The paper propose a value-aware transformer for sparse multivariate time series data. While to approach is well motivated and the problem well-motivated from a clinical viewpoint, the comparison with related work brought up by reviewer qFRi and reviewer ph4X would really make it clear where this paper stands. The authors attempt to diffuse this issue in their replies, but empirical comparisons in the paper would guide practitioners more. This is especially important as the paper is motivated by a real-world problem.",ICLR2022, +luJckO2TdF4,1610040000000.0,1610470000000.0,1,fgpXAu8puGj,fgpXAu8puGj,Final Decision,Reject,"This paper considers the problem of searching over the joint space of hardware and neural architectures to trade-off accuracy and latency. + +Reviewers raised some valid questions about the following aspects: +1. Low technical novelty +2. Prior work on hardware and neural architecture co-design, and closely related work are not addressed +3. Lacking details on hardware platform and discussion on physical constraints to determine invalid hardware designs (addressed somewhat, but the response is not satisfactory) + +One additional comment: if we care about latency for a particular hardware platform, it is possible to automatically configure adaptive inference techniques to meet the latency constraints. + +Overall, my assessment is that the paper requires more work before it is ready for publication.",ICLR2021, +LJGXz1U5RwJ,1610040000000.0,1610470000000.0,1,6YEQUn0QICG,6YEQUn0QICG,Final Decision,Accept (Poster),"The paper addresses the problem of batch normalization (BN) in federated learning, which is of great interest to the community including practitioners. The proposed method here simply excludes the BN parameters from the aggregation, and evolves them locally. + +As a main contribution, reviewers particularly liked the solid justification of the proposed scheme, both with substantial theory and extensive experiments. Presentation style can be slightly improved, the usage at test time can be clarified more, and some references mentioned by R3 should be added, but this overall does not affect the strong level of contributions present in the work, and the discussion phase with the authors was already constructive.",ICLR2021, +A_JEcD8dgh,1642700000000.0,1642700000000.0,1,9xhgmsNVHu,9xhgmsNVHu,Paper Decision,Accept (Poster),The reviewers unanimously appreciated the quality of the experiments. The main point raised was about the related work by Wang et al. but that was addressed by the authors in the rebuttal. I thus encourage the authors to make sure that discussion is reflected in the final version of their work.,ICLR2022, +r0jLXkBdG6JB,1642700000000.0,1642700000000.0,1,WPI2vbkAl3Q,WPI2vbkAl3Q,Paper Decision,Accept (Poster),"The work presented in this study gives a theoretical finite-sample generalisation performance of stochastic gradient descent on linear models, for different batch-sizes and feature structures. This approach enable the authors to predict the training and test losses of neural networks on real data. + +While there were some parts that were initially mis-understood by some reviewers in the initial version of the papers, the extensive discussions between the authors and the reviewers led to several updates, both in the reference to prior work, but also in the presentation clarity. The wide impact and relevance to ICLR of this type of contribution made us recommend this work for acceptance at ICLR.",ICLR2022, +0mdh_C8Rn,1576800000000.0,1576800000000.0,1,rkg1ngrFPr,rkg1ngrFPr,Paper Decision,Accept (Poster),"I've gone over this paper carefully and think it's above the bar for ICLR. + +The paper proves a relationship between the eigenvalues of the Fisher information matrix and the singular values of the network Jacobian. The main step is bounding the eigenvalues of the full Fisher matrix in terms of the eigenvalues and singular values of individual blocks using Gersgorin disks. The analysis seems correct and (to the best of my knowledge) novel, and relationships between the Jacobian and FIM are interesting insofar as they give different ways of looking at linearized approximations. The Gersgorin disk analysis seems like it may give loose bounds, but the analysis still matches up well with the experiments. + +The paper is not quite as strong when it comes to relating the anslysis to optimization. The maximum eigenvalue of the FIM by itself doesn't tell us much about the difficulty of optimization. E.g., if the top FIM eigenvalue is increased, but the distance the weights need to travel is proportionately decreased (as seems plausible when the Jacobian scale is changed), then one could make just as fast progress with a smaller learning rate. So in this light, it's not too surprising that the analysis fails to capture the optimization dynamics once the learning rates are tuned. But despite this limitation, the contribution still seems worthwhile. + +The writing can still be improved. + +The claim about stability of the linearization explaining the training dynamics appears fairly speculative, and not closely related to the analysis and experiments. I recommend removing it, or at least removing it from the abstract. +",ICLR2020, +sTAzO-WkbD,1576800000000.0,1576800000000.0,1,BkghKgStPH,BkghKgStPH,Paper Decision,Reject,"The paper adapts a previously proposed modular deep network architecture (SHDL) for supervised learning in a continual learning setting. One problem in this setting is catastrophic forgetting. The proposed solution replays a small fraction of the data from old tasks to avoid forgetting, on top of a modular architecture that facilitates fast transfer when new tasks are added. The method is developed for image inputs and evaluated experimentally on CIFAR-100. + +The reviews were in agreement that this paper is not ready for publication. All the reviews had concerns about the lack of explanation of the proposed solution and the experimental methods. The reviewers were concerned about the choice of metrics not being comparable or justified: Reviewer4 wanted an apples-to-apples comparison, Reviewer1 suggested the paper follow the evaluation paradigm used in earlier papers, and Reviewer2 described the absence of an explained baseline value. Two reviewers (Reviewer4 and Reviewer2) described the lack of details on the parameters, architecture, and training regime used for the experiments. The paper did not not justify which aspects of the modular system contributed to the observed performance (Reviewer4 and Reviewer1). Several additional concerns were also raised. + +The authors did not respond to any of the concerns raised by the reviewers. +",ICLR2020, +rkx_gGEMxN,1544860000000.0,1545350000000.0,1,HJglg2A9FX,HJglg2A9FX,problems with experiments and assumptions; post-deadline revision too large,Reject,"This paper addresses the problem of learning with outliers, which many reviewers agree is an important direction. However, reviewers point to issues with the experiments (missing baselines, ablations, etc.) and are concerned that the assumptions in the theoretical analysis are too strong. These were potentially addressed in a revised version of the paper, but the revisions are so major that I do not think it is appropriate to consider them in the review process (and it is hard to assess to what extent they address the issues without asking reviewers to do a thorough re-appraisal, which goes beyond the scope of their duties). I encourage the authors to take reviewer comments into account and prepare a more polished version of the manuscript for future submission.",ICLR2019,5: The area chair is absolutely certain +MA2OHJj-7FH,1610040000000.0,1610470000000.0,1,rRFIni1CYmy,rRFIni1CYmy,Final Decision,Accept (Poster),"In this paper, the authors combine ideas from SLAM (using an Extended Kalman Filter and a state with nonlinear transitions and warping) and differentiable memory networks that store a spherical representation of the state (from the ego-centric point of view of an RL agent moving in an environment) with depth and visual features stored at each pixel and dynamics transitions corresponding to warping. + +The main idea in the paper is very simple and elegant, but I will concur with the reviewers that the writing of the first version of the paper was extremely hard to understand and that the experimental section was too dense. Two subsequent revisions of the paper have dramatically improved the paper. + +Given the spread of scores (R1: 6, R2: 7 and R3: 4) and the fact that only R1 and R2 have acknowledged the revisions, I will veer towards acceptance. +",ICLR2021, +lYR00S88maP,1642700000000.0,1642700000000.0,1,UcDUxjPYWSr,UcDUxjPYWSr,Paper Decision,Accept (Oral),"The paper considers the problem of learning both the physical design (morphology and parameters) of a robot together with the corresponding control policy to optimize performance at a target task. Unlike several contemporary methods that formulate this as two separate, but coupled, optimization problems, the paper unifies these decisions into a single decision-making framework. More specifically, a conditional policy learns to first change an agent's physical design (i.e., the morphology/skeletal structure and its associated parameters), and then to control the design. The policy is formulated as a graph neural network, enabling a single policy to simultaneously control robots with different morphologies (and, in turn, different action spaces). Experimental results demonstrate that the approach outperforms recent baselines on a variety of simulated control tasks. + +The paper considers an interesting and challenging problem, that of jointly optimizing an agent's physical design and its control policy, an area of research that has received renewed attention of-late. As the reviewers note, the idea of treating design and control in the context of a single decision-making process is novel. The approach is principled and the experimental results largely justify the significance of the contributions. The reviewers agree that the approach is described clearly and that the paper is well written. The reviewers initially raised a few concerns regarding the experimental evaluation, including the desire for more in-depth evaluations and the need for more random seeds. They also questioned some of the claims made in the initial submission. The authors provided a detailed response to each of these points and made changes to the paper to resolve most of the concerns. + +In summary, the paper proposes a novel approach to an interesting problem with convincing results.",ICLR2022, +CqtQCrzRPCh,1642700000000.0,1642700000000.0,1,agBJ7SYcUVb,agBJ7SYcUVb,Paper Decision,Reject,"This paper presents a package for ""Dynamic Fine-grained Structured Sparse Attention Mechanism"" (DFSSATTEN), which aims to improve the computational efficiency of attention mechanisms by leveraging the specific sparse pattern supported by sparse tensor cores of NVIDIA A100. DFSSATTEN shows theoretical and empirical advantage in terms of performance and speedup compared to various baselines, with 1.27~1.89x speedup over the vanilla attention network across different sequence lengths. + +Reviewers praised the simplicity of the method and the clean code implementation. Speeding up attention mechanisms is an important problem is leveraging sparse tensor cores for attention speedup is a sensible idea. The practical speedups are significant (1.27~1.89x over the vanilla attention across different sequence lengths). However, they also pointed out some weaknesses: the fact that the proposed method is very specific to the particular sparse pattern offered by NVIDIA A100, and not easily generalizable to other future hardware; the fact that the method focuses on inference acceleration and not training from scratch (not completely clear in the paper), which limits its scope; and the fact that the method still has O(N^2) complexity (it still requires the computation of QK^T, which has quadratic memory and computation cost), and therefore it does not really address the quadratic bottleneck of transformers, unlike other existing work in efficient transformers for long sequences. + +I tend to agree with the reviewers and, even though the package can be potentially useful to other researchers, the scope seems limited and the paper seems a bit thin to deserve publication at ICLR. + +Other comments and suggestions: +- When talking about linear transformers, you should cite [1], which predates Performers +- It is not clear to me why 1:2 and 2:4 are called ""fine-grained *structured* sparsity"" +- Citations for the systems in Tab 4 are missing +- When comparing to other methods, it would be include to include their Pareto curves since those methods have tradeoffs in terms of sparsity / approximation error (or downstream accuracy). + +[1] Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention. Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, François Fleuret (https://arxiv.org/abs/2006.16236)",ICLR2022, +Xr8Y-NwyZi,1576800000000.0,1576800000000.0,1,S1ldO2EFPr,S1ldO2EFPr,Paper Decision,Accept (Spotlight),"The paper provides a theoretical analysis of graph neural networks, as the number of layers goes to infinity. For the graph convolutional network, they relate the expressive power of the network with the graph spectra. In particular for Erdos-Renyi graphs, they show that very deep graphs lose information, and propose a new weight normalization scheme based on this insight. + +The authors responded well to reviewer comments. It is nice to see that the open review nature has also resulted in a new connection. Unfortunately one of the reviewers did not engage further in the discussion with respect to the author rebuttals. + +Overall, the paper provides a nice theoretical analysis of a widely used graph neural network architecture, and characterises its behaviour on a popular class of graphs. The fact that the theory provides a new approach for weight normalization is a bonus.",ICLR2020, +tDgK7z-ny8,1576800000000.0,1576800000000.0,1,rJeU_1SFvr,rJeU_1SFvr,Paper Decision,Reject,"The authors propose to overcome challenges in GAN training through latent optimization, i.e. updating the latent code, motivated by natural gradients. The authors show improvement over previous methods. The work is well-motivated, but in my opinion, further experiments and comparisons need to be made before the work can be ready for publication. + +The authors write that ""Unfortunately, SGA is expensive to scale because computing the second-order derivatives with respect to all parameters is expensive"" and further ""Crucially, latent optimization approximates SGA using only second-order derivatives with respect to the latent z and parameters of the discriminator and generator separately. The second-order terms involving parameters of both the discriminator and the generator – which are extremely expensive to compute – are not used. For latent z’s with dimensions typically used in GANs (e.g., 128–256, orders of magnitude less than the number of parameters), these can be computed efficiently. In short, latent optimization efficiently couples the gradients of the discriminator and generator, as prescribed by SGA, but using the much lower-dimensional latent source z which makes the adjustment scalable."" + +However, this is not true. Computing the Hessian vector product is not that expensive. In fact, it can be computed at a cost comparable to gradient evaluations using automatic differentiation (Pearlmutter (1994)). In frameworks such as PyTorch, this can be done efficiently using double backpropagation, so only twice the cost. Based on the above, one of the main claims of improvement over existing methods, which is furthermore not investigated experimentally, is false. + +It is unacceptable that the authors do not compare with SGA: both in terms of quality and computational cost since that is the premise of the paper. The authors also miss recent works that successfully ran methods with Hessian-vector products: https://arxiv.org/abs/1905.12103 https://arxiv.org/abs/1910.05852",ICLR2020, +L4pCe0ZYGT,1610040000000.0,1610470000000.0,1,TMUR2ovJfjE,TMUR2ovJfjE,Final Decision,Reject,"The paper considers generalization in setups in which the training sample +may be generated by a different distribution than the one genertaing the test data. +This sounds much like transfer learning, and similarly sounding considerations, +of a space of possible generating distributions, ways of measuring the statictical complexity +of such spaces and implied error generalization results were analyzed in e.g., +Jonathan Baxter's ""Theoretical models of learning to learn"" 1998 and +S Ben-David, R Schuller ""Exploiting task relatedness for multiple task learning"" +S Ben-David, RS Borbely ""A notion of task relatedness yielding provable multiple-task learning guarantees"" +Machine learning 73 (3), 273-287 + +The current submission does not mention these earlier works. + +Furthermore, the paper suffers from mathematical sloppiness. The model uder which the generalization theorems +hold is not clearly defined. For example, Theorem 2, Theorem 3 and Theorem 4 do not stae what are the probability spaces to which the ""probaility p > 1-\delta"" quantifications refer. + + +",ICLR2021, +Eqx_KLJF4Qi0,1642700000000.0,1642700000000.0,1,2hMEdc35xZ6,2hMEdc35xZ6,Paper Decision,Reject,"The paper proposes a GAN based method for synthesizing various types of defects as foreground on different product images (background). The method builds upon StarGANv2, and adds the cycle/content consistency loss and classification loss between foreground and background. While the paper considers an important problem/application, the reviewers found it lacking sufficient novelty for publication. The paper will be more suited for publication at an application oriented venue.",ICLR2022, +HyY4hM8Oe,1486400000000.0,1486400000000.0,1,H1GEvHcee,H1GEvHcee,ICLR committee final decision,Reject,"This paper identifies a joint distribution for an RBM variant based on leaky-ReLU activations. It also proposes using a sequence of distributions, both as an annealing-based training method, or to estimate log(Z) with AIS. + + This paper was borderline. While there is an interesting idea, the reviewers weren't generally as excited by the work as for other papers. One limitation is that unit-variance Gaussian RBMs aren't a strong baseline for comparison, although that is the focus of the main body of the paper. An update to the paper has results for binary visibles in an appendix, although I'm not sure exactly what was done, if the results are comparable, or if there is a large cost of projection here.",ICLR2017, +HJlqtuYllE,1544750000000.0,1545350000000.0,1,S1lqMn05Ym,S1lqMn05Ym,intuitive idea & theoretical connections; solid experimental results,Accept (Poster),"Strengths + +The paper introduces a promising and novel idea, i.e., regularizing RL via an informationally asymmetric default policy +The paper is well written. It has solid and extensive experimental results. + +Weaknesses + + +There is a lack of benefit on dense-reward problems as a limitation, which the authors further +acknowledge as a limitation. There also some similarities to HRL approaches. +A lack of theoretical results is also suggested. To be fair, the paper makes a number of connections +with various bits of theory, although it perhaps does not directly result in any new theoretical analysis. +A concern of one reviewer is the need for extensive compute, and making comparisons to stronger (maxent) baselines. +The authors provide a convincing reply on these issues. + +Points of Contention + +While the scores are non-uniform (7,7,5), the most critical review, R1(5), is in fact quite positive on many +aspects of the paper, i.e., ""this paper would have good impact in coming up with new +learning algorithms which are inspired from cognitive science literature as well as mathematically grounded."" +The specific critiques of R1 were covered in detail by the authors. + +Overall + +The paper presents a novel and fairly intuitive idea, with very solid experimental results. +While the methods has theoretical results, the results themselves are more experimental than theoretic. +The reviewers are largely enthused about the paper. The AC recommends acceptance as a poster. +",ICLR2019,5: The area chair is absolutely certain +yt03zUuVYZY,1642700000000.0,1642700000000.0,1,bERaNdoegnO,bERaNdoegnO,Paper Decision,Accept (Spotlight),"The paper presents improvements to AlphaZero and MuZero for settings where one is restricted in the number of rollouts. The initial response from reviewers was generally favorable but the reviewers wanted more details and clarifications of multiple parts of the paper, and further intuition about the Gumbel distribution. The authors’ responses were detailed and convinced or maintained strong positive support of most reviewers. The authors also stated that they plan to provide a release of the code and also provided a policy improvement proof. Overall this is an interesting approach that is likely to be of significant interest to many.",ICLR2022, +ryeQvoYbx4,1544820000000.0,1545350000000.0,1,HyGySsAct7,HyGySsAct7,Limited novelty compared to previous works,Reject,"The authors propose an algorithm for generating adversarial examples for ASR systems treating them as black boxes. + +Strengths +- One of the early works to demonstrate black box attacks on ASR system that recognize phrases instead of isolated words. + +Weaknesses +- The approach assumes that the logits are available, which may not be realistic for most ASR systems when they are used in practice -- typically only the final transcription is available. +- Although the technique is applied to continuous speech, algorithmic improvements over prior work of Alzanot et al. is minimal. +- Evaluation is weak. For example, cross correlation cannot completely capture the adversarial nature of a generated audio sample. +- The authors use a genetic algorithm for generating new set of examples which are pruned and mutated. It’s not clear what guarantees exist that the algorithm will eventually succeed. + +The reviewers agree that the presented work puts forth an interesting research direction. But given the deficiencies of the current submission as pointed out by the reviewers, the recommendation is to reject the paper.",ICLR2019,5: The area chair is absolutely certain +yv4bQGf4pU37,1642700000000.0,1642700000000.0,1,y0VvIg25yk,y0VvIg25yk,Paper Decision,Accept (Poster),"The reviewers agree that the paper is addressing an interesting problem, and provides a valuable contribution for the learning of quasimetrics and would be useful for many real world applications.",ICLR2022, +6atETMnvzg0,1642700000000.0,1642700000000.0,1,VPjw9KPWRSK,VPjw9KPWRSK,Paper Decision,Accept (Poster),"This paper presents a method for inference in state-space models with non-linear dynamics and linear-Gaussian observations. Instead of parameterizing a generative model, the paper proposes to parameterize the conditional distribution of current latent states given previous latent states and observations using locally linear transitions, where the parameters of the linear mappings are given by neural networks. Under fairly standard conditionally-independence assumptions, the paper uses known Bayesian filtering/smoothing tricks to derive a recursive estimation algorithm and a parameter-estimation method based on a simple maximum likelihood objective. + +Overall, the reviewers found the idea to be novel and interesting and I agree. They also found the relation to the noise2noise objective worth highlighting. Several concerns were raised during the discussion period, which I believe the authors addressed satisfactorily. However, I think the authors should bring the assumed distinction between ‘supervised’, ‘self-supervised’ and ‘unsupervised’ upfront, as usually these types of models are trained using the noisy data (to which the authors refer to as unsupervised). + +Given the large body of literature on dynamical systems, filters and smoothers, I believe the paper will benefit significantly from more comparisons across a wider range of (and more realistic) datasets.",ICLR2022, +H1lInzIug,1486400000000.0,1486400000000.0,1,HyTqHL5xg,HyTqHL5xg,ICLR committee final decision,Accept (Poster),"The paper provides a clear application of variational methods for learning non-linear state-space models, which is of increasing interest, and of general relevance to the community.",ICLR2017, +AuNz1mtAU_L,1642700000000.0,1642700000000.0,1,qyzTEWWM0Pp,qyzTEWWM0Pp,Paper Decision,Reject,"The paper proposes multiresolution and equivariant generative models. Experimental results for several applications are shown. + +Pros: +- A first hierarchical generative model with multiresolution and equivariance. +- Extensive experiments + +Cons: +- Marginal novelty (multiresolution and permutation equivalence each is not novel for graph neural networks. +- State-of-the-art methods are not compared as baselines. +- Some standard metrics are not evaluated, and the used metrics are questionable (some generated molecules might not be stable although the chemical validity is 100%). +- Time/space complexity evaluation is missing. + +The authors did not address some of the serious concerns in the rebuttal.",ICLR2022, +rJ-bByTrf,1517250000000.0,1517260000000.0,503,rJ695PxRW,rJ695PxRW,ICLR 2018 Conference Acceptance Decision,Reject,"The problem of discovering ordering in an unordered dataset is quite interesting, and the authors have outlined a few potential applications. However, the reviewer consensus is that this draft is too preliminary for acceptance. The main issues were clarity, lack of quantitative results for the order discovery experiments, and missing references. The authors have not yet addressed these issues with a new draft, and therefore the reviewers have not changed their opinions.",ICLR2018, +FZm68JrejkK,1610040000000.0,1610470000000.0,1,bGZtz5-Cmkz,bGZtz5-Cmkz,Final Decision,Reject,"The reviewers liked the overall idea presented in this paper. Although the idea as well as relevant tooling for incorporating constraints in the latent space has been studied a lot in the past, the authors differentiate their work by applying it in a new interesting problem. At the same time, some confusions about relation to prior work remain after rebuttal. Firstly, the theoretical additions to prior work (Srinivas et al. 2010) are still unclear in terms of significance - they feel more like observations made on top of an existing theorem rather than fresh significant insights. Furthermore, even if prior work has not considered exactly the same set-up, it would still be needed to understand what the performance would be when considering prior models or prior datasets used in similar domains (e.g. suggestions by R2, R3). The latter would be desirable especially since the experimental set-up used in this paper is deemed by the reviewers too simple (while the motivation of the paper is to solve an issue essentially manifesting in complex scenarios).",ICLR2021, +Syg3jNhEJE,1543980000000.0,1545350000000.0,1,BJgTZ3C5FX,BJgTZ3C5FX,"lack of novelty, variance in high dimensions",Reject,"This method proposes a primal approach to minimizing Wasserstein distance for generative models. It estimates WD by computing the exact WD between empirical distributions. + +As the reviewers point out, the primal approach has been studied by other papers (which this submission doesn't cite, even in the revision), and suffers from a well-known problem of high variance. The authors have not responded to key criticisms of the reviewers. I don't think this work is ready for publication in ICLR. +",ICLR2019,5: The area chair is absolutely certain +juTVWSCgBPX,1610040000000.0,1610470000000.0,1,lQdXeXDoWtI,lQdXeXDoWtI,Final Decision,Accept (Poster),"This paper provides an interesting analysis on the research on Domain Generalization with main principles and limitations. The authors provide a strong rebuttal to address some comments pointed by reviewers. All the reviews are very positive. +Hence, I recommend acceptance.",ICLR2021, +epBxu1aoZut,1642700000000.0,1642700000000.0,1,PgNEYaIc81Q,PgNEYaIc81Q,Paper Decision,Accept (Poster),"This paper proposes a new dataset called ComPhy to evaluate the ability of models to infer physical properties of objects and to reason about their interactions given these physical properties. The paper also presents an oracle model (named oracle because it requires gold property labels at training time) that is modular and carefully hand designed, but shows considerable improvement over a series of baselines. The reviewers for this submission had several concerns including: +(a) [VByS] ""concerns are about the complexity that the proposed method can handle""\ +(b) [VByS] ""the method is only demonstrated on a simple synthetic dataset""\ +(c) [8BUA] ""I am struggling to see any direct application""\ +(d) [8BUA] ""choosing 4-videos as reference"" -- why use ref videos, why use 4\ +(e) [8BUA] ""Baselines showing results with ground-truth object properties should be reported""\ +(f) [3cQE] ""no innovation in the type or structure of questions asked""\ +(g) [3cQE] ""neither the CPL framework nor the implementation of any module is novel""\ +(h) [DJEq] ""The only difference is that this paper infers hidden properties instead of collisions""\ +(i) [DJEq] ""The dataset is not comprehensive enough"" -- only 2 properties and simplistic and synthetic videos\ + +The authors have provided detailed responses to these concerns and I discuss these below. + +The authors have addressed (c),(d) and (e) well in their rebuttal. + +I don't think (a) is concerning. The proposed model is not expected to solve the dataset entirely inspite of having access to gold properties at training time. As the authors mention, this indicates the complexity of the task at hand. + +The authors also address (f) well. I dont think there is any need for innovation in the structure of questions asked. QA is merely a mechanism to probe the model, and using CLVERER style questions seems appropriate. + +I disagree with the sentiment behind (g). The proposed oracle model clearly inherits modules from past works and assembles them to suit the needs of the dataset. It is this assembly that differentiates it from past works. This is true of most papers in our field, including ones that are widely acknowledged to be important papers. The underlying modules in proposed networks are rarely novel, but their assembly can lead to improvements on benchmarks. Furthermore, the oracle model, isnt the central contribution of this work. The dataset is, and hence, the requirement for novelty is reduced. The oracle is meant to serve as a guideline to show what one may achieve given gold labels at training, and it serves that purpose well. + +Re (h), my takeaway is that inferring properties based on their dynamics and without any link to their appearance is an important step, and past datasets do not exhibit this characteristic. And thus, in spite of being a limited differentiation from CLEVERER, I think this is interesting. + +Re: (b) and (i) I do agree with some aspects of these, with the reviewers. +I think its still valuable to have a dataset with synthetic videos, given that models today are unable to solve this dataset. Moving to more realistic videos is a next step. +However, as the reviewer [DJEq] points out, it would be desirable to add more physical properties and add more complex scene elements like ramps. That would have added a lot more diversity to the dataset -- visually, with regards to physical properties and with regards to the types of reasoning required. + +Having said that, I believe that the dataset in its present form is still valuable to the community, and hence I recommend acceptance. +I think adding more physical properties and scene elements will have made this a much stronger submission.",ICLR2022, +rJxk6d1bx4,1544780000000.0,1545350000000.0,1,H1emus0qF7,H1emus0qF7,Strong paper on hierarchical RL with very strong reviews from people expert in this subarea that I know well.,Accept (Poster),"Strong paper on hierarchical RL with very strong reviews from people expert in this subarea that I know well. +",ICLR2019,4: The area chair is confident but not absolutely certain +HdUguuXAPJi,1642700000000.0,1642700000000.0,1,yzDTTtlIlMr,yzDTTtlIlMr,Paper Decision,Reject,"Following a recent line of work on the implicit bias of learning algorithms, the authors consider optimization methods that incorporate momentum. The reviewers found the topic timely and interesting, and generally appreciated the novelty of the technical contributions in the work. However, several critical issues concerning the presentation quality and the positioning of the paper have been raised. In addition, some of the reviewers felt that parts of the paper were somewhat rushed and potentially misleading (mainly, those concerning deterministic\stochastic ADAM and the complexity of the models considered in the paper), and others believed that the experimental section should be made more solid to properly corroborate the theoretical analysis provided in the paper. The authors are encouraged to incorporate the instructive feedback provided by the reviewers in future revisions of the paper.",ICLR2022, +dJPzTpLBXnI,1610040000000.0,1610470000000.0,1,YQVjbJPnPc9,YQVjbJPnPc9,Final Decision,Reject,"Multiple reviewers point out the interesting improvement to mix attention maps at different layers via convolution based prediction modules. This module is sufficient to show improvements only on encoder side while comparing to concurrent work Synthesizer. +However, the novelty of the work is limited as compared to other papers and the results though improved did not convince the reviewers fully to gain a strong accept. +",ICLR2021, +AEPFs0kn1_,1576800000000.0,1576800000000.0,1,Hklz71rYvS,Hklz71rYvS,Paper Decision,Accept (Spotlight),"This is a very interesting paper which extends natural gradient to output space metrics other than the Fisher-Rao metric (which is motivated by approximating KL divergence). It includes substantial mathematical and algorithmic insight. The method is shown to outperform various other optimizers on a neural net optimization problem that's artificially made ill-conditioned; while it's not clear how practically meaningful this setting is, it seems like a good way to study optimization. I think this paper will be of interest to a lot of researchers and could open up new research directions, so I recommend acceptance as an Oral. +",ICLR2020, +YsrnjBMa3,1576800000000.0,1576800000000.0,1,ByxaUgrFvH,ByxaUgrFvH,Paper Decision,Accept (Poster),"This paper proposes the Mutual Information Gradient Estimator (MIGE) for estimating the gradient of the mutual information (MI), instead of calculating it directly. To build a tractable approximation to the gradient of MI, the authors make use of Stein's estimator followed by a random projection. The authors empirically evaluate the performance on representation learning tasks and show benefits over prior MI estimation methods. +The reviewers agree that the problem is important and challenging, and that the proposed approach is novel and principled. While there were some concerns about the empirical evaluation, most of the issues were addressed during the discussion phase. I will hence recommend acceptance of this paper. We ask the authors to update the manuscript as discussed.",ICLR2020, +rJGC3MIOx,1486400000000.0,1486400000000.0,1,HJDBUF5le,HJDBUF5le,ICLR committee final decision,Accept (Poster),This is an interesting paper that adds nicely to the literature on VAEs and one-shot generalisation. This will be of interest to the community and will contribute positively to the conference.,ICLR2017, +rJ-bPkTBf,1517250000000.0,1517260000000.0,933,SkwAEQbAb,SkwAEQbAb,ICLR 2018 Conference Acceptance Decision,Reject,"The paper addresses the important question of determining the intrinsic dimensionality, but there remain several issue, which make the paper not ready at this point: unclear exposition, lack of contextualisation of existing work and seemingly limited insights. The reviewers have provided many suggestions to improve the paper which we hope will be useful to improve the paper.",ICLR2018, +r1eVXWd4JN,1543960000000.0,1545350000000.0,1,SJlgOjAqYQ,SJlgOjAqYQ,decision,Reject,"The paper presents an empirical comparison of translation invariance property in CNN and capsule networks. As the reviewers point out, the paper is not acceptable quality at ICLR due to low novelty and significance. ",ICLR2019,5: The area chair is absolutely certain +SJxEiqi_xV,1545280000000.0,1545350000000.0,1,r1xwqjRcY7,r1xwqjRcY7,limited novelty and limited experimental evaluation,Reject,"mnist and small picture variants are not that impressive. +it is a minor extension of VAEs which also are not common in sota systems.",ICLR2019,5: The area chair is absolutely certain +NDQa6IS26L,1642700000000.0,1642700000000.0,1,OxgLa0VEyg-,OxgLa0VEyg-,Paper Decision,Reject,"This paper considers the idea of meta-learning the loss function for domain generalization. It's a simple idea, that seems to work reasonably well. Although, as pointed out by the reviewers, the margin is actually quite modest when compared to the strongest baselines (not ERM). + +On a positive note, many reviewers agree that the idea was simple, novel, and interesting. The insight that cross-entropy can be improved for domain generalization is interesting. On the other hand, many reviewers pointed out that the, despite some careful empirical work, it's not clear why this idea works. I read the paper myself, and I agree that the paper could use a bit more work before it is ready for publication. Specifically, I agree with Reviewer eZ71, who asked for a clear justification of the proposed idea. The idea seems sensible, but there is some burden on the paper to provide insight, and not simply present an idea. Here are some specific suggestions that came up during discussion, which could strengthen the paper: +- A more comprehensive discussion of the limitations of this approach. +- It would be good to understand how critical was the specific choice of parametric loss family. Here are some questions that would be good to address: does the parametric family interact with the type of domain shift in the datasets? Why are Taylor polynomials preferable or beneficial for domain generalization compared to, e.g., a linear combination of standard loss functions? +- Is the dataset on which you learn your ITL loss critical? I.e., how critical was the choice of rotated MNIST for learning the ITL loss? Does it generalize to very different and more diverse domain shift tasks, like those in the WILDS benchmark? It would be particularly interesting to see if loss functions meta-trained on distinct datasets learn similar parameters. +- More broadly, evaluation on larger and more diverse domain shift tasks, like those in the WILDS benchmark, would further strengthen the conclusions in the paper.",ICLR2022, +rJeFBQnHx4,1545090000000.0,1545350000000.0,1,BylctiCctX,BylctiCctX,Interesting idea for which the presented evaluation is too narrow,Reject,"This submission proposes an interesting new approach on how to evaluate what features are the most useful during training. The paper is interesting and the proposed approach has the potential to be deployed in many applications, however the work as currently presented is demonstrated in a very narrow domain (stability prediction), as noted by all reviewers. Authors are encouraged to provide stronger experimental validation over more domains to show that their approach can truly improve over existing multitask frameworks.",ICLR2019,4: The area chair is confident but not absolutely certain +mPxhN7IUJa0,1610040000000.0,1610470000000.0,1,r6I3EvB9eDO,r6I3EvB9eDO,Final Decision,Reject,"The reviewers were clearly excited by the novel application of group theory to the problem of composition, and think the core idea is good. However, the reviewers also expressed concern about the clarity of the paper, stating that in several places examples might help. Reviewers were also interested in seeing the work tied to real world applications, and how the work expands our existing knowledge about the composition of learned representations. I hope their suggestions will help the authors to turn this into a stronger, clearer paper.",ICLR2021, +CJxzxuzknN,1576800000000.0,1576800000000.0,1,ryeN5aEYDH,ryeN5aEYDH,Paper Decision,Reject,"The reviewers all believe that this paper is not yet ready for publication. All agree that this is an important application, and an interesting approach. The methodological novelty, as well as other parts of exposition, involving related work, or further discussion of what this solution means for patients, is right now not completely convincing to reviewers. My recommendation is to work on making sure the exposition best explains the methodology, and making sure this venue is the best for the submitted line of work.",ICLR2020, +YNqMAmaOaC9,1610040000000.0,1610470000000.0,1,F8whUO8HNbP,F8whUO8HNbP,Final Decision,Accept (Poster),"The paper raised a natural question: why good synthetic images can be not so good at training/fine-tuning models for downstream tasks (e.g., classification and segmentation)? This problem is named synthetic-to-real (domain) generalization (where syn/real images are regarded as from the source/target domain), and it is of practical importance when using GAN-like methods given limited real images for training. The authors found that the answer to the question is the diversity of the learned feature embeddings, and argued/advocated that we should encourage such diversity when training on syn images in order to better approximate training on real images. To this end, a novel contrastive synthetic-to-real generalization framework was proposed and shown effective in the well designed experiments. + +Overall, the quality is above the bar. While some reviewers had some concerns about the applicability and the motivations for the algorithm design, the authors have done a particularly good job in the rebuttal. After the rebuttal, we all think the paper should be accepted for publication. + +I have some comments on the writing. The introduction claiming so many things has only 4 citations, especially the first two paragraphs have no citation. While I do think what claimed there are correct, the authors should include certain supportive evidences after each claim by themselves. Moreover, while I do think the problem hunting part is well motivated, the problem solving part needs its own motivation/justification. When two or more components are combined in a proposal, why this component is chosen and is there other choice that can achieve the same purpose (this concern has also been raised by reviewers)? I believe the components are not randomly chosen among possible candidates (e.g., ""we further enhance the CSG framework with attentional pooling""), but for writing a paper, the authors should explain the motivation for the algorithm design because we cannot know the motivation unless they tell us.",ICLR2021, +S10d8kaSM,1517250000000.0,1517260000000.0,823,SyhcXjy0Z,SyhcXjy0Z,ICLR 2018 Conference Acceptance Decision,Reject,"Reviewers are unanimous that this is a reject. +A ""class project"" level presentation. +Errors in methodology and presentation. +No author rebuttal or revision",ICLR2018, +5YO54WW3vje,1610040000000.0,1610470000000.0,1,30I4Azqc_oP,30I4Azqc_oP,Final Decision,Reject,"The work proposed to learn causal structure of the environment and use the average causal effect of different categories of the environment, between the current and next state after performing an action as intrinsic reward to assist policy learning. While the reviewers find the ideas presented in the paper interesting and of potential, there are some concerns regarding proper introduction and comparison to related works, and clarity of the algorithm itself. While the two experimental results presented in the paper do show the potential of the work, it is missing an important baseline to disentangle the effectiveness of introducing the causal structure alone vs intrinsic reward. For example, how would A2C with curiosity or surprised based intrinsic reward, which also introduce the surprisingness of the next state as a result of performing an action as additional reward perform on these tasks? +",ICLR2021, +QeqNVU4_bZc,1642700000000.0,1642700000000.0,1,6-lLt2zxbZR,6-lLt2zxbZR,Paper Decision,Reject,"This paper argues several loosely-related points about the evaluation of pretrained models on commonsense reasoning datasets in the Winograd style, and presents experiments with existing models on several datasets, including a novel 20-example benchmark. All four reviewers struggled to find a clear contribution or theme in this paper that is novel and thorough enough to meet the bar for publication at a selective general-ML venue. + +I'd urge the authors to focus in on just one of these points and expand, and to consider submitting to a venue that more narrowly focuses on methods for commonsense reasoning in NLP.",ICLR2022, +H1gbe4TxgV,1544770000000.0,1545350000000.0,1,rklQas09tm,rklQas09tm,Meta Review,Reject,"This paper attempts at modeling text matching and also generating rationales. The motivation of the paper is good. + +However there is some shortcomings of the paper, e.g. there is very little comparison with prior work, no human evaluation at scale and also it seems that several prior models that use attention mechanism would generate similar rationales. No characterization of the last aspect has been made here. Hence, addressing these issues could make the paper better for future venues. + +There is relative consensus between the reviewers that the paper could improve if the reviewers' concerns are addressed when it is submitted to future venues.",ICLR2019,4: The area chair is confident but not absolutely certain +bQt6dSPT1R-,1642700000000.0,1642700000000.0,1,v3LXWP63qOZ,v3LXWP63qOZ,Paper Decision,Reject,"Although all reviewers had many positive comments on the paper, and the authors engaged nicely in the discussion period, at the moment there is a consensus among the reviewers that the central claims of the paper (related to minimal representations / information bottleneck) are not adequately supported by the current experiments. In particular, there were concerns that performance gains could be due to diversity of predictors, rather than minimal representations, which would need to be addressed. It's suggested that the reviewers take all of these comments and discussion into account when preparing a revised version of the paper.",ICLR2022, +NmNoI5qrXcw,1642700000000.0,1642700000000.0,1,EJKLVMB_9T,EJKLVMB_9T,Paper Decision,Reject,"There were genuine differences of opinion here. I saw reviews of 8,6,5,5. +In these cases, I do try to check if the 8 has a really compelling argument and err on the side of accepting, but here I think both the positive and negative reviews have fair points, so I am inclined to recommend rejection here. + +I think the good news is that a lot of the negative stuff was around scoping/writing/related-work, and so it should be (relatively) easy to shore up this submission into something that will get better reviews in the next conference cycle.",ICLR2022, +Sx0-BssqxN,1576800000000.0,1576800000000.0,1,HJeYSxHFDS,HJeYSxHFDS,Paper Decision,Reject,"The paper extends Gauge invariant CNNs to Gauge invariant spherical CNNs. The authors significantly improved both theory and experiments during the rebuttal and the paper is well presented. However, the topic is somewhat niche, and the bar for ICLR this year was very high, so unfortunately this paper did not make it. We encourage the authors to resubmit the work including the new results obtained during the rebuttal period.",ICLR2020, +ixOQerMM-a,1576800000000.0,1576800000000.0,1,ryljMpNtwr,ryljMpNtwr,Paper Decision,Reject,"This paper proposes a benchmark for assessing the impact of image quality degradation (e.g. simulated fog, snow, frost) on the performance of object detection models. The authors introduce corrupted versions of popular object detection datasets, namely PASCAL-C, COCO-C and Cityscapes-C, and an evaluation protocol which reveals that the current models are not robust to such corruptions (losing as much as 60% of the performance). The authors then show that a simple data augmentation scheme significantly improves robustness. The reviewers agree that the manuscript is well written and that the proposed benchmark reveals major drawbacks of current detection models. However, two critical issues with the paper paper remain, namely lack of novelty in light of Geirhos et al., and how to actually use this benchmark in practice. I will hence recommend the rejection of this paper in the current state. Nevertheless, we encourage the authors to address the raised shortcomings (the new experiments reported in the rebuttal are a good starting point). ",ICLR2020, +y8Yz3g8YNc,1576800000000.0,1576800000000.0,1,SJe_D1SYvr,SJe_D1SYvr,Paper Decision,Reject,"The paper introduces the concept of an Expert Induced MDP (eMDP) to address imitation learning settings where environment dynamics are part known / part unknown. Based on the formulation a model-based imitation learning approach is derived and the authors obtain theoretical guarantees. Empirical validation focuses on comparison to behavior cloning. Reviewers raised concerns about the size of the contribution. For example, it is unclear to what degree the assumptions made here would hold in practical settings.",ICLR2020, +r1f1hGI_e,1486400000000.0,1486400000000.0,1,ryT9R3Yxe,ryT9R3Yxe,ICLR committee final decision,Reject,"The contribution of this paper generally boils down to adding a prior to the latent representations of the paragraph in the Paragraph Vector model. An especially problematic point about this paper is the claim that the original paper considered only the transductive setting (i.e. it could not induce representations of new documents). It is not accurate, they also used gradient descent at test time. Though I agree that regularizing the original model is a reasonable thing to do, I share the reviewers' feeling that the contribution is minimal. There are also some serious issues with presentation (as noted by the reviewers), I am surprised that the authors have not addressed them during the review period.",ICLR2017, +62VNT5ZQHT,1642700000000.0,1642700000000.0,1,ldkunzUzRWj,ldkunzUzRWj,Paper Decision,Reject,"The reviewers remained concerned about the overall novelty of the paper, finding the contributions somewhat incremental. The authors are encouraged to better substantiate design choices that they make, to improve the overall presentation, and to contrast with the works/line of research brought up by the reviewers.",ICLR2022, +GSaVwFa2pka,1642700000000.0,1642700000000.0,1,zzk231Ms1Ih,zzk231Ms1Ih,Paper Decision,Accept (Poster),"The paper takes a creative step in the theory of tournaments, and it seems plausible that this could lead to interesting follow-ups. The reviewers made many excellent comments and I highly encourage the authors to take ALL of them into account in the revision, it will make the paper much stronger.",ICLR2022, +6ANqyrrzTm,1642700000000.0,1642700000000.0,1,nD9Pf-PjTbT,nD9Pf-PjTbT,Paper Decision,Reject,"All reviewers are very critical about the submitted paper regarding novelty of results, insufficient placement with respect to existing results, and clarity of presentation. The authors also did not submit a rebuttal. Hence I am recommending rejection of the paper.",ICLR2022, +L5AizINUYGB,1610040000000.0,1610470000000.0,1,ujmgfuxSLrO,ujmgfuxSLrO,Final Decision,Accept (Poster),"This paper presents some innovations to transformers allowing some significant reductions in parameter count. While some reviewers were concerned that the proposed innovations seem incremental and may not stand the test of time, all reviewers recommended acceptance after engaging in a rich and interactive author discussion. Given the clear importance of making transformers more efficient I think this paper will be of interest to the community and is worthy of acceptance at ICLR. ",ICLR2021, +913RlcGax,1576800000000.0,1576800000000.0,1,BJgdOh4Ywr,BJgdOh4Ywr,Paper Decision,Reject,"The main concern raised by reviewers is limited novelty, poor presentation, and limited experiments. All the reviewers appreciate the difficulty and importance of the problem. The rebuttal helped clarify novelty, but the other concerns remain.",ICLR2020, +rJ2PNJpHf,1517250000000.0,1517260000000.0,377,S1lN69AT-,S1lN69AT-,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"The authors present a thorough exploration of large-sparse models that are pruned down to a target size and show that these models can perform better than small dense models. Results are shown on a variety of datasets with as conv models and seq2seq. The authors even go so far as to release the code. I think the authors are to be thanked for their experimental contributions. +However, in terms of accepting the paper for a premier machine learning conference the method holds little surprise or non-obviousness. I think the paper is a good experimental contribution, and would make a good workshop paper instead but it offers little contribution by way of machine learning methods. +",ICLR2018, +qeOmpkc78i,1642700000000.0,1642700000000.0,1,jeLW-Fh9bV,jeLW-Fh9bV,Paper Decision,Accept (Poster),"The main contribution of this paper lies in the novel setting that is being considered: offline data without rewards is combined with meta-training tasks to quickly adapt to new long-horizon tasks at meta-test time. Within this setting, it is shown that the combination of SPiRL and PEARL outperforms the individual algorithms. The technical contribution is limited, as now new methods are introduced. Nevertheless, the setting considered is interesting and the empirical evaluation is solid. For these reasons, I recommend acceptance.",ICLR2022, +suA_qhEthA,1576800000000.0,1576800000000.0,1,S1eL4kBYwr,S1eL4kBYwr,Paper Decision,Reject,"This submission proposes an approach to pre-train general-purpose image and text representations that can be effective on target tasks requiring embeddings for both modes. The authors propose several pre-training tasks beyond masked language modelling that are more suitable for the cross-modal context being addressed, and also investigate which dataset/pretraining task combinations are effective for given target tasks. + +All reviewers agree that the empirical results that were achieved were impressive. + +Shared points of concern were: +- the novelty of the proposed pre-training schemes. +- the lack of insight into the results that were obtained. + +These concerns were insufficiently addressed after the discussion period, particularly the limited novelty. Given the remaining concerns and the number of strong submissions to ICLR, this submission, while promising, does not meet the bar for acceptance. +",ICLR2020, +bnkwgnW_ayu4,1642700000000.0,1642700000000.0,1,0kPL3xO4R5,0kPL3xO4R5,Paper Decision,Accept (Poster),"Initially, we had some borderline scores for this paper. After the (indeed very convincing!) rebuttal and a the end of the discussion phase, however, all reviewers agreed that this is a very solid piece of work, with significant methodological and practical contributions. I fully share this positive impression of the paper!.",ICLR2022, +TtAjBPDPY,1576800000000.0,1576800000000.0,1,H1lZJpVFvr,H1lZJpVFvr,Paper Decision,Accept (Poster),"Earlier work suggests that adversarial examples exploit local features and that more robust models rely on global features. The authors propose to exploit this insight by performing data augmentation in adversarial training, by cutting and reshuffling image block. They demonstrate the idea empirically and witness interesting gains. I think the technique is an interesting contribution, but empirically and as a tool. +",ICLR2020, +xvsHnwx-0e7,1642700000000.0,1642700000000.0,1,VCD05OEn7r,VCD05OEn7r,Paper Decision,Reject,"This paper proposes the framework CAGE (causal probing of deep generative models) for estimating counterfactuals and unit-level causal effects in deep generative models. CAGE employs geometrical manipulations within the latent space of a generative model to estimate the counterfactual quantities. The estimator is written in potential outcome language and assumes unconfoundedness, positivity, stable unit treatment value assumption (SUTVA), and linear separability in semantic attributes of the latent space. Furthermore, the framework considers only the case of binary treatments. + +One major concern raised by reviewers TgM5 and xP5d is that the method is based on a trained generative model, which may not be the true data-generating model. In this case, the paper appears to address statistical dependencies instead of the actual causal relationships in the real world. The authors claim to empirically show that their framework can probe unit-level (individual) causal effects. However, the reviewers are concerned that no theoretical support for the correctness of the method is provided. In other words, the problem is assumed away once a probabilistic model is assumed to be equal to the true generative model, which is almost never the case in practice and is well-known in the field. We want to encourage the authors to provide a more detailed theoretical justification, perhaps with proofs and/or references, that the proposed method can infer causal and counterfactual relationships given the underlying assumptions. + +After all, reviewers were interested but somewhat skeptical about the method's ability to learn causal and counterfactual relationships. Unfortunately, the paper is not ready for publication yet. Still, we would like to encourage the authors to take the reviews seriously and try to improve the manuscript accordingly.",ICLR2022, +2oj_M5PBi8,1576800000000.0,1576800000000.0,1,SyxTZ1HYwB,SyxTZ1HYwB,Paper Decision,Reject,"This paper proposes a sensor placement strategy based on maximising the information gain. Instead of using Gaussian process, the authors apply neural nets as function approximators. A limited empirical evaluation is performed to assess the performance of the proposed strategy. +The reviewers have raised several major issues, including the lack of novelty, clarity, and missing critical details in the exposition. The authors didn’t address any of the raised concerns in the rebuttal. I will hence recommend rejection of this paper.",ICLR2020, +uHj_NNBUJI,1610040000000.0,1610470000000.0,1,m4PC1eUknQG,m4PC1eUknQG,Final Decision,Reject,"Well, this paper has achieved something remarkable in this review process: The initial scores came in at fairly low scores (4, 5, 3, 6). However, as the discussions / rebuttals went back and forth, the reviewers were able understand and see the merits of the proposed methodology. Namely, the setting of L2E (Learning to Exploit), which makes use of a novel method called Opponent Strategy Generation, to quickly generate very different types of opponents to play against. One more pertinent component is the use of MMD (maximum mean discrepancy regularization) which can remove the necessity of dealing with task distributions, and does a better job in creating diverse opponents. + +Having understood the technical approach, three of the reviewers decided to substantially increase their scores. R4 increased 4->6, R5 increased 5->6, R3 increased 3->4, while R2 held steady with a score of 6. It was also good to see empirical favorable results compared to other baseline methods: L2E had the best return against unclear opponents, such as Rocks opponent and Nash opponent. + +Without any reviewer arguing strongly for acceptance, the program committee decided that the paper in its current form does not quite meet the bar, and also that it would benefit from another revision. ",ICLR2021, +H4gEJbYZv_Y0,1642700000000.0,1642700000000.0,1,m716e-0clj,m716e-0clj,Paper Decision,Reject,"The paper proposes a ''communicate-then-adapt'' framework for decentralized optimization, with both theoretical and empirical analysis. The reviewers' main concern is the comparison in theory with prior methods like the GT-DAdam. The convergence to a stationary point of GT-DAdam seems to be faster than the proposed method in the important non-convex optimization. The reviewers are not convinced by the strong claim that ''communicate-then-adapt'' is better than ''adapt-then-communicate'' as such ''adapt-then-communicate'' method can also achieve same or better rates, possibly with less hyper-parameter tuning. I would suggest the authors to make more proper comparison with related methods.",ICLR2022, +WEPKty-bK_i,1642700000000.0,1642700000000.0,1,lycl1GD7fVP,lycl1GD7fVP,Paper Decision,Reject,"*Summary:* Study generalization in kernel regression discussing the NTK case and experiments on finite width nets. + +*Strengths:* +- Mix of theoretical and empirical results in an important topic. +- Advances a promising recent line of work. + +*Weaknesses:* +- Concerns about novelty and lack of comparison with existing works. +- Concerns about insufficient contextualization of new notion of learnability. +- Concerns about scope of results in relation to claims. + +*Discussion:* + +Reviewer gb7t (3) found their concerns about lack of novelty and comparison with prior works not sufficiently addressed in the authors’ responses. 7tiq (6) found the line of investigation promising, but also issues with presentation and found the theoretical results incremental. Mosm (5) finds that the theoretical part pertaining kernels does not offer much novelty and that the paper should have focused on the empirical study that links the NTK spectrum to generalization. q2g8 (8) confidently considers this a good paper. In their view it provides a nice theoretical analysis of generalization in the setting of kernel regression and the metric of learnability intuitive. However, they also found that the detests of the experiments are very artificial and problematic the desire of the article to extend the regime of the results to make claims about deep learning. + +A the end of the discussion period, the official reviewer ratings are mixed 3,5,6,8, indicating various strengths and weaknesses (also in case of the most favorable reviews). From the reviews and discussion, I infer that the topic is worthwhile and relevant, but at the same time that the paper might not be sufficiently convincing in its current form. Therefore I lean to reject the paper. To arrive at a clear conclusion, I consulted two additional researchers. + +*Additional assessment 1:* + +The first additional assessment found the work 'underwhelming' but admitted there is a chance they might not have fully understood the work. + +*Additional assessment 2:* + +The second additional assessment provided following comments: The paper's first contribution (conservation law), I didn't see it elsewhere but I think it's quite expected. The testing performance of low-frequency target functions and high-frequency target functions are averaged. Thus the average performance is constant which is independent on the kernel. However, in practice, kernel learning performs well because real target functions have low frequency. And the very high-frequency functions are unrealistic. + +About the paper's second contribution, I think the paper needs to explain how the result is different from Bordelon et al. (2020). I noticed that the method they use is different but the result seems quite similar. The paper also consider noiseless case and gives an approximation for MSE. + +Also the paper should explain more about the approximation being used. For example, how much error the approximation introduce and how the approximation is different from Bordelon's approximation. I see that in the proof Φ is approximated by a matrix where each element is standard Gaussian. For me I can't understand why the approximation is reasonable. + +I read through the reviews and rebuttals. I didn't see the discussion of the issue of approximation. But I think it's a major issue and the approximation is a very strong assumption. The appendix states: ""we have made an approximation using the central limit theorem assuming that Φ is random with entries sampled i.i.d. from N (0, 1)"". Here Φ is the matrix of eigenfunctions. Hence it is not clear how to apply the central limit theorem. + +*Conclusion:* + +I conclude that although the paper presents some interesting ideas on a relevant subject, it still has much room for improvement. Hence I recommend to reject this article. I encourage the authors to revise taking the above comments into consideration.",ICLR2022, +Teuck8hyTB,1576800000000.0,1576800000000.0,1,SJleNCNtDH,SJleNCNtDH,Paper Decision,Accept (Poster),"The authors address the important issue of exploration in reinforcement learning. In this case, they propose to use reward shaping to encourage joint-actions whose outcomes deviate from the sequential counterpart. Although the proposed intrinsic reward is targeted at a particular family of two-agent robotic tasks, one can imagine generalizing some of the ideas here to other multi-agent learning tasks. + +The reviewers agree that the paper is of interest to the ICLR audience.",ICLR2020, +HkQeBJaBf,1517250000000.0,1517260000000.0,490,rylejExC-,rylejExC-,ICLR 2018 Conference Acceptance Decision,Reject,"The paper studies subsampling techniques necessary to handle large graphs with graph convolutional networks. The paper introduces two ideas: (1) preprocessing for GCNs (basically replacing dropout followed by linear transformation with linear transformation followed by drop out); (2) adding control variates based on historical activations. Both ideas seem useful (but (1) is more empirically useful than (2), Figure 4*). The paper contains a fair bit of math (analysis / justification of the method). + +Overall, the ideas are interesting and can be useful in practice. However, not all reviewers are convinced that the methods constitute a significant contribution. There is also a question whether the math has much value (strong assumptions - also, from interpretation, may be too specific to the formulation of Kipf & Welling making it a bit narrow?). Though I share these feelings and recommend rejection, I think that the reviewers 2 and 3 were a bit too harsh, and the scores do not reflect the quality of the paper. + +*Potential typo: Figure 4 -- should it be CV +PP rather than CV? + ++ an important problem ++ can be useful in practical applications ++ generally solid and sufficiently well written +- significance not sufficient +- math seems not terribly useful + +",ICLR2018, +K3tRBTyOr,1576800000000.0,1576800000000.0,1,rklB76EKPr,rklB76EKPr,Paper Decision,Accept (Poster),This paper studies the effect of clipping on mitigating label noise. The authors demonstrate that standard gradient clipping does not suffice for achieving robustness to label noise. The authors suggest a noise-robust alternative. In the discussion the reviewers raised some interesting questions and technical detailed but mostly agreed that the paper is well-written with nice contributions. I concur with the reviewers that this is a nicely written paper with good contributions. I recommend acceptance but recommend the authors continue to improve their paper based on the reviewers' suggestions.,ICLR2020, +bqQIdMDJP2,1576800000000.0,1576800000000.0,1,SkxQp1StDH,SkxQp1StDH,Paper Decision,Accept (Poster),"The paper proposes an embedding for nodes in a directed graph, which takes into account the asymmetry. The proposed method learns an embedding of a node as an exponential distribution (e.g. Gaussian), on a statistical manifold. The authors also provide an approximation for large graphs, and show that the method performs well in empirical comparisons. + +The authors were very responsive in the discussion phase, providing new experiments in response to the reviews. This is a nice example where a good paper is improved by several extra suggestions by reviewers. I encourage the authors to provide all the software for reproducing their work in the final version. + +Overall, this is a great paper which proposes a new graph embedding approach that is scalable and provides nice empirical results.",ICLR2020, +Skz-SkTBz,1517250000000.0,1517260000000.0,504,SyuWNMZ0W,SyuWNMZ0W,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers agree that the problem being addressed is interesting, however there are concerns with novelty and with the experimental results. An experiment beyond dealing with class imbalance would help strengthen this paper, as would experiments with other kinds of GANs.",ICLR2018, +oT3S6iW68u3,1610040000000.0,1610470000000.0,1,4qR3coiNaIv,4qR3coiNaIv,Final Decision,Accept (Poster),"The reviewers agree that the submitted paper is of high quality and provides a promising approach/framework for Bayesian IRL. Certain concerns regarding details of the implementation and evaluation have already been addressed by the authors during the rebuttal phase, and also the title of the paper was adjusted in line with discussions with the reviewers. For the final paper, the authors should make sure to clearly highlight the advances of inferring a distribution over rewards (this is already partly done by the added grid world experiments) and discuss relations to VAEs as the initially had in mind and even in the paper title. Beyond that, the should of course also address other reviewers’ comments.",ICLR2021, +Yw6Tpflcm,1576800000000.0,1576800000000.0,1,ryxGuJrFvS,ryxGuJrFvS,Paper Decision,Accept (Poster),"This paper proposes distributionally robust optimization (DRO) to learn robust models that minimize worst-case training loss over a set of pre-defined groups. They find that increased regularization is necessary for worst-group performance in the overparametrized regime (something that is not needed for non-robust average performance). + +This is an interesting paper and I recommend acceptance. The discussion phase suggested a change in the title which slightly overstated the paper's contributions (a comment which I agree with). The authors agreed to change the title in the final version. +",ICLR2020, +G_AN7-BgEc,1576800000000.0,1576800000000.0,1,BylldxBYwH,BylldxBYwH,Paper Decision,Reject,"The authors present a physics-aware models for inpainting fluid data. In particular, the authors extend the vanilla U-net architecture and add losses that explicitly bias the network towards physically meaningful solutions. + +While the reviewers found the work to be interesting, they raised a few questions/objections which are summarised below: + +1) Novelty: The reviewers largely found the idea to be novel. I agree that this is indeed novel and a step in the right direction. +2) Experiments: The main objection was to the experimental methodology. In particular, since most of the experiments were on simulated data the reviewers expected simulations where the test conditions were a bit more different than the training conditions. It is not very clear whether the training and test conditions were different and it would have been useful if the authors had clarified this in the rebuttal. The reviewers have also suggested a more thorough ablation study. +3) Organisation: The authors could have used the space more effectively by providing additional details and ablation studies. + +Unfortunately, the authors did not engage with the reviewers and respond to their queries. I understand that this could have been because of the poor ratings which would have made the authors believe that a discussion wouldn't help. The reviewers have asked very relevant Qs and made some interesting suggestions about the experimental setup. I strongly recommend the authors to consider these during subsequent submissions. + +Based on the reviewer comments and lack of response from the authors, I recommend that the paper cannot be accepted. ",ICLR2020, +KueUc5lryc,1576800000000.0,1576800000000.0,1,HJxJdp4YvS,HJxJdp4YvS,Paper Decision,Reject,"The authors present a deep model for probabilistic clustering and extend it to handle time series data. The proposed method beats existing deep models on two datasets and the representations learned in the process are also interpretable. + +Unfortunately, despite detailed responses by the authors, the reviewers felt that some of their main concerns were not addressed. For example, the authors and the reviewers are still not converging on whether SOM-VAE uses a VAE or an autoencoder. Further, the discussion about the advantages of VAE over AE is still not very convincing. Currently the work is positioned as a variational clustering method but the reviewers feel that it is a clustering method which uses a VAE (yes, I understand that this difference is subtle but needs to be clarified). + +The reviewers read the responses of the author and during discussions with the AC suggested that there were still not convinced about some of their initial questions. Given this, at this point I would prefer going by the consensus of the reviewers and recommend that this paper cannot be accepted.",ICLR2020, +HkHYUJprG,1517250000000.0,1517260000000.0,829,HkPCrEZ0Z,HkPCrEZ0Z,ICLR 2018 Conference Acceptance Decision,Reject,The paper has some potentially interesting ideas but it feels very preliminary. The experimental section in particular needs a lot more work.,ICLR2018, +YAIIKJV2CN-,1642700000000.0,1642700000000.0,1,_HFPHFbJrP-,_HFPHFbJrP-,Paper Decision,Reject,"Authors study robustness properties of arbitrary smoothing measures with bounded support using Wasserstein distance and total variation distance. Reviewers pointed out several weaknesses about this work. In particular, they mentioned the paper is not well-organized, comparison with prior work is lacking, the conclusion of the theoretical analysis is not novel and the experiments are not comprehensive. I suggest authors to take these comments into account in improving their work.",ICLR2022, +rklejgP4xN,1545000000000.0,1545350000000.0,1,SyMWn05F7,SyMWn05F7,meta-review,Accept (Poster),"The authors have proposed an approach for directly learning a spatial exploration policy which is effective in unseen environments. Rather than use external task rewards, the proposed approach uses an internally computed coverage reward derived from on-board sensors. The authors use imitation learning to bootstrap the training and then fine-tune using the intrinsic coverage reward. Multiple experiments and ablations are given to support and understand the approach. The paper is well-written and interesting. The experiments are appropriate, although further evaluations in real-world settings really ought to be done to fully explore the significance of the approach. The reviewers were divided, with one reviewer finding fault with the paper in terms of the claims made, the positioning against prior art, and the chosen baselines. The other two reviewers supported publication even after considering the opposition of R1, noting that they believe that the baselines are sufficient, and the contribution is novel. After reviewing the long exchange and discussion, the AC sides with accepting the paper. Although R1 raises some valid concerns, the authors defend themselves convincingly and the arguments do not, in any case, detract substantially from what is a solid submission.",ICLR2019,4: The area chair is confident but not absolutely certain +YX7BI905-jo,1642700000000.0,1642700000000.0,1,LdlwbBP2mlq,LdlwbBP2mlq,Paper Decision,Accept (Oral),"This paper analyzes local SGD under the random reshuffling data selection setting. As is the case for standard random reshuffling, better rates are shown for local SGD when random reshuffling is used. This would already be a nice contribution to a line of work on random shuffling methods—but the paper goes beyond that by showing a matching lower bound and designing a (theoretically) better variant algorithm. The reviewers were all in agreement that this paper should be accepted (as a result not much further discussion happened after the original reviews), and I agree with this consensus. The modification seems to improve the paper, although I did not look through it in detail.",ICLR2022, +zD0rk_qegV,1576800000000.0,1576800000000.0,1,BylEqnVFDB,BylEqnVFDB,Paper Decision,Accept (Poster),"The paper presents a novel graph convolutional network by integrating the curvature information (based on the concept of Ricci curvature). The key idea is well motivated and the paper is clearly written. Experimental results show that the proposed curvature graph network methods outperform existing graph convolution algorithms. One potential limitation is the computational cost of computing the Ricci curvature, which is discussed in the appendix. Overall, the concept of using curvature in graph convolutional networks seems like a novel and promising idea, and I also recommend acceptance.",ICLR2020, +wDX9rqyC4A0,1642700000000.0,1642700000000.0,1,lNreaMZf9X,lNreaMZf9X,Paper Decision,Reject,"The paper studies the effect of different design choices related to learning a dynamics model. The reviewers uniformly agree that the topic of the paper, systematically studying different design choices, is important. Furthermore, the paper is very well written. However, there are a number of weaknesses as well, that limit the relevance of this work. Arguably, the main weakness is that the results are inconclusive: there is no single design choice that is better, a conclusion that provides little guidance for researchers working in this space. Another weakness is that the study focuses on only 4 domains. And while performing such a study on a much broader set of domains can be prohibitively expensive, that doesn't take away from the fact that it is hard to draw strong conclusions from such a small set of tasks. For these reasons, I recommend rejection.",ICLR2022, +iVwwjAuOa3T,1610040000000.0,1610470000000.0,1,C4-QQ1EHNcI,C4-QQ1EHNcI,Final Decision,Reject,"This paper propose an approach to efficient Bayesian deep learning by applying Laplace approximations to sub-structures within a larger network architecture. In terms of strengths, scalable approximate Bayesian inference methods for deep learning models are an important and timely topic. The paper includes an extensive set of experiments with promising results. + +In terms of issues, the reviewers originally raised many concerns and the authors provided a large update to the paper. However, following that update and the discussion, several concerns remain. First, the reviewers noted that the originally submitted draft made claims about the optimality of the sub-network selection procedure that were incorrect due to the use of a diagonal approximation. The authors subsequently retracted these claims and re-focused on the idea that the subset selection approach is theoretically well-motivated heuristic that performs well empirically. Following the discussion, the reviewers continued to express concerns about the heuristic nature of this procedure. + +A second point has to do with scalability. The reviewers noted that the authors had only evaluated their approach on small data sets, leaving open the question of how scalable the method is. The authors responded by adding experiments on the same data sets using larger models, which does not squarely address the issue raised. Third, an additional point was raised regarding the lack of control of resource use in the experiments. The authors note that their approach can use more resources when available while many other methods can not. However, some methods including deep ensembles can also expand to use more resources, as can posterior ensembles produced using MCMC methods like SGLD and SGHMC. The authors need to consider quantifying space-performance and time-performance trade-offs in the same units for different approaches to satisfactorily address this issue. While the authors added one set of experiments looking at deep ensembles in isolation, their conclusions that performance saturates for these models at low ensembles sizes seems to be hasty in some cases (e.g., deep ensembles show continued improvement for large corruptions in Figure 5(right) despite the claim by the authors that the models saturate after 15 epochs). + +In summary, this appears to be a promising approach. While the authors made significant efforts to correct issues and address questions with the original draft, the majority view of the reviewers following discussion is that this paper requires additional work to more carefully expand on the revised results and to address the heuristic status of the sub-network selection approach.",ICLR2021, +H1gayYnexN,1544760000000.0,1545350000000.0,1,r1xurn0cKQ,r1xurn0cKQ,"Novel approach, but needs stronger comparisons.",Reject,"This is a difficult decision, as the reviewers are quite polarized on this paper, and did not come to a consensus through discussion. The positive elements of the paper are that the method itself is a novel and interesting approach, and that the performance is clearly state of the art. While impressive, the fact that a relatively simple task module trained on the features from Zhu et al. can match the performance of GAZSL suggests that it is difficult to compare these methods in an apples-to-apples way without using consistent features. There are two ways to deal with this: train the baseline methods using the features of Zhu, or train correction networks using less powerful features from other baselines. + +Reviewer 3 pointed this out, and asked for such a comparison. The defense given by the authors is that they use the same features as the current SOTA baselines, and therefore their comparison is sound. I agree to an extent, however it should be relatively simple to either elevate other baselines, or compare correction networks with different features. Otherwise, most of the rows in Table 1 should be ignored. Running correction networks in different features in an ablation study would also demonstrate that the gains are consistent. + +I think the authors should run these experiments, and if the results hold then there will be no doubt in my mind that this will be a worthy contribution. However, in their absence, I can’t say with certainty how effective the proposed method really is. +",ICLR2019,4: The area chair is confident but not absolutely certain +ByxWNcoHgE,1545090000000.0,1545350000000.0,1,B1gHjoRqYQ,B1gHjoRqYQ,Many questions - not convincing enough at this time ,Reject,"The paper proposes a new method for adversarial attacks, MarginAttack, which finds adversarial examples with small distortion and runs faster than the CW baseline, but slower than other methods. The authors provide theoretical guarantees and a broad set of experiments. + +In the discussion, a consistent concern has been that, experimentally, the method does not perform noticeably better than previous approaches. The authors mention that the lines are too thick to reveal the difference. It has been pointed out that this might be related to the way the experiments are conducted, but the proposed method still does better than other methods. AnonReviewer1 mentions that the assumptions needed for the theoretical part might be too strong, meaning that the main contribution of the paper is in the experimental side. + +The comparisons with other methods and the assumptions made in the theorems seem to have caused quite some confusion and there was a fair amount of discussion. Following the discussion session, AnonReviewer1 updated his rating from 5 to 6 with high confidence. + +The referees all rate the paper as not very strong, with one marginally above acceptance threshold and two marginally below the acceptance threshold. + +Although the paper seems to propose valuable ideas, and it appears that the discussion has clarified many questions from the initial submission, the paper has not provided a clear, convincing, selling point at this time. ",ICLR2019,4: The area chair is confident but not absolutely certain +iMY2l_UXcIO,1610040000000.0,1610470000000.0,1,YhhEarKSli9,YhhEarKSli9,Final Decision,Reject,"This work proposes a method for identifying appropriate graphical models through enumeration, pruning of redundant dependencies, and neural network conditionals. While structure learning is an interesting application and there are some promising results, there were a number of concerns around experimental evaluation, computational complexity of the method, clarity of the presentation, and connections to prior work. In particular, R1's concerns around the large field of structure learning in Bayesian Networks, and unwillingness to use the established terminology (and comparing to methods there) was not sufficiently addressed in the rebuttal.",ICLR2021, +B1g_fLwBlE,1545070000000.0,1545350000000.0,1,Skgge3R9FQ,Skgge3R9FQ,reject,Reject,The reviewers agree the paper is not ready for publication at ICLR.,ICLR2019,5: The area chair is absolutely certain +DayvB7pms39,1610040000000.0,1610470000000.0,1,NECTfffOvn1,NECTfffOvn1,Final Decision,Accept (Spotlight),"All reviewers are for accepting the paper: in particular, R1 and R3 found the rebuttal sufficiently convincing to increase their scores from their initial assessment leaning towards rejection. + +Strengths: ++ Clarity ++ Simplicity of the proposed approach ++ Convincing experiments outperforming reasonable baselines across all problem instances + +Weaknesses: ++ Scale (as noted by R2 and R3) to larger problem sizes, beyond the setting of less than a dozen. + +I agree with some hesitation that the paper is narrow in scope (both in interest from the community and scale---and ultimately whether it would interest the overall quantum computing audience). However, I think the paper makes significant advances toward the area of adiabatic quantum computation.",ICLR2021, +ppONLmvUA,1576800000000.0,1576800000000.0,1,SJlVVAEKwS,SJlVVAEKwS,Paper Decision,Reject,"This paper proposes to use a generative adversarial network to train a substitute that replicates (imitates) a learned model under attack. It then shows that the adversarial examples for the substitute can be effectively used to attack the learned model. The proposed approach leads to better success rates of attacking than other substitute-training approaches that require more training examples. The condition to get a well-trained imitation model is that a sufficient number of queries are obtained from the target model. This paper has valuable contributions by developing an imitation attacker. However, some key issues remain. In particular, I agree with R1 that the average number of queries per image is relatively high, even during training. In the rebuttal, the authors made the assumption that “suppose their method could make an infinite number of queries for target models”, which is unfortunately not realistic. Another point that I found confusing: at testing, I don’t see how you can use the imitation model D to generate adversarial samples (D is a discriminative model, not a generator); it should be G, right? +",ICLR2020, +SyxywkpBG,1517250000000.0,1517260000000.0,905,rJma2bZCW,rJma2bZCW,ICLR 2018 Conference Acceptance Decision,Reject,"Dear authors, + +The reviewers agreed that the theoretical part lacked novelty and that the paper should focus on its experimental part which at the moment is not strong enough to warrant publication. + +Regarding the theoretical part, here are the main concerns: +- Even though it is used in previous works, the continuous time approximation of stochastic gradient overlooks its practical behaviour, especially since a good rule of thumb is to use as large as stepsize as possible (without reaching divergence), as for instance mentioned in The Marginal Value of Adaptive Gradient Methods in Machine Learning by Wilson et al. +- The isotropic approximation is very strong and I don't know settings where this would hold. Since it seems central to your statements, I wonder what can be deduced from the obtained results. +- I do not think the Gaussian assumption is unreasonable and I am fine with it. Though there are clearly cases where this will not be true, it will probably be OK most of the time. + +I encourage the authors to focus on the experimental part in a resubmission.",ICLR2018, +Mc4YmrEgS0,1576800000000.0,1576800000000.0,1,Sygn20VtwH,Sygn20VtwH,Paper Decision,Reject,"This paper proposes a recurrent architecture based on a recursive gating mechanism. The reviewers leaned towards rejection on the basis of questions regarding novelty, analysis, and the experimental setting. Surprisingly, the authors chose not to engage in discussion, as all reviewers seems pretty open to having their minds changed. If none of the reviewers will champion the paper, and the authors cannot be bothered to champion their own work, I see no reason to recommend acceptance.",ICLR2020, +wAiFOn5o-hS,1642700000000.0,1642700000000.0,1,YigKlMJwjye,YigKlMJwjye,Paper Decision,Accept (Poster),"This paper offers an alternative formulation of demographic parity, named GDP, which makes it amenable to easier computation when the sensitive attribute is continuous. Analytically, the paper relates GDP to other notions, offers ways to estimate GDP from data, and establishes the convergence of these estimators. Experimentally, the paper adds the estimated GDP as a learning regularizer and establishes the accuracy-fairness tradeoff that results by using this method versus others. + +The need to handle continuous sensitive attributes is well-motivated since they are ubiquitous. The direction of the paper is thus very pertinent. The experimental exploration of the paper is also strong, though reviewers initially raised questions of clarity of the relationship of GDP with adversarial debiasing. These are mostly addressed by the authors. One weakness of the paper that largely remains is whether the paper offers new conceptual insights. Indeed, demographic parity is simply a notion of independence between an algorithm’s output and sensitive attributes. Other independence metrics are dismissed in the paper as unreliable to compute. However, one reviewer correctly raises the concern that *under similar regularity conditions* to the ones establishing the convergence of the kernel GDP estimator, it is also possible to establish convergence of other independence metrics, that would equally capture demographic parity. Another reviewer also points out that such convergence would follow using standard non-parametric statistics techniques. Smoothed estimators of mutual-information are indeed available in the literature, with convergence guarantees even in the high-dimensional regime. The authors do not satisfactorily address this, casting doubt on the overall significance of the contribution. + +That said, given the strong motivation behind the paper and the overall promise of the methodology, it may be worth sharing with the community. The authors are urged to address the above. Additionally, they are urged to be transparent about what the theory offers and what it doesn’t. For instance, the convergence results of GDP only tell us that we can use these estimators to audit the fairness of existing models. In other words, although the paper is touted as showing that GDP can be successfully used for learning, the evidence there is purely empirical: there is no learning guarantee simultaneously on the accuracy and fairness of GDP-penalized risk minimization.",ICLR2022, +YM7d5VIXRo,1576800000000.0,1576800000000.0,1,H1leCRNYvS,H1leCRNYvS,Paper Decision,Reject,"This paper introduces a probabilistic generative model which mixes a variational autoencoder (VAE) with an energy based model (EBM). As mentioned by all reviewers (i) the motivation of the model is not well justified (ii) experimental results are not convincing enough. In addition (iii) handling sets is not specific to the proposed approach, and thus claims regarding sets should be revised. +",ICLR2020, +SJjdXy6rf,1517250000000.0,1517260000000.0,175,HJ94fqApW,HJ94fqApW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The paper received scores either side of the borderline: 6 (R1), 5 (R2), 7 (R3). R1 and R3 felt the idea to be interesting, simple and effective. R2 raised a number of concerns which the rebuttal addressed satisfactorily. Therefore the AC feels the paper can be accepted.",ICLR2018, +GOMRRo1WHiVz,1642700000000.0,1642700000000.0,1,oOuPVoT1kA5,oOuPVoT1kA5,Paper Decision,Reject,"The reviewers agree that the problem tackled is important but raise several substantial issues that justify not to accept the paper in its current form. I would encourage the authors to clarify further the crypto part of the paper (dHiP, 2., 4.) and work on how to relax or improve the model assumptions (NGrb). Also, the author's reply to chCc, point 2. becomes more disputable as federated learning is further developed. The argument can be refined. + +On a personal note, the statement of Theorem 3.3 could be made clearer, in particular in simplifying (while weakening a bit) the probability bound. + +AC.",ICLR2022, +v8m2HGCwxhRf,1642700000000.0,1642700000000.0,1,6Q52pZ-Th7N,6Q52pZ-Th7N,Paper Decision,Accept (Poster),"This paper proposes a pseudo-labeled data selection method for semi-supervised pose estimation. The investigated task in this paper is practical and useful. The framework is well designed and reasonable, and extensive ablation studies are conducted to test the efficacy of the method. After discussion, all the reviewers recommend accept of this paper.",ICLR2022, +0pqkdyrXlq,1576800000000.0,1576800000000.0,1,rJxFpp4Fvr,rJxFpp4Fvr,Paper Decision,Reject,"The authors propose a notion of feature robustness, provide a straightforward decomposition of risk in terms of this robustness measure, and then provide some empirical evidence for their perspective. Across the board, the reviewers raised issues with missing related work, which the authors then addressed. I will point out that some things the authors say about PAC-Bayes are false. E.g., in the rebuttal the authors say that PAC-Bayes is limited to 0-1 error. It is generally trivial to obtain bounds for bounded loss. For unbounded loss functions, there are bounds based on, e.g., sub gaussian assumptions. + +Despite improvements in connections with related work, reviewers continued to find the theoretical contributions to be marginal. Even the empirical contributions were found to be marginal.",ICLR2020, +frFK50InPSD,1610040000000.0,1610470000000.0,1,LMslR3CTzE_,LMslR3CTzE_,Final Decision,Reject,"This paper proposes an interesting approach for learning to decide whether a query graph is isomorphic to a subgraph within the target graph. The approach has a number of interesting aspects from the machine learning perspective, e.g. the anchored graphs and the order embeddings. Empirical results show promise in ablation studies and against a few baselines. + +However this paper also has a number of issues as pointed out by the reviewers, placing it right on the borderline. Most notably the clarity of the presentation could be improved as it seems to confuse a few reviewers at various points. Another thing that I’d like to highlight is that the way to convert the pairwise scores f(z_q, z_u) into the final decision about G_T and G_Q seems worthy of a longer discussion. Is a simple average across all pairs the best we can do? I imagine if the query graph is small but the target graph is large then even if the G_Q does match a subgraph of G_T the average score can be quite low. + +Overall I do like the ideas proposed in this paper, but also recognize that the paper can benefit from more improvement, so I’d like to recommend rejection but encourage the authors to submit again in the next round.",ICLR2021, +gcLWokhC_fy,1642700000000.0,1642700000000.0,1,bl9zYxOVwa,bl9zYxOVwa,Paper Decision,Reject,"The paper argues that adversarial training increases inter-class similarities, therby increasingly the misclassification of some classes and lowering accuracy parity across classes. It proposes to combine existing adversarial training methodologies, PGD-AT and TRADES, with a maximum entropy term to improve the classification fairness while remaining robust. + +While they agree that the problem is timely and important, the reviewers identify the following issues that place the current iteration of the paper below the bar of acceptance: the comparison to other works on fair robust training and accuracy parity is incomplete; experimental evaluation is conducted only on CIFAR10, making the generalizability of the paper's claims about performance unclear; and the proposed methodology has low technical novelty.",ICLR2022, +H1e6IzcbxV,1544820000000.0,1545350000000.0,1,HyEtjoCqFX,HyEtjoCqFX,Interesting contribution that improves on the widely used entropy regularized algorithms,Accept (Poster),"The paper proposes a new RL algorithm (MIRL) in the control-as-inference framework that learns a state-independent action prior. A connection is provided to mutual information regularization. Compared to entropic regularization, this approach is expected to work better when actions have significantly different importance. The algorithm is shown to beat baselines in 11 out of 19 Atari games. + +The paper is well written. The derivation is novel, and the resulting algorithm is interesting and has good empirical results. A few concerns were raised in initial reviews, including certain questions about experiments and potential negative impacts of the use of nonuniform action priors in MIRL. The author responses and the new version were quite helpful, and all reviewers agree the paper is an interesting contribution. + +In a revised version, the authors are encouraged to + (1) include a discussion of when MIRL might fail, and + (2) improve the related work section to compare the proposed method to other entropy regularized RL (sometimes under a different name in the literature), for example the following recent works and the references therein: + https://arxiv.org/abs/1705.07798 + http://proceedings.mlr.press/v70/asadi17a.html + http://papers.nips.cc/paper/6870-bridging-the-gap-between-value-and-policy-based-reinforcement-learning + http://proceedings.mlr.press/v80/dai18c.html",ICLR2019,4: The area chair is confident but not absolutely certain +gxqc5iNpa6,1576800000000.0,1576800000000.0,1,SJx37TEtDH,SJx37TEtDH,Paper Decision,Reject,"This paper tries to explain why Adam is better than sgd for training attention model. In specific, it first provides some empirical and theoretical evidence that a heavy-tailed distribution of the noise in stochastic gradients is the cause of SGD's worse performance. Then the authors studied a clipped variant of SGD that circumvents this issue, and revisited Adam through the lens of clipping. Overall, this paper conveys some interesting ideas. On the other hand, the theorems proved in this paper do not provide additional insight besides the intuition and the experiments are weak (hyperparameters are not carefully tuned). So even after author response, it still does not gather sufficient support from the reviewers. This is a borderline paper, and due to a rather limited number of papers the conference can accept, I encourage the authors to improve this paper and resubmit it to future conference. +",ICLR2020, +jSwukAaPzq,1576800000000.0,1576800000000.0,1,SklcyJBtvB,SklcyJBtvB,Paper Decision,Reject,"This paper tackles the problem of learning off-policy in the contextual bandit problem, more specifically when the available data is deficient (in the sense that it does not allow to build reasonable counterfactual estimators). To address this, the authors introduce three strategies: 1) restricting the action space; 2) imputing missing rewards when lacking data; 3) restricting the policy space to policies with ""enough"" data. All three approaches are analyzed (statistical and computational properties) and evaluated empirically. Restricting the policy space appears to be particularly effective in practice. + +Although the problem being solved is very relevant, it is not clear how this work is positioned with respect to approaches solving similar problems in RL. For example, Batch constrained Q-learning ([1]) restricts action space, while Bootstrapping Error Accumulation ([2]) and SPIBB ([3]) restrict the policy class in batch RL. A comparison with these techniques in the contextual bandit settings, in addition to recent state-of-the-art off-policy bandit approaches (Liu et al. (2019), Xie et al. (2019)) is lacking. Moreover, given the newly added results (DR method by Tang et al. (2019)), it is not clear how the proposed approach improves over existing techniques. This should be clarified. I therefore recommend to reject this paper. +",ICLR2020, +yPuDDh114w,1576800000000.0,1576800000000.0,1,Skxuk1rFwB,Skxuk1rFwB,Paper Decision,Accept (Poster),"This paper presents a method that hybridizes the strategies of linear programming and interval bound propagation to improve adversarial robustness. While some reviewers have concerns about the novelty of the underlying ideas presented, the method is an improvement to the SOTA in certifiable robustness, and has become a benchmark method within this class of defenses.",ICLR2020, +3osi_BvwGbA,1642700000000.0,1642700000000.0,1,edONMAnhLu-,edONMAnhLu-,Paper Decision,Accept (Poster),"The paper proposes an interesting and well-motivated improvement of Sharpness Aware Minimization. Overall the AC and reviewers are satisfied by the author feedback in improving the solidity and rigor of the theoretical results. + +The points made by the authors in response to the reviewers initial concerns are essential, especially those regarding interpretation of Corollary 5.2.1, making the proofs rigorous, and fixing the potential for crude convergence bounds. It is therefore critical that the authors incorporate them into their manuscript.",ICLR2022, +SylGxNNegN,1544730000000.0,1545350000000.0,1,ByeMB3Act7,ByeMB3Act7,Good paper,Accept (Poster),"This paper introduces an approach for improving the scalability of neural network models with large output spaces, where naive soft-max inference scales linearly with the vocabulary size. The proposed approach is based on a clustering step combined with per-cluster, smaller soft-maxes. It retains differentiability with the Gumbel softmax trick. The experimental results are impressive. There are some minor flaws, however there's consensus among the reviewers the paper should be published. +",ICLR2019,4: The area chair is confident but not absolutely certain +r1edbrPEeV,1545000000000.0,1545350000000.0,1,Hke-JhA9Y7,Hke-JhA9Y7,Well written paper on learning concise representations for regression with strong empirical evaluation,Accept (Poster),"The reviewers all feel that the paper should be accepted to the conference. The main strengths that they noted were the quality of writing, the wide applicability of the proposed method and the strength of the empirical evaluation. It's nice to see experiments across a large number of problems (100), with corresponding code, where baselines were hyperparameter tuned as well. This helps to give some assurance that the method will generalize to new problems and datasets. Some weaknesses noted by the reviewers were computational cost (the method is significantly slower than the baselines) and they weren't entirely convinced that having more concise representations would directly lead to the claimed interpretability of the approach. Nevertheless, they found it would make for a solid contribution to the conference.",ICLR2019,4: The area chair is confident but not absolutely certain +rEmwTBSm2J,1576800000000.0,1576800000000.0,1,rkgg6xBYDH,rkgg6xBYDH,Paper Decision,Accept (Poster),"This paper presents a generalization bound for RNNs based on matrix-1 norm and Fisher-Rao norm. As the initial bound relies on non-signularity of input covariance, which may not always hold in practice, the authors present additional analysis by noise injection to ensure covariance is positive definite. Through the resulted bound, the paper discusses how weight decay and gradient clipping in the training can help generalization. There were some concerns raised by reviewers, including rigorous report of the experiment results, claims on generalization in IMDB experiment, claims of no explicit dependence on the size of networks, and the relationship of small eigenvalues in input covariance to high frequency features. The authors responded to these and also revised their draft to address most of these concerns (in particular, authors added a new section in the appendix that includes additional experimental results). Reviewers were mainly satisfied with the responses and the revision, and they all recommend accept. +",ICLR2020, +3gT5RbYGKdN,1610040000000.0,1610470000000.0,1,mQPBmvyAuk,mQPBmvyAuk,Final Decision,Accept (Poster),I agree with the reviewers' positive comments about the paper. The BREEDS approach to generating benchmarks seems to be a useful one and addresses an important problem in the space. This approach could be the start of a nice direction of inquiry that will give us new insights into subpopulation shift. And most of the reviewers' negative concerns were addressed by the revision.,ICLR2021, +Kb166EVlfaq,1642700000000.0,1642700000000.0,1,RdJVFCHjUMI,RdJVFCHjUMI,Paper Decision,Accept (Poster),"This submission introduces a theoretical model to explain how ""in-context learning"" (i.e. the ability to output a correct prediction based on inputs for a task that the model was not explicitly trained on) is possible. The model uses a mixture of HMMs and shows that in-context learning is a natural consequence of Bayesian inference under that model. Overall, reviewers agreed that the contribution was useful and timely, and were somewhat convinced by the theoretical arguments. However, there was some broad concern with the framing of the paper. Namely, +1) The paper claims that prompted data is OOD w.r.t. the pre-training distribution. In fact, this is almost certainly not the case for many tasks and datasets. Indeed, it is highly plausible that data very similar to the example given by the paper (identifying the nationality of different celebrities) appears in the pre-training dataset of large LMs. Other examples include the popular ""tldr;"" task format for summarization which is incredibly common on the internet, etc. +2) The paper does not sufficiently distinguish between insights gained in the toy setting considered by the theoretical model and insights that can be applied to large LMs. Most reviewers were concerned that there might not be any reason to think that the insights gained from the theoretical model would apply to large LMs. The paper, however, very much frames itself as developing insight into the behavior of large LMs. + +I will recommend acceptance of this paper, but will stipulate that the above two issues should be fixed in the camera-ready version. Namely, I would suggest that the authors do not refer to prompted forms of tasks/datasets as ""OOD"", and I would suggest that any claims about different insights are not applied to large LMs.",ICLR2022, +HdwvzSiGK7,1576800000000.0,1576800000000.0,1,HJggj3VKPH,HJggj3VKPH,Paper Decision,Reject,"The goal of this paper is to study the dynamics of convergence of neural network training when weight normalization is used. This is an important and interesting area. The authors focus on analyzing such effect based on a recent theoretical trend which studies neural network dynamics based on the so called neural tangent kernel (NTK). The authors show an interesting phenomena of length-direction decoupling. The reviewers raise various points some of which have been addressed by the authors in their response. Two main points not yet clearly addressed is (1) what is the novelty of the theoretical framework given existing literature and (2) what are the benefits of weight normalization based on this theory (e.g. generalization etc. ). The authors suggest improved convergence rate and overparameterization dependence (i.e. that with weight normalization the required width is decreased) as a possible advantage. However, as pointed out by reviewer 3 there are existing results which already obtain better results without weight normalization (the authors' response that this is only true in randomized scenarios is actually not accurate). Based on above I do not think the paper is ready for publication. That said I think this is a nice direction and well-written paper. I recommend the authors revise and resubmit to a future venue. Some suggestions for improvements in case this is helpful (1) improve literature review and discussion of existing results (2) identify clear benefits to weight normalization. I doubt that improving overparameterization in existing form is one of them unless you provide a lower-bound (I suspect one can eventually obtain even linear overparameterization i.e. number of parameters proportional to number of training data even in the NTK regime without weight normalization. The suggestion by the reviewer at looking at generalization might be a good direction to pursue. + + ",ICLR2020, +O_CQwQfp9gF,1642700000000.0,1642700000000.0,1,CzceR82CYc,CzceR82CYc,Paper Decision,Accept (Spotlight),"The paper develops a diffusion-process based generative model that perturbs the data using a critically damped Langevin diffusion. The diffusion is set up through an auxiliary velocity term like in Hamiltonian dynamics. The idea is that picking a process that diffuses faster will lead to better results.The paper then constructs a new score matching objective adapted to this diffusion, along with a sampling scheme for critically damped Langevin score based generative models. The idea of a faster diffusion to make generative models is a good one. The paper is a solid accept. + +Reviewer tK3A was lukewarm as evidenced by their original 2 for empirical novelty that moved to a 3. From my look, it felt like a straightforward application of ideas in one domain, sampling, to another, generative modeling. It's a good paper, but it does not stand out relative to other accepts.",ICLR2022, +SkfD4kpHf,1517250000000.0,1517260000000.0,367,Sy3XxCx0Z,Sy3XxCx0Z,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"the reviewers seem to agree that this submission could be much more strengthened if more investigation is done in two directions: (1) the effect of different, available resources (e.g., in the comment, the authors mentioned WikiData didn't improve, and this raises a question of what kind of properties of external resources are necessary to help) and (2) alternatives to incorporating external knowledge (e.g., as pointed out by one of the reviewers, this is certainly not the only way to do so, and external knowledge has been used by other approaches for RTE earlier. how does this specific way fare against those or other alternatives?) addressing these two points more carefully and thoroughly would make this paper much more appreciated.",ICLR2018, +4VpbizTRe3,1576800000000.0,1576800000000.0,1,Hkg5lAEtvS,Hkg5lAEtvS,Paper Decision,Reject,"The reviewers all agree that this is an interesting paper with good results. The authors' rebuttal response was very helpful. However, given the competitiveness of the submissions this year, the submission did not make it. We encourage the authors to resubmit the work including the new results obtained during the rebuttal.",ICLR2020, +8xDSujmHdx3,1610040000000.0,1610470000000.0,1,RdhjoXl-SDG,RdhjoXl-SDG,Final Decision,Reject,"The paper considers sample generation in high-dimensional bayesian inference and proposes a multi scale procedure that performs coarse-to-fine multi-stage training and enables interpretability of intermediate activations at coarse scales. The method is simple, elegant and addresses a very important bottleneck of high-dimensional bayesian inference. The clarity of the paper has been greatly improved based on the reviewers suggestions. + +However some concerns remain regarding the evaluation that are needed to clearly demonstrate the value of the approach. In particular, it would be important to assess the impact of the number of levels, and how quickly the dimensions grow from one level to the next. The number of forward simulations does not provide a sufficient picture of the computational cost of the approaches. It would be also important to provide wall-clock time. Figure 2a should also provide the best of 3 independent experiments, or better, more experiments should be run, and curves with shaded areas should be provided so one could visualize variability w.r.t runs. ",ICLR2021, +SJgDE630y4,1544630000000.0,1545350000000.0,1,H1eiZnAqKm,H1eiZnAqKm,scaling issue,Reject,"The paper analyses GRUs using dynamic systems theory. The paper is well-written and the theory seems to be solid. + +But there is agreement amongst the reviewers that the application of the method might not scale well beyond rather simple 1- or 2-D GRUs (i.e., with one or two GRUs). This limitation, which is an increasingly serious problem in machine-learning papers, should be solved before the paper should be published. A very recent extension of the simulations to 16 GRUs improves this, but a rigorous analysis of higher-dimensional systems is pending and poses a considerable block for acceptance.",ICLR2019,4: The area chair is confident but not absolutely certain +ryQZDk6SM,1517250000000.0,1517260000000.0,935,ry5wc1bCW,ry5wc1bCW,ICLR 2018 Conference Acceptance Decision,Reject,,ICLR2018, +iGX8LHiF_,1576800000000.0,1576800000000.0,1,S1l6ITVKPS,S1l6ITVKPS,Paper Decision,Reject,"This paper proposes a model that can learn predicates (symbolic relations) from pixels and can be trained end to end. They show that the relations learned generate a representation that generalizes well, and provide some interpretation of the model. + +Though it is reasonable to develop a model with synthetic data, the reviewers did wonder if the findings would generalize to new data from real situations. The authors argue that a new model should be understood (using synthetic data) before it can reasonably be applied to natural data. I hope the reviews have shown the authors which areas of the paper need further explanation, and that the use of a synthetic dataset needs to strong justification, or perhaps show some evidence that the method will probably work on real data (e.g. how it could be extended to natural images).",ICLR2020, +96gEFCpR1xw,1610040000000.0,1610470000000.0,1,JHx9ZDCQEA,JHx9ZDCQEA,Final Decision,Reject,"This paper proposes a novel problem of polymer retrosynthesis, and a method to solve it. The authors formally define the polymer retrosynthesis optimization problem as a constrained problem to identify the monomers and the unit polymer, with the recursive and stability constraints. Further, since the main challenge with polymer retrosynthesis is the extremely scarce training data, the authors propose a domain adaptation technique that can utilize a single-step retrosynthesis model trained on a large amount of data. The authors also use Retro* [Chen et al. 20] for synthesizability check of the monomers. The proposed method, PolyRetro, is validated against few naive baselines for top-k recovery performance, and is shown to outperform them. + +All reviewers found the problem of polymer retrosynthesis tackled to be important as well as novel, and the paper to be very well-written. However, all reviewers had a common concern on the limited technical novelty and meaningless baselines that makes it difficult to evaluate the significance of the results. Some of the reviewers were also concerned with the insignificant performance gain with the proposed domain adaptation technique (PolyRetro vs. PolyRetro-USPTO in Figure 4), and its limited applicability to a condensation polymerization. The authors provided new results with more baselines, which fine-tune the single-step retrosynthesis model (MLP, seq-to-seq) trained on USPTO. + +The below is the summary of pros and cons: + +**Pros** +- The tackled polymer retrosynthesis problem is novel and practically important. +- The proposed problem formulation and constraints are interesting and make sense. +- The paper is well-written and easy to follow even for non-domain experts. + +**Cons** +- The proposed solution with recursive and stability constraints is rather straightforward, as well as the use of Retro* for screening out the monomers. +- The domain adaptation technique, which is advertised as an important contribution to combat extreme data scarcity, is both straightforward and yields small performance gain. +- The baselines in the original version of the paper are simply meaningless strawmans, and the new baseline (seq2seq-retro) in Section D of the Appendix seems quite strong, making it difficult to validate the effectiveness of the proposed method. + +The paper received split reviews, with three leaning toward acceptance and one leaning toward rejection. After the interactive discussion period with the authors, the reviewers had an in-depth discussion, where all reviewers agreed that the technical novelty or contribution to the general machine learning field, or general applicability to polymer synthesis is limited. The reviewers did not reach a consensus, which makes the paper a borderline case, and after the discussion with the program chairs, we decided to reject the paper due to the unresolved concerns. + +I believe that the proposed problem-specific solution is adequate, although it has little technical novelties, since this is an application paper. However what is more problematic is the inconclusive experimental validation results due to lack of meaningful baselines. I suggest the authors to compare against seq2seq retro + Retro* in order to properly validate the effectiveness of the proposed method. Also, results in Figure 7, or the polymerization examples in Section A of the appendix should be incorporated into the main paper. I also suggest that the authors drop domain adaptation from the title since it constitutes a small part of the method and thus is misleading. ",ICLR2021, +ByxT9_zfgE,1544850000000.0,1545350000000.0,1,HJlLKjR9FQ,HJlLKjR9FQ,An interesting contribution (that requires more polished exposition),Accept (Poster),"+ the ideas presented in the paper are quite intriguing and draw on a variety of different connections +- the presentation has a lot of room for improvement. In particular, the statement of Theorem 1, in its current form, requires rephrasing and making it more rigorous. + +Still, the general consensus is that, once these presentation shortcomings are address, this will be an interesting paper. +",ICLR2019,3: The area chair is somewhat confident +r1gLmvhWlV,1544830000000.0,1545350000000.0,1,HJePRoAct7,HJePRoAct7,difficult case,Reject,"The authors supplied an updated paper resolving the most important reviewer concerns after the deadline for revisions. In part, this was due to reviewers requesting new experiments that take substantial time to complete. + +After discussion with the reviewers, I believe that if the revised manuscript had arrived earlier, then it should be accepted. Without the new results I would recommend rejecting since I believe the original submission lacked important experiments to justify the approach (inductive setting experiments are very useful). + +The community has an interest in uniform application of the rules surrounding the revision process. It is not fair to other authors to consider revisions past the deadline and we do not want to encourage late revisions. Better to submit a finished piece of work initially and not assume it will be possible to use up a lot of reviewer time and fix during the review process. + +We also don't want to encourage shoddy, rushed experimental work. However, the way we typically handle requests from reviewers that require a lot of work to complete is by rejecting papers and encouraging them to be resubmitted sometime in the future, typically to another similar conference. + +Thus I am recommending rejecting this paper on policy grounds, not on the merits of the latest draft. I believe that we should base the decision on the state of the paper at the same deadline that applies to all other authors. + +However, I am asking the program chairs to review this case since ultimately they will be the final arbiters of policy questions like this.",ICLR2019,4: The area chair is confident but not absolutely certain +VSxAOLM6zB,1642700000000.0,1642700000000.0,1,bM45i3LQBdl,bM45i3LQBdl,Paper Decision,Reject,"The paper considers the natural class of algorithms, namely Aggregators with Gaussian noise for distributed SGD with differential privacy (DP) and Byzantine resilience (BR). Previous results shows VN->BR-> convergence of SGD. The authors first show that aggregators with Gaussian noise algorithms satisfy DP but violates VN necessarily, so approximate VN is proposed. Theorem 2 shows approximate VN->convergence. Proposition 2 shows the above algorithms satisfies approximate VN with certain parameters. With the combined bound Corollary 1, the authors observe (and then verify by experiments) that larger batch size is beneficial and in particular more beneficial than when DP or BR is enforced alone. In the formulation, an important baseline of robust mean aggregation [Diakonikolas,Kamath,Kane,Li,Moitra,,Stewart'2016] and even more relevant baseline of robust and DP mean aggregation[Liu,kong,Kakade,Oh,'21] are somehow missing. One would assume that directly applying these well-known techniques might give the desired DP and robust SGD. The field at the intersection of differential privacy and robustness has evolved quite a bit recently and tremendous technical innovations are happening. Given the relveance of the proposed problem to this line of work, one should make the connections precise and explain the differences.",ICLR2022, +OovmtzsYN3,1576800000000.0,1576800000000.0,1,r1eowANFvr,r1eowANFvr,Paper Decision,Accept (Poster),"This paper introduces T-NAS, a neural architecture search (NAS) method that can quickly adapt architectures to new datasets based on gradient-based meta-learning. It is a combination of the NAS method DARTS and the meta-learning method MAML. + +All reviewers had some questions and minor criticisms that the authors replied to, and in the private discussion of reviewers and AC all reviewers were happy with the authors' answers. There was unanimous agreement that this is a solid poster. + +Therefore, I recommend acceptance as a poster.",ICLR2020, +rJSywkarz,1517250000000.0,1517260000000.0,909,Sk0pHeZAW,Sk0pHeZAW,ICLR 2018 Conference Acceptance Decision,Reject,"Dear authors, + +I agree with the reviewers that the paper tries to do several things at once and the results are not that convincing. Overall, this work is mostly incremental, which is fine if there is no issue in the execution. Thus, I regret to inform you that this paper will not be accepted to ICLR.",ICLR2018, +S5ef-z71dxm,1642700000000.0,1642700000000.0,1,gJLEXy3ySpu,gJLEXy3ySpu,Paper Decision,Accept (Poster),"Thank you for your submission to ICLR. The reviewers ultimately have mixed opinions on this paper, but reading in a bit more depth I don't feel that the critical comments raised by the sole negative reviewer really raise valid points. Specifically, the fact that this reviewer directly asks e.g. for comparisons to Levine and Feiz 2019, when the paper (before its revisions) contains an entire section devoted to exactly this comparison, strikes me as not sufficient for a thorough review. + +However, while I'm thus going to recommend the paper for acceptance (it does present a notable, if somewhat minor, advance upon the state of the art in randomized smoothing), I also feel the paper is generally rather borderline for more straightforward reasons. Specifically, given the _very_ narrow focus of the proposed improvements (improvements to the bounds of randomized smoothing, for L0 perturbations, for Top-k accuracy), I ultimately don't think the paper presents that significant an advance in the field. The paper could go other way, thought definitely not doing so due to the issues that the sole critical reviewer takes.",ICLR2022, +Been5IDwUH0,1642700000000.0,1642700000000.0,1,ijygjHyhcFp,ijygjHyhcFp,Paper Decision,Reject,"This paper on 'anarchic' federated learning (FL) envisions an FL framework where edge clients can act independently instead of their participation being controlled by a central server. The idea is certainly promising, however, the reviewers pointed out the following main issues: +1) Technical gaps in the theoretical analysis need to be addressed +2) Bounded delay assumptions are too strong and are mismatching with the 'anarchic' goal of the framework +3) The linear speed-up claim should be better explained and justified. +The paper generated lots of post-rebuttal discussions. However, the concerns about the theoretical analysis still remain, and therefore I recommend rejection. I hope the authors will take these constructive comments into account when revising the paper.",ICLR2022, +LzsNL3DajiN,1642700000000.0,1642700000000.0,1,HHUSDJb_4KJ,HHUSDJb_4KJ,Paper Decision,Reject,"Thanks for your submission to ICLR. + +Reviews were fairly mixed on this paper, with two reviewers advocating for accepting the paper and two advocating for rejecting the paper. There were some concerns raised by the reviewers, most notably novelty and some issues with the experiments. After rebuttal, the negative reviewers maintained their scores and the positive reviewers were somewhat less enthusiastic. In the end, the paper is quite borderline and could really go either way, but it seems that the paper could use another round of reviewing, particularly to make sure the issues raised by the reviewers are adequately addressed. + +Please do keep in mind their comments when preparing a future version of the manuscript.",ICLR2022, +HkeQEJarM,1517250000000.0,1517260000000.0,314,Bk9zbyZCZ,Bk9zbyZCZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Biological memory systems are grounded in spatial representation and spatial memory, so neural methods for spatial memory are highly interesting. The proposed method is novel, well-designed and the empirical results are good on unseen environments, although the noise model may be too weak. Moreover, it would have been great to evaluate this method on real data rather than in simulation. ",ICLR2018, +pefW9Ylc6t,1642700000000.0,1642700000000.0,1,LZVXOnSrD0Y,LZVXOnSrD0Y,Paper Decision,Reject,"The paper introduces a method to approximate a Pareto front for multi-objective TSP. The proposed method first converts the MOTSP into a set of constrained single-objective optimization problems with different preference-based constraints. Then it builds a modified TSP-Net with preference augmentation to solve all the constrained problems. The method is empirically compared with multi-objective genetic algorithms and a DRL-based approach, showing to be competitive in the approximation of the Pareto front. + +After reading the authors' feedback and discussing their concerns, the reviewers reached a consensus and they think that the paper is still not ready for publication. The authors need to improve their experimental evaluation in order to make it more robust and fair.",ICLR2022, +rJgU3zOvyV,1544160000000.0,1545350000000.0,1,Sygx4305KQ,Sygx4305KQ,"a sensible proposal, but little evidence of optimization benefit",Reject,"The proposal is a scheme for using implicit matrix-vector products to exploit curvature information for neural net optimization, roughly based on the adaptive learning rate and momentum tricks from Martens and Grosse (2015). The paper is well-written, and the proposed method seems like a reasonable thing to try. + +I don't see any critical flaws in the methods. While there was a long discussion between R1 and the authors on many detailed points, most of the points R1 raises seem very minor, and authors' response to the conceptual points seems satisfactory. + +In terms of novelty, the method is mostly a remixing of ideas that have already appeared in the neural net optimization literature. There is sufficient novelty to justify acceptance if there were strong experimental results, but in my opinion not enough for the conceptual contributions to stand on their own. + +There is not much evidence of a real optimization improvement. The per-epoch improvement over SGD is fairly small, and (as the reviewers point out) probably outweighed by the factor-of-2 computational overhead, so it's likely there is no wall-clock improvement. Other details of the experimental setup seem concerning; e.g., if I understand right, the SGD training curve flatlines because the SGD parameters were tuned for validation accuracy rather than training accuracy (as is reported). The only comparison to another second-order method is to K-FAC on an MNIST MLP, even though K-FAC and other methods have been applied to much larger-scale models. + +I think there's a promising idea here which could make a strong paper if the theory or experiments were further developed. But I can't recommend acceptance in its current form. +",ICLR2019,5: The area chair is absolutely certain +SypA716rG,1517250000000.0,1517260000000.0,256,Hksj2WWAW,Hksj2WWAW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Learn to complete an equation by filling the blank with a missing function or numeral, and also to evaluate an expression. Along the way learn to determine if an identity holds (e.g. sin^2(x) + cos^2(x) = 1). They use a TreeNN with a separate node for each expression in the grammar. + +PROS: +1. They've put together a new dataset of equational expressions for learning to complete an equation by filling in the blank of a missing function (or value) and function evaluation. They've done this in a nice way with a generator and will release it. + +2. They've got two interesting ideas here and they seem to work. First, they train the network to jointly learn to manipulate symbols and to evaluate them. This helps ground the symbolic manipulations in the validity of their evaluations. They do this by using a common tree net for both processes with both a symbol node and a number node. They train on identities (sin^2(x) + cos^2(x) = 1) and also on ground expressions (+(1,2) = 3). The second idea is to help the system learn the interpretation map for the numerals like the ""2"" in ""cos^2(x) with the actual number 2. They do this by including equations which relate decimals with their base 10 expansion. For example 2.5 = 2*10^0 + 5*10^-1. The ""2.5"" is (I think) treated as a number and handled by the number node in the network. The RHS leaves are treated as symbols and handled by the symbol node of the network. This lets them learn to represent decimals using just the 10 digits in their grammar and ties the interpretation of the symbols to what is required for a correct evaluation (in terms of their model this means ""aligning"" the node for symbol with the node for number). + +3. Results are good over what seem to us reasonable baselines + +CONS: + +1. The architecture isn't new and the idea of representing expression trees in a hierarchical network isn't new either. + +2. The writing, to me, is a bit unclear in places and I think they still have some work to do follow the reviewers' advice in this area. + +I really wrestled with this one, and I appreciate the arguments that say it's not novel enough but I feel that there is something interesting in here and if the authors do a clean-up before final submission it will be ok.",ICLR2018, +5cg7w9u0aT,1610040000000.0,1610470000000.0,1,lH2ukHnGDdq,lH2ukHnGDdq,Final Decision,Reject,"Reviewers could not reach consensus here and legitimate concerns are raised on novelty and on empirical results, although this can be attributed to the important computation times required to run experiments on 3D MRI volumes. The authors have provided a comprehensive response to the reviews, the general feedback is that the work has merit but it fails to convince on its real contribution to the state-of-the-art. At this stage, I fear this work cannot be recommended for acceptance.",ICLR2021, +KoP_b6WghXQ,1610040000000.0,1610470000000.0,1,0z1HScLBEpb,0z1HScLBEpb,Final Decision,Reject,"This paper adapts the ideas around universal successor features for decentralised multi-agent environments, with a particular emphasis on deriving better exploration from them. Like most of the reviewers, I think this is indeed a promising research direction. Given the complexity of the endeavour however, it may take a few more steps until the empirical evidence can back up the authors' ambition: the reviewers' consensus on the current version of the paper is that it is not ready for publication yet.",ICLR2021, +5fqmhdAJhNC,1610040000000.0,1610470000000.0,1,wZ4yWvQ_g2y,wZ4yWvQ_g2y,Final Decision,Reject,"Compressing BERT is a practically important research direction. Our main concern on this submission is on its practical value. Comparing with MobileBERT in the literature, NAS-BERT does not show advantages on any aspect: latency, prediction performance, or model size (less important), while being much more costly to build because of NAS. MobileBERT just simply narrowed the original BERT models (8x narrower than BERT large). So it is hard to convince the readers that adaptive-size or NAS is interesting or matters. On the research side, this paper have some interesting points on designing the search space, but overall the novelty of this paper is limited, as all of the reviewers pointed out. It is also worth noticing that the claim of ""task agonistic"" in this paper does not fully hold: in the downstream tasks, the soft labels of the teacher model are required to train the compressed model. To be fully ""task agonistic"", the results on downstream tasks should be solely based on training with the ground truth labels, as in the MobileBERT paper. Once following the exact task agnostic experimental protocol, the reported performance in this paper may be significantly lower. ",ICLR2021, +mk7OngGDtlLp,1642700000000.0,1642700000000.0,1,YpSxqy_RE84,YpSxqy_RE84,Paper Decision,Accept (Poster),"The paper considers a learning problem to determine the best low-precision configuration within the memory budget. It is an interesting problem that could be of interest to the community. Overall, the reviewers were fairly positive on the paper and believe the paper give interesting insights into how to use limited memory for learning.",ICLR2022, +HyxCKpI4lN,1545000000000.0,1545350000000.0,1,r1lWUoA9FQ,r1lWUoA9FQ,Interesting contribution to our understanding of adversarial examples,Accept (Poster),"There's precious little work asking existential questions about adversarial examples, and so this work is most welcome. The work connects with deep results in probability to make simple and transparent claims about the inevitability of adversarial examples under some assumptions. The authors have addressed the key criticisms of the authors around clarity.",ICLR2019,5: The area chair is absolutely certain +-6scBsQzB6,1610040000000.0,1610470000000.0,1,lbHDMllIYI1,lbHDMllIYI1,Final Decision,Reject,"The idea of using multiple sparse matrices seems to be new, but the novelty of the idea alone isn't enough to convince the AC and reviewers (indeed, the idea might not be new, but has never been discussed in literature because of the drawbacks we discuss here). As the authors and reviewers/AC seem to agree, the actual benefits of sparse matrix multiplies are hard to realize, especially on embedded devices, so the contributions at this point are mainly hypothetical and only about the new idea. Each reviewer brought up issues (even the most positive reviewer) and mostly the reviewers were not persuaded by the rebuttal. In short, there wasn't evidence that this new idea could really contribute to the state-of-the-art. This is now a fairly crowded topic (e.g., all the papers brought up by R3 in just that one class of methods), and new papers should beat state-of-the-art and/or introduce new theory -- an example would be a paper from last year's ICLR, https://openreview.net/forum?id=HJfwJ2A5KX , ""Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds"" (Baykal, Liebenwein, Gilitschenski, Feldman, Ru) which not only gives an efficient technique (not based on sparsity of weights) but also gives types of generalization guarantees. + +As R1 said, the results are not state-of-the-art, and we have to believe the authors that ""an iterative-like extension of our method could reach even better results"". The rebuttal says that the paper's goal is to ""pioneer a new approach to neural network compression"". But if you can get better results with something better than Palm4MSA, then please do so, and demonstrate the evidence! Right now, the paper assumes we could implement sparse multiplication efficiently on embedded devices, and assumes we could get better results: both these are quite hypothetical. The AC encourages a resubmission of this paper after these results have been addressed.",ICLR2021, +SRWSPBC7gdW,1642700000000.0,1642700000000.0,1,NB0czpQ3-m,NB0czpQ3-m,Paper Decision,Reject,"This paper introduces a technique to measure the *expected* robustness of a +neural network by measuring the probability random input perturbations will +cause the model to make a mistake. + +The reviewers are not convinced by the results in this paper. The methods +are not carefully evaluated against prior work, and it is not exactly +clear what lesson one can draw from the resulting statistical evaluation. +The experimental setup is not clearly explained in several places, making the +paper difficult to fully follow. + +Since the authors do not respond to the reviewer concerns, there was no +opportunity to address these concerns.",ICLR2022, +rJgFR-6nJV,1544500000000.0,1545350000000.0,1,B1fbosCcYm,B1fbosCcYm,metareview,Reject,"The paper received mixed and divergent reviews. As a paper of unusual topic in ICLR, the presentation of this work would need improvement. For example, it is difficult to understand what's the overall objective function, why a specific design choice was made, etc. It's nice to see that the authors somehow did quite a bit of engineering to make their model work for classification and drawing tasks, but (as an ML person) it’s difficult to get a clear rationale on why the method works other than that it’s biologically motivated. In addition, the proposed model (at a functional level) looks quite similar to Mnih et al.'s ""Recurrent Models of Visual Attention"" work (for classification) and Gregor et al's DRAW model (for generation) in that all these models use sequential/recurrent attention/glimpse mechanisms, but no direct comparisons are made. For classification, the method achieves strong performance on MNIST but this may be due to a better architecture choice compared to Mnih's model but not due to the difference of the memory mechanism. For image generation/reconstruction, the proposed method seems to achieve quite good results but they are not as good as those from DRAW method. Overall, the paper is on the borderline, and while this work has some merits and might be of interest to some subset of audience in ICLR, there are many issues to be addressed in terms of presentation and comparisons. Please see reviews for other detailed comments.",ICLR2019,2: The area chair is not sure +jwzGXOMDtYj,1642700000000.0,1642700000000.0,1,OGbbY4qmir5,OGbbY4qmir5,Paper Decision,Reject,"There was some discussion on this paper, both with the authors and between reviewers. On the one hand, there is a general agreement that the empirical results suggesting that spectral clustering-based method can be competitive with SOTA methods on node classification benchmark is an interesting result. One the other hand, reviewers did not find a significantly novel contribution in the methodology proposed, and found that the empirical evaluation lacks depth and details to be really informative (eg, to understand why some methods work or not on some benchmarks). There is therefore a consensus that the paper is not ready for ICLR in its current form, but we hope that the reviews and discussion will help the authors prepare a revised version in the future.",ICLR2022, +r10wHJaSM,1517250000000.0,1517260000000.0,591,HJPSN3gRW,HJPSN3gRW,ICLR 2018 Conference Acceptance Decision,Reject,"This paper was reviewed by 3 expert reviews and received largely negative reviews, with concerns about the toy-ish nature of the 2D environments and limited novelty. + +Since ICLR18 received multiple papers on similar topics, we took additional measures to ensure that papers were similar papers were judged under the same criteria. Specifically, we asked reviewers of (a) this paper and (b) of a concurrent submission that also studies language grounding in 2D environments to provide opinions on (b) and (a) respectively. Unfortunately, while they may be on similar topic and both working on 2D environments, we received unanimous feedback that (b) was much higher quality (""comparison with multiple baselines, better literature review, no bold claims about visual attention, etc). We realize this may be disappointing but we encourage the authors to incorporate reviewer feedback to make their manuscript stronger. ",ICLR2018, +JVK3tDH4fwq,1642700000000.0,1642700000000.0,1,wfZGut6e09,wfZGut6e09,Paper Decision,Accept (Poster),"While this paper has divergent reviews, reviewer hSRE has by far the most detailed review, seems clearly the most informed on the subject, and is the least supportive. The main issues with the paper seems to be the degree of novelty and reviewer hSRE's feeling that the results on MojuCo are unclear and not explained in the paper. But this alone does not seem like an adequate reason for rejection and hSRE seems happy with the other aspects of the paper. Some of hSRE's original complaints do not concern me, such as the fact that first-order stationarity does not imply Pareto-optimality. I am recommending a poster.",ICLR2022, +PcngTsBwcx,1610040000000.0,1610470000000.0,1,Kr7CrZPPPo,Kr7CrZPPPo,Final Decision,Reject,"The paper tackles a major problem of supervised ML, that of the minimisation of the risk of a set of classifiers. This problem has received attention in numerous work over the past decades, much of which spans the formal aspects of the problem. The paper tackles the problem from a “diversity” standpoint. My main concern is, for such a problem and exhaustive formal and experimental SOTA, one cannot just evacuate any formal understanding of a contribution to future work (Authors’ reply to R2). The argument is then a victim of its own content, ending up in a sloppy vocabulary where “speculation” and “intuition” are called forward as justification to the calls for “rigorous” (R1) and “theoretical” understanding (R2 + answer to R2). I am confident the authors can find formal merit to their contribution, but this needs to be addressed. R1 + R4 hint on avenues to understand the contribution. +",ICLR2021, +RxmaaOxG29,1610040000000.0,1610470000000.0,1,PmUGXmOY1wK,PmUGXmOY1wK,Final Decision,Reject,"In this paper, the authors designed a disentanglement mechanism for global and local information of graphs and proposed a graph representation method based on it. I agree with the authors that 1) considering the global and local information of graphs jointly is reasonable and helpful (as shown in the experiments) and 2) disentanglement is different from independence. + +However, the concerns of the reviewers are reasonable --- Eq. (2) and the paragraph before it indeed show that the authors treat the global and the local information independently. Moreover, the disentanglement of the global information (the whole graph) and the local information (the patch/sub-graph) is not well-defined. In my opinion, for the MNIST digits, the angle and the thickness (or something else) of strokes can be disentangled (not independent) factors that have influences on different properties of the data. In this work, if my understanding is correct, the global and the local factors just provide different views to analyze the same graphs and the proposed method actually designs a new way to leverage multi-view information. It is not sure whether the views are disentangled and whether the improvements are from ""disentanglement"". + +If the authors can provide an example to explain their ""disentanglement"" simply as the MNIST case does, this work will be more convincing. Otherwise, this work suffers from the risk of overclaiming.",ICLR2021, +sDXZ8uI4EXw,1642700000000.0,1642700000000.0,1,WN2Sup7qLdw,WN2Sup7qLdw,Paper Decision,Reject,"### Description + +The paper enhances flow-based generative models by putting them into a coarse-to fine multi-resolution framework. The key technical challenge as I understand is designing up-scaling conditional flow modules. Since the operation needs to be invertible, the paper carefully designs what degrees of freedom need to be injected in addition to the low resolution image to compose a higher resolution one. + +### Decision +The paper received 5 expert and rather detailed reviews. I have read and understood the paper and all reviews. Reviewers remark that the paper is well written, addresses a challenging problem. However reviews were in a consensus on that the contribution of the paper is marginal. The average score was 4.4. The authors did not respond to reviewers and did not update the paper. There was no post-rebuttal discussion and or additional feedback from reviewers. Therefore, must reject. + +### Comments +I have only minor comments on the writing and organization of the paper. +There are many self-repetitions in the text, restating what was already said above in same or very similar sentences. Some questions studied in appendices are not presented in the main papar.",ICLR2022, +HkZML16rf,1517250000000.0,1517260000000.0,732,B1nxTzbRZ,B1nxTzbRZ,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewer scores are fairly close, and the comments in their reviews are likewise similar. All reviewers indicate that they find this to be an interesting learning domain. However, they also agree in assessing the proposed method as having limited novelty and significance. They also critiqued the empirical evaluation as being too specific to Starcraft and not comprehensive, without providing evidence that the defogger contributes to winning at StarCraft. The authors wrote a substantial rebuttal to the reviews, but it did not convince anyone to increase their scores.",ICLR2018, +j8KyM2z4YMH,1642700000000.0,1642700000000.0,1,dLDzuxaN0Hd,dLDzuxaN0Hd,Paper Decision,Reject,"The submission received split reviews: two reviewers recommended accepts, and the other two rejects. The AC went through the reviews, responses, and discussions carefully. The AC appreciates the authors' effort during the response period and agreed that the revision has addressed some of the concerns of the reviewers. However, a few key issues are not fully addressed. This includes results on additional, more complex object categories; discussion on why the performance of the proposed method is not even as comparable as BSP-Net (Table 4); and others. Further, while the authors have significantly refactored the paper to address the concern on presentation and clarity, the changes are too major for the reviewers to review during the response period (the reviewers are expected to check minor updates, but not review a new paper during the response period). + +Considering all pros and cons, the AC recommends rejection. The authors are encouraged to revise the paper for the next venue.",ICLR2022, +n18iTdrYgn,1576800000000.0,1576800000000.0,1,SJxZnR4YvB,SJxZnR4YvB,Paper Decision,Accept (Poster),"This paper tackles the problem of regret minimization in a multi-agent bandit problem, where distributed learning bandit algorithms collaborate in order to minimize their total regret. More specifically, the work focuses on efficient communication protocols and the regret corresponds to the communication cost. The goal is therefore to design protocols with little communication cost. The authors first establish lower bounds on the communication cost, and then introduce an algorithm with provable near-optimal regret. + +The only concern with the paper is that ICLR may not be the appropriate venue given that this work lacks representation learning contributions. However, all reviewers being otherwise positive about the quality and contributions of this work, I would recommend acceptance.",ICLR2020, +HJlgvyRVg4,1545030000000.0,1545350000000.0,1,rkxaNjA9Ym,rkxaNjA9Ym,Contribution to the understanding of reduced precision training for deep networks,Accept (Poster),"The paper investigates a detailed analysis of reduced precision training for a feedforward network, that accounts for both the forward and backward passes in detail. It is shown that precision can be greatly reduced throughout the network computations while largely preserving training quality. The analysis is thorough and carefully executed. + +The technical presentation, including the motivation for some of the specific choices should be made clearer. Also, the requirement that the network first be trained to convergence at full 32 bit precision is a significant limitation of the proposed approach (a weakness that is shared with other work in this area). It would be highly desirable to find ways to bypass or at least mitigate this requirement, which would provide a real breakthrough rather than merely a solid improvement over competing work. + +The reviewer disagreement revolves primarily around the clarity of the main technical exposition: there appears to be consensus that the paper is sound and provides a serious contribution to this area. + +Although the persistent reviewer disagreement left this paper rated at the borderline, I am recommending acceptance, with the understanding that the authors will not disregard the dissenting review and strive to further improve the clarity of the presentation.",ICLR2019,4: The area chair is confident but not absolutely certain +BJl4ZMael4,1544770000000.0,1545350000000.0,1,BygrtoC9Km,BygrtoC9Km,"Good performance, but issues with clarity and novelty",Reject,"The reviewers all appreciate the idea, and the competitive performance, however the consensus is that this is a simple extension of the work of Han et al. and therefore the current submission contains little novelty. There are also numerous issues regarding clarity that the reviewers have pointed out. It is unfortunate that the authors have not engaged in discussion with the reviewers to resolve these, however they are encouraged to consider the reviewer feedback in order to improve the paper.",ICLR2019,5: The area chair is absolutely certain +qNKCWuPa7v,1610040000000.0,1610470000000.0,1,s0Chrsstpv2,s0Chrsstpv2,Final Decision,Reject,"The overall impression on the paper is rather positive, however, even after rebuttal, it still seem that the paper requires further work and definitely a second review round before being ready for publication. Thus, I encourage the authors to continue with the work started during the rebuttal to address the reviewers' comment, which although moved in the right direction would still benefit from further work. Especially, I believe the experiments could be significantly improved (by for example bringing some results to the main paper). Moreover, a more thorough comparison theoretically and empirically with previous work would increase the impact of the paper. ",ICLR2021, +HyJwX16Sf,1517250000000.0,1517260000000.0,151,SyJS-OgR-,SyJS-OgR-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"this submission proposes a learning algorithm for resnets based on their interpreration of them as a discrete approximation to a continuous-time dynamical system. all the reviewers have found the submission to be clearly written, well motivated and have proposed an interesting and effective learning algorithm for resnets.",ICLR2018, +x57idlx9YKX,1610040000000.0,1610470000000.0,1,3UDSdyIcBDA,3UDSdyIcBDA,Final Decision,Accept (Spotlight),The paper shows convergence results for RMSprop in certain regimes. The reviews are uniformly positive about this paper and I recommend acceptance.,ICLR2021, +pDjlsNktRv,1576800000000.0,1576800000000.0,1,Skl8EkSFDr,Skl8EkSFDr,Paper Decision,Reject,"The paper develops a new method for pruning generators of GANs. It has received a mixed set of reviews. Basically, the reviewers agree that the problem is interesting and appreciate that the authors have tried some baseline approaches and verified/demonstrated that they do not work. + +Where the reviewers diverge is on whether the authors have been successful with the new method. In the opinion of the first reviewer, there is little value in achieving low levels (e.g. 50%) of fine-grained sparsity, while the authors have not managed to achieve good performance with filter-level sparsity (as evidenced by Figure 7, Table 3 as well as figures in the appendices). The authors admit that the sparsity levels achieved with their approach cannot be turned into speed improvement without future work. + +Furthermore, as pointed out by the first reviewer, the comparison with prior art, in particular with LIT method, which has been reported to successfully compress the same GAN, is missing and the results of LIT have been misrepresented. While the authors argue that their pruning is an ""orthogonal technique"", and can be applied on top of LIT, this is not verified in any way. In practice, combination of different compression techniques is known to be non-trivial, since they aim to explain the same types of redundancies. + +Overall, while this paper comes close, the problems highlighted by the first reviewer have not been resolved convincingly enough for acceptance.",ICLR2020, +suzfRykvw,1576800000000.0,1576800000000.0,1,HkgrZ0EYwB,HkgrZ0EYwB,Paper Decision,Accept (Poster),"This paper presents an unsupervised method for completing point clouds obtained from real 3D scans based on GAN. Generally, the paper is well-organized, and its contributions and experimental supports are clearly presented, from which all reviewers got positive impressions. +Although the technical contribution of the method seems marginal as it is essentially a combination of established methods, it well fits in a novel and practical application scenario, and its useful is convincingly demonstrated in intensive experiments. We conclude that the paper provides favorable insights covering the weakness in technical novelty, so I’d like to recommend acceptance. +",ICLR2020, +cGFNyxMY-mh,1642700000000.0,1642700000000.0,1,RRj7DcsPjT,RRj7DcsPjT,Paper Decision,Reject,"The reviewers think the proposed method is well motivated and interesting. However, the novelty needs to be improved. At the moment, the paper seems to be a minor improvement over existing works.",ICLR2022, +c5TE1cG7fs,1576800000000.0,1576800000000.0,1,rkgqm0VKwB,rkgqm0VKwB,Paper Decision,Reject,"This paper presents an end-to-end technique for named entity recognition, that uses pre-trained models so as to avoid long training times, and evaluates it against several baselines. The paper was reviewed by three experts working in this area. R1 recommends Reject, giving the opinion that although the paper is well-written and results are good, they feel the technique itself has little novelty and that the main reason the technique works well is using BERT. R2 recommends Weak Reject based on similar reasoning, that the approach consists of existing components (albeit combined in a novel way) and suggest some ablation experiments to isolate the source of the good performance. R3 recommends Weak Accept but feels it is ""unsurprising"" that BERT allows for faster training and higher accuracy. In their response, authors emphasize that the application of pretraining to named entity recognition is new, and that theirs is a methodological advance, not purely a practical one (as R1 suggests and other reviews imply). They also argue it is not possible to do a fair ablation study that removes BERT, but make an attempt. The reviewers chose to keep their scores after the response. Given the split decision, the AC also read the paper. It is clear the paper has significant merit and significant practical value, as the reviews indicate. However, given that three expert reviewers -- all of whom are NLP researchers at top institutions -- feel that the contribution of the paper is weak (in the context of the expectations of ICLR) makes it not possible for us to recommend acceptance at this time. ",ICLR2020, +zhTLfWu6ftU,1610040000000.0,1610470000000.0,1,sFDJNhwz7S,sFDJNhwz7S,Final Decision,Reject,"Thanks for your submission to ICLR. + +This paper presents an extension to Deep Hashing Networks that utilizes angular similarity, and show improved results using the proposed method. The reviewers were somewhat mixed on this paper, with two of three reviewers on the negative side. Some reviewers appreciated that the paper was easy to follow and well written, though one reviewer felt that the paper's writing and presentation could improve. A big concern about the paper expressed by multiple reviewers was that the paper was incremental, in that the main architectural difference seemed to be a change in loss function over existing work. Unfortunately, the reviewers were fairly unresponsive to attempts to get them to respond to the rebuttals offered by the authors. + +Ultimately, I took a look at the paper and found it to be borderline. I do think the contribution is a bit limited, particularly as it is in an area which has seen many papers over the years (and thus has a high bar for new work). However, with some additional work this paper could definitely be acceptable. I think it could use an additional round of editing and review, and I'd encourage the authors to submit this paper to another venue.",ICLR2021, +kSlI-l19e,1576800000000.0,1576800000000.0,1,ryxf9CEKDr,ryxf9CEKDr,Paper Decision,Reject,"The paper presents an efficient approach to computer saliency measures by exploiting saliency map order equivalence (SMOE), and visualization of individual layer contribution by a layer ordered visualization of information. + +The authors did a good job at addressing most issues raised in the reviews. In the end, two major concerns remained not fully addressed: one is the motivation of efficiency, and the other is how much better SMOE is compared with existing statistics. I think these two issue also determines how significance the work is. + +After discussion, we agree that while the revised draft pans out to be a much more improved one, the work itself is nothing groundbreaking. Given many other excellent papers on related topics, the paper cannot make the cut for ICLR. ",ICLR2020, +rJnK81THf,1517250000000.0,1517260000000.0,833,SkERSm-0-,SkERSm-0-,ICLR 2018 Conference Acceptance Decision,Reject,"The reviewers agreed that the paper was too long (more than twice the recommended page limit not counting the appendix) and difficult to follow. They also pointed out that its central idea of learning the noise distribution in a VAE was not novel. While the shortened version uploaded by the authors looks like a step in the right direction, it was not sufficient to convince the reviewers.",ICLR2018, +buVrrPlsHX,1610040000000.0,1610470000000.0,1,g-wu9TMPODo,g-wu9TMPODo,Final Decision,Accept (Spotlight),"The paper seeks to understand how training over-parametrized models (e.g., those based on neural networks) to zero training accuracy even when the test error is small (i.e., benign overfitting) can introduce vulnerabilities in the form of adversarial examples and how to remedy the situation. The paper implicates label noise as one of the causes of adversarial robustness, and suboptimal representations learned as part of the training as another. The claims are supported both theoretically and empirically. A good paper overall, accept! ",ICLR2021, +SJe59a_XeE,1544950000000.0,1545350000000.0,1,r1My6sR9tX,r1My6sR9tX,Interesting idea with thorough empirical evaluation,Accept (Poster),Reviewers largely agree that the paper proposes a novel and interesting idea for unsupervised learning through meta learning and the empirical evaluation does a convincing job in demonstrating its effectiveness. There were some concerns on clarity/readability of the paper which seem to have been addressed by the authors. I recommend acceptance. ,ICLR2019,5: The area chair is absolutely certain +hUrY9BO16Ay,1642700000000.0,1642700000000.0,1,zRJu6mU2BaE,zRJu6mU2BaE,Paper Decision,Accept (Poster),"Summary: +Paper addresses the cross-domain few-shot learning scenario, where meta-learning data is unavailable, and approaches are evaluated directly on novel settings. Authors propose a 3-step approach: 1) self-supervised pretraining, 2) feature selection, 3) fine-tuning, and demonstrate gains over state-of-art. + +Pros: +- Approach is novel for this setting +- Paper is clear and easy to understand +- Performance beats several prior methods +- Experiments are thorough +- Fundamental problem is worthwhile of investigation + +Cons: +- Some concerns among multiple reviewers on how hyperparameters are selected. Authors have provided more information and tables in the paper. +- Training process is multi-step and not unified. Authors provided additional information about unified training results, which yielded poorer results, likely due to overfitting from training many parameters at once. + +Overall recommendation based on the consensus of reviewers and AC expertise: accept.",ICLR2022, +ByPXmJ6BM,1517250000000.0,1517260000000.0,104,Hyg0vbWC-,Hyg0vbWC-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This paper presents a new multi-document summarization task of trying to write a wikipedia article based on its sources. Reviewers found the paper and the task clear to understand and well-explained. The modeling aspects are clear as well, although lacking justification. Reviewers are split on the originality of the task, saying that it is certainly big, but wondering if that makes it difficult to compare with. The main split was the feeling that ""the paper presents strong quantitative results and qualitative examples. "" versus a frustration that the experimental results did not take into account extractive baselines or analysis. However the authors provide a significantly updated version of the work targeting many of these concerns, which does alleviate some of the main issues. For these reasons, despite one low review, my recommendation is that this work be accepted as a very interesting contribution. + +",ICLR2018, +p0gE0VbHsDy,1642700000000.0,1642700000000.0,1,q7n2RngwOM,q7n2RngwOM,Paper Decision,Accept (Poster),"In this paper, the authors proposed a method for causal inference under limited overlap -- an important and understudied complication. The authors propose to recover a prognostic score using a variational autoencoder, and thereby map a higher dimensional set of covariates with limited overlap to a lower dimensional set where overlap holds, and such that ignorability is maintained. + +The paper was reviewed quite favorably by reviewers, and the authors updated the manuscript to address specific issues raised by reviewers.",ICLR2022, +PREw6ArXRTh,1642700000000.0,1642700000000.0,1,gzeruP-0J29,gzeruP-0J29,Paper Decision,Reject,"This paper investigates fast adversarial training methods as a bilevel optimization problem. The proposed algorithm compares well with the existing techniques in overall runtime (obtaining better clean-test accuracy, which is not the goal, and) matching the robust accuracy of existing adversarial training methods. The proposed framework, however, is more general and flexible and is theoretically grounded. The problem studied here is exciting and the approach the authors take is interesting. + +The current version, unfortunately, has some serious shortcomings. The empirical comparisons are a bit lacking — in general, the wall clock time is not a very good measure, it depends heavily on the implementation and various optimizations therein. A more suitable comparison would be in terms of floating-point operations, or in terms of iteration complexity. + +The paper reports other interesting findings such as how the proposed method avoids robust overfitting. However, there is little theoretical evidence or insight for how the proposed method avoids it. + +The writing can be improved with more emphasis on the novelty and significance of the contributions — some of the statements regarding improvements over prior work are somewhat misleading given the incremental gains (e.g., see Table 1). I believe the comments from the reviewers have already helped improve the quality of the paper. I encourage the authors to further incorporate the feedback and work towards a stronger submission.",ICLR2022, +qWCtEM8kdLT,1642700000000.0,1642700000000.0,1,AUGBfDIV9rL,AUGBfDIV9rL,Paper Decision,Accept (Spotlight),This manuscript expands the range of recent work in reinforcement learning for language games to much larger and more realistic datasets. A timely and relevant contribution and one that is well evaluated. Further work in stabilizing RL approaches for such large-scale problems is likely to have other far-reaching consequences. Reviewers were unanimous that this is a strong submission after the author discussion period.,ICLR2022, +u99t-unx3kK,1642700000000.0,1642700000000.0,1,yrD7B9N_54F,yrD7B9N_54F,Paper Decision,Reject,"This paper has been reviewed by four expert reviewers who gave diverging scores. The three negative reviewers have provided significant constructive feedback. The main criticism is the lack of novelty and clarity in the paper. The authors have submitted their rebuttal which did not improve the scores of these reviewers. After the discussion phase, the paper did not obtain any support for acceptance and stayed under the acceptance threshold. Following the reviewers' recommendation, the meta reviewer recommends rejection.",ICLR2022, +vLPjWKiy7ty,1642700000000.0,1642700000000.0,1,Q83vFlie_Pr,Q83vFlie_Pr,Paper Decision,Accept (Poster),"This paper tackles a bandit problem that incorporates three challenges motivated by common issues encountered in online recommender systems: delayed reward, incentivized exploration, and self-reinforcing user preference. The authors propose an approach called UCB-Filtering-with-Delayed-Feedback (UCB-FDF) for this problem and provide a theoretical analysis showing that UCB-FDF achieves the optimal regret bounds. Their analysis also implies that logarithmic regret and incentive cost growth rates are achievable under this setting. These theoretical results are supported by empirical experiments, e.g. using Amazon review data. The main concern with this paper is that the considered challenges have all been tackled already in different bandit settings, so the novelty here is that they are being tackled altogether. It would be more convincing if experiments included baselines from these existing settings to motivate the need for a new strategy rather than simply relying on methods that have been proposed previously to address each of these problems independently; the experiments currently contain only a baseline for bandits with self-reinforcing user preference, which has been added during the rebuttal phase.",ICLR2022, +m7AdKfY4OSh,1642700000000.0,1642700000000.0,1,OnpFa95RVqs,OnpFa95RVqs,Paper Decision,Accept (Poster),"This paper proposes a methodology to create cheap NAS surrogate benchmarks for arbitrary search spaces. Certainly, the work is interesting and useful, with comprehensive studies to validate such approach. It should be credited as belonging to the first efforts of introducing and comprehensively studying the concept of surrogate NAS benchmarks. In AC's opinion, it is a solid paper that will (or has already) inspire many follow up works. The paper is well written. + +This paper received highly mixed ratings. Although the authors might not see, all reviewers actually participated in the private discussions. Reviewer 1eb8 indicated hesitation in her/his support. Reviewer yTPb stated that if not considering the arXiv complicacy, she/he ""would certainly raise score by one level"". AC also reached out to Reviewer yTPb about her/his mentioned possibility of updating scores, and got confirmed that her/his original opinions wasn't changing after rebuttals. Besides, AC agrees the arXiv/NeurIPS complicacy shouldn't brought into the current discussion, and ignored that factor during decision making. + +The main sticking (and considered-as-valid) critique is on the relatively outdated and incomplete selection of baselines. As a benchmark paper, it should capture and diversify the recent methods. For example, the authors might consider adding: https://botorch.org/docs/papers (latest methods in Bayesian Optimization) https://github.com/facebookresearch/LaMCTS (latest methods in Monte Carlo Tree Search) https://facebookresearch.github.io/nevergrad/ (latest methods in Evolutionary algorithms) + +Given the above concerns, AC considers this paper to sit on the borderline, and perhaps with pros outweighing the cons. Hence, a weak accept decision is recommended at this moment.",ICLR2022, +Cjd07CDw07,1576800000000.0,1576800000000.0,1,rylT0AVtwH,rylT0AVtwH,Paper Decision,Reject,"This submission proposes a VAE-based method for jointly inferring latent variables and data generation. The method learns from partially-observed multimodal data. + +Strengths: +-Learning to generate from partially-observed data is an important and challenging problem. +-The proposed idea is novel and promising. + +Weaknesses: +-Some experimental protocols are not fully explained. +-The experiments are not sufficiently comprehensive (comparisons to key baselines are missing). +-More analysis of some surprising results is needed. +-The presentation has much to improve. + +The method is promising but the mentioned weaknesses were not sufficiently addressed during discussion. AC agrees with the majority recommendation to reject. +",ICLR2020, +B0lXUsBfT,1576800000000.0,1576800000000.0,1,Byg79h4tvB,Byg79h4tvB,Paper Decision,Reject,"The paper focuses on adversarial domain adaptation, and proposes an approach inspired from the DANN. The contribution lies in additional terms in the loss, aimed to i) align the source and target prototypes in each class (using pseudo labels for target examples); ii) minimize the variance of the latent representations for each class in the target domain. + +Reviews point out that the expected benefits of target prototypes might be ruined if the pseudo-labels are too noisy; they note that the specific problem needs be more clearly formalized and they regret the lack of clarity of the text. The sensitivity w.r.t. the hyper-parameter values needs be assessed more thoroughly. + +One also notes that SAFN is one of the baseline methods; but its best variant (with entropic regularization) is not considered, while the performance thereof is on par or greater than that of PACFA for ImageCLEF-Da; idem for AdapSeg (consider its multi-level variant) or AdvEnt with MinEnt. + +For these reasons, the paper seems premature for publication at ICLR 2020. ",ICLR2020, +r1gnoGlNxE,1544980000000.0,1545350000000.0,1,HJMC_iA5tm,HJMC_iA5tm,Area chair recommendation,Accept (Poster),"The submission proposes a machine learning approach to directly train a prediction system for whether a boolean sentence is satisfiable. The strengths of the paper seem to be largely in proposing an architecture for SAT problems and the analysis of the generalization performance of the resulting classifier on classes of problems not directly seen during training. + +Although the resulting system cannot be claimed to be a state of the art system, and it does not have a correctness guarantee like DPLL based approaches, the paper is a nice re-introduction of SAT in a machine learning context using deep networks. It may be nice to mention e.g. (W. Ruml. Adaptive Tree Search. PhD thesis, Harvard University, 2002) which applied reinforcement learning techniques to SAT problems. The empirical validation on variable sized problems, etc. is a nice contribution showing interesting generalization properties of the proposed approach. + +The reviewers were unanimous in their recommendation that the paper be accepted, and the review process attracted a number of additional comments showing the broader interest of the setting.",ICLR2019,5: The area chair is absolutely certain +rkgUkUexxN,1544710000000.0,1545350000000.0,1,BkesGnCcFX,BkesGnCcFX,Important subject matter but novelty & results insufficient for acceptance.,Reject,"This manuscript presents a reinterpretation of hindsight experience replay which aims to avoid recomputing the reward function, and investigates Floyd-Warshall RL in the function approximation setting. + +The paper was judged as relatively clear. The authors report a slight improvement in computational cost, which some reviewers called into question. However, all of the reviewers pointed out that the experimental evidence for the method's superiority is weak. Two reviewers additionally raised that this wasn't significantly different than the standard formulation of Hindsight Experience Replay, which doesn't require the computation of rewards for relabeled goals. + +Ultimately, reviewers were in agreement that the novelty of the method and quality of the obtained results rendered the work insufficient for publication. The Area Chair concurs, and urges the authors to consider the reviewers' pointers to the existing literature in order to clarify their contribution for subsequent submission.",ICLR2019,4: The area chair is confident but not absolutely certain +SyGMTGU_x,1486400000000.0,1486400000000.0,1,ByZvfijeg,ByZvfijeg,ICLR committee final decision,Reject,"Paper presents the idea of using higher order recurrence in LSTMs. The ideas are well presented and easy to follow. + However, the results are far from convincing, easily being below well established numbers in the domain. Since the mode is but a very simple extension of the baseline recurrent models using LSTMs that are state of the art on language modelling, it should have been easy to make a fair comparison by replacing the LSTMs of the state of the art models with higher order versions, but the authors did not do that. Their claimed numbers for SOTA are far from previously reported numbers in that setup, as pointed out by reviewers.",ICLR2017, +F5KL4g70Fq,1576800000000.0,1576800000000.0,1,HyeX7aVKvr,HyeX7aVKvr,Paper Decision,Reject,"The authors presents a method for adapting models to new tasks in a zero shot manner using learned meta-mappings. The reviewers largely agreed that this is an interesting and creative research direction. However, there was also agreement that the writing was unclear in many sections, that the appropriate metalearning baselines were not compared to, and that the power of the method was unclear due to overly simplistic domains. While the baseline issue was mostly cleared up in rebuttal and discussion, the other issues remain. Thus, I recommend rejection at this time.",ICLR2020, +ByEN7y6Hz,1517250000000.0,1517260000000.0,116,Hk9Xc_lR-,Hk9Xc_lR-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),I recommend acceptance. The two positive reviews point out the theoretical contributions. The authors have responded extensively to the negative review and I see no serious flaw as claimed by the negative review.,ICLR2018, +gyg254z4lz,1576800000000.0,1576800000000.0,1,S1xJ4JHFvS,S1xJ4JHFvS,Paper Decision,Reject,"The paper addresses an important problem of finding a good trade-off between generalization and convergence speed of stochastic gradient methods for training deep nets. However, there is a consensus among the reviewers, even after rebuttals provided by the authors, that the contribution is somewhat limited and the paper may require additional work before it is ready to be published.",ICLR2020, +ByXILypHM,1517250000000.0,1517260000000.0,786,BkIkkseAZ,BkIkkseAZ,ICLR 2018 Conference Acceptance Decision,Reject,"Understanding the quality of the solutions found by gradient descent for optimizing deep nets is certainly an important area of research. The reviewers found several intermediate results to be interesting. At the same time, the reviewers unanimously have pointed out various technical aspects of the paper that are unclear, particularly new contributions relative to recent prior work. As such, at this time, the paper is not ready for ICLR-2018 acceptance.",ICLR2018, +RSBmYBVy80d4,1642700000000.0,1642700000000.0,1,3tbDrs77LJ5,3tbDrs77LJ5,Paper Decision,Accept (Poster),"The paper studies gradient descent for matrix factorization with a learning rate that is large relative to the a certain notion of the scale of the problem. In particular, they show that the use of large learning rates leads to balancing between the two factors in the factorization. + +The discussion between the authors and the reviewers was fruitful in dispelling some of the reviewers' doubts and at the same time improving the paper. + +The paper seems to make some contribution on a relevant problem for the ICLR community. However, even in the restricted settings they consider, the problem does not appear to be completely solved. That said, I agree with the majority of the reviewers that the step forward seems enough to warrant the acceptance. + +I would still encourage the authors to take into account the reviewers' comments in preparing the camera-ready version. In particular, in the internal discussion it was suggested that the presentation of the paper could be improved by clearly stating the limitations of the current approach (e.g., the assumption of convergence in Theorem 5.1, a better discussion on large vs small learning rates w.r.t. the balancing effect).",ICLR2022, +31nsbO9r2wU6,1642700000000.0,1642700000000.0,1,xaTensJtCP5,xaTensJtCP5,Paper Decision,Reject,"This paper proposes guiding principles with which to design objective functions for proposal distributions for MCMC. They design one such objective based on GSM (Titsias and Dellaportas, 2019). The two concerns raised by reviewers that resonated the most with me were: + +- it was not clear that the actual proposed objective was the best way to implement these guiding principles +- a weak empirical evaluation that did not consider online tuning and high-dim, highly non-Gaussian targets. + +After rebuttal, revision, and discussion, reviewers felt that the authors did a reasonable job of addressing the issue of online tuning, but very highly non-Gaussian targets were not addressed. There was still a sense that the ultimate instantiation of the design principles was a somewhat adhoc loss. Ultimately, I think that this work is just below the bar for acceptance and it can be improved by clarifying the choices made in implementing the objective and some more ambitious experiments.",ICLR2022, +eGUetFiNk3g,1610040000000.0,1610470000000.0,1,2CjEVW-RGOJ,2CjEVW-RGOJ,Final Decision,Accept (Poster),"The authors did a good job responding to reviewer concerns. While the reviewers still consider the method described in the paper to not be especially novel, at least one is impressed by the practicality. imo the authors' attention detailed ablations and analysis post-review makes the paper worth including in the conference.",ICLR2021, +rJlyUaeNlN,1544980000000.0,1545350000000.0,1,HklJV3A9Ym,HklJV3A9Ym,Rewrite needed to address importance of result,Reject,"Several reviewers thought the results were not surprising in light of existing universality results, and thought the results were of limited relevance, given that the formalization is not quite in line with real-world networks for MIL. The authors draw out some further justifications in the rebuttal. These should be reintegrated. I agree with the general criticisms regarding relevance to ICLR. Ultimately, this work may belong in a journal.",ICLR2019,3: The area chair is somewhat confident +DXPtJkGNM6,1642700000000.0,1642700000000.0,1,yXBb-0cPSKO,yXBb-0cPSKO,Paper Decision,Reject,"This paper tackles the contextual bandit problem with general function classes and introduces a novel algorithm called regularized optimism in face of uncertainty (ROFU). Although this is an important and relevant problem, the theoretical contribution is rather weak due to the strong assumptions, which also results in a lack of consistency with the motivation and the empirical settings. Moreover, although experimental results suggest that the proposed ROFU method may have potential, the empirical contribution is unclear as the paper currently lacks a comparison with appropriate previous work. All these concerns were raised in the reviews, but unfortunately, none were addressed in the rebuttal phase.",ICLR2022, +B1etpz_ex4,1544750000000.0,1545350000000.0,1,rkl42iA5t7,rkl42iA5t7,Interesting approach to compression based on analyzing filter activations.,Reject,"The authors propose a technique for compressing neural networks by examining the correlations between filter responses, by removing filters which are highly correlated. This differentiates the authors’ work from many other works which compress the weights independent of the task/domain. + +Strengths: +Clearly written paper +PFA-KL does not require additional hyperparameter tuning (apart from those implicit in choosing \psi) +Experiments demonstrate that the number of filters determined by the algorithm scale with complexity of the task + +Weaknesses: +Results on large-scale tasks such as Imagenet (subsequently added by the authors during the rebuttal period) +Compression after the fact may not be as good as training with a modified loss function that does compression jointly +Insufficient comparisons on ResNet architectures which make comparisons against previous works harder + +Overall, the reviewers were in agreement that this work (particularly, the revised version) was close to the acceptance threshold. In the ACs view, the authors addressed many of the concerns raised by the reviewers in the revisions. However, after much deliberation, the AC decided that the weaknesses 2, and 3 above were significant, and that these should be addressed in a subsequent submission.",ICLR2019,4: The area chair is confident but not absolutely certain +wjh2U9960P9,1610040000000.0,1610470000000.0,1,tH6_VWZjoq,tH6_VWZjoq,Final Decision,Accept (Poster),Please clarify as early as the abstract that you refine the analysis of the algorithm proposed by Shalev-Shwartz et al (which is a great contribution given the importance of the problem).,ICLR2021, +#NAME?,1610040000000.0,1610470000000.0,1,eHG7asK_v-k,eHG7asK_v-k,Final Decision,Reject,"There was some slight disagreement on the paper, but the majority of reviewers agree that although some answers of the authors on questions brought good clarification, other issues still remain problematic. Some of the assumptions remain unclear (w.r.t CDTE), and reviewers still have doubts about the global convergence and weak stable fixed point concept, that lack clear math details. +The experiments are also still a bit too immature, more comparison is needed, as well as an evaluation on other domains.",ICLR2021, +rJZ22MLue,1486400000000.0,1486400000000.0,1,H1wgawqxl,H1wgawqxl,ICLR committee final decision,Invite to Workshop Track,"The authors propose a nonparametric regression approach to learn the activation functions in deep neural networks. The proposed theoretical analysis, based on stability arguments, is quite interesting. Experiments on MNIST and CIFAR-10 illustrate the potential of the approach. + + Reviewers were somewhat positive, but preliminary empirical evidence on small datasets makes this contribution better suited for the workshop track.",ICLR2017, +lTD-xLRqmc,1576800000000.0,1576800000000.0,1,H1eA7AEtvS,H1eA7AEtvS,Paper Decision,Accept (Spotlight),"This paper proposes three modifications of BERT type models two of which is concerned with parameter sharing and one with a new auxiliary loss. New SOTA on downstream tasks are demonstrated. + +All reviewers liked the paper and so did a lot of comments. + +Acceptance is recommended.",ICLR2020, +BJlzFRDZeE,1544810000000.0,1545350000000.0,1,BJxssoA5KX,BJxssoA5KX,metareview,Accept (Poster),"This paper proposes a novel dataset of bouncing balls and a way to learn the dynamics of the balls when colliding. The reviewers found the paper well-written, tackling an interesting and hard problem in a novel way. The main concern (that I share with one of the reviewers) is about the fact that the paper proposes both a new dataset/environment *and* a solution for the problem. This made it difficult the for the authors to provide baselines to compare to. The ensuing back and forth had the authors relax some of the assumptions from the environment and made it possible to evaluate with interaction nets. + +The main weakness of the paper is the relatively contrived setup that the authors have come up with. I will summarize some of the discussion that happened as a result of this point: it is relatively difficult to see how this setup that the authors have and have studied (esp. knowing the groundtruth impact locations and the timing of the impact) can generalize outside of the proposed approach. There is some concern that the comparison with interaction nets was not entirely fair. + +I would recommend the authors redo the comparisons with interaction nets in a careful way, with the right ablations, and understand if the methods have access to the same input data (e.g. are interaction nets provided with the bounce location?). + +Despite the relatively high average score, I think of this paper as quite borderline, specifically because of the issues related to the setup being too niche. Nonetheless, the work does have a lot of scientific value to it, in addition to a new simulation environment/dataset that other researchers can then use. Assuming the baselines are done in a way that is trustworthy, the ablation experiments and discussion will be something interesting to the ICLR community.",ICLR2019,4: The area chair is confident but not absolutely certain +nbWFbGaoWjI,1642700000000.0,1642700000000.0,1,ZTsoE8G3GG,ZTsoE8G3GG,Paper Decision,Accept (Poster),"Most reviewers were positive about the paper, seeing that the proposed method is practical and has convincing experimental performances. One reviewer was a bit negative and raised questions about clarity. After the authors responded, the negative reviewer didn't respond further. After reviewing all the comments, the AC feels that there is enough support from reviewers to accept this paper.",ICLR2022, +rklQpGUde,1486400000000.0,1486400000000.0,1,SkkTMpjex,SkkTMpjex,ICLR committee final decision,Accept (Poster),"All reviewers agreed that this is an interesting contribution, but all agreed that the testing environment was somewhat small-scale and that significant difficulties could arise is scaling it up. However, the overall sentiment was still sufficiently positive.",ICLR2017, +H35TF6Onzmh,1610040000000.0,1610470000000.0,1,5slGDu_bVc6,5slGDu_bVc6,Final Decision,Reject,"This paper proposed a new variant of knowledge distillation. The basic idea is interesting although similar ideas have more or less appeared in the literature as pointed out by the reviewers. Our main concern on this work is that the real empirical improvements are too limited such that it is hard to conclude that the proposed method can really perform better than the baseline. In the meantime, the proposed method is much more computationally expensive. ",ICLR2021, +KgDgbK41irq,1610040000000.0,1610470000000.0,1,mLtPtH2SIHX,mLtPtH2SIHX,Final Decision,Reject,"This work proposes to improve Mixup by using soft labels, removing the need for input mixup. The reviewers found the paper was clear and found the experiments promising. The reviewers raised concerns about the lack of experiments comparing this approach to Mixup+Label smoothing, which were addressed during the rebuttal by the authors. However, the reviewers did not find the empirical evidence strong enough given that this is mostly an empirical contribution. The authors do not necessarily need to train on the full Imagenet, but it would be beneficial to evaluate on more standard settings on the dataset considered to facilitate comparison to previous work.",ICLR2021, +LaWkb7nlPM,1610040000000.0,1610470000000.0,1,M_eaMB2DOxw,M_eaMB2DOxw,Final Decision,Reject,"The paper received reviews from experts in representation of invariant functions. They all have expressed concerns regarding the novelty of the technical contributions, and the lack of appropriate comparisons to existing results. This applies in particular to representation of symmetric functions using neural networks which was largely covered by previous works, as acknowledged by the authors. The authors are encouraged to consider the valuable inputs by the reviewers and revise accordingly. ",ICLR2021, +S1pANkpSf,1517250000000.0,1517260000000.0,471,SyvCD-b0W,SyvCD-b0W,ICLR 2018 Conference Acceptance Decision,Reject,The reviewers have pointed out that there is a substantial amount of related work that this paper should be acknowledging and building on.,ICLR2018, +gGMwTQ9nqyZ,1642700000000.0,1642700000000.0,1,1gEb_H1DEqZ,1gEb_H1DEqZ,Paper Decision,Reject,"This paper proposes a pre-training technique for improving the logical abilities of pre-trained language models. +Reviewers point to many issues with clarity and experimental evaluation. No response was given by authors.",ICLR2022, +stOwmVYw8f5,1610040000000.0,1610470000000.0,1,6BRLOfrMhW,6BRLOfrMhW,Final Decision,Accept (Poster),All of the reviewers thought that this paper addresses an interesting and important problem. Several of the reviewers thought that the paper gave a creative approach for training bloom filters and this would be of interest to the community. ,ICLR2021, +YAN7dg977kM,1642700000000.0,1642700000000.0,1,DTXZqTNV5nW,DTXZqTNV5nW,Paper Decision,Accept (Poster),"This paper presents a Actor-Critic Hedge (ACH) method for 1-on-1 Mahjong. It is is an actor-critic method for approximating Nash equilibrium strategies in large extensive-form games. ACH extends the CFR family of algorithms that uses deep learning and model-free training (not using full game traversal). The propose ACH agent defeats several human players, including a Mahjong champion. This is impressive. + +The reviewers and authors have extensive discussions and the authors managed to address most of the concerns from the reviewers. The overall opinions from the the reviewers favor acceptance. Below are some of the strength and weakness summarized from the reviewers: + +Strength: +* Extensive experiments and impressive performance. +* New policy based algorithm for competitive environments. +* Reviewers' questions are well addressed. + +Weakness: +Lack of more tabular theoretical analysis. Need experiments to compare to existing methods. Theory and experiment does not match.",ICLR2022, +2RGSEpCm-I,1576800000000.0,1576800000000.0,1,ryeFY0EFwS,ryeFY0EFwS,Paper Decision,Accept (Poster),"The paper proposes an intuitive causal explanation for the generalization properties of GD methods. The reviewers appreciated the insights, with one reviewer claiming that there was significant overlap with existing work. + +I ultimately decided to accept this paper as I believe intuitive explanations are critical to the propagation of ideas. That being said, there is a tendency in this community to erase past, especially theoretical, work, for that very reason that theoretical work is less popular. + +Hence, I want to make it clear that the acceptance of this paper is based on the premise that the authors will incorporate all of reviewer 3's comments and give enough credit to all relevant work (namely, all the papers cited by the reviewer) with a proper discussion on the link between these.",ICLR2020, +HJgLwNIxgN,1544740000000.0,1545350000000.0,1,Syl7OsRqY7,Syl7OsRqY7,Meta Review,Accept (Poster),"The paper presents a method for coarse and fine inference for question answering. It originally measured performance only on WikiHop and then later added experiments on TriviaQA. The results are good. + +One of the concerns regarding the paper was the novelty of the work, and lack of enough experiments. However, the addition of TriviaQA results allays some of that concern. I'd suggest citing the paper by Swayamdipta et al from last year that attempted coarse to fine inference for TriviaQA: + +Multi-Mention Learning for Reading Comprehension with Neural Cascades. +Swabha Swayamdipta, Ankur P. Parikh and Tom Kwiatkowski. +Proceedings of ICLR 2018. + +Overall, there is relative consensus that the paper is good with a new method and some strong results.",ICLR2019,4: The area chair is confident but not absolutely certain +S1lyvAMggN,1544720000000.0,1545350000000.0,1,BJxYEsAqY7,BJxYEsAqY7,lack of novelty,Reject,"The paper describes knowledge distillation methods. As noted by all reviewers, the methods are very similar to the prior art, so there is not enough novelty for the paper to be accepted. The reviewers' opinion didn't change after the rebuttal.",ICLR2019,5: The area chair is absolutely certain +KjPnxd3BFj,1576800000000.0,1576800000000.0,1,Byx0PREtDH,Byx0PREtDH,Paper Decision,Reject,"The paper focuses on attribute-object pairs image recognition, leveraging some novel ""attractor network"". + +At this stage, all reviewers agree the paper needs a lot of improvements in the writing. There are also concerns regarding (i) novelty: the proposed approach being two encoder-decoder networks; (ii) lack of motivation for such architecture (iii) possible flow in the approach (are the authors using test labels?) and (iv) weak experiments.",ICLR2020, +Hy72Ek6BG,1517250000000.0,1517260000000.0,436,r1h2DllAW,r1h2DllAW,ICLR 2018 Conference Acceptance Decision,Reject,"This paper presents a somewhat new approach to training neural nets with ternary or low-precision weights. However the Bayesian motivation doesn't translate into an elegant and self-tuning method, and ends up seeming kind of complicated and ad-hoc. The results also seem somewhat toy. The paper is fairly clearly written, however.",ICLR2018, +bgS_sukJIJ7t,1642700000000.0,1642700000000.0,1,K3bGe_-aMV,K3bGe_-aMV,Paper Decision,Reject,"This paper presents a semantically controllable generative framework by integrating explicit knowledge. In particular, a tree-structured generative model is proposed based on knowledge categorization. Reviewers raised concerns about technical details, experiments, and missing references. In the revised paper, the authors provided more justifications and clarifications, such as the definition of adversarial attack. During the discussion, reviewers agreed that the previous concerns have been partially addressed, but there are still concerns on experiments, e.g., more recent work should be considered as baselines. + +Overall, I recommend to reject this paper. I encourage the authors to take the review feedback into account and submit a future version to another venue.",ICLR2022, +BylkZeRLyN,1544110000000.0,1545350000000.0,1,BkMq0oRqFQ,BkMq0oRqFQ,"a new interpretation of batch norm, but not clear what we gain from it",Reject,"This paper interprets batch norm in terms of normalizing the backpropagated gradients. All of the reviewers believe this interpretation is novel and potentially interesting, but that the paper doesn't make the case that this helps explain batch norm, or provide useful insights into how to improve it. The authors have responded to the original set of reviews by toning down some of the claims in the original paper, but haven't addressed the reviewers' more substantive concerns. There may potentially be interesting ideas here, but I don't think it's ready for publication at ICLR. + +",ICLR2019,5: The area chair is absolutely certain +hgWMH_08-FP,1610040000000.0,1610470000000.0,1,YwpZmcAehZ,YwpZmcAehZ,Final Decision,Accept (Poster),"This paper improves the dynamic convolution operation by replacing the dynamic attention over channel groups with channel fusion in a low-dimensional space. It includes extensive experiments with reasonable baselines. Dynamic convolutions are a fruitful method for making convnets more efficient, and this paper further improves their efficiency and efficacy with a novel technique. Reviewers all agreed that the paper was clearly written (though some parts were improved after rebuttal).",ICLR2021, +CIwDpoJ9ZX,1576800000000.0,1576800000000.0,1,SklKcRNYDH,SklKcRNYDH,Paper Decision,Accept (Poster),"Post author rebuttal the score of this paper increased. +Discussions with reviewers were substantive and the AC recommends acceptance.",ICLR2020, +fbgK9NG51,1576800000000.0,1593040000000.0,1,ryeRwlSYPH,ryeRwlSYPH,Paper Decision,Reject,"The submission has two issues, identified by the reviewers; (1) the description of the proposed method was found to be confusing at times and could be improved, and (2) the proposed transitional skills were not well motivated/justified as a solution to the problem the authors propose to solve.",ICLR2020, +N6tgYPin8_,1576800000000.0,1576800000000.0,1,rJeuMREKwS,rJeuMREKwS,Paper Decision,Reject,"The reviewers generally agreed that the technical novelty of the work was limited, and the experimental evaluation was insufficient to make up for this, evaluating the method only on relatively simple toy tasks. As much, I do not think that the paper is ready for publication at this time.",ICLR2020, +Bkxmf4flgE,1544720000000.0,1545350000000.0,1,rJeEqiC5KQ,rJeEqiC5KQ,"interesting problem, but contribution is hard to assess due to lack of details and clarity",Reject,"1. Describe the strengths of the paper. As pointed out by the reviewers and based on your expert opinion. + +The paper tackles an interesting and relevant problem for ICLR: incremental classifier learning applied to image data streams. + +2. Describe the weaknesses of the paper. As pointed out by the reviewers and based on your expert opinion. Be sure to indicate which weaknesses are seen as salient for the decision (i.e., potential critical flaws), as opposed to weaknesses that the authors can likely fix in a revision. + +- The proposed method is not clearly explained and not reproducible. In particular the contribution on top of the baseline iCaRL method is unclear. It seems to be mainly the use of CAE which is a minor change. +- The experimental comparisons are incomplete. For example, in Table 4 the authors don't discuss the storage requirements of GAN and FearNet baselines. +- The authors state that one of their main contributions is fullfilling privacy and legal requirements. They claim this is done by using CAEs which generate image embeddings that they store rather than the original images. However it's quite well known that a lot of data about the original images can be recovered from such embeddings (e.g. Dosovitskiy & Brox. ""Inverting visual representations with convolutional networks."" CVPR 2016.). +These concerns all impacted the final decision. + +3. Discuss any major points of contention. As raised by the authors or reviewers in the discussion, and how these might have influenced the decision. If the authors provide a rebuttal to a potential reviewer concern, it’s a good idea to acknowledge this and note whether it influenced the final decision or not. This makes sure that author responses are addressed adequately. + +There were no major points of contention and no author feedback. + +4. If consensus was reached, say so. Otherwise, explain what the source of reviewer disagreement was and why the decision on the paper aligns with one set of reviewers or another. + +The reviewers reached a consensus that the paper should be rejected. +",ICLR2019,4: The area chair is confident but not absolutely certain +QJPHPq3dHzk,1642700000000.0,1642700000000.0,1,_jMtny3sMKU,_jMtny3sMKU,Paper Decision,Accept (Poster),"All reviewers give acceptance scores. +One reviewer also commented that they would like to increase their score from 6 to 7 (which isn't possible in the system). +I encourage the authors to add the substantial new results generated during the rebuttal into the paper.",ICLR2022, +gj1B1Fni0zQ,1610040000000.0,1610470000000.0,1,HHiiQKWsOcV,HHiiQKWsOcV,Final Decision,Accept (Poster),"This paper presents a theoretical characterization of the impact of noise in causal and non-causal features on model generalization, through the lens of counterfactual data augmentations with toy data and models, and demonstrates that the predictions of this characterization bear out in several experiments on language counterfactually-augmented language data with substantial models. + +Pros: +- Spurious features and their relationship out-of-domain generalization are a practically issue in modern applied ML, and this work helps to coalesce our understanding of this area. +- Fairly extensive experimental work. + +Cons: +- Reviewers didn't find the connection between the theoretical analysis, which focused on a simplified setting, and the experimental work, to be especially clear. In particular, reviewers worried that the predictions that were tested experimentally were fairly intuitive ones that could reasonably be derived from a number of starting assumptions, so it's not clear that they offer strong support for the specific account given here. +- Reviewers found the presentation, especially of the empirical work, confusing. + +Overall, this paper makes a legitimate and sound contribution to an important research area. That contribution is small, and somewhat easy to misinterpret, but after some discussion, reviewers agreed that the paper should still be a worthwhile net positive for the field. + ",ICLR2021, +PGGbytEDQd,1642700000000.0,1642700000000.0,1,0sgntlpKDOz,0sgntlpKDOz,Paper Decision,Accept (Poster),"This paper studies graphon mean-field games, whereby a continuum of agents are connected by a graphon. They study a discrete time version and show existence of a Nash equilibrium (under Lipschitz conditions). Moreover they prove that it corresponds to an approximate Nash equilibrium for the game with a finite number of players, thereby validating graphon mean-field games as a natural abstraction when the number of players is sufficiently large. Finally they give algorithms based on fixed point iterations (one based on discretizing the graphon index, the other based on reformulating it as a classical mean-field game) for computing such an equilibrium. They give numerical experiments to validate their approach. The reviewers pointed out various writing issues or other results that would help complete the picture. Many of these were addressed and/or clarified by the authors in their revision. Overall the paper provides an appealing and relatively complete characterization of equilibria in graphon mean-field games.",ICLR2022, +cPvhaTN1k30,1642700000000.0,1642700000000.0,1,QbFfqWAEmMr,QbFfqWAEmMr,Paper Decision,Reject,"This paper proposes a novel method for improving domain generalization based on the idea of learning different subspaces for each domain. Authors provide theoretical analysis related to their proposal and further evaluate their proposed method on a subset of DomainBed benchmark. + + +**Strong Points:** + +- The paper is well-written. + +- The proposed method is novel. + +- Authors provide theoretical analysis in support of their proposal. + +- The theoretical results seem to be correct. + +- Empirical evaluation shows that the proposed method improves over baselines on a subset of datasets included in the DomainBed benchmark. + +**Weak Points:** + +- The complexity of the theoretical results makes it very difficult for the reader to get any intuition about the underlying mechanisms at play. + +- The theoretical analysis is disconnected from the proposed algorithm. It is hard to see how one could end up proposing such an algorithm following the theoretical results. I suggest that authors would consider reorganizing the paper with less emphasis on the theoretical part, perhaps simplifying the theoretical results and pushing the rest to appendix. + +- The empirical evaluation can be improved significantly. Domain generalization is a very well-established area at this point. WILDS is a carefully designed and well-known benchmark and showing improvement in that benchmark would be very convincing but unfortunately authors do not discuss or even refer to it. They instead report their results on a subset of datasets used in DomainBed benchmark. The DomainBed benchmark is less challenging than WILDS but even following DomainBed closely and reporting the 3 evaluation metrics on all 7 datasets would have been satisfying. However, authors only report the results on 3 datasets. Reporting the results on a diverse group of datasets is particularly important in the case of Domain Generalization because we know that many methods are able to show improvements on a few datasets but it is challenging to beat the baselines on a significant majority of datasets. + +**Final Decision Rationale**: + +This is a borderline paper. On one hand, the proposed method is interesting and novel. On the other hand, the theoretical contributions are very limited and the empirical evaluation is not strong enough for acceptance. Given that all weak points mentioned above can be addressed, I recommend rejection and I sincerely hope that authors would strengthen their paper by addressing them before resubmitting their work.",ICLR2022, +pxupoAD8W,1576800000000.0,1576800000000.0,1,HJg3HyStwB,HJg3HyStwB,Paper Decision,Reject,"The method proposed and explored here is to introduce small spatial distortions, with the goal of making them undetectable by humans but affecting the classification of the images. As reviewers point out, very similar methods have been tested before. The methods are also only tested on a few low-resolution datasets. + +The reviewers are unanimous in their judgement that the method is not novel enough, and the authors' rebuttals have not convinced the reviewers or me about the opposite.",ICLR2020, +-55OfVU8qF,1576800000000.0,1576800000000.0,1,B1xewR4KvH,B1xewR4KvH,Paper Decision,Reject,"This work explores how to leverage structure of this input in decision trees, the way this is done for example in convolutional networks. +All reviewers agree that the experimental validation of the method as presented is extremely weak. Authors have not provided a response to answer the many concerns raised by reviewers. +Therefore, we recommend rejection.",ICLR2020, +arrQRX9dsx,1576800000000.0,1576800000000.0,1,BygacxrFwS,BygacxrFwS,Paper Decision,Reject,"This paper proposes a fractional graph convolutional networks for semi-supervised learning, using a classification function repurposed from previous work, as well as parallelization and weighted combinations of pooling function. This leads to good results on several tasks. +Reviewers had concerns about the part played by each piece, the lack of comparison to recent related work, and asked for better explanation of the rationale of the method and more experimental details. Authors provided explanations and details, and a more thorough set of comparison to other work, showing better performance in some but not all cases. +However, concerns that the proposed innovations are too incremental remain. +Therefore, we cannot recommend acceptance.",ICLR2020, +5RY6HNBWF8,1610040000000.0,1610470000000.0,1,SZ3wtsXfzQR,SZ3wtsXfzQR,Final Decision,Accept (Poster),"The authors study the theoretical performance of a meta-learning in two settings. In the first one the overall number of possible tasks is limited and tasks are close in KL-divergence. The second setting is MAP estimation (in a hierarchical Bayesian framework) for a family of linear regression tasks. Lower and upper bounds are provided on minimax parameter estimation error. +This paper has spurred a lot of discussion among reviewers and (competent) external commentators. Most of these criticisms were right on target, but the authors managed to convince the reviewers and myself that there was simply an issue of presentation of the main results. I suggest the authors to take into serious considerations all the aspects raised by the reviewers that has generated misinterpretations of the presented results.",ICLR2021, +dIBw3_LrAU,1576800000000.0,1576800000000.0,1,rJeGJaEtPH,rJeGJaEtPH,Paper Decision,Reject,"Two reviewers are negative on this paper while the other one is slightly positive. Overall, this paper does not make the bar of ICLR. A reject is recommended.",ICLR2020, +cFLZ-afIppn,1610040000000.0,1610470000000.0,1,hSjxQ3B7GWq,hSjxQ3B7GWq,Final Decision,Accept (Poster),"This paper tackles a very important topic in deep RL, namely automatic (non-differentiable) hyper-parameter tuning. It does so by combining ideas from genetic algorithms and neural architecture search with shared experience replay in order to obtain the key property of sample efficiency. The proposed solution is communicated clearly, and the results are compelling (often 10x improvements), as well as qualitatively interesting. + +Unfortunately for the authors, their original submission contained only part of the intended results, hence the borderline scores by some reviewers. In the meanwhile, a second suite of experiments have been added, which I think are compelling enough evidence to validate the paper's approach.",ICLR2021, +3WTmOgLmLEB,1642700000000.0,1642700000000.0,1,0RDcd5Axok,0RDcd5Axok,Paper Decision,Accept (Spotlight),"The paper reviews and draws connections between several parameter-efficient fine-tuning methods. + +All reviewers found the paper addresses an important research problem, and the theoretical justification and empirical analyses are convincing.",ICLR2022, +6w-SYMGk8h,1642700000000.0,1642700000000.0,1,dgxFTxuJ50e,dgxFTxuJ50e,Paper Decision,Accept (Spotlight),"This work studies the approximation and estimation errors of using neural networks (NNs) to fit functions on infinite-dimensional inputs that admit smoothness constraints. By considering a certain notion of anisotropic smoothness, the authors show that convolutional neural networks avoid the curse of dimensionality. + +Reviewers all agreed that this is a strong submission, tackling a core question in the mathematics of DL, namely developing functional spaces that are compatible with efficient learning in high-dimensional structured data. The AC thus recommends acceptance.",ICLR2022, +DyhjYJiQLX,1642700000000.0,1642700000000.0,1,2PSrjVtj6gU,2PSrjVtj6gU,Paper Decision,Reject,"The paper proposes an improvement to graph based neural network, by improving their attention mechanism (introducing recursive attention and jumping knowledge attention) to flexibly attend to its neighborhood. The paper shows solid experimental results over competitive baselines, as acknowledged by reviewers. The reviewers agree that the paper is clearly written, but overall have issues with the novelty of the approach. The paper combines multiple components (last residual connection module, improved attention mechanism) to show gains, but none of the pieces are very new.",ICLR2022, +tvrkNB35gSd,1642700000000.0,1642700000000.0,1,XIZaWGCPl0b,XIZaWGCPl0b,Paper Decision,Reject,"The paper presents a defense against the gradient sign flip attacks on federated learning. The proposed method is novel, technically sound and well evaluated. The crucial issue of the paper is, however, that this defense is specific to gradient-flip attacks. The authors show the robustness of their method against white-box attacks adhering to this threat model and claim that ""an adaptive white-box attacker with access to all internals of TESSERACT, including dynamically determined threshold parameters, cannot bypass its defense"". The latter statement does not seem to be well justified, and following the extensive discussion of the paper, the reviewers were still not convinced that the proposed method is secure by its design. The AC therefore feel that the specific arguments of the paper should be revised - or the claim of robustness further substantiated - in order for the paper to be accepted. + +Furthermore, as a comment related to ethical consideration, the AC remarks that the paper's acronym, Tesseract, is used by an open source OCR software (https://tesseract-ocr.github.io/) as well as in a recent paper: Pendlebury et al., TESSERACT: Eliminating Experimental Bias in Malware Classification across Space and Time, USENIX Security 2019. + +All of the above mentioned reservations essentially add up to a ""major revision"" recommendation which, given the decision logic of ACLR, translates into the rejection option.",ICLR2022, +rJe-iA8Vl4,1545000000000.0,1545350000000.0,1,B1GAUs0cKQ,B1GAUs0cKQ,Interesting and counter-intuitive result,Accept (Poster),The authors describe a very counterintuitive type of layer: one with mean zero Gaussian weights. They show that various Bayesian deep learning algorithms tend to converge to layers of this variety. This work represents a step forward in our understanding of bayesian deep learning methods and potentially may shine light on how to improve those methods. ,ICLR2019,5: The area chair is absolutely certain +N_ce7c942A-,1610040000000.0,1610470000000.0,1,PS3IMnScugk,PS3IMnScugk,Final Decision,Accept (Poster),"The paper addresses generalization to compositions of rare and unseen sequences. It proposes an unstructured data augmentation, that achieves comparable generalization to structured approaches (e.g. using grammars). The idea is based on recombining prototypes and oversampling in the tail. + +The paper provides a novel approach to an important problem. All four reviewers recommended accept. + +",ICLR2021, +gv7i6VHCZX,1642700000000.0,1642700000000.0,1,RriDjddCLN,RriDjddCLN,Paper Decision,Accept (Poster),"The paper presents an approach to semantic segmentation based on text embedding of class labels. This enables zero-shot semantic segmentation with class labels that were not seen during training. I appreciate the new ablation against a ResNet-101 backbone. I don't find the similarity with CLIP substantial, and I recommend that the paper is accepted.",ICLR2022, +D96GsTE09g,1642700000000.0,1642700000000.0,1,kroqZZb-6s,kroqZZb-6s,Paper Decision,Reject,"This paper has been independently assessed by four expert reviewers. Two of them recommend acceptance (one straight, one marginal), and two rejection (both marginal). Among the main limitations of the presented work, found by the reviewers, was the limited reproducibility of the results due to the use of private data. The authors attempted to defend their experimental design choice to use only one private dataset by stating difficulty in obtaining public data that would be suitable for their method. I am personally in a strong disagreement with that statement, and with the sub-statement of the authors that some publicly available ICU data may not be suitable. That either signals limited practical utility of the presented approach to non-ICU settings only, or is simply incorrect. The presented approach nonetheless has an intriguing potential and I would be inclined to recommends its acceptance. Alas, I find the lack of reproducibility to be a significant drawback of the way this work is currently presented and this limitation could not be easily resolved. Therefore I am leaning towards recommending a rejection.",ICLR2022, +Pdmd8ohicsf,1610040000000.0,1610470000000.0,1,jznizqvr15J,jznizqvr15J,Final Decision,Accept (Poster),"The paper addresses the problem of improving generalization when few annotated data is available by leveraging available auxiliary information. The authors consider the respective merits of two alternatives: using auxiliary information as complementary inputs or as additional outputs in a multi-task or transfer setting. For linear regression, they show theoretically that the former can help improve in distribution error but may hurt OOD error, while the latter may help improve OOD error. They propose a framework for combining the two alternatives and show empirically that it does so on three different datasets. + + +All the reviewers agree on the novelty, interest and impact of the method. The rebuttal clarified the reviewers’ questions. I propose an accept. +",ICLR2021, +534w7qBV3A,1576800000000.0,1576800000000.0,1,HJxiMAVtPH,HJxiMAVtPH,Paper Decision,Reject,"This paper constitutes interesting progress on an important topic; the reviewers identify certain improvements and directions for future work, and I urge the authors to continue to develop refinements and extensions.",ICLR2020, +SkuiiGIOg,1486400000000.0,1486400000000.0,1,Bk8N0RLxx,Bk8N0RLxx,ICLR committee final decision,Reject,"The reviewers agree that the method is exciting as practical contributions go, but the case for originality is not strong enough.",ICLR2017, +w1Ppku-klHo,1642700000000.0,1642700000000.0,1,L3_SsSNMmy,L3_SsSNMmy,Paper Decision,Accept (Spotlight),"All three reviewers recommend acceptance. The paper introduces an interesting study and insights on the connection between local attention and dynamic depth-wise convolution, in terms of sparse connectivity, weight sharing, and dynamic weight. The reviews included questions such as the novelty over [Cordonnier et al 2020] and the connection to Multi-scale vision longformer, which were adequately addressed by the authors. The findings in this paper should be interesting to the ICLR community.",ICLR2022, +TQXcZiBbcND,1642700000000.0,1642700000000.0,1,djwnKXz1B2,djwnKXz1B2,Paper Decision,Reject,"This paper presents a Bayesian GAN approach designed for a federated learning setting. In contrast to recent Bayesian GAN approaches that use Gaussian priors or iteratively-updated priors on GAN parameters, this paper proposes a more complex prior motivated by expectation propagation, dubbed as EP-GAN, and uses this formulation to construct a federated GAN. The paper claims that this prior better captures the multimodal distribution structure of the non-iid heterogeneous data across the different clients. + +The paper looks at an interesting problem, i.e., federated training of GANs, which is indeed a problem that has received a lot of interest lately. The paper received mixed reviews. The reviewers raised several concerns, some of which included (1) weak baselines, (2) not considering what happens when we switch to more advanced GAN models, (3) performance of the approach when the number of clients is large, and (4) lack of clarity in the presentation. + +The authors responded to some of these concerns and it is commendable that they reported some additional results during the discussion phase. However, after an extensive discussion among the reviewers and between reviewers and authors, and after my own reading of the manuscript, concerns still lingers over many of the above-mentioned points. Another concern is the overly complex nature of the approach as compared to other recent federated GAN approaches which raises the question as to whether the actual improvements warrant the complexity of the proposed approach. From the report experiments, the improvements appear to be rather slim. + +Considering these aspects, unfortunately, the paper in its current shape does not seem ready for acceptance. The authors are advised to consider the feedback from the reviewers which will strengthen the submission for a future submission.",ICLR2022, +uWv0ZY8eRX4,1642700000000.0,1642700000000.0,1,CyKQiiCPBEv,CyKQiiCPBEv,Paper Decision,Reject,"The paper builds fast and high-quality SMILES-based molecular embeddings by distilling state-of-the-art graph-based models teachers. +This has the advantage of speeding inference time w.rt to graph based methods. + +The reviews were split regarding the motivation of the work, in the sense of why not train directly on SMILES instead of distilling graph based methods that are in some tasks behind SMILES transformer. Authors provided clarifications in the rebuttal showing that on Knowledge distillation of graph models surpasses SMILES only model training. + +I think given the experimental nature of the paper the main motivation of the paper should be better clarified and supported with more experimentation and downstream tasks.",ICLR2022, +CQaoL3CA_j,1576800000000.0,1576800000000.0,1,BylsKkHYvH,BylsKkHYvH,Paper Decision,Accept (Poster),"This paper investigates the problem of using zero imputation when input features are missing. The authors study this problem, propose a solution, and evaluate on several benchmark datasets. The reviewers were generally positive about the paper, but had some questions and concerns about the experimental results. The authors addressed these concerns in the rebuttal. The reviewers are generally satisfied and believe that the paper should be accepted.",ICLR2020, +O4Jiy6aCIMD,1610040000000.0,1610470000000.0,1,4zr9e5xwZ9Y,4zr9e5xwZ9Y,Final Decision,Reject,"The paper proposes a new distributed training method for graph convolutional networks, using subgraph approximation. The reviewers raised multiple challenges, such as novelty, validity of experiments, and some technical issues. The authors did not respond to the reviewers' comments. The AC agreed with the reviewers that the paper, in the current form, is not ready for publication.",ICLR2021, +HJeISun-xN,1544830000000.0,1545350000000.0,1,BkeDEoCctQ,BkeDEoCctQ,Meta-review,Reject,"Pros: +- novel idea of intra-life curiosity that encourages diverse behavior within each episode rather than across episodes. + +Cons: +- privileged/ad-hoc information (RAM state, distinguishing rooms) +- lack of sufficient ablations/analysis +- insufficient revision/rebuttal + +The reviewers reached consensus that the paper should be rejected in its current form.",ICLR2019,4: The area chair is confident but not absolutely certain +e4JgcBQO3lt,1610040000000.0,1610470000000.0,1,XQQA6-So14,XQQA6-So14,Final Decision,Accept (Poster),"This paper presents a model for spatiotemporal point processes using neural ODEs. Some technical innovations are introduced to allow the conditional intensity to change discontinuously in response to new events. Likewise, the spatial intensity is expanded upon that proposed in prior work on neural SDEs. Reviewers were generally positive about the contributions and the empirical assessments, and the authors made substantial improvements during the discussion phase.",ICLR2021, +eF_zBk5y90g,1642700000000.0,1642700000000.0,1,97r5Y5DrJTo,97r5Y5DrJTo,Paper Decision,Reject,"This paper set out to show that increasing task diversity during meta-training process does not boost performance. The reviewers mostly agreed (only reviewer wVFn dissented) that the empirical set up of the paper was convincing, but they also felt it over-emphasized empirics over a deeper understanding of the phenomena observed. In turn, this resulted in discussions around how the experiments and the explanations didn't fully prove that increasing task diversity does not help. Overall, the discussion and the additional analysis tools provided by the authors (such as the diversity metric) will greatly improve the paper.",ICLR2022, +V6x1Xb529f,1576800000000.0,1576800000000.0,1,rJggX0EKwS,rJggX0EKwS,Paper Decision,Reject,"The article studies benefits of over-parametrization and theoretical properties at initialization in ReLU networks. The reviewers raised concerns about the work being very close to previous works and also about the validity of some assumptions and derivations. Nonetheless, some reviewers mentioned that the analysis might be a starting point in understanding other phenomena and made some suggestions. However, the authors did not provide a rebuttal nor a revision. ",ICLR2020, +scY8IKeJyh,1576800000000.0,1576800000000.0,1,Hye5TaVtDH,Hye5TaVtDH,Paper Decision,Reject,"This paper introduces a novel architecture and loss for estimating PSD matrices using neural networks. There is some theoretical justification for the architecture, and a small-scale but encouraging experiment. + +Overall, I think there is a sensible contribution here, but there are so many architectural and computational choices presented together at once that it's hard to tell what the important parts are. + +The main problems with this paper are: +1) The scalability of the approach O(N^3) +2) The derivation of the architecture and gradient computations wasn't clear about what choices were available and why. Several alternative choices were mentioned but not evaluated. I think the authors also need to improve their understanding of automatic differentiation. Backprop through eigendecomposition is already available in most autodiff packages. It was claimed that a certain kind of matrix derivative provided better generalization, which seems like a strong claim to make in general. +3) The experimental setup seemed contrived, except for the heteroskedastic regression experiments, which lacked competitive baselines. Why were the GP and MLPs homoskedastic? + +As a matter of personal preference, I found that having 4 different ""H""s differing only in font and capitalization for the network architecture was hard to keep track of. + +I agree that R1 had some unjustified comments and R2's review was contentless. I apologize for these inadequate reviews. ",ICLR2020, +HyCcrJTHf,1517250000000.0,1517260000000.0,633,By0ANxbRW,By0ANxbRW,ICLR 2018 Conference Acceptance Decision,Reject,"Proposed network compression method offers limited technical novelty over existing approaches, and empirical evaluations do not clearly demonstrate an advantage over current state-of-the-art. +Paper presentation quality also needs to be improved. ",ICLR2018, +1mjEPEjums,1576800000000.0,1576800000000.0,1,SkxOhANKDr,SkxOhANKDr,Paper Decision,Reject,"This paper presents a method to defend neural networks from adversarial attack. The proposed generative cleaning network has a trainable quantization module which is claimed to be able to eliminate adversarial noise and recover the original image. +After the intensive interaction with authors and discussion, one expert reviewer (R3) admitted that the experimental procedure basically makes sense and increased the score to Weak Reject. Yet, R3 is still not satisfied with some details such as the number of BPDA iterations, and more importantly, concludes that the meaningful numbers reported in the paper show only small gains, making the claim of the paper less convincing. As authors seem to have less interest in providing theoretical analysis and support, this issue is critical for decision, and there was no objection from other reviewers. After carefully reading the paper myself, I decided to support the opinion and therefore would like to recommend rejection. +",ICLR2020, +BKpeC-Jy3OG,1642700000000.0,1642700000000.0,1,c7S4WIlmu5,c7S4WIlmu5,Paper Decision,Reject,"Good premise: What unsupervised training supports IR? This is a key question for IR and is a focus for papers in TREC 2019 Deep Learning Track, for instance. Also, historically, empirical work in the IR community is a very high standard. + +One reviewer says the contrastive loss for learning Siamese Transformers is not new, and prior experimental work was listed. Several reviewers suggested extensions to the empirical work, some of which was subsequently done. Results are ""promising"" according to one reviewer, but not strong enough. Another reviewer says a different use context is needed since its hard to compete with efficient BM2t in its own terms. The authors made some good changes to their paper: updated intro and related work, extended results and discussion, but the 4 reviewers remained in agreement, a reject. However, some reviewers felt this was a good contribution, so with further empirical work and polish it should be good.",ICLR2022, +LWP2v64SVX,1576800000000.0,1576800000000.0,1,ByexElSYDr,ByexElSYDr,Paper Decision,Accept (Poster),"This manuscript proposes and analyzes a federated learning procedure with more uniform performance across devices, motivated as resulting in a fairer performance distribution. The resulting algorithm is tunable in terms of the fairness-performance tradeoff and is evaluated on a variety of datasets. + +The reviewers and AC agree that the problem studied is timely and interesting, as there is limited work on fairness in federated learning. However, this manuscript also received quite divergent reviews, resulting from differences in opinion about the novelty and clarity of the conceptual and empirical results. In reviews and discussion, the reviewers noted insufficient justification of the approach and results, particularly in terms of broad empirical evaluation, and sensitivity of the results to misestimation of various constants. In the opinion of the AC, while the paper can be much improved, it seems to be technically correct, and the results are of sufficiently broad interest to consider publication.",ICLR2020, +8jg-DUvJdCs,1642700000000.0,1642700000000.0,1,rJvY_5OzoI,rJvY_5OzoI,Paper Decision,Accept (Poster),"In this paper, the authors investigate a multi-task RL actor-critic technique, where a single actor is used while multiple critics are trained (one per task, where each task corresponds to a different reward function). Experiments on several environments demonstrate that this method works quite well in practice. + +All reviewers found the proposed approach sensible and effective, in spite of its simplicity. The main concerns were: +- Lack of novelty: although this is indeed not a particularly original idea, the specific instantiation in the actor-critic setup is novel and well motivated +- Some confusing / unconvincing experimental results: after receiving this feedback, the authors were able to upload a new revision that addressed the main concerns +- Focusing on the ""multi-style"" aspect when this is essentially a multi-task algorithm: although I agree that framing it as a specific case of multi-task learning would make sense and would probably make more appealing to multi-task RL researchers, I do not consider this to be a major issue + +In spite of being a relatively straightforward paper, I believe it is good to have strong empirical evaluation of such basic techniques disseminated to the research community, and I thus recommend acceptance, in accordance with reviewers' recommendations after the discussion period.",ICLR2022, +#NAME?,1610040000000.0,1610470000000.0,1,trYkgJMOXhy,trYkgJMOXhy,Final Decision,Reject,"The paper addresses counterfactual fairness learning using generative approach. While acknowledging the importance and potential usefulness of generative approach, the reviewers and AC raised several important concerns that place this paper below the acceptance bar: + +(1) low degree of novelty – see multiple concerns and suggestions by R2, R3, R4; + +(2) the model is not justified by a causal mechanism (R4), and it remains unclear under which condition the proposed GAN approach is ensured to obtain unbiased counterfactual samples (R2); + + (3) lack of technical rigor when presenting the model – see R4’s request to relate to the DAG models, see R1 multiple questions regarding the reinforced data sampler; + +(4) lack of empirical evidence (R3) and evaluation details, e.g. on cross validation and more recent methods (see R4’s recommendations); + + (5) related work is not discussed in sufficient details – see R4’s elaborate comment. + +Among these, (4,5) did not have a substantial impact on the decision but would be helpful to address in a subsequent revision. However, (1), (2) and (3) make it very difficult to assess the benefits of the proposed approach and were viewed by AC as critical issues. +In the rebuttal it is stated that ‘Our counterfactual examples are generated using a powerful generator rather than a fixed synthesizer in Kusner et al. (2017)’ – more rigorous comparison has to be provided to support such statement. AC would urge the authors to contrast and compare their synthetic counterfactual examples with Kusner et al on the datasets where causal graph has been built. [Razieh Nabi and Ilya Shpitser. Fair inference on outcomes., AAAI2018], Figure 2 postulates causal graphs for the Compas and Adult Income datasets evaluated in this paper. + +A general consensus among reviewers and AC suggests, in its current state the manuscript is not ready for a publication. We hope the detailed reviews and encouragements are useful for revising the paper. +",ICLR2021, +seWRbvb0vy,1576800000000.0,1576800000000.0,1,r1lnigSFDr,r1lnigSFDr,Paper Decision,Reject,"This submission proposes a new gating mechanism to improve gradient information propagation during back-propagation when training recurrent neural networks. + +Strengths: +-The problem is interesting and important. +-The proposed method is novel. + +Weaknesses: +-The justification and motivation of the UGI mechanism was not clear and/or convincing. +-The experimental validation is sometimes hard to interpret and the proposed improvements of the gating mechanism are not well-reflected in the quantitative results. +-The submission was hard to read and some images were initially illegible. + +The authors improved several of the weaknesses but not to the desired level. + +AC agrees with the majority recommendation to reject.",ICLR2020, +fzlCU8UO_Je,1642700000000.0,1642700000000.0,1,PGGjnBiQ84G,PGGjnBiQ84G,Paper Decision,Reject,"This paper proposes an architecture for learned surface parameterization, with application to image unwarping, which can be coupled with differentiable rendering, multi-view data, and other modern objective terms. The shape of the document is parameterized using an SDF technique, coupled with neural rendering and objective terms inspired by classical geometry processing. This machinery is quite ""heavy,"" leading to slow training times. + +As pointed out by reviewer QH85, there were some experimental discrepancies---rightfully acknowledged by the authors---which make comparisons to DewarpNet less favorable for the new method, at least from a quantitative perspective. Visual inspection makes the comparison more favorable, although it would be preferable for the quantitative quality metrics and qualitative examples to align. + +Runtime measurements here are also not favorable and severely limit applicability of this technique in real-world scenarios, as pointed out by reviewers hfPz and QH85. + +While the mistaken quantitative results are forgivable, the AC agrees that the scope of this work is quite narrow; it is not clear where this architecture would be applied relative to the motivating application.",ICLR2022, +c9TwXMtqy4,1576800000000.0,1576800000000.0,1,Byl1W1rtvH,Byl1W1rtvH,Paper Decision,Reject,"This paper was a very difficult case. All three original reviewers of the paper had never published in the area, and all of them advocated for acceptance of the paper. I, on the other hand, am an expert in the area who has published many papers, and I thought that while the paper is well-written and experimental evaluation is not incorrect, the method was perhaps less relevant given current state-of-the-art models. In addition, the somewhat non-standard evaluation was perhaps causing this fact to be masked. I asked the original reviewers to consider my comments multiple times both during the rebuttal period and after, and unfortunately none of them replied. + +Because of this, I elicited two additional reviews from people I knew were experts in the field. The reviews are below. I sent the PDF to the reviewers directly, and asked them to not look at the existing reviews (or my comments) when doing their review in order to make sure that they were making a fair assessment. + +Long story short, Reviewer 4 essentially agreed with my concerns and pointed out a few additional clarity issues. Reviewer 5 pointed out a number of clarity issues and was also concerned with the fact that d_j has access to all other sentences (including those following the current sentence). I know that at the end of Section 2 it is noted that at test time d_j only refers to previous sentences, but if so there is also a training-testing disconnect in model training, and it seems that this would hurt the model results. + +Based on this, I have decided to favor the opinions of three experts (me and the two additional reviewers) over the opinions of the original three reviewers, and not recommend the paper for acceptance at this time. In order to improve the paper I would suggest the following (1) an acknowledgement of standard methods to incorporate context by processing sequences consisting of multiple sentences simultaneously, (2) a more thorough comparison with state-of-the-art models that consider cross-sentential context on standard datasets such as WikiText or PTB. I would encourage the authors to consider this as they revise their paper. + +Finally, I would like to apologize to the authors that they did not get a chance to reply to the second set of reviews. As I noted above, I did try to make my best effort to encourage discussion during the rebuttal period.",ICLR2020, +wnRnOAhEP9A,1610040000000.0,1610470000000.0,1,gHsr-v8Tz6l,gHsr-v8Tz6l,Final Decision,Reject,"Although the proposed method shows sota results, it is a simple combination of two existing methods, a bit of Bayesian + domain generalization. It seems that the total improvement by the proposed method is just the sum of improvements by Bayesian and by domain generalization. No synergy between Bayesian and domain generalization is observed. + +I personally doubt that the Bayesian treatment of the domain generalization loss is not essential. +The derivation in Sec. 2 is unnecessarily complicated. In derivation from eq(3) to eq(5), the authors first ""extend"" (3) to (4), which is not appropriate ((4) can hold even if $p(y_{\zeta}|x_{\zeta})$ is highly diverse. ). After that the authors apply Jensen to come back to an appropriate form (5), which is a weighted sum of distances (which is an appropriate criterion). +If they start from Eq.(5), the proposed objective is simply the sum of the standard ELBO (2) and a natural domain invariance loss (5). For non-Bayesian treatment of the domain invariant loss, you could simply replace the KL by Lp norm between $y_s$ and $y_\zeta$. I expected that by answering to the question by Reviewer 1 the authors would prove synergy between Bayesian and domain generalization. But the authors just excused that +'' The feature distributions are unknown without Bayesian formalism, leading to an intractable $L_I$. Therefore, we do not conduct the experiment with only the domain-invariant loss on both the classifier and the feature extractor.'' +I don't really understand what the authors mean, but the authors should have explained why you cannot replace the KL with non-Bayesian Lp loss. + + +",ICLR2021, +fhaP5EbL15,1576800000000.0,1576800000000.0,1,SJx9ngStPH,SJx9ngStPH,Paper Decision,Accept (Poster),"The authors present a new benchmark for architecture search. Reviews were somewhat mixed, but also with mixed confidence scores. I recommend acceptance as poster - and encourage the authors to also cite https://openreview.net/forum?id=HJxyZkBKDr",ICLR2020, +8FzKflIt_vU,1610040000000.0,1610470000000.0,1,LIR3aVGIlln,LIR3aVGIlln,Final Decision,Reject,"The authors model point processes with equivariant normalizing flows. Reviewers agreed that the paper is well written and addresses a problem of interest to the ICLR community, some reviewers considered the contribution to be incremental. Perhaps the biggest contribution is a closed form expression for the trace that needs to be computed as part of the normalizing flow, which is valuable but not particularly emphasized. The authors combine this trace formulation with an equivariant normalizing flow to model the conditional density of point locations given cardinality. (As an aside, it was unclear to me if and how those conditional distributions share parameters; in some contexts, the conditional density could look very different depending on the number of points in the set.) Overall, the paper is interesting but needs a little more to lift it over the bar.",ICLR2021, +rkpaB1prz,1517250000000.0,1517260000000.0,674,ByzvHagA-,ByzvHagA-,ICLR 2018 Conference Acceptance Decision,Reject,"The novelty of the paper is limited and it lacks on comparisons with relevant baselines, as pointed out by the reviewers. ",ICLR2018, +BJ-EmJaBz,1517250000000.0,1517260000000.0,113,rJm7VfZA-,rJm7VfZA-,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The paper considers Markov potential games (MPGs), where the agents share some common resource. They consider MPGs with continuous state-action variables, coupled constraints and nonconvex rewards, which is novel. The reviews are all positive and point out the novel contributions in the paper",ICLR2018, +KsdDpO9MBR2,1642700000000.0,1642700000000.0,1,WKWAkkXGpWN,WKWAkkXGpWN,Paper Decision,Reject,"This paper has conflicting reviews with no strong advocate. One of the positive reviewers states the caveat that paper is ""very dense to read and needs to be improved"". Having looked at the paper myself I would agree with this criticism. One of the negative reviewers states that the paper gives ""an incremental variant of the NLM model"". I am less confident in this judgement. However, I find the density of the paper and the use of synthetic data to be significant drawbacks. With the lack of any real champions for the paper I do not see a path to acceptance.",ICLR2022, +N6YZNAvJX,1576800000000.0,1576800000000.0,1,BJgd81SYwr,BJgd81SYwr,Paper Decision,Accept (Poster),"This paper proposes a type of adaptive dropout to regularize gradient based meta-learning models. The reviewers found the idea interesting and it is supported by improvements on standard benchmarks. The authors addressed several concerns of the reviewers during the rebutal phase. In particular, revisions added results against other regularization mthods. We recommend that further attention is given to ablations, in particular the baseline proposed by Reviewer 1.",ICLR2020, +xiOkvVySEx,1576800000000.0,1576800000000.0,1,S1enmaVFvS,S1enmaVFvS,Paper Decision,Reject,"This paper presents an encoder-decoder based approach to construct a compressed latent space representation of each molecule. Then a second neural network segments the output and assigns an atomic number. Unlike previous works using 1D or 2D representations, the proposed method focuses on the 3D representations. + +The reviewers have several major concerns. Firstly, the novelty of the paper seems to be limited as the proposed method mainly use the existing techniques. Secondly, there is no clear baseline to compare with. Finally, there is no clear quantitative results to measure the proposed method. The rebuttal did not well address these problems. + +Overall, this paper did not meet the standard of ICLR and I choose to reject the paper. +",ICLR2020, +dN57YMIaym,1576800000000.0,1576800000000.0,1,rJeB22VFvS,rJeB22VFvS,Paper Decision,Reject,"This paper proposes two contributions to improve uncertainty in deep learning. The first is a Mahalanobis distance based statistical test and the second a model architecture. Unfortunately, the reviewers found the message of the paper somewhat confusing and particularly didn't understand the connection between these two contributions. A major question from the reviewers is why the proposed statistical test is better than using a proper scoring rule such as negative log likelihood. Some empirical justification of this should be presented.",ICLR2020, +8jlAURBDwx,1576800000000.0,1576800000000.0,1,H1lK5kBKvr,H1lK5kBKvr,Paper Decision,Reject,"This paper proposes a semi-supervised method for reconstructing 3D faces from images via a disentangled representation. The method builds on previous work by Tran et al (2018, 2019). While some results presented in the paper show that this method works well, all reviewers agree that the authors should have provided more experimental evidence to convincingly demonstrate the benefits of their method. The reviewers are also unconvinced by how computationally expensive this method is or by the contributions of the unlabelled data to the performance of the proposed model. Given that the authors did not address the reviewers’ concerns, and for the reasons stated above, I recommend rejecting this paper.",ICLR2020, +ryHwm1TSf,1517250000000.0,1517260000000.0,156,rJUYGxbCW,rJUYGxbCW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The paper studies the use of PixelCNN density models for the detection of adversarial images, which tend to lie in low-probability parts of image space. The work is novel, relevant to the ICLR community, and appears to be technically sound. + +A downside of the paper is its limited empirical evaluation: there evidence suggesting that defenses against adversarial examples that work well on MNIST/CIFAR do not necessarily transfer well to much higher-dimensional datasets, for instance, ImageNet. The paper could, therefore, would benefit from empirical evaluations of the defense on a dataset like ImageNet.",ICLR2018, +mFzwCBY3T24,1610040000000.0,1610470000000.0,1,ToWi1RjuEr8,ToWi1RjuEr8,Final Decision,Reject,"This paper aims to develop a simple yet efficient deep RL algorithm for off-policy RL. The proposed method uses advantages to as weight in regression, which is an extension of the known method of reward-weighted regression. The paper is in general nicely written, and it comes with a set of theoretical analyses and experiments. While all reviewers admit that the approach is interesting and the work makes an attempt to solve an important yet open problem, there are several aspects of the paper that make it not ready for publication in its current form: + +- Novelty: As pointed out by reviewers, the proposed method appears to be a minor modification of existing off-policy solvers. Although the use of advantages as weights makes intuitive sense, it is unclear why and how the new method significantly differs from and outperforms existing methods. Going forward, it would be helpful if the authors could present more convincing arguments/experiments to demonstrate the power of ARW, relative to similar existing methods. + +- Experiments provide some insights into the difference between several algorithms, but the results are not strong enough to support the claim of the paper. Please see reviewers' comments for more details. We strongly recommend the authors to take these comments into consideration and develop more rigorous experiments to demonstrate advantages of AWR. + +- Theoretical analysis is limited. As R#2, R#3 mentioned, the theory analysis in the paper seems to not match the algorithm, and there remain bugs, this it doesn't add to the paper. Although theory might not be the focus of the paper, if the authors decide into include theoretical analysis, the analysis would hopefully provide insights into why and by how much the approach is better. + +",ICLR2021, +qwODnKHUE5,1642700000000.0,1642700000000.0,1,NrB52z3eOTY,NrB52z3eOTY,Paper Decision,Reject,"This submission received 4 diverging ratings: 6, 5, 5, 3. On the positive side, reviewers appreciated the central idea and a quality manuscript. At the same time, they have raised important concerns around unfair comparisons with baselines, experiments not fully supporting the claims and lack of comparisons with some prior methods. After discussions with the authors most reviewers stayed with their original ratings. +The AC agrees that the weaknesses in this case outweigh the strengths. The final recommendation is to reject.",ICLR2022, +V1xucriK59l,1642700000000.0,1642700000000.0,1,#NAME?,#NAME?,Paper Decision,Reject,"The paper considers learning a fair classifier under distribution shift. The proposal involves an additional MMD penalty between the model curvatures on the data subgroups defined by the sensitive attribute. Reviewers generally found the problem setting to be well motivated, and the paper to have interesting ideas. Some concerns were raised in the initial set of reviews: + +(1) _Relation between local curvatures and fairness robustness_. The concern was that the paper does not make sufficiently clear how similarity of the distributions of local curvatures ensures fairness robustness, and that there is no explicit definition of fairness robust to distribution shift. + +(2) _Comparison to related work_. The concern was that works such as FARF as also considering the issue of distribution shift. + +(3) _Technical novelty_. The concern was that technical depth of the proposal may be limited, as it builds on existing ideas (e.g., adversarial learning, Hessian to measure curvature). + +(4) _Significance of results_. The concern was that the improvements of the proposed method are not significant, statistically and/or practically. + +For point (1), the response clarified that the proposal is to ensure that the local curvature (and hence robustness) across data subgroups is similar. The relevant reviewer was still unclear as to whether this ensures what one might intuitively consider ""robust fairness"". On my review of the paper, I do concur that from the Introduction, and the para preceding Eqn 4, it appears that one natural notion is + +$ \sup_{\mathbb{Q} \in \mathcal{U}( \mathbb{P} )} \Delta( \mathbb{Q}( \hat{Y}, Y \mid A = 0 ), \mathbb{Q}( \hat{Y}, Y \mid A = 1 ) ) $ + +where $\mathbb{P}$ is the observed data distribution, $\mathcal{U}$ is some uncertainty set, and $\Delta$ is some fairness measure (e.g., DP). Assuming this is indeed the ideal, it would be useful to mathematically contrast it to the proposal adopted in the present paper. The para preceding Eqn 4 correctly notes that the above notion would require specifying $\mathcal{U}$. This may be challenging, but an apparently reasonable strategy that follows the distributionally-robust optimization literature would be to use a specific ball around the training distribution (e.g., all distributions with bounded KL divergence against $\mathbb{P}$). Further, it is of interest to ask whether the proposed objective in any way approximate this one; put another way, is there any implicit assumption made as to which class of distributions one is likely to encounter? + +Further discussion would also be useful on the following alternative to the objective presented in the paper: rather than match the curvatures for the subgroups, simply minimise their unweighted average. This ought also to ensure robustness under the two different distributions; page 2 hints that this might not work owing to the different scales of these terms (i.e., the minority subgroup being much less robust), but the point does not seem to be discussed very explicitly subsequently. + +For point (2), the response noted that FARF is designed for online learning, whereas the present paper involves a single, static training set drawn iid from a single distribution. In the present paper, the drift happens at test time, and the learner has no access to samples from this distribution. The authors argued that FARF can be applied as-is to this setting. From my reading of this and the FARF paper, I agree that while the latter should be cited, it is not clearly applicable to the present setting. + +This said, the present paper primarily focusses on the covariate shift setting, for which there have been some relevant recent works; see: + +Singh et al., ""Fairness Violations and Mitigation under Covariate Shift"", FAccT '21. + +Rezaei et al., ""Robust Fairness under Covariate Shift"", AAAI '21. + +The former uses tools from joint causal graphs, while the latter assumes access to an unlabelled sample for the target distribution. The present work is certainly different in technical details, but at a minimum it seems prudent to acknowledge that there are relevant works on ensuring fairness outside the observed training distribution, and thus tone down statements such as ""As a pioneer work..."". There also seems scope to compare against the latter, e.g., to see how valuable having a few samples from the target domain are. + +Another work relevant to the spirit of ensuring fairness beyond the observed data is + +Mandal et al., ""Ensuring Fairness Beyond the Training Data"", NeurIPS 2020. + +This is in line with the distributionally-robust objective suggested in point (1), where one considers test distributions that can be arbitrary re-weightings of the training distribution. + +For point (3), from my reading, the technical content is reasonable. I would however have liked more mathematical discussions on point (1) above, which is important as it is the foundation of the strategy followed. + +For point (4), the response asserts their improvements are significant practically and statistically. From my reading, I am inclined to agree with this claim. I would however note that another reviewer raised the question of whether Gaussian and uniform noise are reflective of real-world distribution shifts. I concur with this concern; this part of the paper seems a little disappointing. The response mentioned results on a new setting with more realistic shift, which we suggest is incorporated into future versions of the paper. + +Overall, the paper has some interesting ideas for a topical and important problem. At the same time, there is scope for tightening the work per the comments above, particularly on points (1) and (2), and to some extent (4). We believe that addressing these would help properly situate the work, and thus increase its clarity and potential impact. We thus encourage the authors to consider incorporating these for a future submission.",ICLR2022, +dWpJWfFqeyl,1642700000000.0,1642700000000.0,1,QjOQkpzKbNk,QjOQkpzKbNk,Paper Decision,Accept (Poster),"This paper studies the problem of distilling the knowledge present in different GAN-based image generation tasks. The paper received mixed reviews. The reviewers had difficulty understanding some details regarding the approach, and requests for ablations and clarifications on existing empirical evaluation. The authors provided a strong thoughtful rebuttal that addressed many of those concerns. The paper was discussed and two reviewers updated their reviews in the post-rebuttal phase. Reviewers generally agree that the paper should be accepted but still have concerns regarding contribution and writing. AC agrees with the reviewers and suggests acceptance. However, the authors are urged to look at reviewers' feedback and incorporate their comments in the camera-ready.",ICLR2022, +2QWpDiQ0b,1576800000000.0,1576800000000.0,1,HygaikBKvS,HygaikBKvS,Paper Decision,Reject,"The paper presents an off-policy actor-critic scheme where i) a buffer storing the trajectories from several agents is used (off-policy replay) and mixed with the on-line data from the current agent; ii) a trust-region estimator is used to select trajectories that are sufficiently close to the current policy (e.g. in the sense of a KL divergence). + +As noted by the reviews, the results are impressive. + +Quite a few concerns still remain: +* After Fig. 1 (revised version), what matters is the shared replay, where the agent actually benefits from the experience of 9 other different agents; this implies that the population based training observes 9x more frames than the no-shared version, and the question whether the comparison is fair is raised; +* the trust-region estimator might reduce the data seen by the agent, leading it to overfit the past (Fig. 3, left); +* the influence of the $b$ hyper-parameter (the trust threshold) is not discussed. In standard trust region-based optimization methods, the trust region is gradually narrowed, suggesting that parameter $b$ here should evolve along time. + +",ICLR2020, +HxpxcXdVsU,1642700000000.0,1642700000000.0,1,ErsRrojuPzw,ErsRrojuPzw,Paper Decision,Reject,"This paper exposes a method to reduce the training cost of once-for-all networks. +Overall this paper is well written and easy to follow, and the experimental section shows a clear reduction of training time on the examples used. +However, the reviewers point out that the experimental section could benefit from adding more design spaces, and have a better explanation of the results. More importantly, three out of four reviewers agree that the novelty of this work is too low for the submission to be accepted, with the fourth one only giving a score of 6 (and also noting the lack of novelty). I therefore recommend reject for this paper.",ICLR2022, +HklkLqX0h7,1541450000000.0,1545350000000.0,1,BJxPk2A9Km,BJxPk2A9Km,Important problem but the evalution is not good enough,Reject," +Pros: +- This is an interesting and relevant topic +- It is well motivated and mostly clear + +Cons: +- The motivation, large amounts of data such as occur in lifelong learning, is not well examined in the evaluation which focuses on quite small problems. For an example of work which addresses the lifelong memory management issue (though does not learn a memory management policy) see [1]. +- In general the evaluation is not adequate to the claims. +- Reviewer 2 is concerned with the use of a bi-directional RNN for the comparison of memory entries since it may overfit to order. +- Reviewer 1 is somewhat concerned with novelty over other memory management schemes. + +[1] Scalable Recollections for Continual Lifelong Learning. https://arxiv.org/pdf/1711.06761.pdf",ICLR2019,4: The area chair is confident but not absolutely certain +1I-KxSS9ds,1610040000000.0,1610470000000.0,1,MDsQkFP1Aw,MDsQkFP1Aw,Final Decision,Accept (Poster),"This paper presents a new, large-scale, open-domain dataset for on-screen audio-visual separation, and provides an initial solution to this task. As the setting is quite specialized, the authors proposed a neural architecture based on spatial-temporal attentions (while using existing learning objective for audio separation). The reviewers were initially concerned that, while reasonably motivated, the architecture seemed some arbitrary. The authors then provided extensive ablation studies to evaluate the significance of each component with existing datasets, and these efforts are appreciated by reviewers. The authors may consider re-organizing the paper and moving some ablation studies to the main text. On the other hand, the reviewers believe that the dataset will be very useful for the community due to its diversity in content and label quality.",ICLR2021, +w6qeEwpYFM,1576800000000.0,1576800000000.0,1,rygw7aNYDS,rygw7aNYDS,Paper Decision,Reject,"This paper examined a pure exploration method for efficient action value estimates in tabular reinforcement learning. The paper is on the theoretical properties of value estimates in the large sample regime. The method is shown to outperform baseline algorithms for this task in tabular reinforcement learning. + +The reviewers were divided on the merits of this work. The use of the central limit theorem was viewed as elegant, and the results were thought to be potentially useful. However, the reviewers several limitations. They found the assumption of a communicating MDP to be overly restrictive (reviewer 1). The algorithm may be computationally inefficient (reviewer 2). The nature of ""exploration"" in this work is not the conventional meaning in reinforcement learning (reviewer 3). + +The paper is not yet ready for publication at ICLR. The theoretical results do not clearly convey insights for reinforcement learning with function approximation, and the reviewers are also not in agreement that the current results are applicable to a general MDP setting.",ICLR2020, +omHTnnTmbvJ,1610040000000.0,1610470000000.0,1,lo7GKwmakFZ,lo7GKwmakFZ,Final Decision,Reject,"This paper proposes an extension of the monotonic policy improvement approach to the average reward case. +Although the reviewers acknowledge that this work has merits (well written, clearly organized, well-motivated, technically sound) the reviewers have raised several concerns, which have been only partially addressed by the authors' responses. In particular, Reviewer4 is still concerned about the discrepancy between the theorem and the implemented algorithm, and the proposed simplification used in the implementation boils down to an algorithm that is very similar to TRPO, thus making the contribution quite incremental as also stressed by Reviewer1. Furthermore, I share the concerns raised about the fairness of comparing algorithms that optimize different objective functions. +I suggest the authors take into serious consideration the suggestions provided by the reviewers in order to produce an improved version of their work. +The paper is borderline and I think that it needs another round of fresh reviews before being ready for publication. ",ICLR2021, +SJloa-SHe4,1545060000000.0,1545350000000.0,1,HJNJws0cF7,HJNJws0cF7,"Nice approach, a bit more work is required",Reject,"The paper proposes a novel approach to neural net construction using dynamical systems approach, such as higher order Runge-Kutta method; this approach also allows a dynamical systems interpretation of DenseNets and CliqueNets. While all reviewers agree that this is an intersting a novel approach, along the lines of recent developments in the field on dynamical systems approaches to deep nets, they also suggest to further improve the writing/clarity of the paper and also strengthen the empirical results (currently, the method only provided advantage on CIFAR-10, while being somewhat suboptimal on other datasets, and more evidence for empirical advantages of the proposed approach would be great). Overall, this is a very interesting and promising work, and with a few more empirical demonstrations of the method's superiority as well as more polished wiriting the paper would make a nice contribution to ML community.",ICLR2019,4: The area chair is confident but not absolutely certain +UeMYSBQaV1fY,1642700000000.0,1642700000000.0,1,V09OhBn8iR,V09OhBn8iR,Paper Decision,Reject,"The manuscript proposes a method to adjust a biased model without requiring explicit annotations of biases. The main hypothesis of the manuscript is that there are differences in the direction and magnitude of the loss gradients for underrepresented samples compared to majority patterns in the training data. Based on this hypothesis, the manuscript proposes a rejection sampling method that tries to balance samples in a minibatch. However, a sample with a noisy label can appear to be an underrepresented sample with a correct label which can affect the proposed method. To tackle this, the manuscript also proposes a denoising module that successfully eliminates the effects of noisy labels on the debiasing algorithm proposed. Experiments are performed on various synthetic and real-world biased sets. + +Positive aspects of the manuscript includes: +1. The results for varying levels of ""bias"" as well as the success of the proposed ""denoising"" setup is remarkable for the datasets tested; +2. An interesting hypothesis about the differences between gradient magnitude and direction (as measured by its proximity to an ""average"" gradient direction for all samples) look different for underrepresented samples as compared to ""regular"" data sample. + +There are also several major concerns, including: +1. Lack of motivation and analysis on the connections between per-sample gradients and the majority/minority splits in more complex datasets; +2. The key assumption which motivates the proposed method, namely that minority samples have different gradient distributions than majority ones, deserves a more rigorous validation; +3. The ""scalability"" of the proposed method. One common theme across these datasets is that they can be ""learned"" (at least the biased version) with a much smaller amount of data than is present in the training set. Hence, a rejection sampling-based method can work even when the minority set diminishes; +4. Assumption about known bias. The proposed method assumes knowledge about which of the factors were biased so that a suitable ""bias-only"" model can be trained by leveraging only the ""bias"". + +Post-rebuttal, reviewers stayed with borderline ratings, and they have suggested further improvements: more details about Biased MNIST numbers (to address concerns about known bias), and ablation studies on real datasets (e.g. compare to results without denoising, or denoised using FINE) to fully justify the practical importance of proposed denoising module.",ICLR2022, +3s4kQ3fChB,1576800000000.0,1576800000000.0,1,H1eUz1rKPr,H1eUz1rKPr,Paper Decision,Reject,"While the reviewers appreciated the problem to learn a multiset representation, two reviewers found the technical contribution to be minor, as well as limited experiments. The rebuttal and revision addressed concerns about the motivation of the approach, but the experimental issues remain. The paper would likely substantially improve with additional experiments.",ICLR2020, +ZR3jbPaDAF,1576800000000.0,1576800000000.0,1,Skg5r1BFvB,Skg5r1BFvB,Paper Decision,Reject,"This work considers the popular LQR objective but with [A,B] unknown and dynamically changing. At each time a context [C,D] is observed and it is assumed there exist a linear map Theta from [C,D] to [A,B]. The particular problem statement is novel, but is heavily influenced by other MDP settings and the also follows very closely to previous works. The algorithm seems computationally intractable (a problem shared by previous work this work builds on) and so in experiments a gross approximation is used. + +Reviewers found the work very stylized and did not adequately review related work. For example, little attention is paid to switching linear systems and the recent LQR advances are relegated to a list of references with no discussion. The reviewers also questioned how the theory relates to the traditional setting of LQR regret, say, if [C,D] were identity at all times so that Theta = [A,B]. + +This paper received 3 reviews (a third was added late to the process) and my own opinion influenced the decision. While the problem statement is interesting, the work fails to put the paper in context with the existing work, and there are some questions of algorithm methods. ",ICLR2020, +SyvUSk6BG,1517250000000.0,1517260000000.0,572,B1KFAGWAZ,B1KFAGWAZ,ICLR 2018 Conference Acceptance Decision,Reject,"The authors present a centralized neural controller for multi-agent reinforcement learning. The reviewers are are not convinced that there is sufficient novelty, considering the authors setup as essentially a special case of other recent works, with added adjustments to the neural-networks that are standard in the literature. + +I personally am more bullish about this paper than the reviewers, as I think engineering an architecture to perform well in interesting scenarios is worth reporting. However, the reviewers are mostly in agreement, and their reviews were neither sloppy nor factually incorrect. So I will recommend rejection, following their judgement. + +Nevertheless, I encourage the authors to continue strengthening the results and the presentation and resubmit. ",ICLR2018, +OEdZbkjnr6g,1642700000000.0,1642700000000.0,1,fUhxuop_Q1r,fUhxuop_Q1r,Paper Decision,Reject,"This work proposes to study the generalization capabilities of RL algorithms using contextual decision processes (CDPs). CDPs allows to study generalization similar to how we are used to studying generalization in supervised learning, and can separate the generalization capabilities of a learned agent wrt observation, state and action space. This proposed measure for generalization is used in an extensive study on grid world domains to evaluate existing algorithms that aim to improve generalization. + +**Strengths** +This manuscript is well written and the work is well motivated +A novel perspective and way of measuring generalization of learned agents +An empirical study that compares existing algorithms on how well they generalize in observation, state, action spaces + +**Weaknesses** +Some clarity issues existed (missing links to existing literature, experimental details) +empirical study is (out of necessity) limited to small scale grid worlds +no deeper analysis of the results, why do algorithms perform the way they do from this novel perspective of generalization, which makes it hard to understand how one could choose an algorithm for larger scale settings which don't allow for this type of analysis + +**Rebuttal** +The authors updated the paper to improve the parts that were unclear, and had an extensive discussion with reviewers on the intuition of the results and converging on take-aways. Unfortunately, this intuition and take-aways have not been added. + +**Summary** +While I understand the authors wish to not speculate on intuition, I agree with the reviewers that without (experimentally supported) take-aways the provided analysis is incomplete. Understanding why each algorithm achieves the performance they do wrt this novel way of measuring generalization is the only way the proposed method to measure generalization and the evaluation can be used to draw conclusions about more general problem settings. Thus, although this is a very promising direction on an important problem, the manuscript is not ready yet for publication.",ICLR2022, +a2_ECC1BO_o,1642700000000.0,1642700000000.0,1,Kef8cKdHWpP,Kef8cKdHWpP,Paper Decision,Accept (Poster),"Manipulating deformable objects is an up-and-coming topic in robotics and machine learning, and it creates many interesting scientific and real-world challenges. The paper looks into long horizon tasks of manipulation of deformable objects, using an interesting mix of more local trajectory optimization and differentiable physics. The reviewers agree on the interesting significance of the suggest work, all above acceptance threshold, but also a bit bimodal in terms of “just above” vs. “solidly good”. Thus, the paper appears an useful and discussion-provoking accept for ICLR.",ICLR2022, +r1Q98y6SG,1517250000000.0,1517260000000.0,840,r1DPFCyA-,r1DPFCyA-,ICLR 2018 Conference Acceptance Decision,Reject,"This submission presents intriguingly good results on k-shot learning and I agree with the authors that the results are better than the presented previous work, and that the method is simple, so I took a deeper look into the paper despite the overall negative reviews. However, I think in its current form, the paper is not suitable for publication: + +- The previous work, that the authors compare to, were not really using comparable architectures: in fact, likely much worse base models with fewer parameters etc. I think any future version of this work would need to control for architecture capacity, otherwise how can we be sure where the gains come from? To me, this is a major unknown in terms of the credit assignment for the great results. +- The authors should be comparing with MAML (and follow-up work) by Finn et al. (2017) +- I don't really understand why the authors claim to have no need for validation sets. That's a very strong claim: are ALL the hyper-parameters (model architectures etc) just chosen in another, principled way? This issue would definitely need to be addressed in a follow-up work.",ICLR2018, +BkizhGU_e,1486400000000.0,1486400000000.0,1,SJkXfE5xx,SJkXfE5xx,ICLR committee final decision,Accept (Poster),The reviewers agree that the paper is a valuable contribution to the literature.,ICLR2017, +yvnRFvxKkw,1576800000000.0,1576800000000.0,1,SJeQGJrKwH,SJeQGJrKwH,Paper Decision,Reject,"This work is interesting because it's aim is to push the work in intrinsic motivation towards crisp definitions, and thus reads like an algorithmic paper rather than yet another reward heuristic and system building paper. There is some nice theory here, integration with options, and clear connections to existing work. + +However, the paper is not ready for publication. There were were several issues that could not be resolved in the reviewers minds (even after the author response and extensive discussion). The primary issues were: (1) There was significant confusion around the beta sensitivity---figs 6,7,8 appear misleading or at least contradictory to the message of the paper. (2) The need for x,y env states. (3) The several reviewers found the decision states unintuitive and confused the quantitative analysis focus if they given the authors primary focus is transfer performance. (4) All reviewers found the experiments lacking. Overall, the results generally don't support the claims of the paper, and there are too many missing details and odd empirical choices. + +Again, there was extensive discussion because all agreed this is an interesting line of work. Taking the reviewers excellent suggestions on board will almost certainly result in an excellent paper. Keep going!",ICLR2020, +fVnqGoAxk,1576800000000.0,1576800000000.0,1,Syx9ET4YPB,Syx9ET4YPB,Paper Decision,Reject,"This paper proposed to evaluate the robustness of CNN models on similar video frames. The authors construct two carefully labeled video databases. Based on extensive experiments, they conclude that the state of the art classification and detection models are not robust when testing on very similar video frames. While Reviewer #1 is overall positive about this work, Reviewer #2 and #3 rated weak reject with various concerns. Reviewer #2 concerns limited contribution since the results are similar to our intuition. Reviewer #3 appreciates the value of the databases, but concerns that the defined metrics make the contribution look huge. The authors and Reviewer #3 have in-depth discussion on the metric, and Reviewer #3 is not convinced. Given the concerns raised by the reviewers, the ACs agree that this paper can not be accepted at its current state.",ICLR2020, +mt64CPWVf,1576800000000.0,1576800000000.0,1,B1gXWCVtvr,B1gXWCVtvr,Paper Decision,Reject,"The paper introduces a non-stationary bandit strategy for adapting the exploration rate in Deep RL algorithms. They consider exploration algorithms with a tunable parameter (e.g. the epsilon probability in epsilon-greedy) and attempt to adjust this parameter in an online fashion using a proxy to the learning progress. The proposed approach is empirically compared with using fixed exploration parameters and adjusting the parameter using a bandit strategy that doesn't model the learning process. + +Unfortunately, the proposed approach is not theoretically grounded and the experiments lack comparison with good baselines in order to be convincing. A comparison with other, provably efficient, non-stationary bandit algorithms such as exponential weight methods (Besbes et al 2014) or Thompson sampling (Raj & Kalyani 2017), which are cited in the paper, is missing. Moreover, given the whole set of results and how they are presented, the improvement due to the proposed method is not clear. In light of these concerns I recommend to reject this paper.",ICLR2020, +1_aqcBTQj_v,1642700000000.0,1642700000000.0,1,HiHWMiLP035,HiHWMiLP035,Paper Decision,Reject,"This paper proposes an early exit method that uses class means of samples that is gradient free and is aimed for low compute cases such as mobile and edge data. The idea is novel in this setting (though class means have been used for other settings such as few shot classification) and empirical results show that it works well. There are two main concerns from reviewer concerns that were not addressed by the author rebuttal. First, applicability of the model in real world due to its memory requirements and two, experiments that show performance on more realistic datasets such as Imagenet. The reason the latter is required is the promise of mobile application for the proposed method. I suggest the authors explain the first concern more and add the requested experiments in the upcoming version of the paper.",ICLR2022, +3ooTOTTjsE,1576800000000.0,1576800000000.0,1,HJgSwyBKvr,HJgSwyBKvr,Paper Decision,Accept (Poster),"This paper first discusses some concepts related to disentanglement. The authors propose to decompose disentanglement into two distinct concepts: consistency and restrictiveness. Then, a calculus of disentanglement is introduced to reveal the relationship between restrictiveness and consistency. The proposed concepts are applied to analyze weak supervision methods. + +The reviewers ultimately decided this paper is well-written and has content which is of general interest to the ICLR community.",ICLR2020, +SOz567xbqI,1576800000000.0,1576800000000.0,1,Syl38yrFwr,Syl38yrFwr,Paper Decision,Reject,"This paper presents a differentially private mechanism, called Noisy ArgMax, for privately aggregating predictions from several teacher models. There is a consensus in the discussion that the technique of adding a large constant to the largest vote breaks differential privacy. Given this technical flaw, the paper cannot be accepted.",ICLR2020, +KSa0oQJyF5p,1610040000000.0,1610470000000.0,1,cbtV7xGO9pS,cbtV7xGO9pS,Final Decision,Reject,"The paper proposes a reinforcement learning algorithm that combines trust region policy optimization and entropy maximization. The starting point is the Lagrangian of a constrained optimization problem that upper bounds the change in the policy and lower bounds the entropy of the policy. The paper proves that the algorithm converges, and evaluates it experimentally in MuJoCo domains. + +The main issues raised by the reviewers were related to the proofs (see especially R1) and experimental evaluation (R4). The authors did a great job improving the paper during the discussion phase, but some of the issues remain unresolved, and thus reviewers find the paper not to be ready for publication. Thus, I'm recommending rejection.",ICLR2021, +4eGwGaxTxYI,1642700000000.0,1642700000000.0,1,jT5vnpqlrSN,jT5vnpqlrSN,Paper Decision,Reject,"This paper proposes to encode positions of nodes in graphs by anchor-based GNN with customized message passing steps. All reviewers raised significant concerns on this paper, including novelty of the message passing steps, experiments, writing and clarity, etc. The authors have actively responded to reviewer comments, but many of the concerns are still not addressed. Thus, the paper needs some work in order to be competitive.",ICLR2022, +VO-d6CtidfI,1610040000000.0,1610470000000.0,1,0XXpJ4OtjW,0XXpJ4OtjW,Final Decision,Accept (Oral),"This paper proposes a meta-learning algorithm for reinforcement learning. The work is very interesting for the RL community, it is clear and well-organized. The work is impressive and it contributes to the state-of-the-art. ",ICLR2021, +00VbffDrrdg,1610040000000.0,1610470000000.0,1,mSAKhLYLSsl,mSAKhLYLSsl,Final Decision,Accept (Oral),"The paper introduces a novel dataset condensation technique that generates synthetic samples (images) by matching model gradients with those obtained on the original input samples (images). The authors also show that these synthetic images are not architecture dependent and can be used to train different deep neural networks. The approach is validated on several smaller datasets like MNIST, SVHN and CIFAR10. This work is well-motivated and the methodological contributions convincing. All reviewers were enthusiastic and indicated that there were no flaws in this work. The rebuttal clarified outstanding questions and made the paper stronger.",ICLR2021, +XcWSXnGZZ,1576800000000.0,1576800000000.0,1,SJxNzgSKvH,SJxNzgSKvH,Paper Decision,Reject,"The paper proposes a method to speed up training of deep nets by re-weighting samples based on their distance to the decision boundary. However, they paper seems hastily written and the method is not backed by sufficient experimental evidence.",ICLR2020, +7JN7MZsHqEvt,1642700000000.0,1642700000000.0,1,miA4AkGK00R,miA4AkGK00R,Paper Decision,Reject,"This paper presents several variants and extensions (including stochastic and proximal) of the error-feedback method EF21 and provides convergence rates for each of them and shows that they improve upon previous state of the arts. Despite the much broadened application scenarios and SOTA in convergence rates/complexity, the main and common concern from the reviewers is the novelty of the paper beyond the original EF21 work. There are also concerns on the empirical evaluations that do not fully support the theoretical promises. I agree with the reviewers and regrettably have to recommend rejection for ICLR.",ICLR2022, +ZH9SO-0wr75,1642700000000.0,1642700000000.0,1,7gRvcAulxa,7gRvcAulxa,Paper Decision,Reject,"The paper investigates adversarial examples in deep neural networks from a frequency-based perspective. Their main conclusion is that adversarial examples are neither in high- or low-frequency components, but instead depend on data. The topic is clearly important and the paper is overall clearly written and makes some interesting observations, backed up by empirical evidence. + +However, the reviewers raised a number of critical concerns, including: +- Discussion of prior work is not adequate. The paper should better explain their contribution in contrast to prior work. Specifically, the authors mention Bernhard et al. (2021) as concurrent work, although the reviewers note that the work was published 5 months before. I realize the authors most likely develop their own line of work without knowing about Bernhard et al. (2021), but I would still suggest focusing more on the differences between them. I did not take this factor into account in the final decision. +- Novelty. Prior work has already shown adversarial examples are data-dependent +- Concerns about experimental setup (only investigate one particular attack, measure of average noise gradient not completely justified, ...) + +After discussion, one reviewer downgraded their score and two others kept a more negative score. Only one reviewer was more positive with somewhat low confidence. + +Overall, the paper is more on the reject side for now. Further work is needed and I strongly encourage the authors to clearly highlight the contributions of the paper in contrast to prior work. On the plus side, the work clearly has some potential and addresses an interesting topic.",ICLR2022, +E2i9G4IFp5N,1642700000000.0,1642700000000.0,1,y8zhHLm7FsP,y8zhHLm7FsP,Paper Decision,Reject,"This paper presents the use of the Ensemble Kalman Filter (EnKF) to solve the linear quadratic Gaussian (LQG) optimal control problem. After reviewing the paper and taking into consideration of the reviewing process, here are my comments: +- The related work is limited and needs more improvements to contextualize the problem and the solution. +- The reinforcement learning paradigm is not really appreciated in the proposal. +- The results are rather limited, so more experiments are needed to clearly validate the solution. +From the above, the paper does not fulfill the standards of the ICLR. I suggest improving the paper accordingly and submitting it to a control systems venue.",ICLR2022, +APCxqwApje,1576800000000.0,1576800000000.0,1,SklGryBtwr,SklGryBtwr,Paper Decision,Accept (Poster),"The paper studies out-of-sample generalisation that require an agent to respond to never-seen-before instructions by manipulating and positioning objects in a 3D Unity simulated room, and analyzes factors which promote combinatorial generalization in such environment. + +The paper is a very thought provoking work, and would make a valuable contribution to the line of works on systematic generalization in embodied agents. The draft has been improved significantly after the rebuttal. After the discussion, we agree that it is worthwhile presenting at ICLR. ",ICLR2020, +iZ6RY8-lop64,1642700000000.0,1642700000000.0,1,lpkGn3k2YdD,lpkGn3k2YdD,Paper Decision,Accept (Spotlight),"Description of paper content: + +The paper addresses the problem of credit assignment for delayed reward problems. Their method, Randomized Return Decomposition, learns a reward function that provides immediate reward. The algorithm works by randomly subsampling trajectories and predicting the empirical return by regression using a sum of rewards on the included states. The method is compared to a variety of existing methods on Mujoco problems in “episodic reward” settings, where the reward is zero except for the final step of the episode, where it is the sum of rewards from the original task. Theoretical argument suggests the method is an interpolation of return decomposition (regress based on all states, not a subsample) and uniform reward distribution (send episodic reward to all states equally). By regressing with a subset of states, the method reduces compute for longer problems and is suggested to be more scalable. + +Summary of paper discussion: + +The reviewers largely commended the simplicity of the method, the simplicity of the presentation, the novelty of the algorithm, and the quality of the empirical results. The negative reviewer maintained their initial review’s score on account of a bias introduced by the algorithm.",ICLR2022, +KaiHtgXXB,1576800000000.0,1576800000000.0,1,HkxBJT4YvB,HkxBJT4YvB,Paper Decision,Accept (Poster),The paper proposes a new way of estimating treatment effects from observational data. The text is clear and experiments support the proposed model.,ICLR2020, +SklarkTrf,1517250000000.0,1517260000000.0,663,By5SY2gA-,By5SY2gA-,ICLR 2018 Conference Acceptance Decision,Reject,"This work attempts to incorporate affect information from additional resources into word embeddings. This is a valuable goal, but the methods used are very similar to existing ones, and the experimental results are not quite convincing enough to make a strong enough case for accepting the paper.",ICLR2018, +UE_QFkeZhgp,1642700000000.0,1642700000000.0,1,68n2s9ZJWF8,68n2s9ZJWF8,Paper Decision,Accept (Poster),"This paper proposes a new paradigm --- called in-sample Q learning --- to tackle offline reinforcement learning. Based on the novel idea of using expectile regression, the proposed algorithm enjoys stable performance by focusing on in-sample actions and avoiding querying the values of unseen actions. The empirical performance of the proposed algorithm is appealing, outperforming existing baselines on several tasks. The paper is also well written.",ICLR2022, +quQ8KruzWM,1610040000000.0,1610470000000.0,1,J40FkbdldTX,J40FkbdldTX,Final Decision,Reject,"This submission received reviews with a very wide range of scores (initially 3,5,5,9; then 5,5,5,9). In the discussion, all reviewers maintained their general position (although a private message by the reviewer giving a score of 9 said he/she would consider going down to an 8). + +Because of the high variance, I read the paper in detail myself. I agree with all reviewers that NAS is a very important field of study, that the experiments are interesting, and that purely empirical papers studying what works and what doesn't work (rather than introducing a new method) are definitely needed in the NAS community. But overall, for this particular paper, I agree with the 3 rejecting reviewers. The paper presents a lot of experiments, but I am missing novel deep insights or lasting overarching take-aways. The papers reads a bit like a log book of all the experiments the authors did, before having gone through the next iteration in the process to consolidate findings and gain lasting insight. + +In a bit more detail, half the results in Section 4 use medium-sized super networks, which seem broken to me, yielding much worse performance than small super networks. I did not find any motivation for studying these medium-sized networks, no reason given for them to perform poorly, and none stating why the results are still interesting when the networks perform so poorly (apologies if I overlooked these). The poor performance may be due to using a training pipeline that works poorly for these larger networks, but this is hard to know exactly without further experiments. I would either try to fix these networks' performance or drop them from the paper entirely, as I do not see any insights that can be reliable gained from the current results. As is, I believe these results (accounting for half the plots in the paper) only muddy the water and are preventing a crisp presentation of insightful results. + +Another factor that I find unfortunate about the paper is that it only uses NAS-Bench-201 for its empirical study, and even for that dataset, mostly only the CIFAR-10 part. After getting rid of isomorphic graphs from the original 15625 architectures, NAS-Bench-201 only has 6466 unique architectures (see Appendix A of NAS-Bench-201), while, e.g., NAS-Bench-101 has 423k unique architectures. As the authors indicate themselves in their section ""Grains of Salt"", it is unclear whether insights gained on the very small NAS-Bench-201 space generalize to larger spaces. I therefore believe that there should also be some experiments on another, larger space, to study how well some of the findings generalize. An additional benchmark that the authors could have directly used without performing additional experiments themselves is the NAS benchmark NAS-Bench-1shot1 (ICLR 2020: https://openreview.net/forum?id=SJx9ngStPH), which studies 3 different subsets of NAS-Bench-101, and which was created to allow one-shot methods to use the larger space of evaluated architectures in NAS-Bench-101. + +Minor comments: +- It reads as if the authors performed 5 runs, computed averages of the outcomes, and then computed correlation coefficients. That would be a suboptimal experimental setup, though; in practical applications, only one run of the super network would be run, and therefore, in order to assess performance reliably, one should compute correlation coefficients for one run at a time, and then obtain a measurement of reliability of these correlation coefficients across the 5 runs. +- The y axis in Figure 2 appears to be broken: for example, in the left column it goes from 99.978 to 99.994, and the caption says these should be accuracy predictions of NAS-Bench201. However, even the best architectures in NAS-Bench201 only achieve around 95% accuracy. + + +Overall, I recommend rejection for the current version of the paper. +Going forward, I encourage the authors to continue this line of work and recommend that they iterate over their experiments and extract crisp insights from their experiments. I also recommend performing experiments with a much larger search space than that of NAS-Bench-201 to assess whether the findings generalize.",ICLR2021, +15y3r9kA-3r,1642700000000.0,1642700000000.0,1,gD0KBsQcGKg,gD0KBsQcGKg,Paper Decision,Reject,"This paper presents a method for producing a mixture of (disjoint) predictive distributions for deep learning models rather than a single predictive distribution. The reviewers in general found that the idea had strong potential, was well motivated and addresses an important and under-appreciated problem in deep learning. They seemed to find the proposed approach of using mixture density networks to be sensible. However, the reviewers seemed to find that the paper was unclear in presentation and grammatically, as if hastily written. One reviewer noted that they would not be able to reproduce the method given the confusing presentation. The reviewers also found that the experiments didn't adequately evaluate their method empirically. Unfortunately, the reviewers all agreed that the paper is not quite ready for publication (5, 3, 5). Careful rewriting of the paper and the technical contributions and strengthening the experiments would go a long way towards improving this paper for a future submission.",ICLR2022, +qc23UUiNj4N,1642700000000.0,1642700000000.0,1,wIzUeM3TAU,wIzUeM3TAU,Paper Decision,Accept (Oral),"This paper gives a new theoretical framework to characterize the expressive power of graph neural networks that describes GNN by tensor language (TL) and then makes it possible to analyze its expressive power through the lens of TL. The authors connect the expressive ability of TL to the color refinement algorithms and (vertex/graph) k-WL algorithms. By doing so, the several existing results can be recovered in a unifying manner. In addition to that, the function approximation ability is also investigated. + +The paper gives a novel theoretical framework that gives a clear perspective to the problem of expressive power of GNN, which would be quite beneficial to the community and open up a new research direction. The reviewers have raised several questions on the paper, but the authors addressed all the concerns properly. Therefore, I recommend acceptance to ICLR2022.",ICLR2022, +Q45nRnJSGu,1576800000000.0,1576800000000.0,1,Hke0V1rKPS,Hke0V1rKPS,Paper Decision,Accept (Poster),"This paper extends previous observations (Tsipars, Etmann etc) in relations between Jacobian and robustness and directly train a model that improves robustness using Jacobians that look like images. The questions regarding computation time (suggested by two reviewers, including one of the most negative reviewers) are appropriately addressed by the authors (added experiments). Reviewers agree that the idea is novel, and some conjectured why the paper’s idea is a very sensible one. We think this paper would be an interest for ICLR readers. Please address any remaining comments from the reviewers before the final copy. +",ICLR2020, +k1svhpY8TCn,1610040000000.0,1610470000000.0,1,0SPUQoRMAvc,0SPUQoRMAvc,Final Decision,Reject,"The authors address the problem of self-supervised monocular depth estimation via training with only monocular videos. They propose to use additional information extracted from semantic segmentation at training time to (i) provide additional “semantic context” supervision and (ii) to improve depth estimation at discontinuities through an edge guided point sampling based approach. Results are presented on the KITTI and Cityscapes datasets. + +One of the main concerns is related to the utility of the semantic supervision given the relative cost required to obtain semantic training data in the first place. The authors state that ""the pixel-wise local depth information can not be well represented by current depth network"". However, Guizilini 2020a can generate detailed depth edges and they do NOT require any semantic information during training. The authors also state that ""the required labeled semantic dataset only accounts for a very tiny proportion, which indicates a relatively lower cost."" This is a bit misleading. The proposed method uses per pixel semantic ground truth from three datasets (Mapillary Vistas, Cityscapes, and KITTI). It takes a lot of effort to provide this ground truth compared to self-supervised methods which do not require any ground truth depth. It is encouraging that dataset specific semantic finetuning does not seem to have a large impact (Table 3), but this still requires access to a large enough initial semantic training set. Finally, the quantitative results are not much better than methods that don't require any semantics e.g. Guizilini 2020a, Johnston and Carneiro. Clearly, methods that do not require semantics are much more scalable, especially when adapting to new types of scenes. + + +Regarding the specific contributions of the paper, the SEEM module is the most novel component of the model. However, the addition of the SEEM module does not improve quantitative performance by much (see Table 2). In addition, the qualitative improvement it provides is also very subtle. This can be seen by comparing the last two rows of Fig 7 i.e. without and with. The authors need to make a stronger case, either quantitatively or qualitatively, as to why this is valuable. + + +Finally, but only a minor concern, the following relevant reference is missing: Jiao et al. Look Deeper into Depth: Monocular Depth Estimation with Semantic Booster and Attention-Driven Loss, ECCV 2018 + + +In conclusion, there were mixed views from the reviewers - with some supportive of the paper (R2&3) others not as enthusiastic (R1&4). The authors should be commended for their detailed responses and changes already made based on reviewer comments and suggestions. Unfortunately, this did not change the mind of the reviewers. It is the opinion of this AC that there is still more work required to fully show the utility of the proposed approach, especially considering the non trivial effort that is required to obtain semantic supervision in novel domains. +",ICLR2021, +TmZFHyRMNrZu,1642700000000.0,1642700000000.0,1,5i7lJLuhTm,5i7lJLuhTm,Paper Decision,Accept (Poster),"This paper adapts a method called ""real-time recurrent learning"" for training recurrent neural networks. The idea is to project the true gradient onto a subspace of desired dimensionality along a candidate direction. There are a variety of possible candidates: random directions, backpropagation through time, meta-learning approaches, etc. + +The main strength of the paper is that it is a very simple idea that seems to have practical utility. + +While often presented in different contexts, it should be clearly noted by the authors that the general idea of using low dimensional directional derivatives for computational efficiency is fairly common in optimization. Reviewers mention sketch and project methods. This has also been looked, for example, in the context of Bayesian optimization, with [random selection](https://bayesopt.github.io/papers/2016/Ahmed.pdf) and [value of information based](https://proceedings.neurips.cc/paper/2017/file/64a08e5f1e6c39faeb90108c430eb120-Paper.pdf) criteria. + + +Reviewers appreciated aspects of the paper, though had concerns about relations to sketch and project methods, computational costs, and experimental demonstrations and baselines. Through the rebuttal period, reviewers were mostly satisfied that the concerns about computational costs were well-addressed. A better job could still be done about describing relation to other work. There was also still some desire for more thorough experimental demonstrations and consistent baselines, as described in the reviews. The paper also could use some additional proof-reading as it contains several grammatical errors. On the whole, the paper makes a nice simple practical contribution. Please carefully account for reviewer comments in updated versions.",ICLR2022, +0ZEOUUwqU6,1642700000000.0,1642700000000.0,1,oVE1z8NlNe,oVE1z8NlNe,Paper Decision,Accept (Poster),"The paper focuses on self-supervised learning (SSL) in the federated learning setting (FedSSL). Research in this area is timely and of significance. The authors phrase their work as primarily being an empirical study providing insights into the building blocks of FedSSL. The evaluation in the paper is quite thorough and the authors have been active in a detailed exchange regarding questions raised in the reviews. I would encourage the authors to fully implement the changes they promised into the revised manuscript and work towards timely release of open-source code. (I appreciate internal policies of various institutions, but I do agree with the reviewers that it is more important that the code and experimental details be made public for papers such as this one, compared to some other papers.) I have chosen to disagree with some of the concerns raised in one of the reviews, in particular, I do agree with the authors that insights into the building blocks through empirical studies is a significant contribution, and also that FedEMA is a novel contribution. The discussion on this forum will remain for interested readers to come to their own conclusions about the relative performance of various methods.",ICLR2022, +iXMxXrnNtUs2,1642700000000.0,1642700000000.0,1,0J98XyjlQ1,0J98XyjlQ1,Paper Decision,Reject,"The paper proposes Data-Dependent GCN (D2-GCN), which improves the efficiency of vanilla GCN by node-wise skipping, edgewise skipping, and bit-wise skipping. Gate functions are learned to prune the unimportant neighbor nodes in combinations, unimportant edge connections, and in the bit-precision. The proposed method boosts efficiency while achieving comparable performance over benchmark datasets. Most reviewers agree that the paper is well motivated, and the writing is clear. However, two of the reviewers found the novelty of the paper compared to previous work (for example, [1]) is limited. Three reviewers raised concerns about the lack of theoretical or empirical analysis on how D2-GCN can alleviate the over-smoothing problem, and how the proposed method can serve as an implicit regularization. + +For the novelty concerns, the authors provided a detailed comparison with previous work during the rebuttal. For the lack of analysis on over-smoothing, the authors provided an additional empirical analysis using the distance of different intermediate layers’ output as the metric for measuring over-smoothing. But at least one reviewer is still not satisfied with those. + +Given the current review scores (3, 5, 5, 6), the paper is below the acceptance threshold for the conference. The AC believes that the proposed method seems to be an effective and simple way towards more efficient graph neural networks and hence encourages the authors to submit the revised paper to another venue after addressing the reviewers’ concerns, especially on theoretical or empirical analysis on over-smoothing and implicit regularization. + + +[1]: Gated graph sequence neural networks",ICLR2022, +ryeijQ7GeN,1544860000000.0,1545350000000.0,1,B1xhQhRcK7,B1xhQhRcK7,"reasonable approach, convincing experiments, important topic",Accept (Poster)," +* Strengths + +The paper addresses a timely topic, and reviewers generally agreed that the approach is reasonable and the experiments are convincing. Reviewers raised a number of specific concerns (which could be addressed in a revised version or future work), described below. + +* Weaknesses + +Some reviewers were concerned the baselines are weak. Several reviewers were concerned that relying on failures observed during training could create issues by narrowing the proposal distribution (Reviewer 3 characterizes this in a particularly precise manner). In addition, there was a general feeling that more steps are needed before the method can be used in practice (but this could be said of most research). + +* Recommendation + +All reviewers agreed that the paper should be accepted, although there was also consensus that the paper would benefit from stronger baselines and more close attention to issues that could be caused by an overly narrow proposal distribution. The authors should consider addressing or commenting on these issues in the final version.",ICLR2019,3: The area chair is somewhat confident +PaIHxnClyUu,1642700000000.0,1642700000000.0,1,Nus6fOfh1HW,Nus6fOfh1HW,Paper Decision,Reject,"This work studies the relation between graph heterophily and the robustness of GNNs and theoretically shows that effective structural attacks on GNNs for homophilous graphs lead to increased heterophily level, while for heterophils graphs they alter the homophily level contingent on node degrees under some specific assumptions. + +Overall, the findings in the paper are interesting and can be useful for other researchers trying to improve GNNs' robustness on homophilic and heterophilic datasets. After the discussion and rebuttal, the main concerns are: +- while the paper has shown some interesting observations, no new methodology was proposed based on these findings. +- The authors have attempted to relax assumptions and justified their setup on experiments, however the explanations are still limited. For example, Theorem 1 does not allow attention mechanism, different choices of aggregator, skip-connection, and more GNN layers.",ICLR2022, +k2ACyhCUmuo,1610040000000.0,1610470000000.0,1,HI0j7omXTaG,HI0j7omXTaG,Final Decision,Reject,"This paper proposed a new measure of effective gradient flow (EGF), and also compared sparse vs. dense networks on CIFAR-10 and CIFAR-100. The notion of EGF would be interesting, but the paper did not present enough evidence to support this notion.",ICLR2021, +qpy-bZtrhLL,1642700000000.0,1642700000000.0,1,5MLb3cLCJY,5MLb3cLCJY,Paper Decision,Accept (Poster),"The submission initially received mixed reviews. The authors presented convincing answers during the author response period, after which all reviewers recommended weak accepts. The AC has carefully read the reviews, responses, and discussions, and agreed with the reviewers' recommendation. Despite the marginal performance gains, the submission has presented a useful and inspiring way of learning shape representations. The AC, therefore, recommends acceptance. + +The authors are encouraged to further revise the paper based on the reviews. In addition, the authors should use $\citep$ for all citations that are not used as a pronoun, including all citations in the tables. Please find more information here: https://journals.aas.org/natbib/",ICLR2022, +5cXaeN8qXc,1610040000000.0,1610470000000.0,1,3q5IqUrkcF,3q5IqUrkcF,Final Decision,Accept (Poster),"The paper presents a mathematical analysis of the discrepancy between GD and GF trajectories. Following the discussion period, a knowledgeable R3 updated his/her initial rating from 4 to 6, a knowledgeable R4 raised his/her score from 5 to 6. Finally, a very confident R1 considers this a good paper that should be accepted. He/she indicates that this paper provides a unique and very illuminating perspective on gradient descent through an extremely simple idea. The topic is very timely. I agree with R1 that the paper contributes a refreshing perspective with important elements which should be of interest to a good number of researchers. Taking into account the discussion, confidence levels and ratings of the reviewers, I am recommending the paper to be accepted. I would like to encourage the authors to take the reviewers' comments carefully into consideration when preparing the final version of the article. ",ICLR2021, +qC2ErkfsUY,1610040000000.0,1610470000000.0,1,AHOs7Sm5H7R,AHOs7Sm5H7R,Final Decision,Accept (Poster),"The article is easy to read, of interest for the community, and provide some advance towards understanding the implicit bias of gradient descent. +The results and the methodology for the rank-1 case are very interesting and convincing. +Yet, some results could be made more explicit and the comments by the reviewers should be addressed for the camera ready paper, in particular the one on the organization. +",ICLR2021, +qn_a7pjqMj,1576800000000.0,1576800000000.0,1,B1eYlgBYPH,B1eYlgBYPH,Paper Decision,Reject,"This paper presents a novel RNN algorithm based on unfolding a reweighted L1-L1 minimization problem. Authors derive the generalization error bound which is tighter than existing methods. +All reviewers appreciate the theoretical contributions of the paper, particularly the derivation of generalization error bounds. However, at a higher-level, the overall idea is incremental because RNN by unfolding L1-L1 minimization problem (Le+,2019) and reweighted L1 minimization (Candes+,2008) are both known techniques. The proposed method is essentially a simple combination of them and therefore the result seems somewhat obvious. Also, I agree with reviewers that some experiments are not deep enough to support the theory. For example, for over-parameterization (large model parameters) issue, one can compare the models with the same number of parameters and observe how they generalize. +Overall, this is the very borderline paper that provides a good theoretical contribution with limited conceptual novelty and empirical evidences. As a conclusion, I decided to recommend rejection but could be accepted if there is a room. +",ICLR2020, +jQjcxH5-yhuE,1642700000000.0,1642700000000.0,1,mZsZy481_F,mZsZy481_F,Paper Decision,Reject,"The paper proposes few-shot robust (FROB) model for classification and few-shot OOD detection. While the paper has some interesting contributions, all the reviewers felt that the current version falls short of the ICLR acceptance threshold and the consensus decision was to reject. I encourage the authors to revise the paper based on the reviewers' feedback and resubmit to a different venue. + +As Reviewer r838 pointed out, that this paper uses TinyImages dataset which has been since retracted. I appreciate that prior work used TinyImages, but please see ""Why it is important to withdraw the dataset"" https://groups.csail.mit.edu/vision/TinyImages/ and consider not using the TinyImages dataset for future revisions.",ICLR2022, +81bIZBmUfj8,1610040000000.0,1610470000000.0,1,LcPefbNSwx_,LcPefbNSwx_,Final Decision,Reject,"The authors proposed to pre-process the original input features into a low dimensional term and its corresponding residual term via SVD. The paper empirically demonstrated the neural networks trained on such factorized exhibit faster convergence in training. Several issues of clarity were addressed during the rebuttal period by the authors. + +However, the reviewers still felt that there were some remaining fundamental issues with the paper, + +1) The motivation is not echoed in the experiments, namely most of the experiments on CIFAR and CatDog dataset using a low dimensional factorization of d=1 which is trivial and often part of the whitening preprocessing. + +2) The proposed factorization via SVD will be difficult to scale up to high dimensional features, large training sets and higher d >> 1. + +3) The empirical experiments show a marginal improvement in the training speed, especially in the image recognition tasks, yet there seems an early plateau in test performance when compared to the baselines. + +4) The theoretical analysis in Section 2 studied linear models. Yet, the rest of the paper focuses on non-linear neural networks. It is difficult to see the connection between the analysis and the rest of the paper. + +Thus, I recommend rejection of the paper at this time as the current version of the paper needs further development, and non-trivial modifications, to be broadly applicable.",ICLR2021, +NWITiJkxOo3,1642700000000.0,1642700000000.0,1,fQTlgI2qZqE,fQTlgI2qZqE,Paper Decision,Accept (Poster),"This paper tackles the problem of feature interactions identification in black-box models, which is an important problem towards achieving explainable AI/ML. The authors formulate the problem under the multi-armed bandit setting and propose a solution based on the UCB algorithm. This simplification of the problem leads to a computationally feasible solution, for which the authors provide several theoretical analyses. The importance of the learned interactions is showcased in a new deep learning model leveraging these interactions, leading to a reduction in model size (thereby competing against pruning methods) as well as an improvement in accuracy (thereby competing against generalization methods). Although the proposed approach essentially builds on the specific UCB algorithm, it could likely be extended/modified to other (potentially more efficient) bandit strategies. A drawback of this work resides in the experiments being entirely synthetics. In order to close the gap with practice, experiments on real datasets of higher dimensionality should be conducted.",ICLR2022, +u2A9Wteonm0,1642700000000.0,1642700000000.0,1,oaKw-GmBZZ,oaKw-GmBZZ,Paper Decision,Reject,"Thank you for your submission to ICLR. While all reviewers felt that there were some interesting aspects to the proposed work, the consensus was also that the work didn't properly situate itself within the existing literature on related methods. In particular, I agree with Reviewer kLFD that a numerical comparison to Pfaff et al., is notably missing here; while the authors did provide qualitative comparisons in their discussion, it's not clear to me that these differences are ultimately that significant, and the methods need to be compared directly if a case is to be made for the advantages of the proposed approach.",ICLR2022, +sjPo1lLu6G,1576800000000.0,1576800000000.0,1,r1xfECEKvr,r1xfECEKvr,Paper Decision,Reject,"The paper considers an important problem in medical applications of deep learning, such as variability/stability of model's predictions in face of various perturbations in the model (e.g., random seed), and evaluates different approaches to capturing model uncertainty. However, it appears to be little innovation in terms of machine-learning methodology, so ICLR might not be the best venue for this work, while perhaps other venues focused more on medical applications might be a better fit. + ",ICLR2020, +yPkHCBfCOcx,1642700000000.0,1642700000000.0,1,youe3QQepVB,youe3QQepVB,Paper Decision,Reject,"The paper received borderline reviews. While the reviewers acknowledged good motivation, good number of experiments and good numeral results that demonstrated the proposed method outperforms the existing state of the art, there are shared concerns: the experimental setup is not really a ""low data"" regime, generative models jointly trained with the multi-task model only led to marginal improvements, and the prediction quality is quite low for all methods. In addition, it's unclear why the images generated by MGM have a lot of artifacts, and how the artifacts affect the performance. Overall, the reviewers were not convinced after the rebuttal.",ICLR2022, +7Q3bYjchueh,1610040000000.0,1610470000000.0,1,naSAkn2Xo46,naSAkn2Xo46,Final Decision,Reject,"This paper studies two techniques for handling high dimensional action spaces in deep RL, namely selecting action components independently or selecting components sequentially in an autoregressive approach. The methods are developed for two deep RL algorithms (PPO and SAC) and tested on multiple domains. + +The reviewers recognized the significance of this research topic but found significant problems in the presentation. The reviewers raised concerns on the relationship to prior work in the literature (R2), baseline comparisons that are missing in the experiments (R2, R4), and a lack of clarity in the intended message of the experiments (R3). The authors responded favorably to the reviews, answered clarification questions, and acknowledged the limitations of the submitted work. The authors expressed their intent to release a stronger paper sometime in the future. The reviewers acknowledged the author response and were in consensus that the submission needs more work. + +Three reviewers have indicated reject for the reasons described above. The paper is therefore rejected.",ICLR2021, +BJxW71mgxE,1544720000000.0,1545350000000.0,1,HJepJh0qKX,HJepJh0qKX,Meta-Review,Reject,"There is no author response for this paper. The paper formulates a definition of easy and hard examples for training a neural network (NN) in terms of their frequency of being classified correctly over several repeats. One repeat corresponds to training the NN from scratch. Top 10% and bottom 10% of the samples with the highest and the lowest frequency define easy and hard instances for training. The authors also compare easy and hard examples across different architectures of NNs. +On the positive side, all the reviewers acknowledge the potential usefulness of quantifying easy and hard examples in training NNs, and R1 was ready to improve his/her initial rating if the authors revisited the paper. +On the other hand, all the reviewers and AC agreed that the paper requires (1) major improvement in presentation clarity -- see detailed comments of R1 on how to improve as well as comments/questions from R3 and R2; try to avoid confusing terminology such as ‘contradicted patterns’. +R1 raised important concerns that the proposed notion of easiness is drawn from the experiment in Fig. 1 of Arpit et al (2017) which is not properly attributed. R3 and R2 agreed that in its current state the experimental results are not conclusive and often non informative. To strengthen the paper the reviewers suggested to include more experiments in terms of different datasets, to propose a better metric for defining easy and hard samples (see R3’s suggestions). +We hope the reviews are useful for improving the paper. +",ICLR2019,5: The area chair is absolutely certain +DIqTWYDHcpw,1610040000000.0,1610470000000.0,1,9EsrXMzlFQY,9EsrXMzlFQY,Final Decision,Accept (Spotlight),The reviewers agree that this paper overcomes a number of difficult algorithmic and technical challenges in parallelizing the RED method for image reconstruction. ,ICLR2021, +ZqqlthNwU4,1642700000000.0,1642700000000.0,1,tge0BZv1Ay,tge0BZv1Ay,Paper Decision,Reject,"This paper deals with solving the problem of scheduling machines in a semiconductor factory using an RL approach. As the different actions take a different amount of time to complete, the authors propose to use a predictron architecture to estimate the targets in DQN. The experimental results show that the proposed method outperforms the considered baselines on two scheduling problems. + +After reading the authors' feedback and discussing their concerns, the reviewers agree that this paper is still not ready for publication. +In particular, the main issues are about the novelty/similarity with respect to related works, the lack of theoretical insights and formal definitions, the effectiveness of the presented benchmarking, lack of analysis of some unexpected results. + +I encourage the authors to take into consideration the concerns raised by the reviewers when they will work on the updated version of their paper.",ICLR2022, +PcnqsMEHQz,1642700000000.0,1642700000000.0,1,7xzVpAP5Cm,7xzVpAP5Cm,Paper Decision,Reject,"This paper proposes the algorithm which they call DEO*-SGD, which is a combination of the ideas of the generalized DEO scheme, denoted by DEO*, to facilitate exploration (Section 3.1), adoption of stochastic gradient descent (SGD) in the exploration chains (i.e., those chains except the one with the lowest temperature) (Section 4), and use of adaptive tuning of learning rates (Section 4.2). The proposal is applied experimentally in Section 5 to demonstrate superiority of the proposal over existing approaches. + +The initial review scores of the four reviewers were one positive and three negatives. Most reviewers positively evaluated the proposal, including the proposal of DEO* and its theoretical analysis, as well as its empirical usefulness in deep learning for a computer-vision task. On the other hand, some reviewers showed concern about soundness of the proposal. Upon reading the reviews and the author responses, as well as the paper itself, I think that this paper lacks a clear statement on its objective. + +* **What does ""uncertainty approximation"" mean?:** The paper title would imply that the objective of the proposal in this paper is for ""uncertainty approximation,"" but I could not find any concrete description on what it exactly is. +* **Sampling versus optimization:** The methods of Langevin dynamics, or more generally Markov-chain Monte-Carlo methods, have been used for two distinct purposes: sampling and optimization. In any case fast relaxation towards equilibrium would be of practical importance. For sampling purposes it is also important to assure that the stationary distribution of the Markov chain corresponds to the target distribution (In Langevin dynamics the target distribution would be the canonical ensemble defined by the energy $U(\cdot)$ and the temperature $\tau$). For optimization purposes, however, the assurance of the stationary distribution to be equal to the target distribution would be less of concern. It seems that the authors' interest would be in optimization rather than in sampling, but it is not clearly stated. +* **Soundness issue:** As Reviewer mbau pointed out, DEO* does not have a guarantee of convergence to the target distribution. I thought that if the objective of this paper would be in optimization rather than in sampling, the existence of approximation already in DEO* might be thought of as a minor problem, as the proposal already has other approximations introduced in Section 4. The authors claim that this problem does not affect the main body of the paper, but I feel that it would affect the overall organization of the paper, as the current organization seems to presume that approximation only resides in the adoption of the SGD-based exploration kernels with deterministic swap. In any case, this problem has been acknowledged by the authors themselves, as well as Reviewer ofJx. + +In particular, the detailed discussion between the authors and Reviewer mbau has been very fruitful in clarifying technical subtleties in this manuscript, including the soundness issue mentioned above. At the same time, it would imply that this paper still has room for improvement. + +An additional point I would like to mention is that this paper is not really self-contained, in the sense that several key notions and quantities are not defined or only defined in the Supplementary Materials ($\tilde{U}$ is not explicitly defined at all, the terms ""swap time"" and ""round trip time"" are defined in Appendix A.5, $\sigma_p$ in Corollary 1 and Lemma 2 is defined in Appendix A.1). + +All these weaknesses make me to think that another round of revision would be appropriate to properly judge the quality of this paper, whereas there is no such option within the review procedure of ICLR. I therefore cannot recommend acceptance of this paper at least in its current form. + +Minor points (page and line numbers refer to the revised version): +- Abstract, line 5: ""given sufficient many $P$ chains"" would be better phrased as ""given $P$ chains"", as the big-O notation usually assumes the large-P asymptotic. +- In several places, there are periods after ""Figure"" and ""Table"", which are not needed. +- Page 3, line 32: In Lemma 2 there is apparently no such term found as ""the second quadratic term"". It should appear only after having assumed the equi-acceptance/rejection rates in equation (4), so that the sum becomes proportional to $P$. +- Theorem 1: ""the maximal round trip time"" should certainly be ""the minimal round trip time"". / is the ceiling function(. T -> , t)he round trip time +- Table 1: I did not really understand what ""non-asymptotic"" / ""asymptotic"" mean, as the big-O notation used here should by definition be asymptotic. +- Corollary 1: the optimal (number of) chains +- Page 4, line 34: The abbreviation SGLD is not defined in this paper. +- Page 4, line 36: similar(ly) to +- Equation (6): The sign of the last term should be ""-"".",ICLR2022, +exTZhGdKTu,1576800000000.0,1576800000000.0,1,SkxybANtDB,SkxybANtDB,Paper Decision,Accept (Poster),"The paper proposes a Bayesian approach for time-series regression when the explanatory time-series influences the response time-series with a time lag. The time lag is unknown and allowed to be non-stationary process. Reviewers have appreciated the significance of the problem and novelty of the proposed method, and also highlighted the importance of the application domain considered by the paper. ",ICLR2020, +9liZqe0DA9,1610040000000.0,1610470000000.0,1,sjuuTm4vj0,sjuuTm4vj0,Final Decision,Accept (Poster),"The scores here are bimodal. +The low-scoring reviewers have problems with the evaluation, and I agree it could be improved. +The high scoring reviewers seem to mostly agree with those complaints, but think that the paper is interesting enough +to be accepted anyway. +One of the low-scoring reviewers has some complaints about novelty that I don't find super convincing. +The other low-scoring reviewer has suggested that they'd be OK with a decision of Accept. + +Part of me thinks that I should reject this paper with a message of ""come back later with the experiments improved"", and that +that would be the best thing for the field, because the paper can already be publicized on arXiv anyway. +But the other part of me thinks: what if they do that and get unlucky with a bad batch of reviews the next time (the current reviewers were great and had a really thorough discussion)? +With some amount of trepidation, I'm recommending accept, but *please* reward my faith in you (the authors) and make an effort to fix the things reviewers complained about before the camera ready.",ICLR2021, +nLU1n9XGfQ,1642700000000.0,1642700000000.0,1,HTVch9AMPa,HTVch9AMPa,Paper Decision,Accept (Poster),"The paper proposes ""Delaunay Component Analysis"", a novel manifold learning technique. Reviewers raised several concerns regarding novelty, computational complexity of the method, and presentation. The authors provided a thorough rebuttal and engaged in discussion with the reviewers that addressed the concerns in a satisfactory manner. After the discussion, all the reviewers and AC recommend acceptance.",ICLR2022, +B1lTjvNxxE,1544730000000.0,1545350000000.0,1,SkxZFoAqtQ,SkxZFoAqtQ,"Interesting framing, but limited contributions",Reject,"This paper offers a new angle through which to study the development of comparison functions for sentence pair classification tasks by drawing on the literature on statistical relational learning. All three reviewers seemed happy to see an attempt to unify these two closely related relation-learning problems. However, none of the reviewers were fully convinced that this attempt has yielded any substantial new knowledge: Many of the ideas that come out of this synthesis have already appeared in the sentence-pair modeling literature (in work cited in the paper under review), and the proposed new methods do not yield substantial improvements for the tasks they're tested on. + +I'm happy to accept the authors' arguments that sentence-to-vector models have practical value, and I'm not placing too much weight on the reviewer's comments about the choice to use that modeling framework. I am slightly concerned that the reviewers (especially R2) observed some overly broad statements in the paper, and I urge the authors to take those comments very seriously. + +I'm mostly concerned, though, about the lack of an impactful positive contribution: I'd have hoped for a paper of this kind to offer a a method with clear empirical advantages over prior work, or else a formal result which is more clearly new, and the reviewers are not convinced that this paper makes a contribution of either kind. +",ICLR2019,3: The area chair is somewhat confident +ryoD4ypSf,1517250000000.0,1517260000000.0,376,r1pW0WZAW,r1pW0WZAW,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"I think the model itself is not very novel, as pointed by the reviewers and the analysis is not very insightful either. However, the results themselves are interesting and quite good (on the copy task and pMnist, but not so much the other datasets presented (timit etc) where it not clear that long term dependencies would lead to better results). Since the method itself is not very novel, the onus is upon the authors to make a strong case for the merits of the paper -- It would be worth exploring these architectures further to see if there are useful elements for real world tasks -- more so than is demonstrated in the paper -- for example showing it on tasks such as machine translation or language modelling tasks requiring long term propagation of information or even real speech recognition, not just basic TIMIT phone frame classification rate. + +As a result, while I think the paper could make for an interesting contribution, in its present form, I have settled on recommending the paper for the workshop track. + + +As a side note, paper is related to paper 874 in that an attention model is used to look at the past. The difference is in how the past is connected to the current model. ",ICLR2018, +H1lKqyb8x4,1545110000000.0,1545350000000.0,1,ryE98iR5tm,ryE98iR5tm,"Novel improved lossless compression scheme using VAEs, with limited empirical validation",Accept (Poster),"The paper proposes a novel lossless compression scheme that leverages latent-variable models such as VAEs. Its main original contribution is to improve the bits back coding scheme [B. Frey 1997] through the use of asymmetric numeral systems (ANS) instead of arithmetic coding. The developed practical algorithm is also able to use continuous latents. The paper is well written but the reader will benefit from prior familiarity with compression schemes. Resulting message bit-length is shown empirically to be close to ELBO on MNIST. The main weakness pointed out by reviewers is that the empirical evaluation is limited to MNIST and to a simple VAE, while applicability to other models (autoregressive) and data (PixelVAE on ImageNet) is only hinted to and expected bit-length merely extrapolated from previously reported log-likelihood. The work could be much more convincing if its compression was empirically demonstrated on larger and better models and larger scale data. Nevertheless reviewers agreed that it sufficiently advanced the field to warrant acceptance.",ICLR2019,3: The area chair is somewhat confident +UioeClPhAZR,1642700000000.0,1642700000000.0,1,V3C8p78sDa,V3C8p78sDa,Paper Decision,Accept (Spotlight),"This paper provides a very large-scale study on the pretraining of image recognition models. Specifically, three scaling factors (model sizes, dataset sizes, and training time) are extensively investigated. One important phenomenon observed by this paper is that stronger upstream accuracy may not necessarily contribute to stronger performance on downstream tasks---actually sometimes these two types of performance could even be at odds with each other + +Overall, all the reviewers enjoy reading this paper and highly appreciate the empirical results presented in this paper. There were only a few concerns raised by the reviewers but most were well addressed during the discussion period. All reviewers reach a consensus on accepting this paper and believe this study is worthy to be heard by the community.",ICLR2022, +rpHacn2m_rF,1610040000000.0,1610470000000.0,1,RkqYJw5TMD7,RkqYJw5TMD7,Final Decision,Reject,"This paper studies test-time adaptation in the context of adversarial robustness. The key idea is to use a maximin framework, which illustrates non-trivial robustness (under transfer attack) using domain adversarial neural network (DANN) to Linf-norm and unseen adversarial attacks. The approach is sound, well grounded, and quite logical. Results demonstrate the effectiveness. + +However, there exists some limitations, for example, 1) The adaptive attack results are concerning. Comparing Table 1 and Table 3, with the adaptive attack (J-FPAM), the accuracy in the homogeneous setting is below that of adversarial training (Adv S) in Table 1, which somehow echoes my concern on not testing the transductive setting using strong attacks. It seems that adversarially trained models can better defend the adaptive attacks. 2) The paper says ""This threat model enables us to study whether a large test set can benefit a defender for adversarial robustness"", yet there is no any experiments in the main paper that correspond to this discussion. The appendix seems lacking this discussion either. 3) Due to the page limit, a lot of details have been moved into the appendix, making the paper difficult to read. + +In the end, I think that this paper may not be ready for publication at ICLR, but the next version must be a strong paper if above limitations can be well addressed.",ICLR2021, +X6vMEhBtg,1576800000000.0,1576800000000.0,1,rkgbYyHtwB,rkgbYyHtwB,Paper Decision,Accept (Spotlight),"This paper presents an approach for interactive imitation learning while avoiding an adversarial optimization by using ensembles. The reviewers agreed that the contributions were significant and the results were compelling. Hence, the paper should be accepted.",ICLR2020, +orO9BcZIOz7S,1642700000000.0,1642700000000.0,1,lu_DAxnWsh,lu_DAxnWsh,Paper Decision,Reject,"This paper offers new ideas about the key question of how to extend modern Transformer architectures to solve problems that require more reasoning steps than the model can implement in a single-step forward pass. Reviewers were unanimous that the problem is important, and that the paper is a step in a promising direction. However, reviewers were also unanimous that the proposed experiments are too narrow to be the basis for any confident new claims in this area and that, in addition, the experimental design has a confound that makes it difficult to interpret, even after the addition of a new condition during discussion.",ICLR2022, +DqCGUbTRMp0,1610040000000.0,1610470000000.0,1,Ldau9eHU-qO,Ldau9eHU-qO,Final Decision,Accept (Poster),"The paper considers the problem of learning interpretable, low-dimensional representations from high-dimensional multimodal input via weak supervision in a learning from demonstration (LfD) context. To mitigate the disparity between the abstractions that humans reason over and the robot's low-level action and observation spaces, the paper argues for learning a low-dimensional embedding that captures the underlying concepts. The primary contribution of the paper is the ability to learn disentangled low-dimensional representations that are interpretable from weak supervision using conditional latent variable models. + +The paper was reviewed by three knowledgeable referees, who read the author response and discussed the paper. The paper considers a challenging problem in learning from demonstration, namely dealing with the disparity that exists between the ways in which humans and robots model and observe the world, a problem that is exacerbated when reasoning over high-dimensional multimodal observations. As the reviewers note, the use of variational inference to learn low-dimensional interpretable representations from weak supervision is compelling. The primary concerns are that the contributions need to be more clearly scoped and that the experimental evaluation is a bit narrow. The authors make an effort to resolve some of these issues, in part through the inclusion of an additional experiment that considers pouring tasks. However, the extent to which this second task mitigates concerns about the narrow evaluation is not fully clear. The paper would be strengthened by the inclusion of experiments in a less contrived setting (and one for which the concepts are not necessarily disjoint) as well as a clearer discussion of the primary contributions.",ICLR2021, +1tbSyrLgCLh,1610040000000.0,1610470000000.0,1,J150Q1eQfJ4,J150Q1eQfJ4,Final Decision,Reject,"Reviews were somewhat mixed here, but the consensus is to reject, with at least one voice (R2) urging rejection. Across reviewers, the recommendation to reject is primarily based on the level of originality with the proposed U-Net architecture and on weakness of experiments, especially in comparing to baselines. + +Reviewers found strengths in the paper's writing and in its demonstration of generalization to unseen geometries. + +However, reviewers noted that the architecture does not win originality/significance points (including R3, the most positive reviewer): +* R3: ""The weakness of this paper is that it doesn't present any novel techniques. It's an existing architecture (U-Net) applied in a new domain (wave simulation)."" +* R2: ""The proposed approach is a straightforward application of U-net to predict a spatial field given past few spatial fields (stacked together). However, U-Nets, LSTMs, conv-LSTMs and other architectures have been tried before. It is unclear what the novel contribution in this paper is [...] and why it would be instrumental in handling unseen geometries over longer periods of time."" +* R2 post-response: ""This paper is a clear reject. None of the contributions are novel [...]"" +* R4: ""The paper lacks a novel contribution from the architectural and application side"" +* R1: ""Some previous works also used the U-net to predict wave dynamics [...] It is not clear what is the novelty (if any) in the proposed network architecture"" + +Reviewers also noted weaknesses in the experiments (acknowledged by R3, the most positive reviewer, though that review did not consider them a fatal flaw): +* R1: ""Not enough Experiments. How does the model generalize with more complicated initial conditions, for example, five or ten droplets? Furthermore, there is no comparison to other existing work."" +* R2: ""There is no evaluation against the state of the art [...]"" +* R2 post-response: ""Application of DNNs to this problem, speed-ups over numerical solvers, etc. have all been explored by SOTA works which have not been compared against. There is no clear articulation of the claimed novel contributions over the SOTA and empirical validation (or theoretical reasoning) of the same."" +* R4: ""There are no comparisons to other baselines [...] "" +* R3: ""Reviewer 4 brings up some fair points [about experimental issues]. I'm not as concerned about the lack of a baseline comparison; that doesn't seem to be the point of this paper [...] there is only so much that can be done in an 8 page conference paper [...] However, given that the other 2 reviewers think the paper could use more work, it would be completely reasonable for the chairs to reject it based on those reviews."" + +Based on this consensus of reviews, my recommendation is to reject. I hope the feedback from the reviews is helpful to the authors.",ICLR2021, +f_b5yKWWonc,1610040000000.0,1610470000000.0,1,NsMLjcFaO8O,NsMLjcFaO8O,Final Decision,Accept (Poster),"Reviewers and myself agree that the contribution is clear, significant, and has enough originality. Hence, my recommendation is to ACCEPT the paper. As a brief summary, I highlight below some pros and cons that arose during the review and meta-review processes. + +Pros: +- Solid technical contribution, specially the use of continuous noise levels. +- Clever application of diffusion/score-matching models to a new domain and task, with conditioning. +- Good empirical results, both objective and subjective. +- Listening samples provided. + +Cons: +- Lack of formal comparison with flow-based vocoders. +- Potentially limited novelty. +- No official code available. + +Note: Readers may also be interested in concurrent work https://openreview.net/forum?id=a-xFK8Ymz5J (""DiffWave: A Versatile Diffusion Model for Audio Synthesis"").",ICLR2021, +v_Ze4Oqm-,1576800000000.0,1576800000000.0,1,HkldyTNYwH,HkldyTNYwH,Paper Decision,Accept (Poster),"The authors present a different perspective on the mode collapse and mode mixture problems in GAN based on some recent theoretical results. + +This is an interesting work. However, two reviewers have raised some concerns about the results and hence given a low rating of the paper. After reading the reviews and the rebuttal carefully I feel that the authors have addressed all the concerns of the reviewers. In particular, at least for one reviewer I felt that there was a slight misunderstanding on the reviewer's part which was clarified in the rebuttal. The concerns of R1 about a simpler baseline have also been addressed by the authors with the help of additional experiments. I am convinced that the original concerns of the reviewers are addressed. Hence, I recommend that this paper be accepted. + +Having said that, I strongly recommend that in the final version, the authors should be a bit more clear in motivating the problem. In particular, please make it clear that you are only dealing with the generator and do not have an adversarial component in the training. Also, as suggested by R3 add more intuitive descriptions to make the paper accessible to a wider audience. + +",ICLR2020, +ThDp14kXir,1576800000000.0,1576800000000.0,1,Hyl8yANFDB,Hyl8yANFDB,Paper Decision,Reject,"This paper received three reviews. R1 recommends Weak Reject, and identifies a variety of concerns about the motivation, presentation, clarity and soundness of results, and experimental design (e.g. choice of metrics). In a short review, R2 recommends Weak Accept, but indicates they are not an expert in this area. R3 also recommends Weak Accept, but identifies concerns also centering around clarity and completeness of the paper as well as some specific technical questions. In their response, authors address these issues, and have a constructive back-and-forth conversation with R1, who remains unconvinced about significance of the empirical results and thus the conclusion of the overall paper. After the discussion period, R3 indicated that they weakly favored acceptance but agreed that the paper had significant presentation issues and would not strongly advocate for it. R1 advocated for Reject, given the concerns identified in their reviews and followup comments. Given the split decision, the AC also read the paper. While the work clearly has merit, we agree with R1's comment that it is overall a ""potentially interesting idea, but the justification and presentation/quantification of results is not good enough in the submitted paper,"" and feel the paper really needs a revision and another round of peer review before publication. ",ICLR2020, +wnSPXulBncC,1610040000000.0,1610470000000.0,1,xGZG2kS5bFk,xGZG2kS5bFk,Final Decision,Accept (Poster),"Reviews for this paper were quite mixed (7744), and none were exactly borderline. All reviews were detailed and informative, as was the rebuttal. The main criticisms were (1) lack of detail in the experiments, and some missing evaluation (2) missing related work, (3) overall lack of polish (mentioned among positive reviews too), and (4) some unsubstantiated claims. Positively, reviewers praise the novelty, dataset, the demo, and some reviewers found the experiments mostly convincing. + +Ultimately this is still a borderline decision. The rebuttal does appear to address many of the claims about missing evaluation, and the complaints about polish can be easily addressed. I think the unsubstantiated claims are reasonably rebutted too. Related work doesn't seem to be addressed in the rebuttal.",ICLR2021, +rcMYFfzF4M,1610040000000.0,1610470000000.0,1,EGdFhBzmAwB,EGdFhBzmAwB,Final Decision,Accept (Spotlight),"This paper provides a novel generalization bound for neural networks using knowledge distillation. In particular, they argue that + +""test error <= training error + distillation error + distillation complexity"" where the distillation complexity is typically much smaller than the original complexity of the neural network. This is motivated by the empirical findings that neural networks can typically be significantly compressed in practice using KD without losing too much accuracy. + + +I found this result novel and the direction is very promising. This is a clear accept for ICLR. ",ICLR2021, +9sUnjBcKp26,1642700000000.0,1642700000000.0,1,3mm5rjb7nR8,3mm5rjb7nR8,Paper Decision,Reject,"This submission received four high-quality reviews and there are a lot of meaningful discussions during the author response period. After the discussions, all four reviewers agreed that the submission can be strengthened in a number of ways, including more solid experimental results and a justification for the correctness of the ELBO. The AC agrees. The authors are encouraged to revise the paper based on the reviews for the next venue.",ICLR2022, +DfU1kBE-ej,1576800000000.0,1576800000000.0,1,HyevIJStwH,HyevIJStwH,Paper Decision,Accept (Spotlight),"Quoting a reviewer for a very nice summary: + +""In this work, the authors suggest a new point of view on generalization through the lens of the distribution of the per-sample gradients. The authors consider the variance and mean of the per-sample gradients for each parameter of the model and define for each parameter the Gradient Signal to Noise ratio (GSNR). The GSNR of a parameter is the ratio between the mean squared of the gradient per parameter per sample (computed over the samples) and the variance of the gradient per parameter per sample (also computed over the samples). The GSNR is promising as a measure of generalization and the authors provide a nice leading order derivation of the GSNR as a proxy for the measure of the generalization gap in the model."" + +The majority of the reviewers vote to accept this paper. We can view the 3 as a weak signal as that reviewer stated in his review that he struggled to rate the paper because it contained a lot of math.",ICLR2020, +pOsxAsCqfRo,1642700000000.0,1642700000000.0,1,qynwf18DgXM,qynwf18DgXM,Paper Decision,Reject,"Ricci flow is a central topic in geometric analysis. It has had stunning applications in mathematics, most notably the proof of the Poincare conjecture. The major issue is that it, while it can be used to make a manifold more well-behaved, it frequently develops singularities. The main contribution fo this paper is in introducing linearly nearly Euclidean metrics. They give a proof of convergence in both short and infinite time, under the Ricci-DeTurk flow, and exploit connections to information geometric and mirror descent to develop methods for approximating the gradient flow. The paper is confusingly written (compounded by poor organizational structure and many grammatical mistakes). Perhaps the biggest issue is that it does not have any clear relevance to machine learning. Some sections mention connections to neural networks, but the reviewers found these sections to be indecipherable.",ICLR2022, +7S5DU03Dhd,1576800000000.0,1576800000000.0,1,HyxwZRNtDr,HyxwZRNtDr,Paper Decision,Reject,"While the reviewers agreed that the problem of learning robust policies is an important one, there were a number of major concerns raised about the paper, and as a result I would recommend that the paper not be accepted at this time. The important points are: (1) limited novelty in light of prior work in this area (see R2 and R3); (2) a number of missing comparisons (see R2). There is also a bit of confusion in the reviews, which I think stems from a somewhat unclear statement in the paper of the problem formulation. While there is nothing wrong with assuming access to a parameterized simulator and studying robustness under parametric variation, this is of course a much stronger assumption than some prior work on robust reinforcement learning. Clarity on this point is crucial, and there are a large number of prior methods that can likely do well in this setting (e.g., based on system ID, etc.).",ICLR2020, +H8JxNMBdVe,1576800000000.0,1576800000000.0,1,S1ervgHFwS,S1ervgHFwS,Paper Decision,Reject,"This paper shows an theoretical equivalence between the L2 PGD adversarial training and operator norm regularization. It gives an interesting observation and support it from both theoretical arguments and practical experiments. There has been a significant discussion between the reviewers and authors. Although the authors made efforts in rebuttal, it still leaves many places to improve and clarify, especially in improving the mathematical rigor of the proof and experiments using state-of-the-art networks. + +",ICLR2020, +rkg622tZx4,1544820000000.0,1545350000000.0,1,ByxAOoR5K7,ByxAOoR5K7,"Interesting and novel work, but with a severe theoretical flaw",Reject,"The paper studies RL from a rate-distortion (RD) theory perspective. A new actor-critic algorithm is developed and evaluated on a series of 2D grid worlds. + +The paper has some novel idea, and the connection of RL to RD is quite new. This seems like an interesting direction that is worth further investigation. On the other hand, all reviewers agreed there is a severe flaw in this work, casting a doubt where RD can be directly applied to an RL setting because the distribution is not fixed (unlike in standard RD). This issue could have been addressed empirically, by running controlled experiments, something the the paper might include in a future version.",ICLR2019,4: The area chair is confident but not absolutely certain +tx63Bx9D5U5,1610040000000.0,1610470000000.0,1,u2YNJPcQlwq,u2YNJPcQlwq,Final Decision,Accept (Poster),"The paper advocates empowerment for stabilising dynamical systems, the dynamics of which is estimated with Gaussian channels. Baseline comparisons have improved and that makes the experimental section good. While initial versions of the paper were problematic, all reviewer issues have been addressed and acceptance is almost unanimous.",ICLR2021, +B1gbRJzexN,1544720000000.0,1545350000000.0,1,BkesJ3R9YX,BkesJ3R9YX,Solid work but novelty concerns held back the paper from rising above acceptance threshold,Reject,"Strengths: The paper presentation was assessed as being of high quality. Experiments were diverse in terms of datasets and tasks. + +Weaknesses: Multiple reviewers commented that the paper does not present substantial novelty compared to previous work. + +Contention: One reviewer holding out on giving a stronger rating to the paper due to the issue of novelty. + +Consensus: Final scores were two 6s one 3. + +This work has merit, but the degree of concern over the level of novelty leads to an aggregate rating that is too low to justify acceptance. Authors are encourage to re-submit to another venue. +",ICLR2019,4: The area chair is confident but not absolutely certain +0PxM0kJ6Uj,1610040000000.0,1610470000000.0,1,CBmJwzneppz,CBmJwzneppz,Final Decision,Accept (Poster),"This paper analyzes a version of optimistic value iteration with generalized linear function approximation. Under an optimistic closure assumption, the algorithm is shown to enjoy sublinear regret. The paper also studies error propagation through backups that do not require closed-form characterization of dynamics and reward functions. + +Overall, this is a solid contribution and the consensus is to accept.",ICLR2021, +Kk6JiG3VBQ,1576800000000.0,1576800000000.0,1,HJgS7p4FPH,HJgS7p4FPH,Paper Decision,Reject,"The paper presented a detailed discussion on the implementation of a library emulating Atari games on GPU for efficient reinforcement learning. The analysis is very thoroughly done. The major concern is whether this paper is a good fit to this conference. The developed library would be useful to researchers and the discussion is interesting with respect to system design and implementation, but the technical depth seems not sufficient.",ICLR2020, +WEr5T6kGwQj,1642700000000.0,1642700000000.0,1,8gX3bY78aCb,8gX3bY78aCb,Paper Decision,Reject,"The paper introduces a graph neural network for molecules which takes into account motif-level relationships. The paper received borderline reviews, with three reviewers voting for reject, and one for accept. After the rebuttal, the reviewers did not change their scores. Overall, it seems that the paper has some merit, with good experimental results. Nevertheless, it suffers from two issues (i) the positioning with respect to other motif-based approaches is not clear enough, making the novelty hard to assess; (ii) there is a lot of room for improvement in terms of clarity. Therefore, the area chair follows the majority of the reviewers' recommendations and recommends a reject.",ICLR2022, +r1NprkTSG,1517250000000.0,1517260000000.0,666,ByhthReRb,ByhthReRb,ICLR 2018 Conference Acceptance Decision,Reject,"This work deals with the important task of capturing named entities in a goal-directed setting. The description of the work and the experiments are not ready for publication; for example, it is unclear whether the proposed method would have an advantage over existing methods such as the match type features that are only mentioned in Table 3 for establishing the baseline on the original bAbI dialogue dataset, but not even discussed in the paper.",ICLR2018, +YJswe4ejDS,1576800000000.0,1576800000000.0,1,SkxWnkStvS,SkxWnkStvS,Paper Decision,Reject,"This paper proposes a graphon-based search space for neural architecture search. Unfortunately, the paper as currently stands and the small effect sizes in the experimental results raise questions about the merits of actually employing such a search space for the specific task of NAS. The reviewers expressed concerns that the results do not convincingly support graphon being a superior search space as claimed in the paper. +",ICLR2020, +HylSlRTGeV,1544900000000.0,1545350000000.0,1,S1MB-3RcF7,S1MB-3RcF7,"A well written paper on multi-discriminator GAN training, but just below the bar in empirical results",Reject,"The reviewers found that paper is well written, clear and that the authors did a good job placing the work in the relevant literature. The proposed method for using multiple discriminators in a multi-objective setting to train GANs seems interesting and compelling. However, all the reviewers found the paper to be on the borderline. The main concern was the significance of the work in the context of existing literature. Specifically, the reviewers did not find the experimental results significant enough to be convinced that this work presents a major advance in GAN training. ",ICLR2019,5: The area chair is absolutely certain +Bm_X1119cx,1642700000000.0,1642700000000.0,1,TCl7CbQ29hH,TCl7CbQ29hH,Paper Decision,Reject,"In my opinion, this is a cool idea, but could use a few more test settings to evaluate the general applicability of their method. It would be interesting to see if the method generalizes to a non-reference based task. + +Strengths: +Novel method that explores the interaction of color masks for learning to prompt about regions in images by identifying the color region they correspond to +Paper contains extensive ablation studies & discussions + +Weaknesses: +Experimental results are run on uncommon benchmarks, making it difficult to compare to SOTA V+L methods +Consequently, it’s not clear that this method would generalize beyond visual grounding to tasks such as VQA or captioning",ICLR2022, +Q1PzWudaULg,1642700000000.0,1642700000000.0,1,EVqFdCB5PfV,EVqFdCB5PfV,Paper Decision,Reject,"Strength +* The paper is relatively clearly written. +* The proposed method appears to be sound. + +Weakness +* The novelty of the work seems to be limited. +* The experiment part needs significant improvements. The comparison with existing methods may not be fair. Evaluation of efficiency should be given. There are also detailed investigations that need to be conducted, as indicated by the reviewers. +* There are technical issues that need to be addressed.",ICLR2022, +B1l91_a0J4,1544640000000.0,1545350000000.0,1,SygvZ209F7,SygvZ209F7,worth discussing more,Accept (Poster),"This heavily disputed paper discusses a biologically motivated alternative to back-propagation learning. In particular, methods focussing on sign-symmetry rather than weight-symmetry are investigated and, importantly, scaled to large problems. The paper demonstrates the viability of the approach. If nothing else, it instigates a wonderful platform for debate. + +The results are convincing and the paper is well-presented. But the biological plausibility of the methods needed for these algorithms can be disputed. In my opinion, these are best tackled in a poster session, following the good practice at neuroscience meetings. + +On an aside note, the use of the approach to ResNet should be questioned. The skip-connections in ResNet may be all but biologically relevant.",ICLR2019,4: The area chair is confident but not absolutely certain +ByaciMLul,1486400000000.0,1486400000000.0,1,HJ0NvFzxl,HJ0NvFzxl,ICLR committee final decision,Accept (Oral),"The idea of building a graph-based differentiable memory is very good. The proposed approach is quite complex, but it is likely to lead to future developments and extensions. The paper has been much improved since the original submission. The results could be strengthened, with more comparisons to existing results on bAbI and baselines on the experiments here. Exploring how it performs with less supervision, and different types of supervision, from entirely labeled graphs versus just node labels, would be valuable.",ICLR2017, +EaIRmtuVpGK,1642700000000.0,1642700000000.0,1,iMqTLyfwnOO,iMqTLyfwnOO,Paper Decision,Accept (Poster),"The paper presents a variant of sliced wasserstein distance , where the slicing operation is performed with a neural network. The resulting distance is studied and experiments on synthetic data and as cost in generative modeling are performed. + +While the idea of the paper is not that novel, the work is overall well executed. Reviewers agreed that the paper is borderline weak accept. Accept as a poster.",ICLR2022, +xwAsBveKR54,1642700000000.0,1642700000000.0,1,rczz7TUKIIB,rczz7TUKIIB,Paper Decision,Reject,"The meta-learning framework based on learning the loss function for time series forecasting is an interesting and important topic. However, the reviewers think the literature, baselines, and experimental results need significant improvement.",ICLR2022, +KGxzqHQITOb,1610040000000.0,1610470000000.0,1,4artD3N3xB0,4artD3N3xB0,Final Decision,Reject,"The reviewers felt that the idea of learning a posterior distribution on optimization algorithms is very novel. However, the negative flip side of this novelty was that it was not clear how the prior and likelihood were defined so that Bayes rule could be approximated. The three reviewers appeared to find the paper somewhat confusing, and while the authors' made significant changes, it would be better to resubmit for a new set of reviews of the revised paper.",ICLR2021, +U7MVuakyqpW,1642700000000.0,1642700000000.0,1,di0r7vfKrq5,di0r7vfKrq5,Paper Decision,Reject,"This paper presents a method to improve search engines; the method is designed based on the BM25 retrieval method and is evaluated on NQ open dataset. The reviewers agree that the motivation is interesting and implementation is reasonable, but the authors have only showed the impact of their approach over one retrieval method and one dataset, which is limited and does not show if the method is general enough or not.",ICLR2022, +7qPDF3AmMk,1610040000000.0,1610470000000.0,1,RGJbergVIoO,RGJbergVIoO,Final Decision,Accept (Oral),"Two knowledgeable reviewers were positive 7 and very positive 10 about this paper, considering it an important contribution that illuminates previously unknown aspects of two classic models, namely RBMs and Hopfield networks. They considered the work very well developed, theoretically interesting and also of potential practical relevance. A third reviewer initially expressed some reservations in regard to the inverse map from RBMs to HNs and the experiments. Following the authors' responses, which the reviewer found detailed and informative, he/she significantly raised his/her score to 7, also emphasizing that he/she hoped to see the paper accepted. With the unanimously positive feedback, I am recommending the paper to be accepted. ",ICLR2021, +S1g5LhAXl4,1544970000000.0,1545350000000.0,1,r1ez_sRcFQ,r1ez_sRcFQ,Majority reject.,Reject,"Based on the majority of reviewers with reject (ratings: 4,6,3), the current version of paper is proposed as reject. ",ICLR2019,3: The area chair is somewhat confident +sGliUlczcs,1610040000000.0,1610470000000.0,1,3rRgu7OGgBI,3rRgu7OGgBI,Final Decision,Reject,"The authors propose an alternative fine-tuning procedure by introducing a projection head and two new losses to be combined with the vanilla cross-entropy loss. The authors introduce and jointly optimize the standard cross-entropy loss, the contrastive cross-entropy loss for classifier head and the categorical contrastive learning loss for projector head in an end-to-end fashion. The authors empirically confirmed that this setup compares favorably to existing baselines. + +The reviewers found the setting challenging and worth investigating. The idea of exploring the intrinsic structure of the downstream task to help with fine-tuning was deemed useful. The reviewers appreciated the thorough empirical validation. While the proposed approach was not yet explored in this specific context, most reviewers were concerned with the lack of novelty. In addition, there seems to be a large gap between the quality of exposition in the introduction and results section with respect to the rest of the paper which introduces confusion. + +As it currently stands, the paper is not yet ready for publication and I will recommend rejection. To improve the manuscript the authors should incorporate the received feedback and significantly improve the exposition and justification of the proposed loss. In terms of empirical results, the authors should also explore alternative neural architectures to validate whether the proposed approach is general and whether the need for hyperparameter tuning arises. +",ICLR2021, +Z39eSiMEp-,1576800000000.0,1576800000000.0,1,ryeRn3NtPH,ryeRn3NtPH,Paper Decision,Reject,"The paper proposes an adversarial inductive transfer learning method that handles distribution changes in both input and output spaces. + +While the studied problem is interesting, reviewers have major concerns about the incremental modeling contribution, the lack of comparative study to existing methods and ablation study to disentangling different modules. Overall, the current study is less convincing from either theoretical analysis or experimental results. + +Hence I recommend rejection.",ICLR2020, +ry5E2M8Ox,1486400000000.0,1486400000000.0,1,Sy8gdB9xx,Sy8gdB9xx,ICLR committee final decision,Accept (Oral),"The authors report the experimental findings of a fascinating inquiry on the ability of the deep neural networks to fit randomly labelled data. The investigation is sound, enlightening, and inspiring. The authors propose both a) a theoretical example showing that a simple shallow network with a number of parameters large enough wrt sample size yields perfect finite-sample expressivity; b) a systematic extensive experimental evaluation to support the findings and claims. The experimental evaluation is a model of thoroughness. + + This is definitely groundbreaking work, which will inspire many works in the coming years.",ICLR2017, +H1evgS8QeV,1544930000000.0,1545350000000.0,1,S1fDssA5Y7,S1fDssA5Y7,Needs rewrite,Reject,"This paper received high quality reviews, which highlighted numerous issues with the paper. A common criticism was that the results in the paper seemed disconnected. Numerous technical concerns were raised. Reading the responses, it seems that some of these issues are nonissues, but it seems also that the writing was not sufficiently up to the standard required of this type of technical work. I suggest the authors produce a rewrite and resubmit to the next ML conference, taking the criticisms they've received here very seriously.",ICLR2019,5: The area chair is absolutely certain +aM4mtR91Ybg,1642700000000.0,1642700000000.0,1,Qaw16njk6L,Qaw16njk6L,Paper Decision,Accept (Poster),"This paper tackles a very timely problem. +Scores of 5,6,6,8 put it in the borderline region, but in the private discussion the more negative reviewer noted that they would also be OK with the paper being accepted. I therefore recommend acceptance. +Going through the paper I missed any mention of available source code. I strongly recommend that the authors make code available; this would greatly increase the paper's impact.",ICLR2022, +BylIi8qegN,1544750000000.0,1545350000000.0,1,BJleciCcKQ,BJleciCcKQ,Work lacks novelty and experimental validation,Reject,"In this work, the authors conduct experiments using variants of RNNs and Gated CNNs on a speech recognition task, motivated by the goal of reducing the computational requirements when deploying these models on mobile devices. +While this is an important concern for practical deployment of ASR systems, the main concerns expressed by the reviewers is that the work lacks novelty. Further, the authors choice to investigate CTC based systems which predict characters. These models are not state-of-the-art for ASR, and as such it is hard to judge the impact of this work on a state-of-the-art embedded ASR system. Finally, it would be beneficial to replicate results on a much larger corpus such as Librispeech or Switchboard. Based on the unanimous decision from the reviewers, the AC agrees that the work, in the present form, should be rejected. +",ICLR2019,5: The area chair is absolutely certain +M-W6t2k1i0,1576800000000.0,1576800000000.0,1,B1lOraEFPB,B1lOraEFPB,Paper Decision,Reject,"The paper builds a transition-based dependency parser for Amharic, first predicting transitions and then dependency labels. The model is poorly motivated, and poorly described. The experiments have serious problems with their train/test splits and lack of baseline. The reviewers all convincingly argue for reject. The authors have not responded. ",ICLR2020, +sL5yr3jv-C-,1642700000000.0,1642700000000.0,1,I-nQMZfQz7F,I-nQMZfQz7F,Paper Decision,Reject,"The paper initially received negative reviews; the authors did a good job during the response period: two reviewers have updated their scores to 6. The AC has carefully read the reviews, responses, and discussions, and agreed that the authors have also mostly addressed the concerns of reviewer gsUt as well. It is unprofessional for reviewer gsUt to not engage in discussions after multiple requests. + +The AC however also agrees with reviewer seqp that the new changes are major, and submissions are supposed to be evaluated in their initial form. Further, neither of the positive reviewers would like to champion the paper. + +The final recommendation is to reject the paper. The authors are encouraged to further improve and flesh out the paper based on the reviews for the next venue.",ICLR2022, +SyxXGCsEl4,1545020000000.0,1545350000000.0,1,S1gQ5sRcFm,S1gQ5sRcFm,"interesting model, weak experimental section",Reject,"This paper proposes a probabilistic model for data indexed by an observed parameter (such as time in video frames, or camera locations in 3d scenes), which enables a global encoding of all available frames and is able to sample consistently at arbitrary indexes. Experiments are reported on several synthetic datasets. + +Reviewers acknowledged the significance of the proposed model, noted that the paper is well-written, and the design choices are sounds. However, they also expressed concerns about the experimental setup, which only includes synthetic examples. Although the authors acknowledged during the response phase that this is indeed a current limitation, they argued it is not specific to their particular architecture, but to the task itself. Another concern raised by R1 is the lack of clarity in some experimental setups (for instance where only a subset of the best runs are used to compute error bars, and this subset appears to be of different size depending on the experiment, cf fig 5), and the fact that the datasets used in this paper to compare against GQNs are specifically designed. + +Overall, this is a really borderline submission, with several strengths and weaknesses. After taking the reviewer discussion into account and making his/her own assessment, the AC recommends rejection at this time, but strongly encourages the authors to resubmit their work after improving their experimental setup, which will make the paper much stronger.",ICLR2019,4: The area chair is confident but not absolutely certain +B1eEebvBlE,1545070000000.0,1545350000000.0,1,rJgYxn09Fm,rJgYxn09Fm,a promising idea,Accept (Poster),This paper proposed an interesting approach to weight sharing among CNN layers via shared weight templates to save parameters. It's well written with convincing results. Reviewers have a consensus on accept.,ICLR2019,5: The area chair is absolutely certain +rk4CjGUux,1486400000000.0,1486400000000.0,1,Sywh5KYex,Sywh5KYex,ICLR committee final decision,Reject,"Although this was a borderline paper, the reviewers ultimately concluded that, given how easy it would be for a practitioner to independently devise the methodological trick of the paper, the paper did not demonstrate that the idea was sufficiently useful to merit acceptance.",ICLR2017, +Hkl0IAVxgN,1544730000000.0,1545350000000.0,1,BkltNhC9FX,BkltNhC9FX,One of the better papers at the conference,Accept (Poster),"The reviewers of this paper agreed that it has done a stellar job of presenting a novel and principled approach to attention as a latent variable, providing a new and sound set of inference techniques to this end. This builds on top of a discussion of the limitations of existing deterministic approaches to attention, and frames the contribution well in relation to other recurrent and stochastic approaches to attention. While there are a few issues with clarity surrounding some aspects of the proposed method, which the authors are encouraged to fine-tune in their final version, paying careful attention to the review comments, this paper is more or less ready for publication with a few tweaks. It makes a clear, significant, and well-evaluate contribution to the field of attention models in sequence to sequence architectures, and will be of great interest to many attendees at ICLR.",ICLR2019,5: The area chair is absolutely certain +38xYJdWcURv,1610040000000.0,1610470000000.0,1,ZqB2GD-Ixn,ZqB2GD-Ixn,Final Decision,Reject,"This paper deals with domain generalization with causal modeling. Specifically, it considers a broader class of distribution shifts, arising from the system intervention perspective, and proposes some robust learning principle to achieve domain generalization. The paper is well written and has some interesting ideas. However, as pointed by Reviewers #1 and #4, the exact problem setting should be made more explicit, the theory and algorithm should be more consistent, and some very relevant contributions in the literature should be discussed or compared with. ",ICLR2021, +ByesZWe-lE,1544780000000.0,1545350000000.0,1,B1x5KiCcFX,B1x5KiCcFX,Interesting theoretical results but should clarify contribution and improve math rigor ,Reject,"This paper provides a theoretical analysis of GANs, showing its advantages when the measure satisfies the disconnected support property. Its main theoretical results are interesting, but the reviews and discussion shows some misleading places. It was also found some of the claims and proof are not mathematically rigorous. ",ICLR2019,5: The area chair is absolutely certain +yM4Y_BPtYj,1576800000000.0,1576800000000.0,1,HkxnclHKDr,HkxnclHKDr,Paper Decision,Reject,"This paper proposes a methodology for learning a representation given multiple demonstrations, by optimizing the representation as well as the learned policy parameters. The paper includes some theoretical results showing that this is a sensible thing to do, and an empirical evaluation. + +Post-discussion, the reviewers (and me!) agreed that this is an interesting approach that has a lot of promise. But there was still concern about he empirical evaluation and the writing. Hence I am recommending rejection.",ICLR2020, +Y6qohXMNn,1576800000000.0,1576800000000.0,1,Hkl1iRNFwS,Hkl1iRNFwS,Paper Decision,Accept (Poster),"This paper studies numerous ways in which the statistics of network weights evolve during network training. Reviewers are not entirely sure what conclusions to make from these studies, and training dynamics can be strongly impacted by arbitrary choices made in the training process. Despite these issues, the reviewers think the observed results are interesting enough to clear the bar for publication.",ICLR2020, +r1gcb0RBgV,1545100000000.0,1545350000000.0,1,Bkg8jjC9KQ,Bkg8jjC9KQ,Some progress for analysis of non-monotone variational inequalities,Accept (Poster),"This paper investigates the usage of the extragradient step for solving saddle-point problems with non-monotone stochastic variational inequalities, motivated by GANs. The authors propose an assumption weaker/diffrerent than the pseudo-monotonicity of the variational inequality for their convergence analysis (that they call ""coherence""). Interestingly, they are able to show the (asympotic) last iterate convergence for the extragradient algorithm in this case (in contrast to standard results which normally requires averaging of the iterates for the stochastic *and* mototone variational inequality such as the cited work by Gidel et al.). The authors also describe an interesting difference between the gradient method without the extragradient step (mirror descent) vs. with (that they called optimistic mirror descent). + +R2 thought the coherence condition was too related to the notion of pseudo-monoticity for which one could easily extend previous known convergence results for stochastic variational inequality. The AC thinks that this point was well answered by the authors rebuttal and in their revision: the conditions are sufficiently different, and while there is still much to do to analyze non variational inequalities or having realistic assumptions, this paper makes some non-trivial and interesting steps in this direction. The AC thus sides with expert reviewer R1 and recommends acceptance.",ICLR2019,4: The area chair is confident but not absolutely certain +srBQrsdaMUt,1642700000000.0,1642700000000.0,1,ba81PoR_k1p,ba81PoR_k1p,Paper Decision,Reject,"This work proposed a nested evolutionary algorithm to choose image filters and filter parameters for back-box attacks, with the emphasize of high transferability. + +After reading the manuscript, the comments of reviewers and the authors' responses, I think the main issues of this work include: +1. The limited novelty of the main idea, since there have been many filter-based attacks, and this work is very close to an existing work; +2. The solution is not new, since the evolutionary method is also well adopted in adversarial attacks; +3. Many many black-box attack methods are not cited and compared, though the authors argued that their perturbation upper bound are different such that they cannot be compared, which is not convincing; +4. The claimed high transferability is not well explained, maybe due to the model ensemble (as indicated by reviewer eN8o). Besides, many existing works that studied transferability are not cited and compared. +5. Experiments are inadequate. The authors added some results in the revised version, but the current shape is still not ready for publication. + +Thus, my recommendation is reject. Hope the reviews can help to improve this work in future.",ICLR2022, +n3ovhjtUFE,1576800000000.0,1576800000000.0,1,B1e5TA4FPr,B1e5TA4FPr,Paper Decision,Reject,"This manuscript outlines procedures to address fairness as measured by disparity in risk across groups. The manuscript is primarily motivated by methods that can achieve ""no-harm"" fairness, i.e., achieving fairness without increasing the risk in subgroups. + +The reviewers and AC agree that the problem studied is timely and interesting. However, in reviews and discussion, the reviewers noted issues with clarity of the presentation, and sufficient justification of the results. The consensus was that the manuscript in its current state is borderline, and would have to be significantly improved in terms of clarity of the discussion, and possibly improved methods that result in more convincing results. +",ICLR2020, +_AaL1eANdZc,1642700000000.0,1642700000000.0,1,pzgENfIRBil,pzgENfIRBil,Paper Decision,Reject,"This paper presents a numerical approach to solving the multi-body Schrodinger equation. Three reviews give low confidence scores and the one review with high confidence, and high score, is very brief and the reviewer appears to have a weak background in this area. My feeling is that the ICLR reviewer pool does not contain reviewers who are really competent to review this paper. There is a large literature in the physics community on this problem and the paper should be reviewed in an appropriate venue. This is especially true for evaluating the empirical results. If the mathematical techniques are relevant to general machine learning, and the authors want to have an impact on machine learning community, then it should be possible to give empirical results on a problem commonly used to evaluate machine learning methods at machine learning venues. Whether or not this is important for physics should be judged by physicists. In any case, the reviews are for the most part not enthusiastic.",ICLR2022, +H1gkAJGeg4,1544720000000.0,1545350000000.0,1,SkGuG2R5tm,SkGuG2R5tm,novel and effective method for more effective quantisation of vectorial representations ,Accept (Poster),". Describe the strengths of the paper. As pointed out by the reviewers and based on your expert opinion. + +- The proposed method is novel and effective +- The paper is clear and the experiments and literature review are sufficient (especially after revision). + +2. Describe the weaknesses of the paper. As pointed out by the reviewers and based on your expert opinion. Be sure to indicate which weaknesses are seen as salient for the decision (i.e., potential critical flaws), as opposed to weaknesses that the authors can likely fix in a revision. + +The original weaknesses (mainly clarity and missing details) were adequately addressed in the revisions. + +3. Discuss any major points of contention. As raised by the authors or reviewers in the discussion, and how these might have influenced the decision. If the authors provide a rebuttal to a potential reviewer concern, it’s a good idea to acknowledge this and note whether it influenced the final decision or not. This makes sure that author responses are addressed adequately. + +No major points of contention. + +4. If consensus was reached, say so. Otherwise, explain what the source of reviewer disagreement was and why the decision on the paper aligns with one set of reviewers or another. + +The reviewers reached a consensus that the paper should be accepted.",ICLR2019,4: The area chair is confident but not absolutely certain +WcSVD0lF87,1610040000000.0,1610470000000.0,1,jNTeYscgSw8,jNTeYscgSw8,Final Decision,Reject,"The paper presents an extensive empirical evaluation of several loss functions and regularization techniques used in deep networks. The authors conclude that the classical softmax is significantly outperformed by the other approaches, but there is no clear winner among them. Moreover, the authors have noticed two interesting facts, (1) the choice of loss function affects only upper layers of neural networks with the lower layers being very similar to each other, (2) losses that result in greater class separation also result in higher accuracy, but their features are less transferable to other tasks. + +I agree with the authors that the comments of Reviewer 2 are shallow and not informative. Therefore, they were not taken into account in making the final decision and, as AC, I read the paper very carefully. Regarding Reviewer 4, however, I found his comments to be valid. There is a message that the authors want to communicate, and the reader that needs to decode this message using a noisy channel. Therefore, I encourage the authors to accordingly revise the paper to make the message much clearer. + +The experimental papers that compare a wide spectrum of methods are always hard to judge and this judgement is often very subjective. There are several seminal papers of this type, but not so many for several reasons. I agree with the authors that such studies are very valuable and give evaluation being not biased by authors of a given method. They are also very time- and resource-consuming. But there should be a general consensus how such an experiment should be performed. The authors of the particular methods should also be able to give right feedback to make their methods to be run appropriately. Therefore, there exist several websites and initiatives that try to fulfill these requirements. As said above, any paper of this type will be judged very subjectively. + +The discoveries made by authors should also be presented in a different way. One should start with a hypothesis that, for example, the lower layers are not affected by the loss function and then perform appropriate theoretical and empirical studies to verify the hypothesis. The same applies to the other discovery. In that way the message of the paper would be much clearer. I suppose that analysis of each of the discoveries deserves its own paper. + +",ICLR2021, +BSaIY281T_3,1642700000000.0,1642700000000.0,1,u6ybkty-bL,u6ybkty-bL,Paper Decision,Reject,"This paper has been reviewed by four experts. Their independent evaluations were consistent, all recommended rejection. I agree with that assessment as this paper is not ready for publication at ICLR in its current form. The reviewers have provided the authors with ample constructive feedback and the authors have been encouraged to consider this feedback if they choose to continue the work on this topic.",ICLR2022, +WynniSQQ5z,1576800000000.0,1576800000000.0,1,HJeFmkBtvB,HJeFmkBtvB,Paper Decision,Reject,"This paper presents a variant of the Noise Conditional Score Network (NCSN) which does score matching using a single Gaussian scale mixture noise model. Unlike the NCSN, it learns a single energy-based model, and therefore can be compared directly to other models in terms of compression. I've read the paper, and the methods, exposition, and experiments all seem solid. Numerically, the score is slightly below the cutoff; reviewers generally think the paper is well-executed, but lacking in novelty and quality of results relative to Song & Ermon (2019). +",ICLR2020, +VCj0bB3Pxu,1576800000000.0,1576800000000.0,1,HyxY6JHKwr,HyxY6JHKwr,Paper Decision,Accept (Poster),"The paper proposes and validates a simple idea of training a neural network for a parametric family of losses, using a popular AdaIN mechanism. +Following the rebuttal and the revision, all three reviewers recommend acceptance (though weakly). There is a valid concern about the overlap with an ICLR19-workshop paper with essentially the same idea, however the submission is broader in scope and validates the idea on several applications.",ICLR2020, +HJEgNJaHf,1517250000000.0,1517260000000.0,277,S1cZsf-RW,S1cZsf-RW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The paper proposes a new approach for scalable training of deep topic models based on amortized inference for the local parameters and stochastic-gradient MCMC for the global ones. The key aspect of the method involves using Weibull distributions (instead of Gammas) to model the variational posteriors over the local parameters, enabling the use of the reparameterization trick. The resulting methods perform slightly worse that the Gibbs-sampling-based approaches but are much faster at test time. Amortized inference has already been applied to topic models, but the use of Weibull posteriors proposed here appears novel. However, there seems to be no clear advantage to using stochastic-gradient MCMC instead of vanilla SGD to infer the global parameters, so the value of this aspect of WHAI unclear.",ICLR2018, +SkgvUTqegV,1544760000000.0,1545350000000.0,1,r1espiA9YQ,r1espiA9YQ,Interesting paper but Improvement and Clarification are needed,Reject,"This paper proposes a combination of SVGD and SLGD and analyzes its non-asymptotic properties based on gradient flow. This is an interesting direction to explore. Unfortunately, two major concerns have been raised regarding this paper: 1) the reviewers identified multiple technical flaws. Authors provided rebuttal and addressed some of the problems. But the reviewers think it requires significantly more improvement and clarification to fully address the issues. 2) the motivation of the combination of SVGD and SLGD, despite of being very interesting, is not very clearly motivated; by combining SVGD and SLGD, one get convergence rate for free from the SLGD part, but not much insight is shed on the SVGD part (meaning if the contribution of SLGD is zero, then the bound because vacuum). This could be misleading given that one of the claimed contribution is non-asymptotic theory of ''SVGD-style algorithms"" (rather than SLGD style..). We encourage the authors to addresses the technical questions and clarify the contribution and motivation of the paper in revision for future submissions. +",ICLR2019,5: The area chair is absolutely certain +v1KObiYty_,1576800000000.0,1576800000000.0,1,Syx9Q1rYvH,Syx9Q1rYvH,Paper Decision,Reject,"The manuscript concerns a mutual information maximization objective for dynamics model learning, with the aim of using this representation for planning / skill learning. The central claim is that this objective promotes robustness to visual distractors, compared with reconstruction-based objectives. The proposed method is evaluated on DeepMind Control Suite tasks from rendered pixel observations, modified to include simple visual distractors. + +Reviewers concurred that the problem under consideration is important, and (for the most part) that the presentation was clear, though one reviewer disagreed, remarking that the method is only introduced on the 5th page. A central sticking point was whether the method would reliably give rise to representations that ignore distractors and preferentially encode task information. (I would note that a very similar phenomenon to the behaviour they describe has been empirically demonstrated before in Warde-Farley et al 2018, also on DM Control Suite tasks, where the most predictable/controllable elements of a scene are reliably imitated by a goal-conditioned policy trained against a MI-based reward). The distractors evaluated were criticized as unrealistically stochastic, that fully deterministic distractors may confound the procedure; while a revised version of the manuscript experimented with *less* random distractors, these distractors were still unpredictable at the scale of more than a few frames. + +While the manuscript has improved considerably in several ways based on reviewer feedback, reviewers remain unconvinced by the empirical investigation, particularly the choice of distractors. I therefore recommend rejection at this time, while encouraging the authors to incorporate criticisms to strengthen a resubmission.",ICLR2020, +qVzMXAAfopn,1642700000000.0,1642700000000.0,1,qwBK94cP1y,qwBK94cP1y,Paper Decision,Accept (Spotlight),"This paper is a solid contribution to researchers in this field, as it provides a new idea for the basic problem of determining the direction of causality between two variables, using the functional causal model as a dynamical system and optimal transport.",ICLR2022, +zNhs71lX3,1576800000000.0,1576800000000.0,1,rkeYL1SFvH,rkeYL1SFvH,Paper Decision,Reject,"The authors present an approach to large scale bitext extraction from Wikipedia. This builds heavily on previous work, with the novelty being somewhat minor efficient approximate K-nearest neighbor search and language agnostic parameters such as cutoffs. These techniques have not been validated on other data sets and it is unclear how well they generalise. The major contribution of the paper is the corpus created, consisting of 85 languages, 1620 language pairs and 135M parallel sentences, of which most do not include English. This corpus is very valuable and already in use in the field, but IMO ICLR is not the right venue for this kind of publication. There were four reviews, all broadly in agreement, and some discussion with the authors. +",ICLR2020, +BkgwG0KxgV,1544750000000.0,1545350000000.0,1,SklR_iCcYm,SklR_iCcYm,meta-review,Reject,"The paper proposes a filtering technique to use less training examples in +order to train faster; the filtering step is done with an autoencoder. +Experiments are done on CIFAR-10. Reviewers point to a lack of convincing +experiments, weak evidence, lack of experimental details. +Overall, all reviewers converge to reject this paper, and I agree with them.",ICLR2019,4: The area chair is confident but not absolutely certain +_IXq1EKwxeE,1610040000000.0,1610470000000.0,1,bfTUfrqL6d,bfTUfrqL6d,Final Decision,Reject,"In this paper, the authors proposed a reinforcement-learning-based model for aspect-based sentiment analysis. As raised by the reviewers, 1) the writing needs to be improved: e.g., presenting the details of the proposed method clearly, citing the references properly, etc. 2) related methods need to be implemented for comparison, 3) the reported results are not SOTA compared with existing methods. Moreover, some technical claims are not convincing, which need to be stated more carefully. + +In summary, based on its current shape, this paper is not ready to be published in ICLR.",ICLR2021, +S1uQNkprf,1517250000000.0,1517260000000.0,320,SJJQVZW0b,SJJQVZW0b,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"This method has a lot of strong points, but the reviewers had concerns about baselines, comparisons, and hand-engineered aspects of the method. The authors gave a strong rebuttal and made substantial updates to the paper to address the concerns. I think that this has saved the submission and tipped the balance towards acceptance. ",ICLR2018, +fs4p2Xp-I6y,1610040000000.0,1610470000000.0,1,t5lNr0Lw84H,t5lNr0Lw84H,Final Decision,Reject,"The paper presents a thorough comparison of different algorithms for multi-agent Deep RL methods. The conclusions of the paper is that across a variety of envionment and hyperparameter tuning, multi-agent PPO seems to peform well relatively to the competitors. + +The reviewers agreed that the paper fills a gap in the literature regarding a fair and thorough comparison of algorithm, and that the paper clearly presents the results. As it stands, the code to reproduce the experiments and the results are a useful contribution to the community. However, the reviewers felt the technical contribution to be below the bar for ICLR, as the paper does not help in understanding the differences between algorithms, or develop insights as to how to further improve algorithms. The large standard deviations of the various algorithms also makes the main experimental insight (MAPPO works consistently well) relatively weak. + +",ICLR2021, +P-62BKv03Ac,1642700000000.0,1642700000000.0,1,qDx6DXD3Fzt,qDx6DXD3Fzt,Paper Decision,Reject,"This paper aims for detecting not only clean OOD data, but also their adversarially manipulated ones. The authors propose a method for this goal, with no/marginal loss in clean test accuracy (say, Acc) and clean OOD detection accuracy (say, AUC), while existing methods for targeting the same goal suffers from low Acc and AUC. 3 reviewers are positive and 2 reviewers are negative. Reviewers and AC think that the proposed idea of merging a certified binary classifier for in-versus out-distribution with a classifier for the in-distribution task is interesting. However, AC thinks that experimental results are arguable as pointed out by reviewers. For example, in CIFAR-10, the proposed method outperforms the baseline (GOOD) with respect to Acc and AUC, but often significantly underperforms it with respect to GAUC (guaranteed AUC) or AAUC (adversarial AUC). Then, the question is which metric is more important? It is arguable to say whether Acc is more important than GAUC or AAUC. But, at least, AC thinks that AUC and AAUC (or GAUC) are equally important as adversarially manipulated OOD data is nothing but another OOD data made from the original clean OOD data. Hence, the superiority of the proposed method over the baseline is arguable in the experiments, and AC tends to suggest rejection. + +ps ... AC is also a bit skeptical on the motivation of this paper. What is the value of obtaining ""guaranteed AUC""? It is not the ""real/true"" worst case OOD performance, as it varies with respect to the tested clean OOD data. Namely, it is the worst case OOD performance just in a certain ""subset"" of OOD data, i.e., adversarially manipulated OOD data made from a certain clean OOD data. Hence, AC is curious about what is the value of establishing such a ""partial"" lower bound (rather than ""true"" lower bound considering all possible OOD data). AC thinks that the problem setup studied in this paper (and some previous papers) looks interesting/reasonable at the first glance, but feels somewhat artificial after a deeper look.",ICLR2022, +xtD-4hfVb1,1576800000000.0,1576800000000.0,1,HJgCF0VFwr,HJgCF0VFwr,Paper Decision,Accept (Poster),"This paper proposes a novel approach for pruning deep neural networks using non-parametric statistical tests to detect 3-way interactions among two nodes and the output. While the reviewers agree that this is a neat idea, the paper has been limited in terms of experimental validation. The authors provided further experimental results during the discussion period and the reviewers agree that the paper is now acceptable for publication at ICLR-2020. ",ICLR2020, +SkOJX1arM,1517250000000.0,1517260000000.0,49,BywyFQlAW,BywyFQlAW,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"The submission formulates self paced learning as a specific iterative mini-max optimization, which incorporates both a risk minimization step and a submodular maximization for selecting the next training examples. + +The strengths of the paper lie primarily in the theoretical analysis, while the experiments are somewhat limited to simple datasets: News20, MNIST, & CIFAR10. Additionally, the main paper is probably too long in its current form, and could benefit from some of the proof details being moved to the appendix. + +",ICLR2018, +TDvtFBHBRW,1576800000000.0,1576800000000.0,1,SJg5J6NtDr,SJg5J6NtDr,Paper Decision,Accept (Poster),"The paper proposed a meta-learning approach that learns from demonstrations and subsequent RL tasks. +The reviewers found this work interesting and promising. There have been some concerns regarding the clarity of presentation, which seems to be addressed in the revised version. Therefore, I recommend acceptance for this paper.",ICLR2020, +IWiiXEhimzy,1642700000000.0,1642700000000.0,1,6P6-N1gLQDC,6P6-N1gLQDC,Paper Decision,Reject,"The paper contains *fresh* new ideas connecting mental models and SCMs and providing interpretations (explanations) from DAG models learned from data, including those learned by using deep learning. The usefulness of the theory is illustrated with experiments. The paper contributes some theoretical results, but the presentation has serious issues. In general, the reviewers found the paper hard to follow due to a lack of clarity in some notations, definitions, and assumptions. + +The paper was discussed in-depth and at length, including the reviewers, the AC, and the senior AC. After all, the gap between the current writing and what is expected from the camera-ready is a bit too large, and we feel it could be a disservice to the authors and community to have the paper accepted in its current form, without passing through another round of reviews. Unfortunately, we do not have any version of ""conditional acceptance."" + +Having said that, we feel the paper has the potential for having a significant impact, and we appreciate the novelty of the proposed approach and the connection among different fields. To avoid issues in the future, we would like to suggest the authors pay attention to the detailed feedback provided by the reviewers, including the discussion and the conversation with the AC, following the exchange on Nov/28. Some examples of points that could make the presentation clearer include 1) clarifying the contributions and providing more examples of the theoretical results, 2) making explicit that the results work for Markovian and additivity models, and 3) perhaps changing the title accordingly.",ICLR2022, +ObWLaNgPaK,1610040000000.0,1610470000000.0,1,NZj7TnMr01,NZj7TnMr01,Final Decision,Reject,"This paper studies the problem of uncertainty estimation under distribution shift. The proposed approach (PAD) addresses this under-estimation issue, by augmenting the training data with inputs that the network has unjustified low uncertainty estimates, and asking the model to correct this under-estimation at those augmented datapoints. Results show promising improvement over a set of common benchmark tasks in uncertainty estimation, with comparisons to a number of existing approaches. + +All the reviewer agreed that the experiments are well conducted and the empirical results are very promising. However, they also had a shared concern on the justification of the approach. Reviewers are less willing to accept a paper merely for commending its empirical performance. + +I share the above concern as the reviewers, and I personally found the presentation of the approach a bit rush and disconnected from the motivation. For example, the current presentation feels like the method is motivated by BNNs but it is not clear to me how the proposed objective connects to the motivation. Also no derivation of the objective is included in either main text or appendix. + +In revision, I would suggest a focus on improving the clarity and theoretical justification of the proposed objective function.",ICLR2021, +YG9O6L-xoKZ,1610040000000.0,1610470000000.0,1,tHgJoMfy6nI,tHgJoMfy6nI,Final Decision,Accept (Poster),"The paper touches upon the problem of catastrophic forgetting in continual learning. The idea is to enhance experience reply by explanations of the decision/predictions made. Technically, this ""Remembering for the Right Reasons"" loss adds an explanation loss to continual learning. This is an interesting idea as also the reviewers agree on. I would like to encourage the authors to have consider a different abbreviation. RRR also stand for ""Right for the Right Reasons"" loss due to Ross et al.; the authors should use a different abbreviation and also mention the work of Ross et al. (Andrew Slavin Ross, Michael C. Hughes, Finale Doshi-Velez: Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations. IJCAI 2017: 2662-2670). Moreover, it might actually be interesting in moving towards interactive learning here as well, because continual learning may also suffer from confounders. Moreover, there is also a connection to HINT (Ramprasaath Ramasamy Selvaraju, Stefan Lee, Yilin Shen, Hongxia Jin, Shalini Ghosh, Larry P. Heck, Dhruv Batra, Devi Parikh: Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded. ICCV 2019: 2591-2600) as it also aims at keeping explanations close to each other. Indeed, they use a ranking loss and do not consider continual learning. Overall, a simple method that is shown empirically to help improving existing replay methods for class-incremental learning.",ICLR2021, +AZuvyQm-7Q,1610040000000.0,1610470000000.0,1,rmd-D7h_2zP,rmd-D7h_2zP,Final Decision,Reject,"The authors explore modeling the relationship between domain-slot pairs in multi-domain dialogue state tracking via use of special tokens in pre-trained contextualized word embeddings (i.e., one special token for each domain-slot pair or special tokens for the domain and the slot that are merged). Beyond this, the basic architecture is very similar to the TRADE architecture (and papers that build on this general slot-gate + slot-value classifier) for the fixed vocabulary setting. Experiments are conducted on the MultiWOZ 2.1/2.2 datasets, demonstrating impressive improvements over recent results. + +== Pros == ++ They demonstrate that domain-slot interdependencies can be modeled through special tokens for use with pre-trained embeddings. ++ The top-line empirical results are impressive. + +== Cons == +- Lack of a deep dive on the empirical analysis to show precisely why/where the proposed method is working better than existing work. +- The methodological advance is minimal beyond using better pre-trained embeddings. +- Only one dataset when others exist and this is largely an empirical paper. +- The writing is rushed and reads like a 'late-breaking' paper. + +Evaluating along the specified dimensions: +* Quality: The quality of the work was the primary concern of the reviewers. Specifically, this reads like a 'late breaking' paper where the table of results is impressive, but there isn't significant examination of the empirical results showing why/when it works relative to competing methods. Focusing just on Tables 2 & 3, much of the improvement is ostensibly really due to the more powerful embeddings. Contextualizing this wrt {SimpleTOD, TRADE, DSTQA, Picklist}, this appears a minor methodological innovation centered around the input embeddings. The empirical results are impressive, but may very well be a result of the more powerful pre-trained embeddings -- additional empirical analysis and discussion might be able to convince the reader otherwise, but is lacking here. +* Clarity: This is a very simple idea, so it should be easily understood by most familiar with the research area. That being said, the paper seems very rushed in general. +* Originality: This applies ideas used in many NLP applications to the dialogue-state tracking problem. As previously stated, the architecture is similar to several existing DST formulations -- where the core idea is to model slot-value interdependencies through the contextualized embeddings using special token. While not a trivial idea, it also is something that many could/would have put together. Until it is abundantly clear that this isn't really a study of how to apply larger pre-trained embeddings to DST problems, it isn't clear that this is a significant dialogue systems advance beyond the strong performance. +* Significance: As stated, this isn't a significant methodological advance. However, the empirical results appear very impressive -- although the reviewers expressed some concerns regarding the evaluation. Since this is largely empirical, one of the reviewers pointed out that additional relevant datasets now exist, which would significantly strengthen the case. + +In summary, the empirical results appear impressive, ostensibly setting the SoTA. However, there were several concerns regarding the novelty of the approach, if it is actually working better due to the reasons stated, sufficient analysis of the empirical results, amongst other things. Thus, despite the impressive results, the consensus evaluation was that this work is not ready for publication in its current form (even if the top-line results should be disseminated). +",ICLR2021, +tycScIiriSb,1610040000000.0,1610470000000.0,1,lSijhyKKsct,lSijhyKKsct,Final Decision,Reject,"This paper provides a simple approach to incorporate temporal information in RL algorithms. AC agrees with authors that simplicity is a virtue. As reviewers point out that experimentally the approach is not conclusively better (given that environments might be hand-chosen). Even R3 believes some reported improvements is within variance. Given the discussions, AC agrees that results do not seem convincing enough.",ICLR2021, +ByljjOTMx4,1544900000000.0,1545350000000.0,1,r1xrb3CqtQ,r1xrb3CqtQ,"Interesting problem, but work does not feel complete",Reject,"This paper studies the problem of heterogeneous domain transfer, for example across different data modalities. + +The comments of the reviewers are overlapping to a great extent. On the one hand, the reviewers and AC agree that the problem considered is very interesting and deserves more attention. + +On the other hand, the reviewers have raised concerns about the amount of novelty contained in this manuscript, as well as convincingness of results. The AC understands the authors’ argument that a simple method can be a feature and not a flaw, however this work still does not feel complete. Even within a relatively simple framework, it would be desirable to examine the problem from multiple angles and ""disentangle"" the effects of the different hypotheses – for example the reviewers have drawn attention to end-to-end training and comparison with other baselines. The points raised above, together with improving the manuscript (as commented by reviewers) would make this work more complete.",ICLR2019,4: The area chair is confident but not absolutely certain +n2UryrKFWZ,1576800000000.0,1576800000000.0,1,BJxiqxSYPB,BJxiqxSYPB,Paper Decision,Reject,"This paper proposes to augment training data for theorem provers by learning a deep neural generator that generates data to train a prover, resulting in an improvement over the Holophrasm baseline prover. The results were restricted to one particular mathematical formalism -- MetaMath, a limitation raised one by reviewer. + +All reviewers agree that it's an interesting method for addressing an important problem. However there were some concerns about the strength of the experimental results from R4 and R1. R4 in particular wanted to see results on more datasets, an assessment with which I agree. Although the authors argued vigorously against using other datasets, I am not convinced. For instance, they claim that other datasets do not afford the opportunity to generate new theorems, or the human proofs provided cannot be understood by an automatic prover. In their words, + +""The idea of theorem generation can be applied to other systems beyond Metamath, but realizing it on another system is highly nontrivial. It can even involve new research challenges. In particular, due to large differences in logic foundations, grammar, inference rules, and benchmarking environments, the generation process, which is a key component of our approach, would be almost completely different for a new system. And the entire pipeline essentially needs to be re-designed and re-coded from scratch for a new formal system, which can require an unreasonable amount of engineering."" + +It sounds like they've essentially tailored their approach for this one dataset, which limits the generality of their approach, a limitation that was not discussed in the paper. + +There is also only one baseline considered, which renders their experimental findings rather weak. For these reasons, I think this work is not quite ready for publication at ICLR 2020, although future versions with stronger baselines and experiments could be quite impactful. + + + + +",ICLR2020, +Bkxlc8JHxN,1545040000000.0,1545350000000.0,1,HJeOMhA5K7,HJeOMhA5K7,"Poor results, Evaluating knowledge or method to incorporate it?",Reject,"The paper considers the task of incorporating knowledge expressed as rules into column networks. The reviewers acknowledge the need for such techniques, like the flexibility of the proposed approach, and appreciate the improvements to convergence speed and accuracy afforded by the proposed work. + +The reviewers and the AC note the following as the primary concerns of the paper: +(1) The primary concerned raised by the reviewers was that the evaluation is focused on whether KCLN can beat one with the knowledge, instead of measuring the efficacy of incorporating the knowledge itself (e.g. by comparing with other forms of incorporating knowledge, or by varying the quality of the rules that were introduced), (2) Even otherwise, the empirical results are not significant, offering slight improvements over the vanilla CLN (reviewer 1), (3) There are concerns that the rule-based gates are introduced but gradients are only computed on the final layer, which might lead to instability, and (4) There are a number of issues in the presentation, where the space is used on redundant information and description of datasets, instead of focusing on the proposed model. + +The comments by the authors address some of these concerns, in particular, clarifying that the forms of knowledge/rules are not limited, however, they focused on simple rules in the paper. However, the primary concerns in the evaluation still remain: (1) it seems to focus on comparing against Vanilla-CLN, instead of focusing on the source of the knowledge, or on the efficacy in incorporating it (see earlier work on examples of how to evaluate these), and (2) the results are not considerably better with the proposed work, making the reviewers doubtful about the significance of the proposed work. + +The reviewers agree that the paper is not ready for publication.",ICLR2019,4: The area chair is confident but not absolutely certain +gR4qJy_xhkY,1642700000000.0,1642700000000.0,1,YRq0ZUnzKoZ,YRq0ZUnzKoZ,Paper Decision,Accept (Poster),"Description of paper content: + +The authors propose a dynamics model that can generalize to novel environments. The train and test MDPs have the same state and action spaces but different dynamics. Environment specific inference is achieved by estimating latent vectors Z that describe the non-stationary or variable part of the dynamics. These Z-s are inferred from trajectory segments in unlabeled environments. The Z-s are learned contrastively: Z-s from the same trajectory are pulled together, and Z-s from separate trajectories are pushed apart. However, to mitigate the error of distancing Z-s from different trajectories but the same environment, Z-s on trajectories with similar transitions are also pushed together using a soft clustering penalty. These losses are justified based on ideas from Pearl’s causal inference. + +Summary of paper discussion: + +The reviewers concluded that the contributions are conceptually interesting and “somewhat” novel. The reviewers felt that the empirical performance gains of the method over baselines were demonstrated but not extremely impressive.",ICLR2022, +RzkXwNrJp_J,1642700000000.0,1642700000000.0,1,9rKTy4oZAQt,9rKTy4oZAQt,Paper Decision,Reject,"This paper introduces a new approach for risk sensitive RL by using an objective that depends on the full distribution and can apply a weight to the resulting trajectory. The reviewers thought that focusing on more general and expressive objectives for RL is well motivated. However, they had a number of concerns of the current paper state, including its clarity in a number of sections and its relation to other work in risk-sensitive RL. The authors provided thoughtful responses but some concerns lingered around the prior concerns.",ICLR2022, +CwLv7g5VFJs,1642700000000.0,1642700000000.0,1,nhN-fqxmNGx,nhN-fqxmNGx,Paper Decision,Accept (Poster),"In the end, all reviewers agreed that this is a solid piece of work. However, there were also some doubts regarding the relevance of the block diagonal design and the underlying assumptions about the p/n ratio. The majority of the reviewers, on the other hand, had the impression that the positive aspects dominate the potential problems, and I also share this viewpoint. However, I'd like to encourage the authors to carefully address the points of criticism raised by the reviewers in their final version.",ICLR2022, +u1Rbfnelj,1576800000000.0,1576800000000.0,1,Byg-An4tPr,Byg-An4tPr,Paper Decision,Reject,"The authors propose a framework for relating adversarial robustness, privacy and utility and show how one can train models to simultaneously attain these properties. The paper also makes interesting connections between the DP literature and the robustness literature thereby porting over composition theorems to this new setting. + +The paper makes very interesting contributions, but a few key points require some improvement: +1) The initial version of the paper relied on an approximation of the objective function in order to obtain DP guarantees. While the authors clarified how the approximation impacts model performance in the rebuttal and revision, the reviewers still had concerns about the utility-privacy-robustness tradeoff achieved by the algorithm. + +2) The presentation of the paper seems tailored to audiences familiar with DP and is not easy for a broader audience to follow. + +Despite this limitations, the paper does make significant novel contributions on an improtant problem (simultaneously achieveing privacy, robustness and utility) and could be of interest. + +Overall, I consider this paper borderline and vote for rejection, but strongly encourage the authors to improve the paper wrt the above concerns and resubmit to a future venue.",ICLR2020, +HkbMnMLdg,1486400000000.0,1486400000000.0,1,Hk1l9Xqxe,Hk1l9Xqxe,ICLR committee final decision,Reject,"The paper studies an unusual and (apparently) challenging application -- segmentation of whale and bird songs. Though the authors claim that the applications is much more challenging than previous applications of the proposed techniques (in speech processing), the evaluation is very questionable (as there is no gold standard), there is no convincing comparison with other (potentially simpler techniques). Overall, the reviewers believe the work is not mature enough to be accepted at ICLR. + + + interesting dataset / task + + - novelty is limited + - evaluation is weak + - writing is poor",ICLR2017, +KGjxW_WODHv,1642700000000.0,1642700000000.0,1,dKLoUvtnq0C,dKLoUvtnq0C,Paper Decision,Reject,"This paper presents an approach to learn the solution operator of Markovian partial differential equations (PDEs) by combining the Fourier Neural Operator (FNO) with a hyper-network. In short, the hyper-network g_\theta(t) is trained to output the weights of a FNO f_w(x), which (given an initial condition) outputs the PDE solution at the time given to the hyper-network. The main claimed contributions of the proposed approach (as compared to, e.g., the original FNO architecture) are that the obtained solutions improve the learning accuracy at the supervision time points and that the solutions are able to interpolate and extrapolate to arbitrary times. + +The reviewers seemed to like the idea of using hyper-networks for modelling continuous-time FNOs. Several issues were raised by the reviewers, e.g., with respect to related work by Li et al. (2021, https://arxiv.org/abs/2106.06898), which I believe were addressed by the authors satisfactorily. However, it is still concerning that the reported performance for FNO in, e.g., 1d-Burgers does not quite match those recently published (Kovachki et al, 2021, https://arxiv.org/abs/2108.08481, Table 2). Although the authors did report additional results using GeLU, these results are still very different to those in Kovachki et al ( 2021) and it is unclear whether the improved performance is due to a lack of tuning the baseline FNO. I commend the authors for, as suggested by the reviewers, running more extrapolation experiments. However, I believe the reviewers also made a point about considering (training) times much longer than 1, as even the original FNO paper did this for Navier Stokes (NS) with T=50. + +Overall, the paper provides modest improvement wrt FNOs, although it does extend its capabilities to interpolation and extrapolation. The paper will also benefit from providing a brief overview of FNOs.",ICLR2022, +gDmxugn5toz,1642700000000.0,1642700000000.0,1,Ng8wWGXXIXh,Ng8wWGXXIXh,Paper Decision,Reject,"This manuscript was the object of a rich and lengthy discussion. The AC also felt compelled to read the paper in details and discussed it further with the SAC. + +The authors did a thorough job at addressing some of the reviewers points. The added results on cross-entropy loss and additional discussion, as well as the points made in ""Further Discussion on the Numerical Experiments"" are very much appreciated. + +However, significant concerns remain on establishing connections with prior work, including related ideas on invariance from the causality literature, so as to gain deeper understanding of the implications of the proposed objective. We also strongly encourage the authors to further work on strengthening their theoretical analysis to clearly demonstrate the value of the proposed approach. + +The proposed formulation is certainly thought provoking and we urge the authors to pursue their work in view of the above comments.",ICLR2022, +BYrMFwxlC5b,1642700000000.0,1642700000000.0,1,vJb4I2ANmy,vJb4I2ANmy,Paper Decision,Accept (Poster),"This paper introduces Noisy Feature Mixup: an extension of input mixup and manifold mixup to all layers of a neural net, for the purpose of improving robustness and generalization in supervised learning. Experimental validation supports the increased robustness to attacks on the input data. The reviewers find the paper well written and they appreciate the theoretical analysis as well as the empirical results. The reviewers did not identify any big problems, and their minor concerns were sufficiently addressed in the author reponse. I'm therefore happy to recommend accepting this paper.",ICLR2022, +IN20aY-YlNn,1610040000000.0,1610470000000.0,1,6s480DdlRQQ,6s480DdlRQQ,Final Decision,Reject,"The authors present a new set of trigger based backdoor attacks that use dynamic patterns that make detection harder. These attacks seem to be stronger with regards to state of the art attacks. + +Some weaknesses is the need for full whitebox access of the model. Several key references are missing, and the comparison with other backdoor attacks in unclear. + +Moreover, although the authors compare with trigger based backdoors, there are plenty of triggerless backdoors that can be viewed as dynamic, eg the attacks by +Bagdasaryan et al. http://proceedings.mlr.press/v108/bagdasaryan20a.html and works referencing this paper. + +The paper indeed provides an interesting path towards dynamic attacks, but the lack of comparisons with state of the art literature, and also the need for whitebox access/high poisoning rate significantly limit the novelty of this work. + +",ICLR2021, +Hydf2M8ux,1486400000000.0,1486660000000.0,1,Bk0FWVcgx,Bk0FWVcgx,ICLR committee final decision,Accept (Poster),"The paper presents an analysis of deep ReLU networks, and contrasts it with linear networks. It makes good progress towards providing a theoretical explanation of the difficult problem of characterizing the critical points of this highly nonconvex function. + +I agree with the authors that their approach is superior to earlier works based on spin-glass models or other approximations that make highly unrealistic assumptions. The ReLU structure allows them to provide concrete analysis, without making such approximations, as opposed to more general activation units. + +The relationship between smoothness of data distribution and the level of model overparamterization is used to characterize the ability to reach the global optimum. The intuition that having no approximation error (due to overparameterization) leads to more tractable optimization problem is intuitive. Such findings have been seen before (e.g. in tensor decomposition, the global optima can be reached when the amount of noise is small). I recommend the authors to connect it to such findings for other nonconvex problems. The paper does not address overfitting, as they themselves point out in the conclusion. However, I do expect it to be a very challenging problem.",ICLR2017, +7jxCANGwsP,1576800000000.0,1576800000000.0,1,S1gfu3EtDr,S1gfu3EtDr,Paper Decision,Reject,"This paper presents a spatially structured neural memory architecture that supports navigation tasks. The paper describes a complex neural architecture that integrates visual information, camera parameters, egocentric velocities, and a differentiable 2D map canvas. This structure is trained end-to-end with A2C in the VizDoom environment. The strong inductive priors captured by these geometric transformations is demonstrated to be effective on navigation-related tasks in the experiments in this environment. + +The reviewers found many strengths and a few weaknesses in this paper. One strength is that the paper pulls together many related ideas in the mapping literature and combines them in one integrated system. The reviewers liked the method's ability to leverage semantic reasoning and spatial computation. They liked the careful updating of the maps and the use of projective geometry. + +The reviewers were less convinced of the generality of this method. The lack of realism in these simulated environments left the reviewers unconvinced that the benefits observed from using projective geometry in this setting will continue to hold in more realistic environments. The use of fixed geometric transformations with RGBD inputs instead of learned transformations also makes this approach less general than a system that could handle RGB inputs. Finally, the reviewers noted that the contributions of this paper are not well aligned with the paper's claims. + +This paper is not yet ready for publication as the paper's claims and experiments were not sufficiently convincing to the reviewers. ",ICLR2020, +H1m2fkaHG,1517250000000.0,1517260000000.0,14,B1gJ1L2aW,B1gJ1L2aW,ICLR 2018 Conference Acceptance Decision,Accept (Oral),The paper characterizes the latent space of adversarial examples and introduces the concept of local intrinsic dimenstionality (LID). LID can be used to detect adversaries as well build better attacks as it characterizes the space in which DNNs might be vulnerable. The experiments strongly support their claim.,ICLR2018, +rJFXnf8_e,1486400000000.0,1486400000000.0,1,HkpLeH9el,HkpLeH9el,ICLR committee final decision,Invite to Workshop Track,"Quality, Clarity: There is no consensus on this, with the readers having varying backgrounds, and one reviewer commenting that they found it to be unreadable. + + Originality, Significance: + The reviews are mixed on this, with the high score (7) acknowledging a lack of expertise on program induction. + The paper is based on the published TerpreT system, and some think that it marginal and contradictory with respect to the TerpreT paper. In the rebuttal, point (3) from the authors points to the need to better understand gradient-based program search, even if it is not always better. This leaves me torn about a decision on this paper, although currently it does not have strong support from the most knowledgeable reviewers. + That said, due to the originality of this work, the PCs are inclined to invite this work to be presented as a workshop contribution.",ICLR2017, +XUhzLYsSec,1610040000000.0,1610470000000.0,1,XkI_ggnfLZ4,XkI_ggnfLZ4,Final Decision,Reject,"This paper explores the role of hyperparameters in the separate phases of a classic pruning pipeline: mask identification and retraining. Key observations include a set of the hyperparameters to search relative to a standard regime as well as the identification that the layerwise pruning rates from mask finding are intertwined with these hyperparameters and are what chiefly affects the eventual performance of the pruned network. + +The pros of this paper are that it works against the contemporary wisdom that the default hyperparameters for a model are the best for finding a mask for the model. Instead, there are improvements to be had by identifying a set of hyperparameters that lead to worse overall model accuracy, but better masks. Second, the work shows that the layerwise pruning rates are the key elements of these hyperparameters effect. The rates can in fact be transferred to more poorly performing network configurations and improve performance. + +The cons of this paper, as noted by the reviewers, are the somewhat unclear implications of the technique. The added guidance on directions to improve hyperparameters is valuable but does not necessarily provide a cost-effect strategy to find these. At its strongest, this guidance offers practitioners a recommendation to also consider hyperparameters for the initial model. + +The stronger, forward-looking implication is, instead, the connection to layerwise pruning rates. Specifically, while layerwise pruning rates have been demonstrated to be important in the literature (e.g., [1]), there has been a limited study into the exact nature of a good set of pruning rates versus a bad set of pruning rates. Where this paper stops short of a clear result, is if were to connect excessive pruning of the earlier layers, or simply the layerwise rates themselves to another property of the network (e.g., gradient flow, or capacity) that indicates the improved eventual performance. + +My Recommendation is to Reject. The paper's core experiments are well-executed. However, this final detail, closing the gap between the portability of these layerwise rates and a conceptual understanding, is a key missing component. Once done, that will make for a very strong paper. + +[1] AMC: AutoML for Model Compression and Acceleration on Mobile Devices. Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, Song Han. EECV, 2018 + +",ICLR2021, +AeSeQ-S0lZK,1642700000000.0,1642700000000.0,1,BB4e8Atc1eR,BB4e8Atc1eR,Paper Decision,Accept (Spotlight),"This is an exciting paper that provide the efficient algorithms for exact sampling from NDPPs along with theoretical results that are very pertinent in and out themselves. The AC agree with the reviewers that the authors satisfactorily addressed the concerns raised in the reviews, and is convinced that the revised version will be greatly appreciated by the community. We very much encourage the authors to pursue this line of work and in particular to overcome the practical restriction to the ONDPP subclass.",ICLR2022, +c8b-LRNzTx,1576800000000.0,1576800000000.0,1,S1eIw0NFvr,S1eIw0NFvr,Paper Decision,Reject,"This work investigates neural network pruning through the lens of its influence over specific exemplars (which are found to often be lower quality or mislabelled images) and how removing them greatly helps metrics. +The insight from the paper is interesting, as recognized by reviewers. However, experiments do not suggest that the findings shown in the paper would generalize to more pruning methods. Nor do the authors give directions for tackling the ""hard exemplar"" problem. Authors' response did provide justifications and clarifications, however the core of the concern remains. +Therefore, we recommend rejection.",ICLR2020, +BypfNJaSz,1517250000000.0,1517260000000.0,311,ByrZyglCb,ByrZyglCb,ICLR 2018 Conference Acceptance Decision,Accept (Poster),The idea of universal perturbation is definitely interesting and well carried out in that paper.,ICLR2018, +gntHm9nCNh,1610040000000.0,1610470000000.0,1,0gfSzsRDZFw,0gfSzsRDZFw,Final Decision,Reject,"While the reviewers in general liked the ideas proposed in the paper, the experimental evaluation has several issues that need fixing before it can be accepted.",ICLR2021, +k-KwiTze8B,1642700000000.0,1642700000000.0,1,LK8bvVSw6rn,LK8bvVSw6rn,Paper Decision,Reject,"As an empirical paper, this paper studies uncertainty estimations with respect to various architectures and learning schemes. Three reviewers suggested acceptance based on the strength of the paper (fairly extensive experiments were conducted, and some new observations were discovered, such as the superiority of ViT). On the other hand, two reviewers proposed rejection due to lack of rigor in writing and lack of novelty. No consensus was reached through additional discussion. In particular, the reviewer's point that the experiment was not well controlled-different models were trained with different hyperparameters etc- seems quite important, and it weakens the significance of the contribution of the paper. + +All reviewers agreed that it is a potentially interesting and important paper. I encourage the authors to resubmit in the future after carefully addressing the reviewers' concerns.",ICLR2022, +Y9L1-n6ZI8,1576800000000.0,1576800000000.0,1,H1lmyRNFvr,H1lmyRNFvr,Paper Decision,Accept (Poster),"Paper received reviews of A, WA, WR. AC has carefully read all reviews/responses. R1 is less experienced in this area. AC sides with R2,R3 and feels paper should be accepted. Interesting topic and interesting problem. Authors are encouraged to strengthen experiments in final version. ",ICLR2020, +ctONccCa2J,1610040000000.0,1610470000000.0,1,KOtxfjpQsq,KOtxfjpQsq,Final Decision,Reject,"The paper presents a meta-learning for Model-based RL that introduces branched rollouts to improve sample efficiency of the learned model. + +While the paper addresses an important topic of sample efficiency in RL, and provides theoretical analysis, the reviewers raised concerns with the novelty and clarity. The extension to POMDP setting is certainly important technological contribution, albeit a straightforward. To be suitable for publication the work needs to make stronger case for the significance of the method. ",ICLR2021, +PX1s3BSL_ZE,1642700000000.0,1642700000000.0,1,gc8zLQWf2k,gc8zLQWf2k,Paper Decision,Reject,"This paper tries to improve the training of adversarial deep neural networks by avoiding fitting the “harmful” atypical samples and fitting more “benign” atypical samples. + +Overall, the main concerns are + +1. The current presentation can easily cause some misunderstandings on the observations made in Section 3, especially [1] and [3] mentioned by the reviewer iXiX. +- The authors may consider moving ""related work"" to the first half of the submission, and organize existing findings with rare/hard/atypical in a more principle manner. +- Besides, as author mentioned in Section 3.1: ""it is equivalent to a classification task based on an extremely small dataset, with one or a few training samples given"". Such findings are natural and not novel to the deep learning community. Authors may consider shorten Section 3.1 and elaborate more in Section 3.2. + +2. Theorem 1 and 2 do not help much. +- It does not talk about the training algorithm and models, which over simplifies the learning problem. +- Besides, the authors can consider some theoretical results how BAT improve the performance of typical samples, but still preserve the ability to fit those ""useful"" atypical samples. +This helps to bridge the gap between motivation behind BAT and its algorithm design (raise by reviewer ytJj and sm19). + +3. It is also suggested to make observations more convincing. +- Since authors want to claim their findings are universal, it is better to consider more adv training methods and datasets; it is also better to change the ratio of ""normal samples"" v.s. ""atypical samples"". In this way, the effect of atypical samples in adversarial training can be more carefully quantized.",ICLR2022, +vhUZYzg19fO,1610290000000.0,1610470000000.0,1,OBI5QuStBz3,OBI5QuStBz3,Final Decision,Reject,"This work presents an improved lower bound on the communciation complexity of distributed optimization in some settings. While reviewers agree that the paper is addressing a challenging and important question, all reviewers questioned the significance of the contributions of this work. In particular, two reviewers felt that the novelty of this work is limited. Unfortunately, the author response was unable to adequately address these concerns.",ICLR2021, +Sylhc8PgeE,1544740000000.0,1545350000000.0,1,ryfz73C9KQ,ryfz73C9KQ,Concerns about the experiments and paper clarity.,Reject,"This paper proposed an unsupervised learning algorithm for predictive modeling. The key idea of using NCE/CPC for predictive modeling is interesting. However, major concerns were raised by reviewers on the experimental design/empirical comparisons and paper writing. Overall, this paper cannot be published in its current form, but I think it may be dramatically improved for a future publication. ",ICLR2019,4: The area chair is confident but not absolutely certain +Psrzw-jbyVf,1642700000000.0,1642700000000.0,1,6PvWo1kEvlT,6PvWo1kEvlT,Paper Decision,Accept (Poster),"This paper interprets pre-trained masked language models (MLMs) as energy-based sequence models and designs a tractable MCMC sampling algorithm based on Metropolis-Hastings with proposals derived from MLMs themselves. + +The strategy is simple, reasonably elegant, and fixes technical mistakes of prior work. The proposed algorithm addresses intractabilities of some naive MCMC schemes, does not require modifications to MLM training, and makes good use of MLMs themselves as proposals thus being crucially economical about resources. + +We had some concerns about speed of generation and the paper's positioning with regards to existing strategies for sampling from energy-based models (already during parameter estimation). While I understand that for many applications speed of generation is crucial, I think that on its own should not keep this line of research outside our best venues. And I hope steps like this one will lead to faster algorithms in the near future. I do relate to the issue of positioning, and I am glad the authors did not take it lightly. In the rebuttal phase the related work and positioning have been improved, but the authors remarked that the limited space for the camera-ready was preventing them from expanding the discussion. A note to authors: it's not a bad idea to have an expanded related work section in appendix.",ICLR2022, +r1lyUmTgTm,1541620000000.0,1545350000000.0,1,Skz-3j05tm,Skz-3j05tm,Good performance but not much novelty,Reject,"This paper describes a graph convolutional network (GCN) approach to capture relational information in natural language as well as knowledge sources for goal-oriented dialogue systems. Relational information is captured by dependency parses, and when there is code switching in the input language, word co-occurrence information is used instead. Experiments on the modified DSTC2 dataset show significant improvements over baselines. +The original version of the paper lacked comparison to some SOTA baselines as also raised by the reviewers, these are included in the revised version. +Although the results show improvements over other approaches, it is arguable BLEU and ROUGE scores are not good enough for this task. Inclusion of human evaluation in the results would be very useful. +",ICLR2019,4: The area chair is confident but not absolutely certain +SkeIqjMelV,1544720000000.0,1545350000000.0,1,SJxFN3RcFX,SJxFN3RcFX,Interesting idea but has significant technical flaws and lacks clarity,Reject,"This paper addresses a promising and challenging idea in Bayesian deep learning, namely thinking about distributions over functions rather than distributions over parameters. This is formulated by doing MCMC in a functional space rather than directly in the parameter space. The reviewers were unfortunately not convinced by the approach citing a variety of technical flaws, a lack of clarity of exposition and critical experiments. In general, it seems that the motivation of the paper is compelling and the idea promising, but perhaps the paper was hastily written before the ideas were fully developed and comprehensive experiments could be run. Hopefully the reviewer feedback will be helpful to further develop the work and lead to a future submission. + +Note: Unfortunately one review was too short to be informative. However, fortunately the other two reviews were sufficiently thorough to provide enough signal. ",ICLR2019,5: The area chair is absolutely certain +VHxYD5BXGe,1576800000000.0,1576800000000.0,1,H1xzdlStvB,H1xzdlStvB,Paper Decision,Reject,"The submission presents an approach to speed up network training time by using lower precision representations and computation to begin with and then dynamically increasing the precision from 8 to 32 bits over the course of training. The results show that the same accuracy can be obtained while achieving a moderate speed up. + +The reviewers were agreed that the paper did not offer a signficant advantage or novelty, and that the method was somewhat ad hoc and unclear. Unfortunately, the authors' rebuttal did not clarify all of these points, and the recommendation after discussion is for rejection. ",ICLR2020, +ByxtAy0gxV,1544770000000.0,1545350000000.0,1,HylsgnCcFQ,HylsgnCcFQ,Novel application of self attention for estimating dynamic graph embeddings,Reject,"This paper proposes a self-attention based approach for learning representations for the vertices of a dynamic graph, where the topology of the edges may change. The attention focuses on representing the interaction of vertices that have connections. Experimental results for the link prediction task on multiple datasets demonstrate the benefits of the approach. The idea of attention or its computation is not novel, however its application for estimating embeddings for dynamic graph vertices is new. +The original version of the paper did not have strong baselines as noted by multiple reviewers, but the paper was revised during the review period. However, some of these suggestions, for example, experiments with larger graph sizes and other related work i.e., similar work on static graphs are left as a future work.",ICLR2019,4: The area chair is confident but not absolutely certain +F-82D30Ew,1576800000000.0,1576800000000.0,1,rygEokBKPS,rygEokBKPS,Paper Decision,Reject,"This paper proposes a new black-box adversarial attack based on tiling and evolution strategies. While the experimental results look promising, the main concern of the reviewers is the novelty of the proposed algorithm, and many things need to be improved in terms of clarity and experiments. The paper does not gather sufficient support from the reviewers even after author response. I encourage the authors to improve this paper and resubmit to future conference.",ICLR2020, +HkJarkpBz,1517250000000.0,1517260000000.0,662,rJ7yZ2P6-,rJ7yZ2P6-,ICLR 2018 Conference Acceptance Decision,Reject,This paper's idea is to augment pre-trained word embeddings on a large corpus with embeddings learned on the data of interest. This is shown to yield better results than the pre-trained word embeddings alone. This contribution is too limited to justify publication at iclr.,ICLR2018, +d0nnx1hQdD,1576800000000.0,1576800000000.0,1,r1lOgyrKDS,r1lOgyrKDS,Paper Decision,Accept (Poster),"The paper presents a novel reinforcement learning-based algorithm for contextual sequence generation. Specifically, the paper presents experimental results on the application of the gradient ARSM estimator of Yin et al. (2019) to challenging structured prediction problems (neural program synthesis and image captioning). The method consists in performing correlated Monte Carlo rollouts starting from each token in the generated sequence, and using the multiple rollouts to reduce gradient variance. Numerical experiments are presented with promising performance. + +Reviewers were in agreement that this is a non-trivial extension of previous work with broad potential application. Some concerns about better framing of contributions were mostly resolved during the author rebuttal phase. Therefore, the AC recommends publication. ",ICLR2020, +#NAME?,1610040000000.0,1610470000000.0,1,O-XJwyoIF-k,O-XJwyoIF-k,Final Decision,Accept (Spotlight),"Two knowledgeable reviewers and one fairly confident reviewer were positive (7) about this submission. The authors' response clarified a few questions and comments from the initial reviews. The paper provides exact bounds that close the gap between lower and upper bounds, and that helps us understand these networks better. With the unanimously positive feedback, I am recommending the paper to be accepted. ",ICLR2021, +GbpTc4ZtdC,1642700000000.0,1642700000000.0,1,oxxUMeFwEHd,oxxUMeFwEHd,Paper Decision,Accept (Poster),"This paper presents a new graph neural network layer that is sensitive to topological structure in the graph. Reviewers all believe the work is technically sound, and the experiments (particularly after author revisions) show clear benefits in cases where topological structure is important. The main questions are about whether the experimental evaluation is sufficient. While there are always more experiments that could be run, I tend to agree with the authors that the chosen experiments support the key claims in the paper, so it seems ok. The other question about the experiments is if they sufficiently convince the reader that topological structure is useful in practice. This seems more mixed. The paper would certainly be improved if there was a motivating application where there was a clear win. For example, molecular structures are used as motivation in the intro, but the best performing method on proteins doesn’t use the topological layer. All-in-all, though, there does appear to be clear improvements on carefully constructed cases, and there appear to be some benefits in real-world datasets.",ICLR2022, +BkeoNokQeE,1544910000000.0,1545350000000.0,1,ryfaViR9YX,ryfaViR9YX,Meta-Review,Reject,"The authors propose a generative model based on variational autoencoders that provides means to manipulate the high-level attributes of a given input. The attributes can be either pre-defined ground truth attributes or unknown attributes automatically discovered from the data. + +While the reviewers acknowledged the potential usefulness of the proposed approach, they raised important concerns that were viewed by AC as a critical issue: (1) very limited experimental evaluation (e.g. no baseline or ablation results, no quantitative results); comparisons on other more complex datasets and more in-depth analysis would substantially strengthen the evaluation and would allow to assess the scope of the contribution of this work – see, for example, R3’s suggestion to use other dataset like dSprites or CelebA, where the ground truth attributes are known; (2) lack of presentation clarity – see R2’s latest comment how to improve. + +A general consensus among reviewers and AC suggests, in its current state the manuscript is not ready for a publication. It needs clarification, more empirical studies and polish to achieve the desired goal. +",ICLR2019,5: The area chair is absolutely certain +to07RT7J8dh,1610040000000.0,1610470000000.0,1,VD_ozqvBy4W,VD_ozqvBy4W,Final Decision,Accept (Poster)," +The paper aims at controllable generation by introducing an additional ""content-conditioner"" block in the Transformer models. The paper further provides 4 different variants of a pre-training task to train the content-conditioner model. + +While the proposed approach seems an incremental contribution over CTRL and PPLM, certain reviews praised the approach being novel while keeping the architecture changes minimal. Overall, reviews indicate that the overall proposed method of fine-grained controlled generation with self-supervision is valuable, and empirical results support its effectiveness. + +All reviewers initially raised concerns regarding clarity and lack of human evaluation. However, clarity issues seem to be resolved through author/reviewer discussions and the updated revision. + +R3 had important concerns regarding topic and sentiment relevance evaluations. +While the reviewer remains unconvinced after discussions with authors, after carefully reading the revised paper and discussions, I feel that the authors tried to address this point fairly through their additional experiments and also edited their contribution statement accordingly. + +Overall, at least two reviewers sounded very excited about this work and other than R3's concerns, the general sentiment about this work was positive. Therefore, I recommend weak accept. + +There are still some writing issues that I strongly encourage authors to carefully address in the future versions. Quoting from reviewer discussions: + +> Differentiability of the adversarial loss. Authors just added one statement saying "" Through continuous approximation.."" without any more details are given, which continuous approx was used (Gumbel softmax?) and how they overcame the problem of its training instability. + +> Table 6, can be misleading, authors bold the results when cocon+ is performing better than baselines (mostly in content similarity) but not the other way around topic/sentiment accuracy. The latter is arguably more important.",ICLR2021, +zaIChzWmacc,1642700000000.0,1642700000000.0,1,WXy4C-RjET,WXy4C-RjET,Paper Decision,Reject,"The reviewers unanimously recommend rejecting this submission and I concur with this recommendation. The submission essentially introduces a regularization technique to solve the alleged problem of Adam getting worse out-of-sample error for typical image classification problems, e.g. training ResNets on ImageNet. Reviewers raised a variety of issues with the submission. Some found the experiments unconvincing, some were concerned that the submission duplicated closely related work without engaging with and citing that work, and some were concerned by what they viewed as insufficient analysis and comparisons. To me, the most severe issue with the submission is that the experimental evidence for its claims is not sufficiently convincing and the problem it purports to solve has not been convincingly demonstrated, making the work hard to motivate. The other issues raised by the reviewers are less damaging in my view. + +Although this is a meta review and not a full de novo review, I would be remiss to not raise a few of the severe issues I see with the results that makes it hard for them to be convincing. +The Adam results in table 1 are far weaker than they should be, raising questions about the experiments as a whole. For example, https://arxiv.org/abs/2102.06356 reports 76.4% top 1 accuracy for ResNet-50 on ImageNet with Adam without increasing the epsilon parameter to a larger value as Choi et al. 2019 did (who also report good Adam results for ResNet-50 on ImageNet). This should also lead us to question one premise of the paper that there is some problem with adaptive optimizers for image classification. + +Ok, but perhaps LAWN helps validation error even if there is no gap between SGD and Adam? Sadly, to demonstrate this subordinate claim, LAWN would have to be compared carefully with state of the art regularization techniques and compared with results that use any optimizer, not just Adam. With modern regularization techniques, it isn't hard to get 77%+ top 1 validation accuracy on ImageNet with ResNet-50. See, for example https://arxiv.org/abs/2010.01412v1 which gets 77.5% in 100 epochs and as high as 79.1 with longer training. Since LAWN is claiming to improve generalization, it must be compared with other regularization techniques. It is a type error to primarily compare it with optimizers so even if there weren't concerns with the performance of the existing baselines, there would need to be additional comparisons. + +The claims about fixing issues that arise at large batch sizes are prima facie problematic since there isn't strong evidence of an actual problem at the batch sizes considered in the submission.",ICLR2022, +HkOW6MUde,1486400000000.0,1486400000000.0,1,rywUcQogx,rywUcQogx,ICLR committee final decision,Reject,"The authors propose to use CCA as a transformation within a network that optimally correlates two views. The authors then back-propagate gradients through the CCA. Promising experimental results on for cross-modality retrieval experiments on two public image-to-text datasets are presented. + + The main concern with the paper is the clarity of the exposition. The novelty and motivation of the approach remains unclear, despite significant effort from the reviewers to understand. + + A major rewriting of the paper will generate a stronger submission to a future venue.",ICLR2017, +HkgK3YNll4,1544730000000.0,1545350000000.0,1,BJgQB20qFQ,BJgQB20qFQ,Borderline paper,Reject,"This paper provides a new approach for progressive planning on discrete state and action spaces. The authors use LSTM architectures to iteratively select and improve local segments of an existing plan. They formulate the rewriting task as a reinforcement learning problem where the action space is the application of a set of possible rewriting rules. These models are then evaluated on a simulated job scheduling dataset and Halide expression simplification. This is an interesting paper dealing with an important problem. The proposed solution based on combining several existing pieces is novel. On the negative side, the reviewers thought the writing could be improved, and the main ideas are not explained clearly. Furthermore, the experimental evaluation is weak.",ICLR2019,3: The area chair is somewhat confident +BJBzr16SM,1517250000000.0,1517260000000.0,519,S1fduCl0b,S1fduCl0b,ICLR 2018 Conference Acceptance Decision,Reject,"Thank you for submitting you paper to ICLR. The paper studies an interesting problem and the solution, which fuses student-teacher approaches to continual learning and variational auto-encoders, is interesting. The revision of the paper has improved readability. However, although the framework is flexible, it is complex and appears rather ad hoc as currently presented. Exploration of the effect of the many hyper-parameters or some more supporting theoretical work / justification would help. The experimental comparisons were varied, but adding more baselines e.g. comparing to a parameter regularisation approach like EWC or synaptic intelligence applied to a standard VAE would have been enlightening. + +Summary: There is the basis of a good paper here, but a comprehensive experimental evaluation of design choices or supporting theory would be useful for assessing what is a complex approach.",ICLR2018, +041w74ei91s,1610040000000.0,1610470000000.0,1,_lV1OrJIgiG,_lV1OrJIgiG,Final Decision,Reject,"The paper considers the problem of 2D point-goal navigation in novel environments given access to an abstract occupancy grid map of the environment, together with knowledge of the agent's state and the goal location typical of point-goal navigation. The paper proposes learning a navigation policy in a model-based fashion, whereby the architecture predicts the parameters of the transition function and then uses this learned transition function to plan the agent's actions. The authors also describe a model-free approach that extends a version of DQN to reason over the 2D maps. + +The paper was reviewed by four knowledgeable referees, who read the author response. The general problem of learning to navigate a priori unknown environments to reach a desired goal is an interesting problem that has received significant attention of-late in the learning community. In its current form, however, the paper does not adequately convey why this is a difficult problem that can not be solved using existing planning techniques or why it benefits from learning, particularly given access to an abstract map. These concerns apply more generally to point-goal navigation, namely the assumption that the pose of the agent and goal are fully known throughout (or the agent-relative pose of the goal) and that there is no uncertainty in the agent's motion. The practicality of these assumptions is unclear, and they are inconsistent with decades of research in robotics and robot learning, which addresses the more realistic setting in which there is uncertainty in pose and motion. The author response helps to clarify some of these questions, but it is still not fully clear why existing methods are insufficient for this task, whether they use traditional planning methods or are learned. Revisiting the discussion of why this is a hard problem would strengthen the paper, as would a more thorough evaluation that compares against other baselines.",ICLR2021, +qOZ16sHWK_LX,1642700000000.0,1642700000000.0,1,C5u6Z9voQ1,C5u6Z9voQ1,Paper Decision,Reject,"The paper investigates attacks against time series analysis methods such as GNN and DNN for anomaly and intrusion detection. Standard attacks such as FGSM and PGD are extended for the time series domain and evaluated on several datasets including automotive, aerospace and resource utilization datasets. While the authors claim to be the first to investigate such attacks, some related work was not considered in the paper, which was pointed out by reviewers. Also some other weaknesses of the proposed method, e.g., its focus on feature space perturbations were pointed out. Hence, while acknowledging the importance and the novelty of this paper's contributions, the reviewers agree that the paper must be better positioned in the context of the related work in order to be accepted.",ICLR2022, +Syqfryprz,1517250000000.0,1517260000000.0,523,rkcya1ZAW,rkcya1ZAW,ICLR 2018 Conference Acceptance Decision,Reject,"Thank you for submitting you paper to ICLR. The consensus from the reviewers is that there are some interesting theoretical contributions and some promising experimental support. However, although the paper is moving in the right direction, they believe that it is not quite ready for publication.",ICLR2018, +qkNKYOR_9rW,1642700000000.0,1642700000000.0,1,tlkHrUlNTiL,tlkHrUlNTiL,Paper Decision,Reject,"The paper examines a sum-over-paths representation of ReLU networks, for which learning can be broken into two parts: learning the gates, and learning the weights given the gates, the latter of which being described by the Neural Path Kernel. The paper introduces a dual architecture, Deep Linear Gated Networks (DLGN) that parameterizes these two processes separately. The DLGN is argued to aid in interpretability of ReLU networks, with a main conclusion being that the neural network is learned path-by-path instead of layer-by-layer. + +The reviewers generally found strength in the motivation and perspective and thought that the DLGN could serve as a useful architecture for aiding interpretability. Some reviewers found the presentation hard to follow, and others were not entirely convinced by the ultimate conclusions. Overall, the reviewers opinions were mixed. + +I believe the ICLR community would generally find interest in the DLGN and the interpretations it might afford to deep ReLU networks. However, the number and strength of the conclusions obtained in the current analysis are rather weak. The conclusion that networks learn path-by-path instead of layer-by-layer was emphasized but the implications were not highlighted, and it remains unclear to me and at least some reviewers what the concrete significance of this observation actually is. Another major claim is that the DLGN recovers more than 83.5% of the performance of state-of-the-art DNNs, but a priori it is not obvious what this number means, or if it is even good or bad performance. A more detailed analysis with additional common baselines, ablations, etc., would really help readers understand the significance of the performance gap. + +Overall, this is an interesting direction with significant potential, but for the above reasons I cannot recommend the current version for acceptance.",ICLR2022, +prbsw6WouXc,1610040000000.0,1610470000000.0,1,6X_32jLUaDg,6X_32jLUaDg,Final Decision,Reject,"This paper proposed an unsupervised domain adaptation method for 3D lidar-based object detection. Four reviewers provided detailed reviews: 3 rated “Marginally above acceptance threshold”, and 1 rated “Ok but not good enough - rejection”. The reviewers appreciated simple yet effective idea, the well motivated method, the comprehensiveness of the experiments, and well written paper. However, major concerns are also raised regarding the core technical contributions on the proposed approach. The ACs look at the paper, the review, the rebuttal, and the discussion. Given the concerns on the core technical contributions, the high competitiveness of the ICLR field, and the lack of enthusiastic endorsements from reviewers, the ACs believe this work is not ready to be accepted to ICLR yet and hence a rejection decision is recommended. +",ICLR2021, +Lod8Mka7qIF,1642700000000.0,1642700000000.0,1,Vzh1BFUCiIX,Vzh1BFUCiIX,Paper Decision,Accept (Poster),"This paper explores large scale supervised multi-task training across 107 NLP tasks combined with self-supervised C4 masked span infilling, using the T5 sequence-to-sequence model. The results improve over prior strong T5 baselines on several NLP benchmarks such as SuperGLUE, GEM, and Rainbow. + +The paper's main strengths are the scale and large number of tasks, the release of the trained models and data, as well as the clarity and presentation. Reviewers had concerns with the novelty, limitations in the evaluation (to just T5, and to just SuperGLUE in portions of the paper), and the potential impact of hyperparameters on the results. During the discussion period, the authors noted that it is not obvious a priori that their approach would work, and that their evaluations on other tasks made it unlikely to be overfitting to SuperGLUE. They also noted that running the additional hyperparameter experiments suggested during the reviews were computationally prohibitive. + +Overall, despite the drawbacks and relative lack of novelty, the extensive experiments and released models provide significant value and will be of interest to the research community.",ICLR2022, +D08MD0cs-St,1610040000000.0,1610470000000.0,1,aGmEDl1NWJ-,aGmEDl1NWJ-,Final Decision,Reject,"The paper proposes to augment the original model to introduce the ""luring effect"", which can be used for detection and black-box defense. Despite being an interesting setup, there are several weaknesses in the threat model (whether it is practical) and the evaluation (lack of adaptive attacks). Those concerns remain after the rebuttal phase. + +Threat model: see the concerns raised by Reviewer 1 and the updated comments after the rebuttal phase. + +Lack of adaptive attack: the authors assume that the attacker has very limited knowledge to how the system works. This could be viewed as a ""black-box setting"" for adversarial detection evaluation, and actually many other detection works can perform almost perfectly in this setting, so it's not clear how significant are the results. The authors tried to consider some adaptive attacks in their rebuttal but the reviewers are still not fully convinced. +",ICLR2021, +M3o4uHKhZTE,1610040000000.0,1610470000000.0,1,hpH98mK5Puk,hpH98mK5Puk,Final Decision,Accept (Poster),"This paper introduces two regularizers that are meant to improve out-of-domain robustness when used in the fine-tuning of pretrained transformers like BERT. Results with ANLI and Adversarial SQuAD are encouraging. + +Pros: +- New method with concrete improvements in several difficult task settings. +- New framing of adversarial generalization. + +Cons: +- The ablations that are highlighted in the main paper body don't do a good job of isolating the specific new contributions. (Though the appendix provides enough detail that I'm satisfied that the main empirical contribution is sound.) +- Reviewers found the theoretical motivation very difficult to follow in places.",ICLR2021, +vjzouX-rS_s,1642700000000.0,1642700000000.0,1,0kNbTghw7q,0kNbTghw7q,Paper Decision,Reject,"To improve the generative adversarial nets, the paper proposes to add an implicit transformation of the Gaussian latent variables before the top-down generator. To further obtain better generations with respect to quality and diversity, this paper introduces targeted latent transforms into a bi-level optimization of GAN. Experiments are conducted to verify the effectiveness of the proposed method. The paper is highly motivated and well-written, but the experiment part still needs to be strengthened because the goal of the paper is to improve the GAN training, comprehensive and thorough evaluation of the proposed method is necessary. + +After the first round of review, in addition to the clarification issue and missing reference issue, two reviewers point out that the method is only tested in small-scale datasets, and suggest authors evaluate the performance of the proposed method in more complex datasets. Two reviewers point out that the experimental validation and comparison to prior approaches are insufficient. During the rebuttal, the authors provide extra experiment results to partially address some issues. However, most of the major concerns from other reviewers, such as (i) how are the performance of the method in large scale datasets that have complex latent space manifolds, (ii) non-convincing performance gain, and unclear problem setup, still remain. After an internal discussion, AC agrees with all reviewers that the current paper is not ready for publication, thus recommending rejecting the paper. AC urges the authors to improve their paper by taking into account all the suggestions provided by the reviewers, and then resubmit it to the next venue.",ICLR2022, +zpihRmWvHXI,1642700000000.0,1642700000000.0,1,tFgdrQbbaa,tFgdrQbbaa,Paper Decision,Accept (Poster),"After carefully reading the reviews and rebuttal, I believe this work is of sufficient quality for acceptance. Understanding continual learning from a theoretical stand point is a very important topic. I find that one of the main issue raised by reviewers was about the exact meaning of Continual learning, and whether what the authors studied was more akin to sequential learning. While I don't mind the term sequential learning, and is quite descriptive of the work as well, I disagree that the considered setup is not continual learning.",ICLR2022, +BJp92fU_l,1486400000000.0,1486400000000.0,1,HyGTuv9eg,HyGTuv9eg,ICLR committee final decision,Accept (Poster),"The program committee appreciates the authors' response to concerns raised in the reviews. While there are some concerns about the computational speed of the approach as well as its advantage over existing methods for some textures, reviewers are excited by the ability of this work to produce structured texture that requires long-range interactions. Overall, the work has contributions that are worth presenting at ICLR.",ICLR2017, +B1j6LyaSM,1517250000000.0,1517260000000.0,887,Bk-ofQZRb,Bk-ofQZRb,ICLR 2018 Conference Acceptance Decision,Reject,The reviewers agree this paper is not yet ready for publication.,ICLR2018, +HJurQkTrz,1517250000000.0,1517260000000.0,132,rkPLzgZAZ,rkPLzgZAZ,ICLR 2018 Conference Acceptance Decision,Accept (Poster),"Important problem (modular continual RL) and novel contributions. The initial submission was judged to be a little dense and hard to read, but the authors have been responsive in responding and updating the paper. I support accepting this paper. ",ICLR2018, +Nnu648mFMQbe,1642700000000.0,1642700000000.0,1,z7DAilcTx7,z7DAilcTx7,Paper Decision,Reject,"In this paper, authors study adversarial examples from a distributional robustness point of view. Reviewers had several concerns about the work and all thought the paper is not above the accept threshold. In particular, they mentioned that the presentation and writing of the paper need to be improved and results (specially the ones presented in Section 2) are not significant contributions and novel. Given all, I think the paper needs more work before being accepted.",ICLR2022, +axb_l2zDv6,1576800000000.0,1576800000000.0,1,BkgUB1SYPS,BkgUB1SYPS,Paper Decision,Reject,"This paper a theoretical interpretation of separation rank as a measure of a recurrent network's ability to capture contextual dependencies in text, and introduces a novel bidirectional NLP variant and tests it on several NLP tasks to verify their analysis. + +Reviewer 3 found that the paper does not provide a clear description of the method and that a focus on single message would have worked better. Reviewer 2 made a claim of several shortcomings in the paper relating to lack of clarity, limited details on method, reliance on a 'false dichotomy', and failure to report performance. Reviewer 1 found the goals of the work to be interesting but that the paper was not clear, that the proofs were not rigorous enough, and clarity of experiments. The authors responded to the all the comments. The reviewers felt that their comments were still valid and did not adjust their ratings. + +Overall, the paper is not yet ready in its current form. We hope that the authors will find valuable feedback for their ongoing research.",ICLR2020, +rbXaE85IO4p,1642700000000.0,1642700000000.0,1,-AOEi-5VTU8,-AOEi-5VTU8,Paper Decision,Accept (Poster),"The paper presents some efficiency improvements over existing methods to compute matrix square root and its gradient. Reviewers find that the novelty over existing methods is sufficient, and that the improvements are valuable. + +I propose a poster despite the relatively high numerical scores, because the group of practitioners who will use the result is somewhat niche -- the reviewers are of course selected from this group and hence value the paper more highly. + +In addition the real-world speedups are modest, but it is nevertheless important to document this approach.",ICLR2022, +r13j8kTHf,1517250000000.0,1517260000000.0,861,HyBbjW-RW,HyBbjW-RW,ICLR 2018 Conference Acceptance Decision,Reject,"The idea of using the determinant of the covariance matrix over inputs to select experiments to run is a foundational concept of experimental design. Thus it is natural to think about extending such a strategy to sequential model based optimization for the hyperparameters of machine learning models, using recent advances in determinantal point processes. The idea of sampling from k-DPPs to do parallel hyperparameter search, balancing quality and diversity of expected outcomes, seems neat. While the reviewers found the idea interesting, they saw weaknesses in the approach and most importantly were not convinced by the empirical results. All reviewers thought that the baselines were inappropriate given recent work in hyperparameter optimization (and classic work in statistics). + +Pros: +- Useful to a large portion of the community (if it works) +- An interesting idea that seems timely + +Cons: +- Only slightly outperforms baselines that are too weak +- Not empirically compared to recent literature +- Some of the design and methodology require more justification +- Experiments are limited to small scale problems",ICLR2018, +vrhf8or8S8,1576800000000.0,1576800000000.0,1,H1lKNp4Fvr,H1lKNp4Fvr,Paper Decision,Reject,"The paper proposed the use of a shallow layers with large receptive fields for feature extraction to be used in stereo matching tasks. It showed on the KITTI2015 dataset this method leads to large model size reducetion while maintaining a comparable performance. + +The main conern on this paper is the lack of technical contributions: +* The task of stereo matching is very specialized one, simply presenting the model size reduction and performance is not interesting to general readers. Adding more analysis that help understanding why the proposed method helps in this particular task and for what kind of tasks a shallow feature instead a deeper one is perferred. In that way, the paper would be addressing much wider audiences. +* The discussions on related work is not thorough enough, lacking of analysis of pros and cons between different methods.",ICLR2020, +5KjcmEMsvWC,1610040000000.0,1610470000000.0,1,LpSGtq6F5xN,LpSGtq6F5xN,Final Decision,Reject,"The submitted paper proposes a novel model/approach for deep clustering which shows good empirical performance on a set of standard benchmark datasets as compared to state of the art baseline algorithms. While I believe that this paper can be turned into a good ICLR paper, it doesn’t meet the standard of ICLR in its current form. + +More specifically: +1) The quality of the writeup is poor, containing many typos but more problematically many unclear/confusing statements which are either vague/unclear, not supported by citations and/or substantiated in other parts of the paper (Examples: „This might result in inferior clustering performance, degenerated generative model, and stability issues during training.“ Or “However, this objective seems to miss the clustering target, since the reconstruction term of is not related to the clustering and actual clustering is only associated with the regularization term optimization.“). An important contribution of the paper could be to substantiate these statements and I argue that achieving better performance alone is not sufficient therefore. + +2) From a theoretical perspective, it would be interesting whether there is any justification for statements like the ones references above. Furthermore, the authors’ approach involves several approximations whose implications are neither studied nor explained. The authors responded only partially to questions in that regard by reviewers leaving certain concerns unanswered. + +3) From an empirical perspective, an extended study of the proposed approach would help t better understand its benefits over existing approaches. Commonly considered settings like mismatch in the number of specified clusters are not studied at all. The proposed approach also seems to be initialized by VaDe (mentioned in Section 4) and it would be interesting to understand to which extend this is necessary and why (and how does performance change/degrade if this is not done). It also makes statements regarding stability unclear as training VaDe itself can be quite challenging. Furthermore, the overall algorithm for training the proposed model should be presented in a compact form in the paper. The paper should also be self-contained in the sense of containing information on the important hyper-parameters needed for training the proposed model. + +In summary, the proposed approach is potentially interesting but the paper should not be accepted in its current form. +",ICLR2021, +SJgwQO0VeV,1545030000000.0,1545350000000.0,1,S14h9sCqYm,S14h9sCqYm,Limited evaluation,Reject,"This paper considers an important problem of aligning two knowledge graphs (the entities and relations therein). The reviewers found the use of adversarial training quite novel and appropriate for the the task, especially as it works in the unsupervised setting as well. The reviewers were also impressed that the proposed work outperforms existing approaches in terms of the accuracy of the alignment. + +The following potential weaknesses were raised by the reviewers and the AC: (1) Reviewer 3 brings up the fact that the hyperaparameters were set different from the original publications of the baselines, and thus are not convinced of the soundness of the results, (2) Reviewer 2 notes that the evaluation is limited, and more variations should be considered, such as varying the overlap, taking larger subsets of knowledge graphs, and going beyond TranE as the choice for embedding, and (3) Reviewer 3 notes that a simpler baseline based on alignment discrepancy should be considered, which would alleviate the need for RL based training. + +Although the reviewers raised very different concerns with the paper, none of them were addressed in a response or revision, and thus they agree that the paper should be rejected.",ICLR2019,4: The area chair is confident but not absolutely certain +6AalO2oVLmN,1642700000000.0,1642700000000.0,1,N1WI0vJLER,N1WI0vJLER,Paper Decision,Accept (Poster),"This paper presents a way of using multigrid techniques to parallelize GRU networks across the time dimension. Reviewers are uniformly in favor of accepting the paper. The main strength is that the paper provides a new perspective on dealing with long input sequences by parallelizing RNNs across time. The main weaknesses are around the experiments: only CPU experiments are run, and sequences are not very long (max 128 length). All-in-all, though, it provides an interesting perspective that should be valuable to the community.",ICLR2022, +maRpdTbYwv,1576800000000.0,1576800000000.0,1,B1lgUkBFwr,B1lgUkBFwr,Paper Decision,Reject,"This paper addresses the problem of performing unsupervised domain adaptation when some target domain data is missing is a potentially non-stochastic way. The proposed solution consists of applying a version of domain adversarial learning for adaptation together with an MSE based imputation loss learned using complete source data. The method is evaluated on both the standard digit recognition datasets and a real-world advertising dataset. + +The reviewers had mixed recommendations for this work, with two recommending weak reject and one recommending acceptance. The key positive point from R3 who recommended acceptance was that this work addresses a new problem statement which may be of practical importance. The other two reviewers expressed concerns over the contribution of the work and the validity of the problem setting. Namely, both R2 and R4 had significant confusion over the problem specification and/or under what conditions the proposed setting is valid. + +It is a difficult decision for this paper as there is a core disagreement between the reviewers. All reviewers seem to agree that the proposed solution is a combination of prior methods in a new way to address the specific problem setting of this work. However, the reviewers differ in precisely whether they determine the proposed problem setting to be valid and justified. Due to this discrepancy, the AC does not recommend acceptance at this time. If the core contribution is to be an application of existing techniques to a new problem statement than that should be clarified and motivated further. +",ICLR2020, +Pz232eteghY,1610040000000.0,1610470000000.0,1,Oz_4sa7hKhl,Oz_4sa7hKhl,Final Decision,Reject,"The paper suggests a simple variant for BERT training that improves classification for smaller training samples. So it has a very specific applicability unlike other published variants which generally improve a broad range of tasks. The variant adds a self-supervision classification task based on clustering. Experiments are done but it only shows improvement for small training sizes. + +AnonReviewer4 suggested a BOW experiment/baseline which was done by the authors in an updated version. This confirmed the authors line. AnonReviewer3 asked for computational details, which were added. AnonReviewer1 lists a number of limitations which the authors need to address and rephrase the statements in their paper. + +So it is publishable work, but somewhat marginal due to its specialised nature and thus rejected.",ICLR2021, +GUm-Vn9oTI,1576800000000.0,1576800000000.0,1,BJlzm64tDH,BJlzm64tDH,Paper Decision,Accept (Poster),"This submission proposes a secondary objective when learning language models like BERT that improves the ability of such models to learn entity-centric information. This additional objective involves predicting whether an entity has been replaced. Replacement entities are mined using wikidata. + +Strengths: +-The proposed method is simple and shows significant performance improvements for various tasks including fact completion and question answering. + +Weaknesses: +-The experimental settings and data splits were not always clear. This was sufficiently addressed in a revised version. +-The paper could have probed performance on tasks involving less common entities. + +The reviewer consensus was to accept this submission. +",ICLR2020, +6Il7MRP3E0w,1642700000000.0,1642700000000.0,1,NeRrtif_hfa,NeRrtif_hfa,Paper Decision,Reject,"This well-written paper introduces an improved exploration strategy by exploiting knowledge about sequences of actions that lead to the same state. The idea is straightforward and easy to understand and apply, which makes it potentially interesting. An important downside is the limited applicability of the method, as there mainly seems to be an advantage in (mostly deterministic) grid-like MDPs. In addition, priors about action-sequence equivalences have to be available. Overall, the contribution of the paper is not deemed significant enough for publication at a top-tier conference like ICLR by the majority of the reviewers as well as myself. For these reasons, I recommend rejection.",ICLR2022, +mYeOyEOTFE4,1642700000000.0,1642700000000.0,1,Sq0-tgDyHe4,Sq0-tgDyHe4,Paper Decision,Accept (Poster),"This paper presents a novel regularization technique for CNNs based on swapping feature vectors in the final layer. It is demonstrated that this simple technique helps with generalization in supervised learning and RL with image inputs. +Following the author rebuttal, all reviewers agreed that the simplicity of this method and the nice empirical performance it obtains is important to report to the community. In this respect, I agree with the reviewers, and recommend acceptance. + +One important issue that came up during the discussion is how much this work is related to RL, and the authors SL experiments helped to put the contribution in a broader context. Indeed, one way to see the results of this work is that if such performance improvement is obtained in the Procgen benchmark with just image-based regularization, perhaps this benchmark is not very suitable for studying generalization in RL (where we expect that more sophisticated techniques would be required). In addition, I can think of RL domains (e.g., Tetris, which was mentioned in the discussion) where I would not expect the proposed method to help. It would be good if the authors discuss these issues in some capacity in their final version. + +Please take all reviewer comments into account when preparing the final version.",ICLR2022, +HygTZubNeN,1544980000000.0,1545350000000.0,1,HyxpNnRcFX,HyxpNnRcFX,Timely idea... but paper lacks in results and conclusive insights.,Reject,"This paper is extending the meta-learning MAML method to the mixture case. Specifically, the global parameters of the method are now modeled as a mixture. The authors also derive the elaborate associated inference for this approach. + +The paper is well written although Rev2 raises some presentation issues that can surely improve the quality of the paper, if addressed in depth. + +The results do not convince any of the three reviewers. Rev3 asks for a clearer exposition of the results to increase convincingness. Rev2 and Rev1 also make similar comments. + +Rev1 also questions the motivation of the approach, although the other two reviewers seem to find the approach well motivated. Although it certainly helps to prove the motivation within a very tailored to the method application, the AC weighted the opinion of all reviewers and did not consider the paper to lack in the motivation aspect. + +The reviewers were overall not very impressed with this paper and that does not seem to stem from lack of novelty or technical correctness. Instead, it seems that this work is rather inconclusive (or at least it is presented in an inconclusive manner): Rev1 says that the important questions (like trade-offs and other practical issues) are not answered, Rev2 suggests that maybe this paper is trying to address too much, and all three reviewers are not convinced by the experiments and derived insights. + +Finally, Rev2 points out some inherent caveats of the method; although they do not seem to be severe enough to undermine the overall quality of the approach, it would be instructive to have them investigated more thoroughly (even if not completely solving them).",ICLR2019,5: The area chair is absolutely certain +tF-ejoOfDt,1576800000000.0,1576800000000.0,1,ByxhOyHYwH,ByxhOyHYwH,Paper Decision,Reject,"This paper develops a new few-shot image classification algorithm by using a metric-softmax loss for non-episodic training and a linear transformation to modify the model towards few-shot training data for task-agnostic adaptation. + +Reviewers acknowledge that some of the results in the paper are impressive especially on domain sift settings as well as with a fine-tuning approach. However, they also raise very detailed and constructive concerns on the 1) lack of novelty, 2) improper claim of contribution, 3) inconsistent evaluation protocol with de facto ones in existing work. Author's rebuttal failed to convince the reviewers in regards to a majority of the critiques. + +Hence I recommend rejection.",ICLR2020, +HJlurooWl4,1544830000000.0,1545350000000.0,1,ByeDojRcYQ,ByeDojRcYQ,Meta-review,Reject,"Pros: +- interesting novel formulation of policy learning in homogeneous swarms +- multi-stage learning process that trades off diversity and consistency (fig 1) + +Cons: +- implausible mechanisms like averaging weights of multiple networks +- minor novelty +- missing ablations of which aspect is crucial +- dubious baseline results +- no rebuttal + +One reviewer out of three would have accepted the paper, the other two have major concerns. Unfortunately the authors did not revise the paper or engage with the reviewers to clear up these points, so as it stand the paper should be rejected.",ICLR2019,4: The area chair is confident but not absolutely certain +SZWM3Zmn1-S,1610040000000.0,1610470000000.0,1,tij5dHg5Hk,tij5dHg5Hk,Final Decision,Reject,Most of the reviewers and AC found many claims of this submission unsubstantiated. ,ICLR2021, +GSaiWE_MbP6,1610040000000.0,1610470000000.0,1,5WcLI0e3cAY,5WcLI0e3cAY,Final Decision,Reject,"This paper proposes a new method for pre-training of language models in the e-commerce domain. It introduces five objectives for pre-training by incorporating domain knowledge into the model. + +Pros • The paper is generally easy to follow. • Design of the pre-training objectives is reasonable. • Experimental results are solid and convincing. • A useful method is proposed, and its effectiveness has been verified in the e-commence domain. + +Cons • Novelty of the work might not be enough. • Presentations can be improved. +• It is not clear whether the proposed approach can be applied to other domains which may not have enough structured data. + +The authors have made several things clearer in the rebuttal. They have also added new experimental results. However, the overall quality of the paper does not reach the level of ICLR from the viewpoint of novelty, significance, and clarity. + +",ICLR2021, +_xzlypBWa1,1576800000000.0,1576800000000.0,1,rJlDO64KPH,rJlDO64KPH,Paper Decision,Reject,"The paper proposed local prior matching that utilizes a language model to rescore the hypotheses generate by a teacher model on unlabeled data, which are then used to training the student model for improvement. The experimental results on Librispeech is thorough. But two concerns on this paper are: 1) limited novelty: LM trained on large tex data is already used in weak distillation and the only difference is the use of multiply hypotheses. As pointed out by the reviewers, the method is better understood through distillation even though the authors try to derive it from Bayesian perspective. 2) Librispeech is a medium sized dataset, justifications on much larger dataset for ASR would make it more convincing. ",ICLR2020, +6e2xsLRPt5-,1610040000000.0,1610470000000.0,1,k2Hm5Szfl5Z,k2Hm5Szfl5Z,Final Decision,Reject,"This paper studies the tensor principal component analysis problem, where we observe a tensor T = \beta v^{\otimes k} + Z where v is a spike and Z is a Gaussian noise tensor. The goal is to recover an accurate estimate to the spike for as small a signal-to-noise ratio \beta as possible. There has been considerable interest in this problem, mainly coming from the statistics and theoretical computer science communities, and the best known algorithms succeed when \beta \geq n^{k/4} where n is the dimension of v. The main contribution of this paper is to leverage ideas from theoretical physics and build a matrix whose top eigenvector is correlated with v for sufficiently large \beta using trace invariants. On synthetic data, the algorithms achieve better performance than existing methods. + +The main negative of this paper is that it is not so clear how tensor PCA is relevant in machine learning applications. The authors gave some references to applications of tensor methods, but I want to point out that all of those works are about using tensor decompositions, which despite the fact that they are both about tensors, are rather different sorts of tools. Many of the reviewers also found the paper difficult to follow. I do think exposition is particularly challenging when making connections between different communities, as this work needs to introduce several notions from theoretical physics. I am also not sure how novel the methods are, since a somewhat recent paper Moitra and Wein, ""Spectral Methods from Tensor Networks"", STOC 2019 also uses tensor networks to build large matrices whose top eigenvalue is correlated with a planted signal, albeit for a different problem called orbit retrieval. ",ICLR2021, +BJHRrypBf,1517250000000.0,1517260000000.0,681,rJTGkKxAZ,rJTGkKxAZ,ICLR 2018 Conference Acceptance Decision,Reject,Reviewers recognize the proposed method of hierarchical extension to ALI to be potentially novel and interesting but have expressed strong concerns on the experiments section. The paper also needs to have comparisons with relevant hierarchical generative model baselines. Not suitable for publication in its current form.,ICLR2018, +SPJKIx-hZVX,1642700000000.0,1642700000000.0,1,5ueTHF0yAlZ,5ueTHF0yAlZ,Paper Decision,Reject,"The paper presents an improvement to the core-set active learning algorithm by leveraging distance measures weighted by uncertainty scores and using beam search instead of greedy search. + +The reviewers agreed that the paper provides a nice theoretical analysis as well as motivation for the proposal, as well an ablation that shows the proposal indeed empirically outperforms the original core-set algorithm. However, the reviewers also agreed that additional important comparisons would make the paper more convincing, including Bayesian core-set algorithms as well as other recent proposals based on the original core-set algorithm.",ICLR2022, +HkDjf1prG,1517250000000.0,1517260000000.0,4,HJGXzmspb,HJGXzmspb,ICLR 2018 Conference Acceptance Decision,Accept (Oral),"High quality paper, appreciated by reviewers, likely to be of substantial interest to the community. It's worth an oral to facilitate a group discussion.",ICLR2018, +IzDz_e6NDRJ,1610040000000.0,1610470000000.0,1,a0yodLze7gs,a0yodLze7gs,Final Decision,Reject,"The initial round of reviews showed a consensus among the reviewers that the presentation of the paper was poor, the novelty was unclear, claims were not properly justified, and the experimental evaluation and discussion were quite insufficient. The authors provided a rebuttal and an updated version of the paper. Although the updated paper demonstrated that the proposed approach indeed provides some benefits, it appears that the authors were not successful to address the numerous but constructive reviewers' comments. + +The paper is not ready for publication in ICLR 2021 and can benefit from major revisions and careful proofreading. ",ICLR2021, +Sj_akjrADi1,1642700000000.0,1642700000000.0,1,3ILxkQ7yElm,3ILxkQ7yElm,Paper Decision,Accept (Poster),"This paper proposes environment fields, a representation that models reaching distances within a scene. Dense environment fields are learnt using a neural network, and the effectiveness of this representation is shown on 2D maze environments and 3D indoor environments. This paper received hugely contrasting reviews, with two reviewers being very supportive and one reviewer providing the lowest score of 1. In light of this, I'll start with providing my takeaways on the review and discussion with reviewer z3Y4 (rating of 1) and then proceed to the remaining discussion. + +Reviewer z3Y4 has provided the score of 1 and has made strong remarks that include: ""what is proposed in this paper is simply not comprehensible"", ""description of the method itself is simply devoid of all required detail"", ""The main claims of the paper are incorrect or not at all supported by theory or empirical results."" and "" what is being proposed in this paper is simply too unclear and vague to be assessed"". **Such dismissive remarks, in my opinion, are completely unnecessary and create a toxic discussion and review environment.** + +Reviewer z3Y4 has many criticisms of the submission, but the primary ones include: (a) the lack of details throughout the paper (b) the positioning of the paper in the abstract and introduction, and (c) the lack of experiments in continuous environments. Re (a): It is well understood in our research community that providing every last detail in the main submission is nearly impossible due to the restriction on the number of pages. Providing excess details in the main paper also often reduces the readability of the paper. Such details are better addressed in the appendix and crucially, the code. The authors have provided some details in the appendix and have indicated that they will release a code base. I also agree with the authors that justifying every last detail in the network architecture such as choice of an activation function is not necessary for this submission. The same goes with describing methods in past works in detail vs referring the reader to the appropriate citation. As a result, I believe that the authors have addressed (a) well. Re (b): This has also been addressed by the authors, by pointing out relevant parts of the paper that had the necessary details. Re (c): In this regard, the paper clearly contains a well laid out experiment in 3D indoor scenes, so as far as I am concerned, this has been addressed in the main submission. + +Reviewers AhgQ and fAEP have supported this submission but also laid out some concerns that include: +(1) Are the gradients suboptimal ? +(2) Positioning the paper with regards to past works +(3) Motivation behind using the VAE +(4) Qualitative analysis and failures +The authors have addressed these 4 concerns well using the rebuttal as well as via a revision of the appendix. The reviewers, post discussion have indicated their satisfaction with the revised submission. + +I think this paper is interesting and proposes a novel scene representation which can be useful for others in the Embodied AI community. I am in agreement with reviewers AhgQ and fAEP, and in spite of the strong reject score by z3Y4, I recommend accepting this paper.",ICLR2022, +Hkk4IyTBG,1517250000000.0,1517260000000.0,754,HkJ1rgbCb,HkJ1rgbCb,ICLR 2018 Conference Acceptance Decision,Reject," +Pro: + - Interesting approach to tie together reinforcement Q-learning with CNN for prediction and reward function learning in predicting downstream effects of chemical structures, while providing relevant areas for decision-making. + +Con: +- Datasets are small, generalizability not clear. +- Performance is not high (although performance wasn't the goal necessarily) +- Sometimes test performance is higher than training performance, making results questionable. +- Should include comparison to other wrapper-based combinatorial approaches. +- Too targeted an appeal/audience (better for chemical journal)",ICLR2018, +W-NnySSAIKA,1610040000000.0,1610470000000.0,1,fESskTMMSv,fESskTMMSv,Final Decision,Reject,"The paper is about an approach that combines successor representation with marginalized importance sampling. +Although the reviewers acknowledge that the paper has some merits (interesting idea, good discussion, extensive experimental analysis) and the authors' responses have solved most of the reviewers' issues, the paper is borderline and the reviewers did not reach a consensus about its acceptance. In particular, the reviewers feel that the contributions of this paper are not significant enough. +I encourage the authors to modify their paper by taking into consideration the suggestions provided by the reviewers and try to submit it to one of the forthcoming machine learning conferences.",ICLR2021, +Mz8vg_BjQn,1576800000000.0,1576800000000.0,1,SJxUjlBtwB,SJxUjlBtwB,Paper Decision,Accept (Spotlight),"The paper introduces a generative approach to reconstruct 3D images for cryo-electron microscopy (cryo-EM). + +All reviewers really liked the paper, appreciate the challenging problem tackled and the proposed solution. + +Acceptance is therefore recommended. ",ICLR2020, +HyKsVJTrM,1517250000000.0,1517260000000.0,427,rkaT3zWCZ,rkaT3zWCZ,ICLR 2018 Conference Acceptance Decision,Invite to Workshop Track,"The authors present an environment for semantic navigation that is based on an existing dataset, SUNCG. Datasets/environments are important for deep RL research, and the contribution of this paper is welcome. However, this paper does not offer enough novelty in terms of approach/method and its claims are somewhat misleading, so it would probably be a better fit to publish it at a workshop. ",ICLR2018, +A2F7gqAPho,1576800000000.0,1576800000000.0,1,BJg1fgBYwH,BJg1fgBYwH,Paper Decision,Reject," The paper proposes to improve noise robustness of the network learned features, by augmenting deep networks with Spike-Time-Dependent-Plasticity (STDP). The new network show improved noise robustness with better classification accuracy on Cifar10 and ImageNet subset when input data have noise. While this paper is well written, a number of concerns are raised by the reviewers. They include that the proposed method would not be favored from computer vision perspective, it is not convincing why spiking nets are more robust to random noises, and the method fails to address works in adversarial perturbations and adversarial training. Also, Reviewer #2 pointed out the low level of methodological novelty. The authors provided response to the questions, but did not change the rating of the reviewers. Given the various concerns raised, the ACs recommend reject.",ICLR2020, +1CBLNgZcA,1576800000000.0,1576800000000.0,1,HJxcP2EFDS,HJxcP2EFDS,Paper Decision,Reject,"Main content: + +This paper presents negation handling approaches for Amharic sentiment classification. + +-- + +Discussion: + +All reviewers agree the paper is poorly written, uses outdated approaches, and requires better organization and formatting. + +-- + +Recommendation and justification: + +This paper after more work might be better submitted in an NLP workshop on low resource languages, rather than ICLR which is more focused on new machine learning methods.",ICLR2020, +iau3a9RjS6n,1642700000000.0,1642700000000.0,1,JYQYysrNT3M,JYQYysrNT3M,Paper Decision,Reject,"This paper studies an RL problem with vector rewards, where the goal is to maximize the expected minimum total reward (ex-post max-min fairness). This is different from prior works on a similar topic, where the goal is to maximize the minimum expected total reward (ex-ante max-min fairness). The authors propose an algorithm for solving the problem with $O(T^{2 / 3})$ regret and evaluate it. + +This paper received two borderline reject and two reject reviews. The reviewers recognize the novelty of the objective. However, they are also concerned with its motivation and that the proposed algorithm relies on strong assumptions, such as that the used oracle knows the underlying reward and transition models, or at least has some estimate of them. At the end, the scores of this paper are not good enough for acceptance. Therefore, it is rejected.",ICLR2022, +rJg_y1mgx4,1544720000000.0,1545350000000.0,1,S1G_cj05YQ,S1G_cj05YQ,Meta-Review,Reject,"There is no author response for this paper. The paper presents a multi-task learning framework as a unified view on the previous methods for tackling catastrophic forgetting in continual learning. In light of this framework, the authors propose to minimize the KL-divergence between the predictions of the previous optimal model and the current model using some stored samples from the previous tasks. + +The consensus among all three reviewers and AC is that the paper lacks (1) novelty, as the proposed approach is similar if not identical to Learning without forgetting (LwF)[Li&Hoiem 2017] with the difference that the KL-divergence is computed on samples kept from the previous tasks (and LwF uses samples from the current task). Methodological and experimental comparison to LwF is crucial to assess the benefits and novelty of the proposed approach. + +Also the reviewers address other potential weaknesses and give suggestions for improvement: (2) empirical evaluations can be substantially improved with sensitivity analysis of the hyper-parameters on the validation data (R3), indicating errors and error bars for all results (R3 and R2), using more challenging and realistic experimental setting where the data comes from different domains (R1), justifying the results better -- see R2’s questions; (3) lack of clarity and motivation in Section 3.1 -- see R2’s and R1’s suggestions for how to improve clarity and potentially take advantage of the current task to probably correct the previous models prediction when it was wrong. + +AC suggests, in its current state the manuscript is not ready for a publication. We hope the reviews are useful for improving and revising the paper. +",ICLR2019,5: The area chair is absolutely certain +rJl7TvFmgE,1544950000000.0,1545350000000.0,1,B1MAJhR5YX,B1MAJhR5YX,"Developments on counting linear regions, applicability uncertain ",Reject,"The paper seeks to obtain faster means to count or approximately count of the number of linear regions of a neural network. The paper improves bounds and makes an interesting contribution to a long line of work. + +A consistent concern of the reviewers is the limited applicability of the method. The empirical evaluation can serve to better assess the accuracy of theoretical bounds that have been obtained in previous works, but the practical utility is not as clear yet. + +This is a borderline case. The reviewers lean towards a positive rating of the paper, but are not particularly enthusiastic about the paper. The paper makes good contributions, but is just not convincing enough. + +I think that the work program that the authors suggest in their responses could lead to a stronger paper in the future. In particular, the exploration of necessary and sufficient conditions for different neural networks to be equivalent and the use of number of linear regions when analyzing neural networks, seem to be very promising directions. ",ICLR2019,4: The area chair is confident but not absolutely certain +XgPJjMUD9Pf,1610040000000.0,1610470000000.0,1,86PW5gch8VZ,86PW5gch8VZ,Final Decision,Reject,"The reviewers have a strong consensus towards rejection here, and I agree with this consensus , although I think some of the reviewers' concerns are misplaced. For example, the paper does not appear to use a magnitude upper bound that would be vacuous together with a strong convexity assumption (although variance bounds + strong convexity do cover only a small fraction of strongly convex learning tasks, these assumptions aren't vacuous). Some feedback I have that perhaps was not covered by the reviewers: + +Pros: + + - Studying the setting where the number of bits varies dynamically is very interesting (although, as Reviewer 3 points out, not entirely novel). There is significant possibility for improvement from this method, and your theory seems to back this up. + +Cons: + + - The experimental setup is weak, and is measuring the wrong thing. When we run SGD to train a model, what we really care about is when the training finishes: the total wall clock time to train on some system. For compression methods with fixed compression rates, it's fine to use the number of bits transmitted as a proxy, because (when the number of bits transmitted is uniform over time) this will be monotonic in the wall-clock time. However, when the bits transmitted per iteration can change over time, this can have a difficult-to-predict effect on the wall-clock time, because of the potential for overlap between communication and computation (where below a certain number of bits sent, the system is not communication-bound). Wall-clock time experiments comparing against other more modern compression methods would significantly improve this paper.",ICLR2021, +HygdBreglN,1544710000000.0,1545350000000.0,1,B1lKS2AqtX,B1lKS2AqtX,Well executed exploration of a 3D CNN LSTM method,Accept (Poster),"Strengths: Strong results on future frame video prediction using a 3D convolutional network. Use of future video prediction to jointly learn auxiliary tasks shown to to increase performance. Good ablation study. + +Weaknesses: Comparisons with older action recognition methods. Some concerns about novelty, the main contribution is the E3D-LSTM architecture, which R1 characterized as an LSTM with an extra gate and attention mechanism. + +Contention: Authors point to novelty in 3D convolutions inside the RNN. + +Consensus: All reviewers give a final score of 7- well done experiments helped address concerns around novelty. Easy to recommend acceptance given the agreement. +",ICLR2019,4: The area chair is confident but not absolutely certain +r6z1mEQxrSJ,1610040000000.0,1610470000000.0,1,Ma0S4RcfpR_,Ma0S4RcfpR_,Final Decision,Reject,"The authors propose a recurrent model of self-position, with a handcrafted expression of the rotational structure in terms of a matrix Lie group. As noted by the reviewers, this work strongly builds upon Gao et al (ICLR 2019). This really is mentioned too late and not prominently enough in the manuscript, and furthermore, the difference to this work is not clearly explored in the paper (there are just two sentences immediately prior to the conclusion and no experimental comparison). The reviewers pointed out that the phenomena observed here are handcrafted into the structure of the model, rather than being emergent. The reviewers raised concerns that it is not clear what conclusion to draw from this work. +For these reasons, I recommend rejection this stage.",ICLR2021, +fqy7LkRkRY,1642700000000.0,1642700000000.0,1,vr39r4Rjt3z,vr39r4Rjt3z,Paper Decision,Reject,"Unfortunately, I feel the paper is not quite ready for ICLR, even if the reviews seem in general quite positive (though of low confidence). +After reading the reviews and rebuttal, and going over the paper I have to make the following comments: + * The paper do two modification to the backbone architecture, that have an impact on the ability of these systems to continually learn; these changes are adding layer normalization and a mask + * The paper is mostly empirical in nature; while there are some intuitions presented clearly about these ideas, their efficiency is proved empirically, which is completely fine + + However: + * The empirical validation seems not sufficient; the main results are permuted MNIST, incremental CIFAR 10, incremental CUB200; the results on permuted MNIST in terms of final accuracy seem surprisingly low (particularly when involving CL solutions, like EWC, ER, HAT .. see table 2; e.g. FA1 < 80% seems very surprising). This seems strange to me and adds a bit of shade on the results + * The proposed methods are simple; There is a strong message behind them, namely that the choice of the backbone (architecture size, normalization layers) has a huge impact on learning. But being a purely empirical result, this really needs to be backed up with analysis and an attempt at understanding of what is going on. E.g. looking at the masks over time .. to they converge to be task specific? Anything that would give a bit of depth to the results. Discussing the Figures (e.g. I'm looking at Fig 3 and I was grep-ing the text to see a discussion of how one would interpret those results). Why is FashionMNIST used to produced Fig 3, and why is not something like this done for one of the CL benchmark considered. Providing additional typical measures for CL (e.g. showing learning curves). + * Just overall does not seem that the work provides sufficient insight, or analysis. + +I do think there is something really interesting in this work, and I do hope the authors will resubmit this work after some modification. And I do agree that there are many aspects of the backbone or architecture that have big impacts on CL and this is an understudied and not well understood topic. So in that sense I think the idea of this work is good. But I just feel it fails short in terms of results, analysis. I feel in the current format, the work will not have the impact it deserves.",ICLR2022, +M_1jb9JvRFS,1610040000000.0,1610470000000.0,1,ls8D_-g8-ne,ls8D_-g8-ne,Final Decision,Reject,"This paper considers the problem of (biological) sequence design and optimization. The authors made an interesting yet important case that in certain sequence design tasks, a simple evolutionary greedy algorithm could be competitive with the increasingly complex contemporary black-box optimization models. + +Most reviewers appreciate the design of the open-source simulation environment in benchmarking AdaLead (and other competing algorithms) in a number of biological sequence design tasks (e.g. TF binding, RNA, and protein). However, there is a common concern that the experimental results are not fine-grained enough to explain the outperforming results of the proposed algorithm. There are also unresolved comments on missing important BO baselines in the empirical study. As a purely empirical work, these appear to be important concerns. While these results appear to be useful for the domain of biological sequence design, the reviewers are unconvinced that the proposed algorithm is significantly novel, or the results are sufficiently compelling to merit an acceptance to this venue. +",ICLR2021, +rklI0PATJE,1544570000000.0,1545350000000.0,1,ByeLmn0qtX,ByeLmn0qtX,"Additional discussion and experiments appreciated, more improvements needed",Reject,"This paper proposes using conditional VAEs for multi-domain transfer and presents results on CelebA and SCUT. As mentioned by reviewers, the presentation and clarity of the work could be improved. It is quite difficult to determine the new/proposed aspects of the work from a first read through. Though we recognize and appreciate that the authors updated their manuscript to improve its clarity, another edit pass with particular focus on clarifying prior work on conditional VAEs and their proposed new application to domain transfer would be beneficial. + +In addition, as DIS is the main metric for comparison to prior work and for evaluation of the final approach, the conclusions about the effectiveness of this method would be easier to see if a more detailed description of the metric and analysis of the results were provided. + +Given the limited technical novelty and discussion amongst reviewers of the desire for more experimental evidence, this work is not quite ready for publication.",ICLR2019,4: The area chair is confident but not absolutely certain +Wp7vGgux92e,1610040000000.0,1610470000000.0,1,nLYMajjctMh,nLYMajjctMh,Final Decision,Reject,"The paper studies an elegant formulation of personalized federated learning, which balances between a global model and locally trained models. It then analyzes algorithm variants inspired by local update SGD in this setting. The problem formulation using the explicit trade-off between model differences and global objective was received positively, as mentioned by R1 and R2. After a productive discussion including the authors and reviewers, unfortunately consensus remained that the paper remains below the bar in the current form. The contributions are not presented clearly enough in context, the set of competing algorithms (including e.g. EA-SGD, ADMM, SVRG/Scaffold for the heterogeneous setting, and others) needs to be clarified in particular for the modified formulation compared to traditional FL, since objectives are different. Some smaller concerns also remained on the applicability to more general non-convex settings in practice. We hope the feedback helps to strengthen the paper for a future occasion.",ICLR2021, +Z1EJt8DTeFl,1642700000000.0,1642700000000.0,1,psQ6wcNXjS1,psQ6wcNXjS1,Paper Decision,Reject,"This paper proposed a strategy to train EBMs according to the length of MCMC trajectories required. The paper covers three settings with the different length of MCMC: image synthesis, adversarial defense, and density estimation. The reviewers generally find that there are interesting ideas and promising results in the paper, but the paper is not ready to publish at its current stage. The argument regarding density estimation and FID evaluation is not convincing. The proposed method is also more complicated than the baseline methods (CoopNets and PCD), and we would need a stronger argument for the added complexity.",ICLR2022, +n48TE60Q7p,1576800000000.0,1576800000000.0,1,SJgXs1HtwH,SJgXs1HtwH,Paper Decision,Reject,"This paper proposes an application of capsule networks to code modeling. + +I see the potential in this approach, but as the reviewers pointed out, in the current draft there are significant issues with respect to both clarity of motivating the work, and in the empirical results (which start at a much lower baseline than previous work). I am not recommending acceptance at this time, but would encourage the reviewers to clarify the issues raised in the reviews for future submission.",ICLR2020, +vz5aeamg4W,1610040000000.0,1610470000000.0,1,CxGPf2BPVA,CxGPf2BPVA,Final Decision,Reject,"All four reviewers recommend rejecting the paper. However there is agreement that this is an interesting line of research, and the AC agrees. Reviewers provided extensive and well educated feedback. The authors did not respond to the raised concerns.",ICLR2021, +rkXRhfI_e,1486400000000.0,1486400000000.0,1,BJC8LF9ex,BJC8LF9ex,ICLR committee final decision,Reject,"This paper presents a modification of GRU-RNNs to handle missing data explicitly, allowing them to exploit data not missing at random. The method is presented clearly enough, but the reviewers felt that the claims were overreaching. It's also unsatisfying that the method depends on specific modifications of RNN architectures for a particular domain, instead of being a more general approach.",ICLR2017,